Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > stat > arXiv:2506.04813

Help | Advanced Search

Statistics > Machine Learning

arXiv:2506.04813 (stat)
[Submitted on 5 Jun 2025 ]

Title: Distributional encoding for Gaussian process regression with qualitative inputs

Title: 具有定性输入的高斯过程回归的分布编码

Authors:Sébastien Da Veiga (ENSAI, CREST, RT-UQ)
Abstract: Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing method for the optimization of black-box functions. However, when all or some input variables are categorical, building a predictive and computationally efficient GP remains challenging. Starting from the naive target encoding idea, where the original categorical values are replaced with the mean of the target variable for that category, we propose a generalization based on distributional encoding (DE) which makes use of all samples of the target variable for a category. To handle this type of encoding inside the GP, we build upon recent results on characteristic kernels for probability distributions, based on the maximum mean discrepancy and the Wasserstein distance. We also discuss several extensions for classification, multi-task learning and incorporation or auxiliary information. Our approach is validated empirically, and we demonstrate state-of-the-art predictive performance on a variety of synthetic and real-world datasets. DE is naturally complementary to recent advances in BO over discrete and mixed-spaces.
Abstract: 高斯过程(GP)回归是一种流行且样本高效的工程应用方法,尤其适用于获取观测值昂贵的情况,同时也是贝叶斯优化(BO)的核心组成部分,而贝叶斯优化是一种广泛使用的黑盒函数优化方法。 然而,当所有或部分输入变量为分类变量时,构建一个预测能力强且计算高效的GP仍然具有挑战性。 从原始的类别目标编码概念出发——即将原始分类值替换为目标变量在此类别的平均值——我们提出了一种基于分布编码(DE)的广义方法,该方法充分利用了某个类别的目标变量的所有样本。 为了在GP中处理这种类型的编码,我们基于概率分布的最大均值差异和Wasserstein距离的特征核函数的最新成果进行了构建。 此外,我们还讨论了几种扩展,包括分类、多任务学习以及整合辅助信息。 我们的方法通过实证验证,并在各种合成数据集和真实数据集上展示了最先进的预测性能。 DE天然地补充了离散空间和混合空间中BO的最新进展。
Subjects: Machine Learning (stat.ML) ; Machine Learning (cs.LG)
Cite as: arXiv:2506.04813 [stat.ML]
  (or arXiv:2506.04813v1 [stat.ML] for this version)
  https://doi.org/10.48550/arXiv.2506.04813
arXiv-issued DOI via DataCite

Submission history

From: Sebastien Da Veiga [view email]
[v1] Thu, 5 Jun 2025 09:35:02 UTC (5,609 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
view license
Current browse context:
stat.ML
< prev   |   next >
new | recent | 2025-06
Change to browse by:
cs
cs.LG
stat

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号