Understanding uncertainty in Bayesian cluster analysis

Balocchi, Cecilia; Wade, Sara

统计学 > 计算

arXiv:2506.16295 (stat)

[提交于 2025年6月19日 ]

标题：理解贝叶斯聚类分析中的不确定性

标题： Understanding uncertainty in Bayesian cluster analysis

Authors:Cecilia Balocchi, Sara Wade

摘要：基于贝叶斯方法的聚类通常因其能够提供分区结构中的不确定性而受到赞赏。然而，由于聚类空间具有离散、无序且维度巨大的特性，总结聚类结构的后验分布可能具有挑战性。尽管最近的进展提供了单一的聚类估计来表示后验分布，但在后验分布呈现多模态的情况下，这种做法会忽略不确定性，甚至可能是不具代表性的。为了增强我们对不确定性的理解，我们提出了一个用于贝叶斯聚类的WASserstein近似（WASABI），它通过多个而非单一的聚类估计来总结后验样本，每个估计对应于接收大量后验质量的分区空间的不同部分。具体来说，我们通过在分区空间上配备适当的度量，在Wasserstein距离的意义下逼近后验分布，从而找到这些聚类估计。一个有趣的副产品是，可以通过在分区空间上使用类似于k-medoids的算法将后验样本划分为不同的组，每组由其中一个聚类估计表示。利用合成数据集和真实数据集，我们展示了我们的方法有助于提高对不确定性的理解，特别是在数据聚类不完全分离或所采用模型设定错误的情况下。

摘要： The Bayesian approach to clustering is often appreciated for its ability to provide uncertainty in the partition structure. However, summarizing the posterior distribution over the clustering structure can be challenging, due the discrete, unordered nature and massive dimension of the space. While recent advancements provide a single clustering estimate to represent the posterior, this ignores uncertainty and may even be unrepresentative in instances where the posterior is multimodal. To enhance our understanding of uncertainty, we propose a WASserstein Approximation for Bayesian clusterIng (WASABI), which summarizes the posterior samples with not one, but multiple clustering estimates, each corresponding to a different part of the space of partitions that receives substantial posterior mass. Specifically, we find such clustering estimates by approximating the posterior distribution in a Wasserstein distance sense, equipped with a suitable metric on the partition space. An interesting byproduct is that a locally optimal solution to this problem can be found using a k-medoids-like algorithm on the partition space to divide the posterior samples into different groups, each represented by one of the clustering estimates. Using both synthetic and real datasets, we show that our proposal helps to improve the understanding of uncertainty, particularly when the data clusters are not well separated or when the employed model is misspecified.

主题：	计算 (stat.CO)
引用方式：	arXiv:2506.16295 [stat.CO]
	(或者 arXiv:2506.16295v1 [stat.CO] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.16295

提交历史

来自： Sara Wade [查看电子邮件]
[v1] 星期四， 2025 年 6 月 19 日 13:13:27 UTC (11,217 KB)

统计学 > 计算

标题：理解贝叶斯聚类分析中的不确定性

标题： Understanding uncertainty in Bayesian cluster analysis

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 计算

标题： 理解贝叶斯聚类分析中的不确定性 显示英文标题

标题： Understanding uncertainty in Bayesian cluster analysis

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：理解贝叶斯聚类分析中的不确定性