Ranking Vectors Clustering: Theory and Applications

Fattahi, Ali; Eshragh, Ali; Aslani, Babak; Rabiee, Meysam

计算机科学 > 机器学习

arXiv:2507.12583 (cs)

[提交于 2025年7月16日 ]

标题：排序向量聚类：理论与应用

标题： Ranking Vectors Clustering: Theory and Applications

Authors:Ali Fattahi, Ali Eshragh, Babak Aslani, Meysam Rabiee

摘要：我们研究聚类排名向量的问题，其中每个向量表示作为不同整数有序列表的偏好。具体而言，我们专注于k中心点排名向量聚类问题（KRC），其目标是将一组排名向量划分为k个簇并确定每个簇的中心点。与经典的k均值聚类（KMC）不同，KRC限制了观测值和中心点都为排名向量。我们建立了KRC的NP难性并描述了其可行集。对于单簇情况，我们推导出最优中心点的闭式解析解，该解可以在线性时间内计算。为了解决KRC的计算挑战，我们开发了一种高效的近似算法KRCA，该算法迭代地改进来自KMC的初始解，称为基线解。此外，我们在KRCA中引入了一种分支定界（BnB）算法，利用决策树框架来减少计算时间，同时引入一个控制参数以平衡解的质量和效率。我们建立了KRCA和BnB的理论误差界限。通过在合成和现实数据集上的大量数值实验，我们证明KRCA始终优于基线解，在快速计算时间下显著提高了解的质量。这项工作突显了KRC在个性化和大规模决策中的实际意义，提供了可在未来研究中进一步构建的方法论进展和见解。

摘要： We study the problem of clustering ranking vectors, where each vector represents preferences as an ordered list of distinct integers. Specifically, we focus on the k-centroids ranking vectors clustering problem (KRC), which aims to partition a set of ranking vectors into k clusters and identify the centroid of each cluster. Unlike classical k-means clustering (KMC), KRC constrains both the observations and centroids to be ranking vectors. We establish the NP-hardness of KRC and characterize its feasible set. For the single-cluster case, we derive a closed-form analytical solution for the optimal centroid, which can be computed in linear time. To address the computational challenges of KRC, we develop an efficient approximation algorithm, KRCA, which iteratively refines initial solutions from KMC, referred to as the baseline solution. Additionally, we introduce a branch-and-bound (BnB) algorithm for efficient cluster reconstruction within KRCA, leveraging a decision tree framework to reduce computational time while incorporating a controlling parameter to balance solution quality and efficiency. We establish theoretical error bounds for KRCA and BnB. Through extensive numerical experiments on synthetic and real-world datasets, we demonstrate that KRCA consistently outperforms baseline solutions, delivering significant improvements in solution quality with fast computational times. This work highlights the practical significance of KRC for personalization and large-scale decision making, offering methodological advancements and insights that can be built upon in future studies.

主题：	机器学习 (cs.LG) ; 计算复杂性 (cs.CC); 应用 (stat.AP); 方法论 (stat.ME)
引用方式：	arXiv:2507.12583 [cs.LG]
	(或者 arXiv:2507.12583v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.12583

提交历史

来自： Ali Eshragh [查看电子邮件]
[v1] 星期三， 2025 年 7 月 16 日 19:00:09 UTC (4,510 KB)

计算机科学 > 机器学习

标题：排序向量聚类：理论与应用

标题： Ranking Vectors Clustering: Theory and Applications

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 排序向量聚类：理论与应用 显示英文标题

标题： Ranking Vectors Clustering: Theory and Applications

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：排序向量聚类：理论与应用