Capacitated Fair-Range Clustering: Hardness and Approximation Algorithms

Gadekar, Ameet; Thejaswi, Suhas

计算机科学 > 数据结构与算法

arXiv:2505.15905v1 (cs)

[提交于 2025年5月21日 ]

标题：有容量的公平范围聚类：难解性与近似算法

标题： Capacitated Fair-Range Clustering: Hardness and Approximation Algorithms

Authors:Ameet Gadekar, Suhas Thejaswi

摘要：容量公平范围 $k$-聚类通过引入容量约束和人口统计公平性，推广了经典的 $k$-聚类。在此设置中，每个设施有一个容量限制，并可能属于一个或多个人口统计群体。任务是选择$k$个设施作为中心，并将每个客户分配到一个中心，使得：($a$) 没有一个中心超过其容量，($b$) 从每个组中选择的中心数量位于指定的下限和上限之间（公平范围约束），并且 ($c$) 聚类成本（例如，$k$-中位数或$k$-均值）最小化。Thejaswi 等人的先前工作（KDD 2022）表明，满足公平范围约束是 NP 难的，使得该问题无法以任何多项式因子进行近似。我们通过证明即使公平范围约束可以轻松满足，不可近似性仍然存在，从而加强了这一结果，突显了聚类任务本身的内在计算复杂性。在标准复杂性假设下，我们表明，如果不彻底枚举设施集的所有$k$-子集，则不可能进行非平凡的近似。值得注意的是，我们的不可近似性结果甚至在树度量上以及当组的数量是设施集大小的对数时仍然成立。鉴于这些强有力的不可近似性结果，我们专注于一个更实际的场景，其中组的数量是常数。在此情况下，我们设计了两种近似算法：($i$) 一个多项式时间$O(\log k)$- 和$O(\log^2 k)$-近似算法，用于$k$-中位数和$k$-均值目标，以及 ($ii$) 一个参数化为$k$的固定参数可解算法，分别实现$(3+\epsilon)$- 和$(9 + \epsilon)$-近似。这些结果与无公平范围约束的容量聚类的最佳已知近似保证相匹配，并解决了Zang等人提出的一个开放问题。 (NeurIPS 2024).

摘要： Capacitated fair-range $k$-clustering generalizes classical $k$-clustering by incorporating both capacity constraints and demographic fairness. In this setting, each facility has a capacity limit and may belong to one or more demographic groups. The task is to select $k$ facilities as centers and assign each client to a center such that: ($a$) no center exceeds its capacity, ($b$) the number of centers selected from each group lies within specified lower and upper bounds (fair-range constraints), and ($c$) the clustering cost (e.g., $k$-median or $k$-means) is minimized. Prior work by Thejaswi et al. (KDD 2022) showed that satisfying fair-range constraints is NP-hard, making the problem inapproximable to any polynomial factor. We strengthen this result by showing that inapproximability persists even when the fair-range constraints are trivially satisfiable, highlighting the intrinsic computational complexity of the clustering task itself. Assuming standard complexity conjectures, we show that no non-trivial approximation is possible without exhaustively enumerating all $k$-subsets of the facility set. Notably, our inapproximability results hold even on tree metrics and when the number of groups is logarithmic in the size of the facility set. In light of these strong inapproximability results, we focus on a more practical setting where the number of groups is constant. In this regime, we design two approximation algorithms: ($i$) a polynomial-time $O(\log k)$- and $O(\log^2 k)$-approximation algorithm for the $k$-median and $k$-means objectives, and ($ii$) a fixed-parameter tractable algorithm parameterized by $k$, achieving $(3+\epsilon)$- and $(9 + \epsilon)$-approximation, respectively. These results match the best-known approximation guarantees for capacitated clustering without fair-range constraints and resolves an open question posed by Zang et al. (NeurIPS 2024).

主题：	数据结构与算法 (cs.DS) ; 计算复杂性 (cs.CC)
引用方式：	arXiv:2505.15905 [cs.DS]
	(或者 arXiv:2505.15905v1 [cs.DS] 对于此版本)
	https://doi.org/10.48550/arXiv.2505.15905

提交历史

来自： Suhas Thejaswi [查看电子邮件]
[v1] 星期三， 2025 年 5 月 21 日 18:00:06 UTC (111 KB)

计算机科学 > 数据结构与算法

标题：有容量的公平范围聚类：难解性与近似算法

标题： Capacitated Fair-Range Clustering: Hardness and Approximation Algorithms

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 数据结构与算法

标题： 有容量的公平范围聚类：难解性与近似算法 显示英文标题

标题： Capacitated Fair-Range Clustering: Hardness and Approximation Algorithms

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：有容量的公平范围聚类：难解性与近似算法