$k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation

Greenhut, Daniel; Feldman, Dan

计算机科学 > 机器学习

arXiv:2507.14631 (cs)

[提交于 2025年7月19日 ]

标题： $k$- 用于（非平方）欧几里得距离的PCA：多项式时间近似

标题： $k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation

Authors:Daniel Greenhut, Dan Feldman

摘要：给定一个整数$k\geq1$和一个在$\REAL^d$中的包含$n$个点的集合$P$，经典的$k$-PCA（主成分分析）近似了$P$的仿射\emph{$k$-子空间均值}，即最小化其在$P$点上的平方欧几里得距离（$\ell_{2,2}$-范数）之和的$k$维仿射线性子空间，即这些距离的均值。 \emph{$k$-子空间中位数}是一个子空间，它最小化其非平方欧几里得距离的和（$\ell_{2,1}$混合范数），即它们的中位数。中位数子空间通常比均值更稀疏且对噪声/异常值更具鲁棒性，但由于与$\ell_{z,z}$（非混合）范数不同，它对于$k<d-1$是非凸的，因此也更难近似。我们提供了第一个多项式时间确定性算法，其运行时间和近似因子都不指数依赖于$k$。更准确地说，乘法近似因子是$\sqrt{d}$，而运行时间是输入规模的多项式。我们期望我们的技术对于许多其他相关问题也会有用，例如$\ell_{2,z}$的距离范数，如$z\not \in \br{1,2}$，例如$z=\infty$，以及处理异常值/稀疏性。开放代码和在真实数据集上的实验结果也已提供。

摘要： Given an integer $k\geq1$ and a set $P$ of $n$ points in $\REAL^d$, the classic $k$-PCA (Principle Component Analysis) approximates the affine \emph{$k$-subspace mean} of $P$, which is the $k$-dimensional affine linear subspace that minimizes its sum of squared Euclidean distances ($\ell_{2,2}$-norm) over the points of $P$, i.e., the mean of these distances. The \emph{$k$-subspace median} is the subspace that minimizes its sum of (non-squared) Euclidean distances ($\ell_{2,1}$-mixed norm), i.e., their median. The median subspace is usually more sparse and robust to noise/outliers than the mean, but also much harder to approximate since, unlike the $\ell_{z,z}$ (non-mixed) norms, it is non-convex for $k<d-1$. We provide the first polynomial-time deterministic algorithm whose both running time and approximation factor are not exponential in $k$. More precisely, the multiplicative approximation factor is $\sqrt{d}$, and the running time is polynomial in the size of the input. We expect that our technique would be useful for many other related problems, such as $\ell_{2,z}$ norm of distances for $z\not \in \br{1,2}$, e.g., $z=\infty$, and handling outliers/sparsity. Open code and experimental results on real-world datasets are also provided.

主题：	机器学习 (cs.LG) ; 计算几何 (cs.CG); 数据结构与算法 (cs.DS)
引用方式：	arXiv:2507.14631 [cs.LG]
	(或者 arXiv:2507.14631v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.14631

提交历史

来自： Daniel Greenhut [查看电子邮件]
[v1] 星期六， 2025 年 7 月 19 日 14:00:50 UTC (449 KB)

计算机科学 > 机器学习

标题： $k$- 用于（非平方）欧几里得距离的PCA：多项式时间近似

标题： $k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： $k$- 用于（非平方）欧几里得距离的PCA：多项式时间近似 显示英文标题

标题： $k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： $k$- 用于（非平方）欧几里得距离的PCA：多项式时间近似