On the Estimation of Coherence

Mohri, Mehryar; Talwalkar, Ameet

统计学 > 机器学习

arXiv:1009.0861v1 (stat)

[提交于 2010年9月4日 ]

标题：关于相干性的估计

标题： On the Estimation of Coherence

Authors:Mehryar Mohri, Ameet Talwalkar

摘要：低秩矩阵近似常用于帮助将标准机器学习算法扩展到大规模问题。最近，矩阵相干性被用来在这些低秩近似和其他基于采样的算法（例如，矩阵补全、鲁棒PCA）的背景下，表征从矩阵条目子集提取全局信息的能力。由于相干性是根据矩阵的奇异向量定义的，并且计算成本较高，这些结果的实际意义在很大程度上取决于以下问题：我们能否高效且准确地估计矩阵的相干性？在本文中，我们解决了这个问题。我们提出了一种从少量列中估计相干性的新算法，对其行为进行了形式化分析，并基于此分析得出了一个新的基于相干性的矩阵近似界。然后，我们在合成和真实数据集上展示了广泛的实验结果，这些结果证实了我们的最坏情况理论分析，但同时也为在考虑低秩近似时使用我们提出的算法提供了强有力的支持。我们的算法在各种数据集上高效且准确地估计了矩阵的相干性，这些相干性估计在个案基础上是采样矩阵近似效果的优秀预测器。

摘要： Low-rank matrix approximations are often used to help scale standard machine learning algorithms to large-scale problems. Recently, matrix coherence has been used to characterize the ability to extract global information from a subset of matrix entries in the context of these low-rank approximations and other sampling-based algorithms, e.g., matrix com- pletion, robust PCA. Since coherence is defined in terms of the singular vectors of a matrix and is expensive to compute, the practical significance of these results largely hinges on the following question: Can we efficiently and accurately estimate the coherence of a matrix? In this paper we address this question. We propose a novel algorithm for estimating coherence from a small number of columns, formally analyze its behavior, and derive a new coherence-based matrix approximation bound based on this analysis. We then present extensive experimental results on synthetic and real datasets that corroborate our worst-case theoretical analysis, yet provide strong support for the use of our proposed algorithm whenever low-rank approximation is being considered. Our algorithm efficiently and accurately estimates matrix coherence across a wide range of datasets, and these coherence estimates are excellent predictors of the effectiveness of sampling-based matrix approximation on a case-by-case basis.

主题：	机器学习 (stat.ML) ; 人工智能 (cs.AI); 机器学习 (cs.LG)
引用方式：	arXiv:1009.0861 [stat.ML]
	(或者 arXiv:1009.0861v1 [stat.ML] 对于此版本)
	https://doi.org/10.48550/arXiv.1009.0861

提交历史

来自： Ameet Talwalkar [查看电子邮件]
[v1] 星期六， 2010 年 9 月 4 日 19:18:54 UTC (48 KB)

统计学 > 机器学习

标题：关于相干性的估计

标题： On the Estimation of Coherence

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 机器学习

标题： 关于相干性的估计 显示英文标题

标题： On the Estimation of Coherence

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：关于相干性的估计