Statistical consistency and asymptotic normality for high-dimensional robust M-estimators

Loh, Po-Ling

数学 > 统计理论

arXiv:1501.00312 (math)

[提交于 2015年1月1日 ]

标题：高维鲁棒M估计量的统计一致性与渐近正态性

标题： Statistical consistency and asymptotic normality for high-dimensional robust M-estimators

Authors:Po-Ling Loh

摘要：我们研究了正则化鲁棒M估计量的理论性质，适用于数据从稀疏高维线性模型中抽取且受到重尾分布和/或误差项及协变量中的异常值污染的情况。我们首先在误差分布满足较为温和条件的情况下，针对惩罚回归估计量建立了一种局部统计一致性形式：当损失函数的导数有界且满足局部限制曲率条件时，所有位于真实回归向量常数半径内的驻点都以Lasso（子高斯误差）所享有的最小最大速率收敛。当用适当的非凸正则化代替l_1惩罚时，我们证明了这些驻点实际上是唯一的，并等于具有正确支持的局部oracle解——因此，低维情况下关于渐近正态性的结果可以立即推广到高维情形。这对误差为重尾分布时正则化非凸M估计量的效率具有重要意义。我们对损失函数局部曲率的分析对于优化同样具有实际意义，特别是在鲁棒回归函数和/或正则化器是非凸的且目标函数在外围区域存在驻点时。我们表明，只要复合梯度下降算法初始化在真实回归向量的常数半径内，迭代序列就会以线性速率收敛到局部区域内的驻点。此外，凸正则化鲁棒回归函数的全局最优解可用于获得合适的初始化。结果是一种新颖的两步程序，使用凸M估计量实现一致性，使用非凸M估计量提高效率。

摘要： We study theoretical properties of regularized robust M-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an l_1-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support---hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex M-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex M-estimator to achieve consistency and a nonconvex M-estimator to increase efficiency.

评论：	56页，8幅图
主题：	统计理论 (math.ST) ; 信息论 (cs.IT); 机器学习 (stat.ML)
MSC 类：	62F12
引用方式：	arXiv:1501.00312 [math.ST]
	(或者 arXiv:1501.00312v1 [math.ST] 对于此版本)
	https://doi.org/10.48550/arXiv.1501.00312

提交历史

来自： Po-Ling Loh [查看电子邮件]
[v1] 星期四， 2015 年 1 月 1 日 20:52:30 UTC (998 KB)

数学 > 统计理论

标题：高维鲁棒M估计量的统计一致性与渐近正态性

标题： Statistical consistency and asymptotic normality for high-dimensional robust M-estimators

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 统计理论

标题： 高维鲁棒M估计量的统计一致性与渐近正态性 显示英文标题

标题： Statistical consistency and asymptotic normality for high-dimensional robust M-estimators

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：高维鲁棒M估计量的统计一致性与渐近正态性