Pac-bayesian bounds for sparse regression estimation with exponential weights

Alquier, Pierre; Lounici, Karim

doi:10.1214/11-EJS601

数学 > 统计理论

arXiv:1009.2707v2 (math)

[提交于 2010年9月14日 (v1) ，最后修订 2011年3月14日 (此版本， v2)]

标题： Pac-bayesian 界限用于稀疏回归估计的指数权重

标题： Pac-bayesian bounds for sparse regression estimation with exponential weights

Authors:Pierre Alquier, Karim Lounici

摘要：我们考虑稀疏回归模型，其中参数的数量$p$大于样本量$n$。在考虑高维问题时，困难在于提出能够在统计性能和计算性能之间取得良好平衡的估计量。例如，BIC 估计量从统计角度来看表现良好\cite{BTW07}，但只能计算$p$值最多为几十的情况。 Lasso 估计量是凸优化问题的解，因此可以计算较大的$p$值。然而，为了建立该估计量的快速收敛速率，需要对设计施加严格的条件。 Dalalyan 和 Tsybakov\cite{arnak}提出了一种方法，在问题的统计和计算方面取得了良好的平衡。他们的估计量可以计算合理大的$p$并且在设计的弱假设下满足良好的统计性质。然而，\cite{arnak}仅在经验超出风险方面提出了稀疏性oracle不等式。在本文中，我们提出了一种类似于\cite{arnak}的聚合过程，但具有改进的统计性能。我们的主要理论结果是针对一种指数权重估计器版本的真实超出风险的概率稀疏性oracle不等式。我们还提出了一种MCMC方法来计算我们估计器在合理大的$p$值下的估计。

摘要： We consider the sparse regression model where the number of parameters $p$ is larger than the sample size $n$. The difficulty when considering high-dimensional problems is to propose estimators achieving a good compromise between statistical and computational performances. The BIC estimator for instance performs well from the statistical point of view \cite{BTW07} but can only be computed for values of $p$ of at most a few tens. The Lasso estimator is solution of a convex minimization problem, hence computable for large value of $p$. However stringent conditions on the design are required to establish fast rates of convergence for this estimator. Dalalyan and Tsybakov \cite{arnak} propose a method achieving a good compromise between the statistical and computational aspects of the problem. Their estimator can be computed for reasonably large $p$ and satisfies nice statistical properties under weak assumptions on the design. However, \cite{arnak} proposes sparsity oracle inequalities in expectation for the empirical excess risk only. In this paper, we propose an aggregation procedure similar to that of \cite{arnak} but with improved statistical performances. Our main theoretical result is a sparsity oracle inequality in probability for the true excess risk for a version of exponential weight estimator. We also propose a MCMC method to compute our estimator for reasonably large values of $p$.

评论：	19页
主题：	统计理论 (math.ST) ; 计算 (stat.CO)
MSC 类：	Primary: 62J07, Secondary: 62J05, 62G08, 62F15, 62B10, 68T05
引用方式：	arXiv:1009.2707 [math.ST]
	(或者 arXiv:1009.2707v2 [math.ST] 对于此版本)
	https://doi.org/10.48550/arXiv.1009.2707
期刊参考：	Electronic Journal of Statistics, Vol 5(2011), 127-145
相关 DOI:	https://doi.org/10.1214/11-EJS601

提交历史

来自： Karim Lounici [查看电子邮件]
[v1] 星期二， 2010 年 9 月 14 日 16:17:29 UTC (53 KB)
[v2] 星期一， 2011 年 3 月 14 日 14:53:23 UTC (45 KB)

数学 > 统计理论

标题： Pac-bayesian 界限用于稀疏回归估计的指数权重

标题： Pac-bayesian bounds for sparse regression estimation with exponential weights

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 统计理论

标题： Pac-bayesian 界限用于稀疏回归估计的指数权重 显示英文标题

标题： Pac-bayesian bounds for sparse regression estimation with exponential weights

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： Pac-bayesian 界限用于稀疏回归估计的指数权重