Gradient descent inference in empirical risk minimization

Han, Qiyang; Xu, Xiaocong

数学 > 统计理论

arXiv:2412.09498 (math)

[提交于 2024年12月12日 (v1) ，最后修订 2025年1月7日 (此版本， v2)]

标题：梯度下降推断在经验风险最小化中

标题： Gradient descent inference in empirical risk minimization

Authors:Qiyang Han, Xiaocong Xu

摘要：梯度下降法是现代统计学习中最广泛使用的迭代算法之一。然而，在高维设置下的精确算法动力学仍然只被部分理解，这因此限制了它在统计推断应用中的更广泛应用潜力。本文在所谓的均值场（mean-field）条件下，即样本量与信号维度成比例时，提供了经验风险最小化问题中梯度下降迭代的精确的非渐近分布特征刻画。我们的非渐近状态演化理论适用于一般非凸损失函数和非高斯数据，并揭示了两个奥恩萨格校正矩阵在均值场条件下精确表征所有梯度下降迭代之间非平凡依赖的核心作用。尽管奥恩萨格校正矩阵通常是解析上难以处理的，但我们的状态演化理论促进了通用的梯度下降推理算法，该算法可以一致地估计这些矩阵跨越广泛的模型类别。利用这一算法，我们展示了状态演化可以被反演以构建（i）用于梯度下降迭代泛化误差的数据驱动估计器以及（ii）用于未知信号推理的去偏梯度下降迭代。详细的应用到两个典型模型——线性回归和（广义）逻辑回归——以说明我们一般理论和推理方法的模型特定特性。

摘要： Gradient descent is one of the most widely used iterative algorithms in modern statistical learning. However, its precise algorithmic dynamics in high-dimensional settings remain only partially understood, which has therefore limited its broader potential for statistical inference applications. This paper provides a precise, non-asymptotic distributional characterization of gradient descent iterates in a broad class of empirical risk minimization problems, in the so-called mean-field regime where the sample size is proportional to the signal dimension. Our non-asymptotic state evolution theory holds for both general non-convex loss functions and non-Gaussian data, and reveals the central role of two Onsager correction matrices that precisely characterize the non-trivial dependence among all gradient descent iterates in the mean-field regime. Although the Onsager correction matrices are typically analytically intractable, our state evolution theory facilitates a generic gradient descent inference algorithm that consistently estimates these matrices across a broad class of models. Leveraging this algorithm, we show that the state evolution can be inverted to construct (i) data-driven estimators for the generalization error of gradient descent iterates and (ii) debiased gradient descent iterates for inference of the unknown signal. Detailed applications to two canonical models--linear regression and (generalized) logistic regression--are worked out to illustrate model-specific features of our general theory and inference methods.

主题：	统计理论 (math.ST) ; 信息论 (cs.IT); 优化与控制 (math.OC); 方法论 (stat.ME); 机器学习 (stat.ML)
引用方式：	arXiv:2412.09498 [math.ST]
	(或者 arXiv:2412.09498v2 [math.ST] 对于此版本)
	https://doi.org/10.48550/arXiv.2412.09498

提交历史

来自： Qiyang Han [查看电子邮件]
[v1] 星期四， 2024 年 12 月 12 日 17:47:08 UTC (633 KB)
[v2] 星期二， 2025 年 1 月 7 日 06:49:09 UTC (728 KB)

数学 > 统计理论

标题：梯度下降推断在经验风险最小化中

标题： Gradient descent inference in empirical risk minimization

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 统计理论

标题： 梯度下降推断在经验风险最小化中 显示英文标题

标题： Gradient descent inference in empirical risk minimization

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：梯度下降推断在经验风险最小化中