High Probability Guarantees for Random Reshuffling

Yu, Hengxu; Li, Xiao

数学 > 优化与控制

arXiv:2311.11841 (math)

[提交于 2023年11月20日 (v1) ，最后修订 2025年3月14日 (此版本， v3)]

标题：高概率保证的随机重排

标题： High Probability Guarantees for Random Reshuffling

Authors:Hengxu Yu, Xiao Li

摘要：我们考虑带有随机重排的随机梯度方法（$\mathsf{RR}$）来解决平滑非凸优化问题。$\mathsf{RR}$在实践中有着广泛的应用，特别是在训练神经网络方面。在本工作中，我们为此方法提供了高概率的一阶和二阶复杂度保证。首先，我们建立了一个高概率的一阶样本复杂度结果，以将梯度的欧几里得范数（不取期望）降低到$\varepsilon$以下。我们得到的复杂度与现有最好的期望内复杂度相匹配，仅在对数项上有所不同，同时没有引入额外的假设，也没有改变$\mathsf{RR}$的更新规则。然后我们提出了一个简单且可计算的停止准则用于$\mathsf{RR}$（记为$\mathsf{RR}$-$\mathsf{sc}$）。该准则保证在有限次数的迭代后被触发，使我们能够证明最后一个迭代的高概率一阶复杂度保证。其次，基于提出的停止准则，我们设计了一种扰动随机重排方法（$\mathsf{p}$-$\mathsf{RR}$），该方法在平稳点附近涉及一个额外的随机扰动过程。我们推导出$\mathsf{p}$-$\mathsf{RR}$可以保证逃离严格鞍点，并建立了高概率的二阶复杂度结果，而无需对随机梯度误差施加任何次高斯尾部类型的假设。推导上述结果的基本要素是$\mathsf{RR}$中抽样无放回的新集中性质，这可能具有独立兴趣。最后，我们在神经网络训练中进行了数值实验，以支持我们的理论结果。

摘要： We consider the stochastic gradient method with random reshuffling ($\mathsf{RR}$) for tackling smooth nonconvex optimization problems. $\mathsf{RR}$ finds broad applications in practice, notably in training neural networks. In this work, we provide high probability first-order and second-order complexity guarantees for this method. First, we establish a high probability first-order sample complexity result for driving the Euclidean norm of the gradient (without taking expectation) below $\varepsilon$. Our derived complexity matches the best existing in-expectation one up to a logarithmic term while imposing no additional assumptions nor changing $\mathsf{RR}$'s updating rule. We then propose a simple and computable stopping criterion for $\mathsf{RR}$ (denoted as $\mathsf{RR}$-$\mathsf{sc}$). This criterion is guaranteed to be triggered after a finite number of iterations, enabling us to prove a high probability first-order complexity guarantee for the last iterate. Second, building on the proposed stopping criterion, we design a perturbed random reshuffling method ($\mathsf{p}$-$\mathsf{RR}$) that involves an additional randomized perturbation procedure near stationary points. We derive that $\mathsf{p}$-$\mathsf{RR}$ provably escapes strict saddle points and establish a high probability second-order complexity result, without requiring any sub-Gaussian tail-type assumptions on the stochastic gradient errors. The fundamental ingredient in deriving the aforementioned results is the new concentration property for sampling without replacement in $\mathsf{RR}$, which could be of independent interest. Finally, we conduct numerical experiments on neural network training to support our theoretical findings.

评论：	对组织结构进行了调整，以使基本理念更清晰
主题：	优化与控制 (math.OC) ; 机器学习 (cs.LG)
MSC 类：	90C30, 90C06, 90C26, 90C15
引用方式：	arXiv:2311.11841 [math.OC]
	(或者 arXiv:2311.11841v3 [math.OC] 对于此版本)
	https://doi.org/10.48550/arXiv.2311.11841

提交历史

来自： Hengxu Yu [查看电子邮件]
[v1] 星期一， 2023 年 11 月 20 日 15:17:20 UTC (67 KB)
[v2] 星期五， 2023 年 12 月 8 日 02:26:17 UTC (67 KB)
[v3] 星期五， 2025 年 3 月 14 日 09:45:53 UTC (214 KB)

数学 > 优化与控制

标题：高概率保证的随机重排

标题： High Probability Guarantees for Random Reshuffling

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 优化与控制

标题： 高概率保证的随机重排 显示英文标题

标题： High Probability Guarantees for Random Reshuffling

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：高概率保证的随机重排