Optimizers Qualitatively Alter Solutions And We Should Leverage This

Pascanu, Razvan; Lyle, Clare; Modoranu, Ionut-Vlad; Borras, Naima Elosegui; Alistarh, Dan; Velickovic, Petar; Chandar, Sarath; De, Soham; Martens, James

计算机科学 > 机器学习

arXiv:2507.12224 (cs)

[提交于 2025年7月16日 ]

标题：优化器定性地改变解，而我们应该利用这一点

标题： Optimizers Qualitatively Alter Solutions And We Should Leverage This

Authors:Razvan Pascanu, Clare Lyle, Ionut-Vlad Modoranu, Naima Elosegui Borras, Dan Alistarh, Petar Velickovic, Sarath Chandar, Soham De, James Martens

摘要：由于深度神经网络（DNNs）的非线性特性，仅依赖局部信息的优化器（如SGD）无法保证收敛到损失函数的唯一全局最小值。事实上，在该领域早期，这曾是人们对DNN可行性的主要怀疑来源。过去几十年深度学习的进步表明这种怀疑是错误的，大量实证证据显示，遵循标准训练协议的足够大的DNN表现出良好的优化动态，能够收敛到性能良好的解决方案。这种成功使社区倾向于将凸优化作为学习的心理模型，从而在改进优化器时，要么关注所需的迭代次数，要么关注FLOPs或墙钟时间的训练效率。我们认为，尽管这一观点已被证明非常有成效，但针对DNN的另一个特定视角却受到了相当少的关注：优化器不仅影响收敛速度，还影响学习解决方案的定性属性。换句话说，优化器可以并且会编码归纳偏置并改变给定模型类的有效表达能力。此外，我们认为优化器可以是编码学习过程中期望目标的一种有效方式。我们认为，社区应致力于理解现有方法的偏差，并旨在构建具有明确意图诱导某些解决方案属性的新优化器，而不是仅仅根据它们的收敛速度来评判它们。我们希望我们的论点能激发研究，以改善我们对学习过程如何影响我们收敛到的解决方案类型的理解，并促使人们对优化器设计的重要性有更深刻的认识，作为补充架构和数据在塑造模型结果中的作用的关键杠杆。

摘要： Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when using optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.

主题：	机器学习 (cs.LG)
引用方式：	arXiv:2507.12224 [cs.LG]
	(或者 arXiv:2507.12224v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.12224

提交历史

来自： Ionut-Vlad Modoranu [查看电子邮件]
[v1] 星期三， 2025 年 7 月 16 日 13:33:31 UTC (738 KB)

计算机科学 > 机器学习

标题：优化器定性地改变解，而我们应该利用这一点

标题： Optimizers Qualitatively Alter Solutions And We Should Leverage This

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 优化器定性地改变解，而我们应该利用这一点 显示英文标题

标题： Optimizers Qualitatively Alter Solutions And We Should Leverage This

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：优化器定性地改变解，而我们应该利用这一点