Pay Attention to Small Weights

Zhou, Chao; Jacobs, Tom; Gadhikar, Advait; Burkholz, Rebekka

计算机科学 > 机器学习

arXiv:2506.21374 (cs)

[提交于 2025年6月26日 ]

标题：注意小权重

标题： Pay Attention to Small Weights

Authors:Chao Zhou, Tom Jacobs, Advait Gadhikar, Rebekka Burkholz

摘要：微调大型预训练神经网络被证明在内存和计算成本方面都是资源密集型的。为了缓解这个问题，一种常见的方法是将训练限制在模型参数的一个子集上。通过分析微调过程中梯度与权重之间的关系，我们观察到一个显著的模式：大的梯度通常与小幅度的权重相关联。这种相关性在微调设置中比从头开始训练更为明显。受这一观察的启发，我们提出了NANOADAM，它在微调过程中仅动态更新小幅度的权重，并提供了几个实际优势：首先，这种标准是无梯度的——参数子集可以在不进行梯度计算的情况下确定；其次，它保留了大尺度的权重，这些权重可能编码了预训练期间学到的关键特征，从而减少了灾难性遗忘的风险；第三，它允许使用更大的学习率，并在实验中始终导致更好的泛化性能。我们在自然语言处理和视觉任务中都验证了这一点。

摘要： Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.

主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI)
引用方式：	arXiv:2506.21374 [cs.LG]
	(或者 arXiv:2506.21374v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.21374

提交历史

来自： Chao Zhou [查看电子邮件]
[v1] 星期四， 2025 年 6 月 26 日 15:22:55 UTC (16,136 KB)

计算机科学 > 机器学习

标题：注意小权重

标题： Pay Attention to Small Weights

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 注意小权重 显示英文标题

标题： Pay Attention to Small Weights

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：注意小权重