Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks

Livi, Lorenzo

计算机科学 > 机器学习

arXiv:2508.12121 (cs)

[提交于 2025年8月16日 (v1) ，最后修订 2025年8月24日 (此版本， v3)]

标题：状态与参数在循环神经网络中的时间尺度耦合

标题： Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks

Authors:Lorenzo Livi

摘要：我们研究循环神经网络（RNNs）中的门控机制如何在使用固定全局学习率进行训练时，隐式地诱导自适应学习率行为。这种效应源于状态空间时间尺度（由门控参数化）与梯度下降过程中参数空间动力学之间的耦合。通过推导漏电积分器和门控RNN的精确雅可比矩阵，我们得到一个一阶展开，明确说明了常数、标量和多维门控如何重塑梯度传播，调节有效步长，并在参数更新中引入各向异性。这些发现表明，门控不仅控制信息流，还作为数据驱动的预条件器，在参数空间中适应优化轨迹。我们进一步将门控与学习率调度、动量以及自适应方法如Adam进行了形式类比。实证模拟验证了这些观点：在多个序列任务中，我们展示了门控引起的依赖延迟的有效学习率和梯度流的方向性集中，多门控模型的表现与Adam产生的各向异性结构相当或更优。这些结果表明，优化器驱动和门控驱动的自适应性是互补而非等价的机制。总体而言，这项工作提供了一个统一的动力系统视角，解释了门控如何将状态演化与参数更新耦合在一起，从而说明了为什么门控架构在实践中能够实现鲁棒的可训练性和稳定性。

摘要： We study how gating mechanisms in recurrent neural networks (RNNs) implicitly induce adaptive learning-rate behavior, even when training is carried out with a fixed, global learning rate. This effect arises from the coupling between state-space time scales--parametrized by the gates--and parameter-space dynamics during gradient descent. By deriving exact Jacobians for leaky-integrator and gated RNNs, we obtain a first-order expansion that makes explicit how constant, scalar, and multi-dimensional gates reshape gradient propagation, modulate effective step sizes, and introduce anisotropy in parameter updates. These findings reveal that gates not only control information flow, but also act as data-driven preconditioners that adapt optimization trajectories in parameter space. We further draw formal analogies with learning-rate schedules, momentum, and adaptive methods such as Adam. Empirical simulations corroborate these claims: in several sequence tasks, we show that gates induce lag-dependent effective learning rates and directional concentration of gradient flow, with multi-gate models matching or exceeding the anisotropic structure produced by Adam. These results highlight that optimizer-driven and gate-driven adaptivity are complementary but not equivalent mechanisms. Overall, this work provides a unified dynamical systems perspective on how gating couples state evolution with parameter updates, explaining why gated architectures achieve robust trainability and stability in practice.

评论：	格式化和更多仿真
主题：	机器学习 (cs.LG) ; 动力系统 (math.DS)
引用方式：	arXiv:2508.12121 [cs.LG]
	(或者 arXiv:2508.12121v3 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.12121

提交历史

来自： Lorenzo Livi [查看电子邮件]
[v1] 星期六， 2025 年 8 月 16 日 18:19:34 UTC (620 KB)
[v2] 星期三， 2025 年 8 月 20 日 07:10:59 UTC (1,767 KB)
[v3] 星期日， 2025 年 8 月 24 日 17:10:20 UTC (2,535 KB)

计算机科学 > 机器学习

标题：状态与参数在循环神经网络中的时间尺度耦合

标题： Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 状态与参数在循环神经网络中的时间尺度耦合 显示英文标题

标题： Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：状态与参数在循环神经网络中的时间尺度耦合