KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks

Hazelden, James; Driscoll, Laura; Shlizerman, Eli; Shea-Brown, Eric

计算机科学 > 机器学习

arXiv:2507.06381 (cs)

[提交于 2025年7月8日 ]

标题： KPFlow：从算子角度看待递归网络梯度下降训练中的动态崩溃

标题： KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks

Authors:James Hazelden, Laura Driscoll, Eli Shlizerman, Eric Shea-Brown

摘要：梯度下降（GD）及其变体是用于实现循环动力系统（如循环神经网络（RNNs）、神经ODE和门控循环单元（GRUs））高效训练的主要工具。这些模型中形成的动态表现出神经坍缩和潜在表示等特征，这些特征可能支持网络的显著泛化能力。在神经科学中，这些表示的定性特征被用来比较生物系统和人工系统中的学习。尽管最近取得了进展，但仍需要理论工具来严格理解塑造学习表示的机制，特别是在有限的非线性模型中。在这里，我们展示了描述模型动态如何随GD演变的梯度流，可以分解为涉及两个算子的乘积：参数算子K和线性化流传播算子P。K反映了前馈神经网络中的神经切线核，而P出现在李雅普诺夫稳定性与最优控制理论中。我们展示了该分解的两个应用。首先，我们展示了它们的相互作用如何在GD下产生低维潜在动态，并且特别说明了坍缩是网络结构的结果，而不仅仅是底层任务的性质。其次，在多任务训练中，我们展示了这些算子可以用来衡量与单个子任务相关的目标之间的一致性。我们通过实验和理论验证了这些发现，提供了一个高效的Pytorch包，\emph{KPFlow}，用于实现针对一般循环架构的鲁棒分析工具。综上所述，我们的工作朝着建立对非线性循环模型中GD学习的下一阶段理解迈出了重要一步。

摘要： Gradient Descent (GD) and its variants are the primary tool for enabling efficient training of recurrent dynamical systems such as Recurrent Neural Networks (RNNs), Neural ODEs and Gated Recurrent units (GRUs). The dynamics that are formed in these models exhibit features such as neural collapse and emergence of latent representations that may support the remarkable generalization properties of networks. In neuroscience, qualitative features of these representations are used to compare learning in biological and artificial systems. Despite recent progress, there remains a need for theoretical tools to rigorously understand the mechanisms shaping learned representations, especially in finite, non-linear models. Here, we show that the gradient flow, which describes how the model's dynamics evolve over GD, can be decomposed into a product that involves two operators: a Parameter Operator, K, and a Linearized Flow Propagator, P. K mirrors the Neural Tangent Kernel in feed-forward neural networks, while P appears in Lyapunov stability and optimal control theory. We demonstrate two applications of our decomposition. First, we show how their interplay gives rise to low-dimensional latent dynamics under GD, and, specifically, how the collapse is a result of the network structure, over and above the nature of the underlying task. Second, for multi-task training, we show that the operators can be used to measure how objectives relevant to individual sub-tasks align. We experimentally and theoretically validate these findings, providing an efficient Pytorch package, \emph{KPFlow}, implementing robust analysis tools for general recurrent architectures. Taken together, our work moves towards building a next stage of understanding of GD learning in non-linear recurrent models.

主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 动力系统 (math.DS); 神经与认知 (q-bio.NC)
引用方式：	arXiv:2507.06381 [cs.LG]
	(或者 arXiv:2507.06381v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.06381

提交历史

来自： James Hazelden [查看电子邮件]
[v1] 星期二， 2025 年 7 月 8 日 20:33:15 UTC (4,431 KB)

计算机科学 > 机器学习

标题： KPFlow：从算子角度看待递归网络梯度下降训练中的动态崩溃

标题： KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： KPFlow：从算子角度看待递归网络梯度下降训练中的动态崩溃 显示英文标题

标题： KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： KPFlow：从算子角度看待递归网络梯度下降训练中的动态崩溃