The Vanishing Gradient Problem for Stiff Neural Differential Equations

Fronk, Colby; Petzold, Linda

计算机科学 > 机器学习

arXiv:2508.01519 (cs)

[提交于 2025年8月2日 ]

标题：刚性神经微分方程的梯度消失问题

标题： The Vanishing Gradient Problem for Stiff Neural Differential Equations

Authors:Colby Fronk, Linda Petzold

摘要：基于梯度的神经微分方程和其他参数化动力系统优化本质上依赖于对数值解相对于模型参数的微分能力。在刚性系统中，观察到在训练过程中，控制快速衰减模式的参数敏感性变得非常小，导致优化困难。在本文中，我们表明这种梯度消失现象不是任何特定方法的产物，而是所有A稳定和L稳定刚性数值积分方案的普遍特征。我们分析了通用刚性积分方案的有理稳定性函数，并证明由稳定性函数导数所支配的相关参数敏感性在大刚性情况下会衰减至零。提供了常见刚性积分方案的显式公式，详细说明了这一机制。最后，我们严格证明了稳定性函数导数的最慢衰减率为$O(|z|^{-1})$，揭示了一个基本限制：所有A稳定的时间步进方法在刚性条件下不可避免地抑制参数梯度，这对刚性神经ODE的训练和参数识别构成了重大障碍。

摘要： Gradient-based optimization of neural differential equations and other parameterized dynamical systems fundamentally relies on the ability to differentiate numerical solutions with respect to model parameters. In stiff systems, it has been observed that sensitivities to parameters controlling fast-decaying modes become vanishingly small during training, leading to optimization difficulties. In this paper, we show that this vanishing gradient phenomenon is not an artifact of any particular method, but a universal feature of all A-stable and L-stable stiff numerical integration schemes. We analyze the rational stability function for general stiff integration schemes and demonstrate that the relevant parameter sensitivities, governed by the derivative of the stability function, decay to zero for large stiffness. Explicit formulas for common stiff integration schemes are provided, which illustrate the mechanism in detail. Finally, we rigorously prove that the slowest possible rate of decay for the derivative of the stability function is $O(|z|^{-1})$, revealing a fundamental limitation: all A-stable time-stepping methods inevitably suppress parameter gradients in stiff regimes, posing a significant barrier for training and parameter identification in stiff neural ODEs.

主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 系统与控制 (eess.SY); 数值分析 (math.NA)
引用方式：	arXiv:2508.01519 [cs.LG]
	(或者 arXiv:2508.01519v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.01519

提交历史

来自： Colby Fronk [查看电子邮件]
[v1] 星期六， 2025 年 8 月 2 日 23:44:14 UTC (216 KB)

计算机科学 > 机器学习

标题：刚性神经微分方程的梯度消失问题

标题： The Vanishing Gradient Problem for Stiff Neural Differential Equations

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 刚性神经微分方程的梯度消失问题 显示英文标题

标题： The Vanishing Gradient Problem for Stiff Neural Differential Equations

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：刚性神经微分方程的梯度消失问题