Genericity of Polyak-Lojasiewicz Inequalities for Entropic Mean-Field Neural ODEs

Daudin, Samuel; Delarue, François

数学 > 优化与控制

arXiv:2507.08486 (math)

[提交于 2025年7月11日 ]

标题：熵势场神经ODE的Polyak-Lojasiewicz不等式的普遍性

标题： Genericity of Polyak-Lojasiewicz Inequalities for Entropic Mean-Field Neural ODEs

Authors:Samuel Daudin, François Delarue

摘要：我们研究理想化深度残差神经网络（ResNets）的行为，该行为通过一个在连续性（或伴随传输）方程上设定的最优控制问题进行建模。连续性方程描述了在层形成连续体的渐近 regime 中特征的统计演化。速度场通过网络激活函数表示，该函数本身被视为网络参数（权重和偏置）统计分布的函数。从数学角度来看，控制以松弛方式解释，其取值在参数集合上的概率测度空间中。我们研究当成本泛函来自回归问题并包含对参数分布的额外熵正则化项时网络的最优行为。在此框架下，我们特别关注稳定优化器的存在性——即成本的 Hessian 非退化的优化器。我们证明，对于初始数据的一个开且稠密集（此处理解为特征及其相关标签的概率分布），存在控制问题的唯一稳定全局最小值。此外，我们证明这些最小值满足局部 Polyak--Lojasiewicz 不等式，这可能导致当初始化足够接近最优参数时相应梯度下降的指数收敛。因此，这一结果展示了在具有连续层的 ResNets 和熵惩罚下，Polyak--Lojasiewicz 条件在特征和标签分布方面的普遍性。

摘要： We address the behavior of idealized deep residual neural networks (ResNets), modeled via an optimal control problem set over continuity (or adjoint transport) equations. The continuity equations describe the statistical evolution of the features in the asymptotic regime where the layers of the network form a continuum. The velocity field is expressed through the network activation function, which is itself viewed as a function of the statistical distribution of the network parameters (weights and biases). From a mathematical standpoint, the control is interpreted in a relaxed sense, taking values in the space of probability measures over the set of parameters. We investigate the optimal behavior of the network when the cost functional arises from a regression problem and includes an additional entropic regularization term on the distribution of the parameters. In this framework, we focus in particular on the existence of stable optimizers --that is, optimizers at which the Hessian of the cost is non-degenerate. We show that, for an open and dense set of initial data, understood here as probability distributions over features and associated labels, there exists a unique stable global minimizer of the control problem. Moreover, we show that such minimizers satisfy a local Polyak--Lojasiewicz inequality, which can lead to exponential convergence of the corresponding gradient descent when the initialization lies sufficiently close to the optimal parameters. This result thus demonstrates the genericity (with respect to the distribution of features and labels) of the Polyak--Lojasiewicz condition in ResNets with a continuum of layers and under entropic penalization.

主题：	优化与控制 (math.OC) ; 偏微分方程分析 (math.AP); 概率 (math.PR)
引用方式：	arXiv:2507.08486 [math.OC]
	(或者 arXiv:2507.08486v1 [math.OC] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.08486

提交历史

来自： Samuel Daudin [查看电子邮件]
[v1] 星期五， 2025 年 7 月 11 日 11:04:46 UTC (112 KB)

数学 > 优化与控制

标题：熵势场神经ODE的Polyak-Lojasiewicz不等式的普遍性

标题： Genericity of Polyak-Lojasiewicz Inequalities for Entropic Mean-Field Neural ODEs

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 优化与控制

标题： 熵势场神经ODE的Polyak-Lojasiewicz不等式的普遍性 显示英文标题

标题： Genericity of Polyak-Lojasiewicz Inequalities for Entropic Mean-Field Neural ODEs

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：熵势场神经ODE的Polyak-Lojasiewicz不等式的普遍性