Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

Jentzen, Arnulf; Kleinberg, Konrad; Kruse, Thomas

数学 > 优化与控制

arXiv:2506.22851 (math)

[提交于 2025年6月28日 ]

标题：深度神经网络可以证明在没有维度灾难的情况下求解马尔可夫决策过程的贝尔曼方程

标题： Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

Authors:Arnulf Jentzen, Konrad Kleinberg, Thomas Kruse

摘要：离散时间随机最优控制问题和马尔可夫决策过程（MDPs）是不确定环境下顺序决策的基本模型，因此提供了强化学习理论的数学框架。求解MDPs的核心工具是贝尔曼方程及其解，即所谓的$Q$-函数。在本文中，我们构建了用于无限时间范围且有限控制集$A$的MDPs相关$Q$-函数的深度神经网络（DNN）近似。更具体地说，我们证明了如果MDP的收益函数和随机转移动态可以被具有泄漏整流线性单元（ReLU）激活函数的深度神经网络（DNN）适当近似，那么相关贝尔曼方程的解$Q_d\colon \mathbb R^d\to \mathbb R^{|A|}$，$d\in \mathbb{N}$也可以通过具有泄漏ReLU激活函数的DNN在$L^2$-意义下进行近似，这些DNN的参数数量在状态空间的维数$d\in \mathbb{N}$和预定误差$\varepsilon\in (0,1)$的倒数$1/\varepsilon$上最多以多项式速率增长。我们的证明依赖于最近引入的全历史递归多级固定点（MLFP）近似方案。

摘要： Discrete time stochastic optimal control problems and Markov decision processes (MDPs) are fundamental models for sequential decision-making under uncertainty and as such provide the mathematical framework underlying reinforcement learning theory. A central tool for solving MDPs is the Bellman equation and its solution, the so-called $Q$-function. In this article, we construct deep neural network (DNN) approximations for $Q$-functions associated to MDPs with infinite time horizon and finite control set $A$. More specifically, we show that if the the payoff function and the random transition dynamics of the MDP can be suitably approximated by DNNs with leaky rectified linear unit (ReLU) activation, then the solutions $Q_d\colon \mathbb R^d\to \mathbb R^{|A|}$, $d\in \mathbb{N}$, of the associated Bellman equations can also be approximated in the $L^2$-sense by DNNs with leaky ReLU activation whose numbers of parameters grow at most polynomially in both the dimension $d\in \mathbb{N}$ of the state space and the reciprocal $1/\varepsilon$ of the prescribed error $\varepsilon\in (0,1)$. Our proof relies on the recently introduced full-history recursive multilevel fixed-point (MLFP) approximation scheme.

主题：	优化与控制 (math.OC) ; 机器学习 (cs.LG); 数值分析 (math.NA); 概率 (math.PR); 机器学习 (stat.ML)
MSC 类：	90C40, 90C39, 60J05, 93E20, 65C05, 68T07
引用方式：	arXiv:2506.22851 [math.OC]
	(或者 arXiv:2506.22851v1 [math.OC] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.22851

提交历史

来自： Thomas Kruse [查看电子邮件]
[v1] 星期六， 2025 年 6 月 28 日 11:25:44 UTC (48 KB)

数学 > 优化与控制

标题：深度神经网络可以证明在没有维度灾难的情况下求解马尔可夫决策过程的贝尔曼方程

标题： Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 优化与控制

标题： 深度神经网络可以证明在没有维度灾难的情况下求解马尔可夫决策过程的贝尔曼方程 显示英文标题

标题： Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：深度神经网络可以证明在没有维度灾难的情况下求解马尔可夫决策过程的贝尔曼方程