Improving Reinforcement Learning Sample-Efficiency using Local Approximation

Prashant, Mohit; Easwaran, Arvind

计算机科学 > 机器学习

arXiv:2507.12383 (cs)

[提交于 2025年7月16日 ]

标题：使用局部近似提高强化学习的样本效率

标题： Improving Reinforcement Learning Sample-Efficiency using Local Approximation

Authors:Mohit Prashant, Arvind Easwaran

摘要：在本研究中，我们推导了在无限时域马尔可夫决策过程（MDP）设置下，强化学习（RL）的渐近样本复杂度的可能近似正确（PAC）界限，这些界限比现有文献中的更紧。本研究的前提有两个方面：首先，从转换的角度来看，两个状态越远，当学习第二个状态的$\epsilon$最优值时，第一个状态的价值就越不相关；其次，从样本复杂度的角度来看，学习一个状态的$\epsilon$最优值所耗费的“努力”与学习另一个状态的$\epsilon$最优值所需的样本数量无关，只要该状态与第一个状态之间有足够多的转换。相反，彼此邻近的状态的价值是相互依赖的，并且需要相似数量的样本来学习。通过使用原始状态空间的子集构造较小的MDP来近似原始MDP，我们能够将样本复杂度降低一个对数因子，达到$O(SA \log A)$时间步，其中$S$和$A$分别是状态空间和动作空间的大小。通过构建具有上述样本复杂度的PAC-MDP算法，我们能够将这些结果扩展到无限时域、无模型的设置中。最后，我们通过在实验环境中将我们的算法与之前的工作进行比较，展示了改进的重要性。

摘要： In this study, we derive Probably Approximately Correct (PAC) bounds on the asymptotic sample-complexity for RL within the infinite-horizon Markov Decision Process (MDP) setting that are sharper than those in existing literature. The premise of our study is twofold: firstly, the further two states are from each other, transition-wise, the less relevant the value of the first state is when learning the $\epsilon$-optimal value of the second; secondly, the amount of 'effort', sample-complexity-wise, expended in learning the $\epsilon$-optimal value of a state is independent of the number of samples required to learn the $\epsilon$-optimal value of a second state that is a sufficient number of transitions away from the first. Inversely, states within each other's vicinity have values that are dependent on each other and will require a similar number of samples to learn. By approximating the original MDP using smaller MDPs constructed using subsets of the original's state-space, we are able to reduce the sample-complexity by a logarithmic factor to $O(SA \log A)$ timesteps, where $S$ and $A$ are the state and action space sizes. We are able to extend these results to an infinite-horizon, model-free setting by constructing a PAC-MDP algorithm with the aforementioned sample-complexity. We conclude with showing how significant the improvement is by comparing our algorithm against prior work in an experimental setting.

评论：	预印本
主题：	机器学习 (cs.LG)
引用方式：	arXiv:2507.12383 [cs.LG]
	(或者 arXiv:2507.12383v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.12383

提交历史

来自： Mohit Prashant [查看电子邮件]
[v1] 星期三， 2025 年 7 月 16 日 16:31:17 UTC (166 KB)

计算机科学 > 机器学习

标题：使用局部近似提高强化学习的样本效率

标题： Improving Reinforcement Learning Sample-Efficiency using Local Approximation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 使用局部近似提高强化学习的样本效率 显示英文标题

标题： Improving Reinforcement Learning Sample-Efficiency using Local Approximation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：使用局部近似提高强化学习的样本效率