Average-Reward Reinforcement Learning with Entropy Regularization

Adamczyk, Jacob; Makarenko, Volodymyr; Tiomkin, Stas; Kulkarni, Rahul V.

计算机科学 > 机器学习

arXiv:2501.09080 (cs)

[提交于 2025年1月15日 ]

标题：带有熵正则化的平均奖励强化学习

标题： Average-Reward Reinforcement Learning with Entropy Regularization

Authors:Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni

摘要：平均奖励公式化强化学习（RL）近年来引起了越来越多的关注，因为它能够在不进行折扣的情况下解决时间扩展问题。独立地，RL算法受益于熵正则化：一种用于使最优策略随机化的方法，从而对噪声更具鲁棒性。尽管这两种方法各有优势，但文献中对熵正则化与平均奖励目标相结合的研究并不充分，且对此设置的算法开发有限。为弥补这一领域空白，我们开发了用于解决具有函数近似的熵正则化平均奖励RL问题的算法。我们通过实验验证了我们的方法，并在标准RL基准上将其与现有算法进行了比较。

摘要： The average-reward formulation of reinforcement learning (RL) has drawn increased interest in recent years due to its ability to solve temporally-extended problems without discounting. Independently, RL algorithms have benefited from entropy-regularization: an approach used to make the optimal policy stochastic, thereby more robust to noise. Despite the distinct benefits of the two approaches, the combination of entropy regularization with an average-reward objective is not well-studied in the literature and there has been limited development of algorithms for this setting. To address this gap in the field, we develop algorithms for solving entropy-regularized average-reward RL problems with function approximation. We experimentally validate our method, comparing it with existing algorithms on standard benchmarks for RL.

评论：	已被AAAI-25第八届人工智能规划与强化学习（PRL）研讨会接受
主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI)
引用方式：	arXiv:2501.09080 [cs.LG]
	(或者 arXiv:2501.09080v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.09080

提交历史

来自： Jacob Adamczyk [查看电子邮件]
[v1] 星期三， 2025 年 1 月 15 日 19:00:46 UTC (8,190 KB)

计算机科学 > 机器学习

标题：带有熵正则化的平均奖励强化学习

标题： Average-Reward Reinforcement Learning with Entropy Regularization

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 带有熵正则化的平均奖励强化学习 显示英文标题

标题： Average-Reward Reinforcement Learning with Entropy Regularization

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：带有熵正则化的平均奖励强化学习