GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

Zhou, Zhehua; Xie, Xuan; Song, Jiayang; Shu, Zhan; Ma, Lei

doi:10.1109/TNNLS.2024.3496492

计算机科学 > 人工智能

arXiv:2406.03912 (cs)

[提交于 2024年6月6日 (v1) ，最后修订 2025年1月14日 (此版本， v2)]

标题： GenSafe：基于降阶马尔可夫决策过程模型的可推广安全增强器

标题： GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

Authors:Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma

摘要：安全强化学习（SRL）旨在通过引入安全约束，实现深度强化学习（DRL）算法的安全学习过程。然而，SRL方法的有效性通常依赖于准确的函数近似，这在早期学习阶段由于数据不足而尤为困难。为了解决这个问题，我们在本工作中引入了一种新颖的可推广安全增强器（GenSafe），它能够克服数据不足的挑战，并提高SRL方法的性能。利用模型降阶技术，我们首先提出了一种创新的方法来构建一个降阶马尔可夫决策过程（ROMDP），作为原始安全约束的低维近似器。然后，通过求解重新表述的基于ROMDP的约束，GenSafe优化智能体的动作，以增加约束满足的可能性。本质上，GenSafe充当了SRL算法的额外安全层。我们在多个SRL方法和基准问题上评估了GenSafe。结果表明，它能够提高安全性表现，尤其是在早期学习阶段，同时保持令人满意的任务性能。我们提出的GenSafe不仅为增强现有SRL方法提供了一种新措施，还显示出与各种SRL算法的广泛兼容性，使其适用于广泛的系统和SRL问题。

摘要： Safe Reinforcement Learning (SRL) aims to realize a safe learning process for Deep Reinforcement Learning (DRL) algorithms by incorporating safety constraints. However, the efficacy of SRL approaches often relies on accurate function approximations, which are notably challenging to achieve in the early learning stages due to data insufficiency. To address this issue, we introduce in this work a novel Generalizable Safety enhancer (GenSafe) that is able to overcome the challenge of data insufficiency and enhance the performance of SRL approaches. Leveraging model order reduction techniques, we first propose an innovative method to construct a Reduced Order Markov Decision Process (ROMDP) as a low-dimensional approximator of the original safety constraints. Then, by solving the reformulated ROMDP-based constraints, GenSafe refines the actions of the agent to increase the possibility of constraint satisfaction. Essentially, GenSafe acts as an additional safety layer for SRL algorithms. We evaluate GenSafe on multiple SRL approaches and benchmark problems. The results demonstrate its capability to improve safety performance, especially in the early learning phases, while maintaining satisfactory task performance. Our proposed GenSafe not only offers a novel measure to augment existing SRL methods but also shows broad compatibility with various SRL algorithms, making it applicable to a wide range of systems and SRL problems.

主题：	人工智能 (cs.AI) ; 机器学习 (cs.LG); 机器人技术 (cs.RO); 系统与控制 (eess.SY)
引用方式：	arXiv:2406.03912 [cs.AI]
	(或者 arXiv:2406.03912v2 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2406.03912
相关 DOI:	https://doi.org/10.1109/TNNLS.2024.3496492

提交历史

来自： Zhehua Zhou [查看电子邮件]
[v1] 星期四， 2024 年 6 月 6 日 09:51:30 UTC (5,261 KB)
[v2] 星期二， 2025 年 1 月 14 日 10:32:32 UTC (14,390 KB)

计算机科学 > 人工智能

标题： GenSafe：基于降阶马尔可夫决策过程模型的可推广安全增强器

标题： GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： GenSafe：基于降阶马尔可夫决策过程模型的可推广安全增强器 显示英文标题

标题： GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： GenSafe：基于降阶马尔可夫决策过程模型的可推广安全增强器