Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents

Banker, Thomas; Mesbah, Ali

计算机科学 > 机器学习

arXiv:2507.13491 (cs)

[提交于 2025年7月17日 ]

标题：基于模型的控制的无模型强化学习：迈向安全、可解释和样本高效的智能体

标题： Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents

Authors:Thomas Banker, Ali Mesbah

摘要：在不确定性下训练复杂的智能体以实现最优决策是现代自主系统在各领域快速发展的关键。值得注意的是，无模型强化学习（RL）使决策智能体能够通过与系统的交互直接提升性能，而无需对系统有太多先验知识。然而，无模型RL通常依赖于配备深度神经网络函数近似器的智能体，利用网络的表达能力来捕捉智能体的策略和价值函数以应对复杂系统。然而，神经网络会加剧无模型RL中的样本效率低下、不安全学习和可解释性有限等问题。为此，本工作引入了基于模型的智能体作为控制策略近似的一种有吸引力的替代方案，利用可调节的系统动力学、成本和约束模型进行安全策略学习。这些模型可以编码先验系统知识，以指导、约束并帮助解释智能体的决策，而由于模型不匹配导致的缺陷可以通过无模型RL进行弥补。我们概述了学习基于模型的智能体的优势和挑战——以模型预测控制为例，并详细介绍了主要的学习方法：贝叶斯优化、策略搜索RL和离线策略，以及它们各自的优势。尽管无模型RL早已确立，但其与基于模型的智能体之间的相互作用仍大多未被探索，这促使我们从它们结合潜力的角度出发，探讨其在样本高效学习安全且可解释的决策智能体方面的前景。

摘要： Training sophisticated agents for optimal decision-making under uncertainty has been key to the rapid development of modern autonomous systems across fields. Notably, model-free reinforcement learning (RL) has enabled decision-making agents to improve their performance directly through system interactions, with minimal prior knowledge about the system. Yet, model-free RL has generally relied on agents equipped with deep neural network function approximators, appealing to the networks' expressivity to capture the agent's policy and value function for complex systems. However, neural networks amplify the issues of sample inefficiency, unsafe learning, and limited interpretability in model-free RL. To this end, this work introduces model-based agents as a compelling alternative for control policy approximation, leveraging adaptable models of system dynamics, cost, and constraints for safe policy learning. These models can encode prior system knowledge to inform, constrain, and aid in explaining the agent's decisions, while deficiencies due to model mismatch can be remedied with model-free RL. We outline the benefits and challenges of learning model-based agents -- exemplified by model predictive control -- and detail the primary learning approaches: Bayesian optimization, policy search RL, and offline strategies, along with their respective strengths. While model-free RL has long been established, its interplay with model-based agents remains largely unexplored, motivating our perspective on their combined potentials for sample-efficient learning of safe and interpretable decision-making agents.

主题：	机器学习 (cs.LG) ; 系统与控制 (eess.SY)
引用方式：	arXiv:2507.13491 [cs.LG]
	(或者 arXiv:2507.13491v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.13491

提交历史

来自： Thomas Banker [查看电子邮件]
[v1] 星期四， 2025 年 7 月 17 日 18:59:54 UTC (322 KB)

计算机科学 > 机器学习

标题：基于模型的控制的无模型强化学习：迈向安全、可解释和样本高效的智能体

标题： Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 基于模型的控制的无模型强化学习：迈向安全、可解释和样本高效的智能体 显示英文标题

标题： Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于模型的控制的无模型强化学习：迈向安全、可解释和样本高效的智能体