"Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations

Hou, Muhan; Hindriks, Koen; Eiben, A. E.; Baraka, Kim

计算机科学 > 人工智能

arXiv:2406.03069v2 (cs)

[提交于 2024年6月5日 (v1) ，修订后的 2024年6月6日 (此版本， v2) ， 最新版本 2024年10月2日 (v3) ]

标题： “给我一个这样的例子”：从示范中进行情景主动强化学习

标题： "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations

Authors:Muhan Hou, Koen Hindriks, A.E. Eiben, Kim Baraka

摘要：强化学习（RL）在序列决策问题中取得了巨大成功，但通常需要大量的智能体-环境交互。为了提高样本效率，像从专家演示中进行强化学习（RLED）这样的方法引入了外部专家演示，以在学习过程中促进智能体的探索。实际上，这些演示通常是从人类用户那里收集的，成本较高，因此通常受到数量限制。因此，如何选择对学习最有益的最佳人类演示集成为主要关注点。本文提出了EARLY（基于轨迹的主动从演示中学习），一种使学习智能体能够在基于轨迹的特征空间中生成优化查询的算法。基于对智能体当前策略的轨迹级不确定性估计，EARLY确定基于特征查询的优化时间和内容。通过查询情节演示而不是孤立的状态-动作对，EARLY提高了人类教学体验并实现了更好的学习性能。我们在三个难度逐渐增加的模拟导航任务中验证了我们方法的有效性。结果表明，当演示由模拟的oracle策略生成时，我们的方法能够在比其他基线方法快30%以上的收敛速度下实现所有三个任务的专家级性能。后续的小规模用户研究（N=18）的结果进一步验证了在人类专家演示的情况下，我们的方法仍能保持显著更好的收敛性，同时在感知任务负荷方面实现更好的用户体验，并显著减少人类时间的消耗。

摘要： Reinforcement Learning (RL) has achieved great success in sequential decision-making problems, but often at the cost of a large number of agent-environment interactions. To improve sample efficiency, methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process. In practice, these demonstrations, which are often collected from human users, are costly and hence often constrained to a limited amount. How to select the best set of human demonstrations that is most beneficial for learning therefore becomes a major concern. This paper presents EARLY (Episodic Active Learning from demonstration querY), an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space. Based on a trajectory-level estimate of uncertainty in the agent's current policy, EARLY determines the optimized timing and content for feature-based queries. By querying episodic demonstrations as opposed to isolated state-action pairs, EARLY improves the human teaching experience and achieves better learning performance. We validate the effectiveness of our method in three simulated navigation tasks of increasing difficulty. The results show that our method is able to achieve expert-level performance for all three tasks with convergence over 30\% faster than other baseline methods when demonstrations are generated by simulated oracle policies. The results of a follow-up pilot user study (N=18) further validate that our method can still maintain a significantly better convergence in the case of human expert demonstrators while achieving a better user experience in perceived task load and consuming significantly less human time.

主题：	人工智能 (cs.AI)
引用方式：	arXiv:2406.03069 [cs.AI]
	(或者 arXiv:2406.03069v2 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2406.03069

提交历史

来自： Muhan Hou [查看电子邮件]
[v1] 星期三， 2024 年 6 月 5 日 08:52:21 UTC (14,527 KB)
[v2] 星期四， 2024 年 6 月 6 日 19:04:15 UTC (10,628 KB)
[v3] 星期三， 2024 年 10 月 2 日 20:03:52 UTC (2,970 KB)

计算机科学 > 人工智能

标题： “给我一个这样的例子”：从示范中进行情景主动强化学习

标题： "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： “给我一个这样的例子”：从示范中进行情景主动强化学习 显示英文标题

标题： "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： “给我一个这样的例子”：从示范中进行情景主动强化学习