How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

Vuong, Quan; Vikram, Sharad; Su, Hao; Gao, Sicun; Christensen, Henrik I.

计算机科学 > 机器学习

arXiv:1903.11774 (cs)

[提交于 2019年3月28日 ]

标题：如何选择强化学习策略的领域随机化参数以实现仿真到现实的迁移？

标题： How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

Authors:Quan Vuong, Sharad Vikram, Hao Su, Sicun Gao, Henrik I. Christensen

摘要：最近，强化学习（RL）算法在从最少处理的输入中学习复杂行为方面表现出显著的成功。然而，这种成功大多局限于模拟环境。虽然在将RL算法直接应用于真实系统方面有令人鼓舞的成功，但它们在更复杂系统上的性能仍受到RL算法相对数据效率低下的瓶颈限制。领域随机化是一种有前景的研究方向，已证明使用RL算法控制真实机器人的出色效果。从高层次来看，领域随机化通过在模拟环境条件的分布上训练策略来工作。如果环境足够多样化，那么在此分布上训练的策略可能会推广到现实世界。领域随机化中的人工指定设计选择是模拟环境分布的形式和参数。尚不清楚如何最佳地选择此分布的形式和参数，先前的工作使用了手动调整的分布。本文摘要表明，分布的选择在训练策略在现实世界中的性能中起着关键作用，并且该分布的参数可以优化以最大化训练策略在现实世界中的性能。

摘要： Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in applying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL algorithms. Domain randomization is a promising direction of research that has demonstrated impressive results using RL algorithms to control real robots. At a high level, domain randomization works by training a policy on a distribution of environmental conditions in simulation. If the environments are diverse enough, then the policy trained on this distribution will plausibly generalize to the real world. A human-specified design choice in domain randomization is the form and parameters of the distribution of simulated environments. It is unclear how to the best pick the form and parameters of this distribution and prior work uses hand-tuned distributions. This extended abstract demonstrates that the choice of the distribution plays a major role in the performance of the trained policies in the real world and that the parameter of this distribution can be optimized to maximize the performance of the trained policies in the real world

评论：	2页的扩展摘要
主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 机器学习 (stat.ML)
引用方式：	arXiv:1903.11774 [cs.LG]
	(或者 arXiv:1903.11774v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.1903.11774

提交历史

来自： Quan Vuong [查看电子邮件]
[v1] 星期四， 2019 年 3 月 28 日 03:24:44 UTC (151 KB)

计算机科学 > 机器学习

标题：如何选择强化学习策略的领域随机化参数以实现仿真到现实的迁移？

标题： How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 如何选择强化学习策略的领域随机化参数以实现仿真到现实的迁移？ 显示英文标题

标题： How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：如何选择强化学习策略的领域随机化参数以实现仿真到现实的迁移？