Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Touati, Ahmed; Taiga, Adrien Ali; Bellemare, Marc G.

计算机科学 > 机器学习

arXiv:2003.04069v1 (cs)

[提交于 2020年3月9日 ]

标题：度量空间中高效无模型强化学习的缩放方法

标题： Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Authors:Ahmed Touati, Adrien Ali Taiga, Marc G. Bellemare

摘要：尽管在可证明高效的强化学习算法方面有大量研究，但大多数工作集中在表格表示上，因此难以处理指数级或无限大的状态-动作空间。在本文中，我们考虑具有连续状态-动作空间的周期性强化学习，该空间假定配备了一个自然度量，用于表征不同状态和动作之间的接近程度。我们提出了ZoomRL，这是一种在线算法，借鉴了连续多臂老虎机的思想，通过在更有希望且经常访问的区域进行细化，同时仔细平衡利用与探索的权衡，来学习联合空间的自适应离散化。我们证明ZoomRL在最坏情况下的累积损失为$\tilde{O}(H^{\frac{5}{2}} K^{\frac{d+1}{d+2}})$，其中$H$是规划时域，$K$是回合数，$d$是相对于度量的空间覆盖维数。此外，我们的算法享有改进的与度量相关的保证，这些保证反映了底层空间的几何结构。最后，我们证明我们的算法对小的误指正误差具有鲁棒性。

摘要： Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces. In this paper, we consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric that characterizes the proximity between different states and actions. We propose ZoomRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space by zooming in more promising and frequently visited regions while carefully balancing the exploitation-exploration trade-off. We show that ZoomRL achieves a worst-case regret $\tilde{O}(H^{\frac{5}{2}} K^{\frac{d+1}{d+2}})$ where $H$ is the planning horizon, $K$ is the number of episodes and $d$ is the covering dimension of the space with respect to the metric. Moreover, our algorithm enjoys improved metric-dependent guarantees that reflect the geometry of the underlying space. Finally, we show that our algorithm is robust to small misspecification errors.

主题：	机器学习 (cs.LG) ; 机器学习 (stat.ML)
引用方式：	arXiv:2003.04069 [cs.LG]
	(或者 arXiv:2003.04069v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2003.04069

提交历史

来自： Ahmed Touati [查看电子邮件]
[v1] 星期一， 2020 年 3 月 9 日 12:32:02 UTC (1,270 KB)

计算机科学 > 机器学习

标题：度量空间中高效无模型强化学习的缩放方法

标题： Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 度量空间中高效无模型强化学习的缩放方法 显示英文标题

标题： Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：度量空间中高效无模型强化学习的缩放方法