TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

Amiranashvili, Artemij; Dosovitskiy, Alexey; Koltun, Vladlen; Brox, Thomas

计算机科学 > 机器学习

arXiv:1806.01175v1 (cs)

[提交于 2018年6月4日 ]

标题： TD或非TD：分析时间差分在深度强化学习中的作用

标题： TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

Authors:Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox

摘要：我们对强化学习（RL）的理解是由几十年前使用表格表示和线性函数近似器获得的理论和实证结果塑造的。这些结果表明，使用时间差分（TD）的RL方法优于直接蒙特卡洛估计（MC）。在深度RL中，处理感知复杂的环境和深度非线性模型时，这些结果是否仍然成立？在本文中，我们使用专门设计的环境重新审视现代深度RL中TD的作用，这些环境控制了影响性能的具体因素，如奖励稀疏性、奖励延迟和任务的感知复杂性。在比较无限时限MC与TD时，我们能够在现代环境中重现经典结果。然而我们也发现，当奖励稀疏或延迟时，有限时限MC并不逊色于TD。这使得MC成为深度RL中TD的一种可行替代方案。

摘要： Our understanding of reinforcement learning (RL) has been shaped by theoretical and empirical results that were obtained decades ago using tabular representations and linear function approximators. These results suggest that RL methods that use temporal differencing (TD) are superior to direct Monte Carlo estimation (MC). How do these results hold up in deep RL, which deals with perceptually complex environments and deep nonlinear models? In this paper, we re-examine the role of TD in modern deep RL, using specially designed environments that control for specific factors that affect performance, such as reward sparsity, reward delay, and the perceptual complexity of the task. When comparing TD with infinite-horizon MC, we are able to reproduce classic results in modern settings. Yet we also find that finite-horizon MC is not inferior to TD, even when rewards are sparse or delayed. This makes MC a viable alternative to TD in deep RL.

主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 机器学习 (stat.ML)
引用方式：	arXiv:1806.01175 [cs.LG]
	(或者 arXiv:1806.01175v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.1806.01175

提交历史

来自： Artemij Amiranashvili [查看电子邮件]
[v1] 星期一， 2018 年 6 月 4 日 16:16:51 UTC (1,119 KB)

计算机科学 > 机器学习

标题： TD或非TD：分析时间差分在深度强化学习中的作用

标题： TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： TD或非TD：分析时间差分在深度强化学习中的作用 显示英文标题

标题： TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： TD或非TD：分析时间差分在深度强化学习中的作用