Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

Tang, Ziyang; Feng, Yihao; Li, Lihong; Zhou, Dengyong; Liu, Qiang

计算机科学 > 机器学习

arXiv:1910.07186v1 (cs)

[提交于 2019年10月16日 ]

标题：无限时 horizon 离线策略估计中的双重稳健偏差减少

标题： Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

Authors:Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu

摘要：无限时域的离策策略评估是一项极具挑战性的任务，因为典型的重要性采样（IS）估计器具有过大的方差。最近，刘等人（2018a）提出了一种方法，通过估计平稳密度比显著降低了无限时域离策评估的方差，但代价是由于密度比估计中的误差可能会引入潜在的高偏差。在本文中，我们开发了他们方法的一种减偏增强版，该方法可以利用学到的价值函数以获得更高的准确性。我们的方法是双重鲁棒的，即当密度比估计或价值函数估计中任意一个达到完美时，偏差就会消失。一般来说，当其中任意一个估计准确时，偏差也可以被减少。理论和实验结果均表明，我们的方法相较于以往方法具有显著优势。

摘要： Infinite horizon off-policy policy evaluation is a highly challenging task due to the excessively large variance of typical importance sampling (IS) estimators. Recently, Liu et al. (2018a) proposed an approach that significantly reduces the variance of infinite-horizon off-policy evaluation by estimating the stationary density ratio, but at the cost of introducing potentially high biases due to the error in density ratio estimation. In this paper, we develop a bias-reduced augmentation of their method, which can take advantage of a learned value function to obtain higher accuracy. Our method is doubly robust in that the bias vanishes when either the density ratio or the value function estimation is perfect. In general, when either of them is accurate, the bias can also be reduced. Both theoretical and empirical results show that our method yields significant advantages over previous methods.

主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 机器学习 (stat.ML)
引用方式：	arXiv:1910.07186 [cs.LG]
	(或者 arXiv:1910.07186v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.1910.07186

提交历史

来自： Yihao Feng [查看电子邮件]
[v1] 星期三， 2019 年 10 月 16 日 06:33:17 UTC (164 KB)

计算机科学 > 机器学习

标题：无限时 horizon 离线策略估计中的双重稳健偏差减少

标题： Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 无限时 horizon 离线策略估计中的双重稳健偏差减少 显示英文标题

标题： Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：无限时 horizon 离线策略估计中的双重稳健偏差减少