Budget-Constrained Bandits over General Cost and Reward Distributions

Cayci, Semih; Eryilmaz, Atilla; Srikant, R.

计算机科学 > 机器学习

arXiv:2003.00365v1 (cs)

[提交于 2020年2月29日 ]

标题：预算约束下的通用成本和收益分布的多臂老虎机问题

标题： Budget-Constrained Bandits over General Cost and Reward Distributions

Authors:Semih Cayci, Atilla Eryilmaz, R. Srikant

摘要： We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+\gamma)$ for some $\gamma > 0$ exist for all cost-reward pairs, $O(\log B)$ regret is achievable for a budget $B>0$. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

主题：	机器学习 (cs.LG) ; 机器学习 (stat.ML)
引用方式：	arXiv:2003.00365 [cs.LG]
	(或者 arXiv:2003.00365v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2003.00365

提交历史

来自： Semih Cayci [查看电子邮件]
[v1] 星期六， 2020 年 2 月 29 日 23:50:08 UTC (65 KB)

计算机科学 > 机器学习

标题：预算约束下的通用成本和收益分布的多臂老虎机问题

标题： Budget-Constrained Bandits over General Cost and Reward Distributions

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 预算约束下的通用成本和收益分布的多臂老虎机问题 显示英文标题

标题： Budget-Constrained Bandits over General Cost and Reward Distributions

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：预算约束下的通用成本和收益分布的多臂老虎机问题