Tight Regret Upper and Lower Bounds for Optimistic Hedge in Two-Player Zero-Sum Games

Tsuchiya, Taira

计算机科学 > 机器学习

arXiv:2510.11691 (cs)

[提交于 2025年10月13日 ]

标题：乐观Hedge在两人零和博弈中的紧致遗憾上界和下界

标题： Tight Regret Upper and Lower Bounds for Optimistic Hedge in Two-Player Zero-Sum Games

Authors:Taira Tsuchiya

摘要：在两人零和博弈中，基于乐观Hedge的学习动态在强解耦学习动态中实现了已知的最佳后悔上界之一。通过适当选择学习率，社会后悔和个人后悔可以按照两名玩家的动作数量$m$和$n$被限制为$O(\log(mn))$。本研究探讨了乐观Hedge的后悔中对$m$和$n$的依赖性的最优性。为此，我们首先改进现有的后悔分析，并表明在对手动作数量已知的强解耦设置中，社会后悔和个人后悔的界限都可以改进为$O(\sqrt{\log m \log n})$。在此分析中，我们将后悔上界表示为关于学习率和某些负项系数的优化问题，从而实现对主要常数的精细分析。然后，我们通过提供与算法相关的个人后悔下界，证明现有的社会后悔上界以及这些新的社会和个体后悔上界无法进一步改进乐观Hedge。重要的是，这些社会后悔上界和下界在主要项的常数因子上完全匹配。最后，基于这些结果，我们改进了基于乐观Hedge的学习动态的最后迭代收敛速度和动态后悔，并用与算法相关的动态后悔下界补充这些界限，这些下界与改进的界限相匹配。

摘要： In two-player zero-sum games, the learning dynamic based on optimistic Hedge achieves one of the best-known regret upper bounds among strongly-uncoupled learning dynamics. With an appropriately chosen learning rate, the social and individual regrets can be bounded by $O(\log(mn))$ in terms of the numbers of actions $m$ and $n$ of the two players. This study investigates the optimality of the dependence on $m$ and $n$ in the regret of optimistic Hedge. To this end, we begin by refining existing regret analysis and show that, in the strongly-uncoupled setting where the opponent's number of actions is known, both the social and individual regret bounds can be improved to $O(\sqrt{\log m \log n})$. In this analysis, we express the regret upper bound as an optimization problem with respect to the learning rates and the coefficients of certain negative terms, enabling refined analysis of the leading constants. We then show that the existing social regret bound as well as these new social and individual regret upper bounds cannot be further improved for optimistic Hedge by providing algorithm-dependent individual regret lower bounds. Importantly, these social regret upper and lower bounds match exactly including the constant factor in the leading term. Finally, building on these results, we improve the last-iterate convergence rate and the dynamic regret of a learning dynamic based on optimistic Hedge, and complement these bounds with algorithm-dependent dynamic regret lower bounds that match the improved bounds.

评论：	29页，2图
主题：	机器学习 (cs.LG) ; 计算机科学与博弈论 (cs.GT); 机器学习 (stat.ML)
引用方式：	arXiv:2510.11691 [cs.LG]
	(或者 arXiv:2510.11691v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.11691

提交历史

来自： Taira Tsuchiya [查看电子邮件]
[v1] 星期一， 2025 年 10 月 13 日 17:52:01 UTC (91 KB)

计算机科学 > 机器学习

标题：乐观Hedge在两人零和博弈中的紧致遗憾上界和下界

标题： Tight Regret Upper and Lower Bounds for Optimistic Hedge in Two-Player Zero-Sum Games

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 乐观Hedge在两人零和博弈中的紧致遗憾上界和下界 显示英文标题

标题： Tight Regret Upper and Lower Bounds for Optimistic Hedge in Two-Player Zero-Sum Games

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：乐观Hedge在两人零和博弈中的紧致遗憾上界和下界