Robust No-Regret Learning in Min-Max Stackelberg Games

Goktas, Denizalp; Zhao, Jiayi; Greenwald, Amy

计算机科学 > 计算机科学与博弈论

arXiv:2203.14126 (cs)

[提交于 2022年3月26日 (v1) ，最后修订 2022年4月13日 (此版本， v2)]

标题：鲁棒无遗憾的最小最大斯塔克尔伯格博弈学习

标题： Robust No-Regret Learning in Min-Max Stackelberg Games

Authors:Denizalp Goktas, Jiayi Zhao, Amy Greenwald

摘要：在两人零和博弈的最小最大（即零和）博弈中，无遗憾学习算法的行为已经被很好地理解。在本文中，我们研究了在策略集相关的最小最大博弈中无遗憾学习的行为，其中第一个玩家的策略限制了第二个玩家的行为。这类博弈最好被理解为顺序博弈，即最小最大Stackelberg博弈。我们考虑两种情况，一种是只有第一个玩家使用无遗憾算法选择其行动，而第二个玩家最佳响应，另一种是两个玩家都使用无遗憾算法。对于前一种情况，我们证明无遗憾动态收敛到一个Stackelberg均衡。对于后一种情况，我们引入了一种新的遗憾类型，我们称之为拉格朗日遗憾，并证明如果两个玩家都最小化他们的拉格朗日遗憾，那么博弈将收敛到一个Stackelberg均衡。然后我们观察到，在这两种情况下，在线镜像下降（OMD）动态分别对应于已知的嵌套（即顺序）梯度下降-上升（GDA）算法和一种新的同时GDA类似算法，从而建立了这些算法收敛到Stackelberg均衡。最后，我们通过研究在线最小最大Stackelberg博弈来分析OMD动态对扰动的鲁棒性。我们证明OMD动态对于具有独立策略集的大量在线最小最大博弈是鲁棒的。在依赖情况下，我们通过在在线Fisher市场中模拟它们来实验性地展示OMD动态的鲁棒性，这是具有依赖策略集的最小最大Stackelberg博弈的一个典型例子。

摘要： The behavior of no-regret learning algorithms is well understood in two-player min-max (i.e, zero-sum) games. In this paper, we investigate the behavior of no-regret learning in min-max games with dependent strategy sets, where the strategy of the first player constrains the behavior of the second. Such games are best understood as sequential, i.e., min-max Stackelberg, games. We consider two settings, one in which only the first player chooses their actions using a no-regret algorithm while the second player best responds, and one in which both players use no-regret algorithms. For the former case, we show that no-regret dynamics converge to a Stackelberg equilibrium. For the latter case, we introduce a new type of regret, which we call Lagrangian regret, and show that if both players minimize their Lagrangian regrets, then play converges to a Stackelberg equilibrium. We then observe that online mirror descent (OMD) dynamics in these two settings correspond respectively to a known nested (i.e., sequential) gradient descent-ascent (GDA) algorithm and a new simultaneous GDA-like algorithm, thereby establishing convergence of these algorithms to Stackelberg equilibrium. Finally, we analyze the robustness of OMD dynamics to perturbations by investigating online min-max Stackelberg games. We prove that OMD dynamics are robust for a large class of online min-max games with independent strategy sets. In the dependent case, we demonstrate the robustness of OMD dynamics experimentally by simulating them in online Fisher markets, a canonical example of a min-max Stackelberg game with dependent strategy sets.

评论：	15页，1图，2表，6个算法；即将发表于AAMAS'22。 arXiv管理员注释：与arXiv:2110.05192存在文本重叠
主题：	计算机科学与博弈论 (cs.GT) ; 机器学习 (cs.LG); 理论经济学 (econ.TH)
引用方式：	arXiv:2203.14126 [cs.GT]
	(或者 arXiv:2203.14126v2 [cs.GT] 对于此版本)
	https://doi.org/10.48550/arXiv.2203.14126

提交历史

来自： Denizalp Goktas [查看电子邮件]
[v1] 星期六， 2022 年 3 月 26 日 18:12:40 UTC (2,278 KB)
[v2] 星期三， 2022 年 4 月 13 日 20:44:18 UTC (2,279 KB)

计算机科学 > 计算机科学与博弈论

标题：鲁棒无遗憾的最小最大斯塔克尔伯格博弈学习

标题： Robust No-Regret Learning in Min-Max Stackelberg Games

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机科学与博弈论

标题： 鲁棒无遗憾的最小最大斯塔克尔伯格博弈学习 显示英文标题

标题： Robust No-Regret Learning in Min-Max Stackelberg Games

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：鲁棒无遗憾的最小最大斯塔克尔伯格博弈学习