Harnessing the Power of Reinforcement Learning for Adaptive MCMC

Wang, Congye; Fisher, Matthew A.; Kanagawa, Heishiro; Chen, Wilson; Oates, Chris. J.

统计学 > 计算

arXiv:2507.00671 (stat)

[提交于 2025年7月1日 ]

标题：利用强化学习的力量进行自适应MCMC

标题： Harnessing the Power of Reinforcement Learning for Adaptive MCMC

Authors:Congye Wang, Matthew A. Fisher, Heishiro Kanagawa, Wilson Chen, Chris. J. Oates

摘要：采样算法驱动概率机器学习，近年来在这一任务上的工具多样性出现了爆炸式增长。然而，采样算法的日益复杂性与调参负担的增加密切相关。现在比以往任何时候都更需要将采样器的调参视为一个独立的学习任务。在概念上的突破中，Wang 等人（2025）将梅特罗波利斯-哈斯廷斯算法形式化为马尔可夫决策过程，开启了使用强化学习（RL）进行自适应调参的可能性。他们的重点在于理论基础；实现强化学习梅特罗波利斯-哈斯廷斯（RLMH）的实际好处留给了后续工作。本文的目的有两个：首先，我们观察到一个令人惊讶的结果，即自然选择的奖励，如接受率或期望平方跳跃距离，对于训练 RLMH 提供的信号不足。相反，我们提出了一种基于对比发散的新奖励，在 RLMH 的背景下展示了其优越性能。其次，我们探索了 RLMH 的潜力，并提出了自适应梯度基采样器，这些采样器在马尔可夫转移核的灵活性与相关 RL 任务的学习性之间取得平衡。使用后验数据库基准的全面模拟研究支持了 RLMH 的实际有效性。

摘要： Sampling algorithms drive probabilistic machine learning, and recent years have seen an explosion in the diversity of tools for this task. However, the increasing sophistication of sampling algorithms is correlated with an increase in the tuning burden. There is now a greater need than ever to treat the tuning of samplers as a learning task in its own right. In a conceptual breakthrough, Wang et al (2025) formulated Metropolis-Hastings as a Markov decision process, opening up the possibility for adaptive tuning using Reinforcement Learning (RL). Their emphasis was on theoretical foundations; realising the practical benefit of Reinforcement Learning Metropolis-Hastings (RLMH) was left for subsequent work. The purpose of this paper is twofold: First, we observe the surprising result that natural choices of reward, such as the acceptance rate, or the expected squared jump distance, provide insufficient signal for training RLMH. Instead, we propose a novel reward based on the contrastive divergence, whose superior performance in the context of RLMH is demonstrated. Second, we explore the potential of RLMH and present adaptive gradient-based samplers that balance flexibility of the Markov transition kernel with learnability of the associated RL task. A comprehensive simulation study using the posteriordb benchmark supports the practical effectiveness of RLMH.

主题：	计算 (stat.CO) ; 机器学习 (cs.LG); 机器学习 (stat.ML)
引用方式：	arXiv:2507.00671 [stat.CO]
	(或者 arXiv:2507.00671v1 [stat.CO] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.00671

提交历史

来自： Chris Oates [查看电子邮件]
[v1] 星期二， 2025 年 7 月 1 日 11:12:34 UTC (416 KB)

统计学 > 计算

标题：利用强化学习的力量进行自适应MCMC

标题： Harnessing the Power of Reinforcement Learning for Adaptive MCMC

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 计算

标题： 利用强化学习的力量进行自适应MCMC 显示英文标题

标题： Harnessing the Power of Reinforcement Learning for Adaptive MCMC

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：利用强化学习的力量进行自适应MCMC