A random measure approach to reinforcement learning in continuous time

Bender, Christian; Thuan, Nguyen Tran

计算机科学 > 机器学习

arXiv:2409.17200 (cs)

[提交于 2024年9月25日 ]

标题：连续时间强化学习的随机测度方法

标题： A random measure approach to reinforcement learning in continuous time

Authors:Christian Bender, Nguyen Tran Thuan

摘要：我们提出了一种随机测度方法，用于在具有受控扩散和跳跃的连续时间强化学习(RL)中对探索进行建模，即执行测度值控制。首先，我们考虑在连续时间中对随机控制进行采样的情况发生在离散时间网格上，并将由此产生的随机微分方程(SDE)重新表述为由适当随机测度驱动的方程。这些随机测度的构造利用了布朗运动和泊松随机测度（这是原始模型动力学中的噪声来源）以及在网格上为控制执行采样的附加随机变量。然后，我们证明了当采样网格的网格大小趋近于零时这些随机测度的极限定理，这导致了一个由白噪声随机测度和泊松随机测度共同驱动的网格采样极限SDE。我们还指出，网格采样极限SDE可以替代最近连续时间RL文献中的探索性SDE和样本SDE，即它可以用于探索性控制问题的理论分析以及学习算法的推导。

摘要： We present a random measure approach for modeling exploration, i.e., the execution of measure-valued controls, in continuous-time reinforcement learning (RL) with controlled diffusion and jumps. First, we consider the case when sampling the randomized control in continuous time takes place on a discrete-time grid and reformulate the resulting stochastic differential equation (SDE) as an equation driven by suitable random measures. The construction of these random measures makes use of the Brownian motion and the Poisson random measure (which are the sources of noise in the original model dynamics) as well as the additional random variables, which are sampled on the grid for the control execution. Then, we prove a limit theorem for these random measures as the mesh-size of the sampling grid goes to zero, which leads to the grid-sampling limit SDE that is jointly driven by white noise random measures and a Poisson random measure. We also argue that the grid-sampling limit SDE can substitute the exploratory SDE and the sample SDE of the recent continuous-time RL literature, i.e., it can be applied for the theoretical analysis of exploratory control problems and for the derivation of learning algorithms.

评论：	33页
主题：	机器学习 (cs.LG) ; 概率 (math.PR); 机器学习 (stat.ML)
MSC 类：	Primary: 60G57, Secondary: 28A33, 60H10, 93B52, 93E35
引用方式：	arXiv:2409.17200 [cs.LG]
	(或者 arXiv:2409.17200v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2409.17200

提交历史

来自： Thuan Nguyen [查看电子邮件]
[v1] 星期三， 2024 年 9 月 25 日 14:34:09 UTC (38 KB)

计算机科学 > 机器学习

标题：连续时间强化学习的随机测度方法

标题： A random measure approach to reinforcement learning in continuous time

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 连续时间强化学习的随机测度方法 显示英文标题

标题： A random measure approach to reinforcement learning in continuous time

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：连续时间强化学习的随机测度方法