Expressive Reward Synthesis with the Runtime Monitoring Language

Donnelly, Daniel; Ferrando, Angelo; Belardinelli, Francesco

Computer Science > Machine Learning

arXiv:2510.16185 (cs)

[Submitted on 17 Oct 2025 (v1) , last revised 21 Oct 2025 (this version, v2)]

Title: Expressive Reward Synthesis with the Runtime Monitoring Language

Title: 带有运行时监控语言的表达奖励合成

Authors:Daniel Donnelly, Angelo Ferrando, Francesco Belardinelli

Abstract: A key challenge in reinforcement learning (RL) is reward (mis)specification, whereby imprecisely defined reward functions can result in unintended, possibly harmful, behaviours. Indeed, reward functions in RL are typically treated as black-box mappings from state-action pairs to scalar values. While effective in many settings, this approach provides no information about why rewards are given, which can hinder learning and interpretability. Reward Machines address this issue by representing reward functions as finite state automata, enabling the specification of structured, non-Markovian reward functions. However, their expressivity is typically bounded by regular languages, leaving them unable to capture more complex behaviours such as counting or parametrised conditions. In this work, we build on the Runtime Monitoring Language (RML) to develop a novel class of language-based Reward Machines. By leveraging the built-in memory of RML, our approach can specify reward functions for non-regular, non-Markovian tasks. We demonstrate the expressiveness of our approach through experiments, highlighting additional advantages in flexible event-handling and task specification over existing Reward Machine-based methods.

Abstract: 强化学习（RL）中的一个关键挑战是奖励（错误）规范问题，其中定义不明确的奖励函数可能导致意外的、可能有害的行为。确实，RL中的奖励函数通常被视为从状态-动作对到标量值的黑盒映射。虽然在许多情况下有效，但这种方法无法提供奖励给出的原因，这可能阻碍学习和可解释性。奖励机器通过将奖励函数表示为有限状态自动机来解决这个问题，从而能够指定结构化、非马尔可夫的奖励函数。然而，它们的表达能力通常受到正则语言的限制，使其无法捕捉更复杂的行为，例如计数或参数化条件。在本工作中，我们基于运行时监控语言（RML）开发了一类基于语言的奖励机器。通过利用RML内置的内存，我们的方法可以为非正则、非马尔可夫的任务指定奖励函数。我们通过实验展示了我们方法的表达能力，并突出了在灵活事件处理和任务规范方面相对于现有基于奖励机器的方法的额外优势。

Subjects:	Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Machine Learning (stat.ML)
Cite as:	arXiv:2510.16185 [cs.LG]
	(or arXiv:2510.16185v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2510.16185

Submission history

From: Francesco Belardinelli [view email]
[v1] Fri, 17 Oct 2025 19:54:59 UTC (198 KB)
[v2] Tue, 21 Oct 2025 10:04:30 UTC (200 KB)

Computer Science > Machine Learning

Title: Expressive Reward Synthesis with the Runtime Monitoring Language

Title: 带有运行时监控语言的表达奖励合成

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title: Expressive Reward Synthesis with the Runtime Monitoring Language Show Chinese title

Title: 带有运行时监控语言的表达奖励合成

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Expressive Reward Synthesis with the Runtime Monitoring Language