Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2510.16185

Help | Advanced Search

Computer Science > Machine Learning

arXiv:2510.16185 (cs)
[Submitted on 17 Oct 2025 (v1) , last revised 21 Oct 2025 (this version, v2)]

Title: Expressive Reward Synthesis with the Runtime Monitoring Language

Title: 带有运行时监控语言的表达奖励合成

Authors:Daniel Donnelly, Angelo Ferrando, Francesco Belardinelli
Abstract: A key challenge in reinforcement learning (RL) is reward (mis)specification, whereby imprecisely defined reward functions can result in unintended, possibly harmful, behaviours. Indeed, reward functions in RL are typically treated as black-box mappings from state-action pairs to scalar values. While effective in many settings, this approach provides no information about why rewards are given, which can hinder learning and interpretability. Reward Machines address this issue by representing reward functions as finite state automata, enabling the specification of structured, non-Markovian reward functions. However, their expressivity is typically bounded by regular languages, leaving them unable to capture more complex behaviours such as counting or parametrised conditions. In this work, we build on the Runtime Monitoring Language (RML) to develop a novel class of language-based Reward Machines. By leveraging the built-in memory of RML, our approach can specify reward functions for non-regular, non-Markovian tasks. We demonstrate the expressiveness of our approach through experiments, highlighting additional advantages in flexible event-handling and task specification over existing Reward Machine-based methods.
Abstract: 强化学习(RL)中的一个关键挑战是奖励(错误)规范问题,其中定义不明确的奖励函数可能导致意外的、可能有害的行为。 确实,RL中的奖励函数通常被视为从状态-动作对到标量值的黑盒映射。 虽然在许多情况下有效,但这种方法无法提供奖励给出的原因,这可能阻碍学习和可解释性。 奖励机器通过将奖励函数表示为有限状态自动机来解决这个问题,从而能够指定结构化、非马尔可夫的奖励函数。 然而,它们的表达能力通常受到正则语言的限制,使其无法捕捉更复杂的行为,例如计数或参数化条件。 在本工作中,我们基于运行时监控语言(RML)开发了一类基于语言的奖励机器。 通过利用RML内置的内存,我们的方法可以为非正则、非马尔可夫的任务指定奖励函数。 我们通过实验展示了我们方法的表达能力,并突出了在灵活事件处理和任务规范方面相对于现有基于奖励机器的方法的额外优势。
Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Machine Learning (stat.ML)
Cite as: arXiv:2510.16185 [cs.LG]
  (or arXiv:2510.16185v2 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2510.16185
arXiv-issued DOI via DataCite

Submission history

From: Francesco Belardinelli [view email]
[v1] Fri, 17 Oct 2025 19:54:59 UTC (198 KB)
[v2] Tue, 21 Oct 2025 10:04:30 UTC (200 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
license icon view license
Current browse context:
cs.LG
< prev   |   next >
new | recent | 2025-10
Change to browse by:
cs
cs.AI
cs.FL
stat
stat.ML

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号