Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > stat > arXiv:2506.00818

Help | Advanced Search

Statistics > Machine Learning

arXiv:2506.00818 (stat)
[Submitted on 1 Jun 2025 ]

Title: Generalized Linear Markov Decision Process

Title: 广义线性马尔可夫决策过程

Authors:Sinian Zhang, Kaicheng Zhang, Ziping Xu, Tianxi Cai, Doudou Zhou
Abstract: The linear Markov Decision Process (MDP) framework offers a principled foundation for reinforcement learning (RL) with strong theoretical guarantees and sample efficiency. However, its restrictive assumption-that both transition dynamics and reward functions are linear in the same feature space-limits its applicability in real-world domains, where rewards often exhibit nonlinear or discrete structures. Motivated by applications such as healthcare and e-commerce, where data is scarce and reward signals can be binary or count-valued, we propose the Generalized Linear MDP (GLMDP) framework-an extension of the linear MDP framework-that models rewards using generalized linear models (GLMs) while maintaining linear transition dynamics. We establish the Bellman completeness of GLMDPs with respect to a new function class that accommodates nonlinear rewards and develop two offline RL algorithms: Generalized Pessimistic Value Iteration (GPEVI) and a semi-supervised variant (SS-GPEVI) that utilizes both labeled and unlabeled trajectories. Our algorithms achieve theoretical guarantees on policy suboptimality and demonstrate improved sample efficiency in settings where reward labels are expensive or limited.
Abstract: 线性马尔可夫决策过程(MDP)框架为强化学习(RL)提供了一个有原则的基础,并具有强大的理论保证和样本效率。然而,其严格的假设——即转换动力学和奖励函数在线性特征空间中都是线性的——限制了它在现实世界领域的适用性,在这些领域中,奖励通常表现出非线性或离散结构。受到医疗保健和电子商务等应用的启发,其中数据稀缺且奖励信号可以是二元或计数值,我们提出了广义线性MDP(GLMDP)框架——线性MDP框架的扩展——它使用广义线性模型(GLM)对奖励进行建模,同时保持线性转换动态。我们针对一个新函数类建立了GLMDP相对于Bellman完整性的条件,该函数类适应了非线性奖励,并开发了两种离线RL算法:广义悲观值迭代(GPEVI)和半监督变体(SS-GPEVI),后者利用标记和未标记轨迹。我们的算法在策略次优性方面取得了理论保证,并在奖励标签昂贵或有限的情况下展示了改进的样本效率。
Comments: 34 pages, 9 figures
Subjects: Machine Learning (stat.ML) ; Machine Learning (cs.LG)
Cite as: arXiv:2506.00818 [stat.ML]
  (or arXiv:2506.00818v1 [stat.ML] for this version)
  https://doi.org/10.48550/arXiv.2506.00818
arXiv-issued DOI via DataCite

Submission history

From: Sinian Zhang [view email]
[v1] Sun, 1 Jun 2025 03:50:41 UTC (1,181 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
license icon view license
Current browse context:
stat.ML
< prev   |   next >
new | recent | 2025-06
Change to browse by:
cs
cs.LG
stat

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号