Generalized Linear Markov Decision Process

Zhang, Sinian; Zhang, Kaicheng; Xu, Ziping; Cai, Tianxi; Zhou, Doudou

Statistics > Machine Learning

arXiv:2506.00818 (stat)

[Submitted on 1 Jun 2025 ]

Title: Generalized Linear Markov Decision Process

Title: 广义线性马尔可夫决策过程

Authors:Sinian Zhang, Kaicheng Zhang, Ziping Xu, Tianxi Cai, Doudou Zhou

Abstract: The linear Markov Decision Process (MDP) framework offers a principled foundation for reinforcement learning (RL) with strong theoretical guarantees and sample efficiency. However, its restrictive assumption-that both transition dynamics and reward functions are linear in the same feature space-limits its applicability in real-world domains, where rewards often exhibit nonlinear or discrete structures. Motivated by applications such as healthcare and e-commerce, where data is scarce and reward signals can be binary or count-valued, we propose the Generalized Linear MDP (GLMDP) framework-an extension of the linear MDP framework-that models rewards using generalized linear models (GLMs) while maintaining linear transition dynamics. We establish the Bellman completeness of GLMDPs with respect to a new function class that accommodates nonlinear rewards and develop two offline RL algorithms: Generalized Pessimistic Value Iteration (GPEVI) and a semi-supervised variant (SS-GPEVI) that utilizes both labeled and unlabeled trajectories. Our algorithms achieve theoretical guarantees on policy suboptimality and demonstrate improved sample efficiency in settings where reward labels are expensive or limited.

Abstract: 线性马尔可夫决策过程（MDP）框架为强化学习（RL）提供了一个有原则的基础，并具有强大的理论保证和样本效率。然而，其严格的假设——即转换动力学和奖励函数在线性特征空间中都是线性的——限制了它在现实世界领域的适用性，在这些领域中，奖励通常表现出非线性或离散结构。受到医疗保健和电子商务等应用的启发，其中数据稀缺且奖励信号可以是二元或计数值，我们提出了广义线性MDP（GLMDP）框架——线性MDP框架的扩展——它使用广义线性模型（GLM）对奖励进行建模，同时保持线性转换动态。我们针对一个新函数类建立了GLMDP相对于Bellman完整性的条件，该函数类适应了非线性奖励，并开发了两种离线RL算法：广义悲观值迭代（GPEVI）和半监督变体（SS-GPEVI），后者利用标记和未标记轨迹。我们的算法在策略次优性方面取得了理论保证，并在奖励标签昂贵或有限的情况下展示了改进的样本效率。

Comments:	34 pages, 9 figures
Subjects:	Machine Learning (stat.ML) ; Machine Learning (cs.LG)
Cite as:	arXiv:2506.00818 [stat.ML]
	(or arXiv:2506.00818v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2506.00818

Submission history

From: Sinian Zhang [view email]
[v1] Sun, 1 Jun 2025 03:50:41 UTC (1,181 KB)

Statistics > Machine Learning

Title: Generalized Linear Markov Decision Process

Title: 广义线性马尔可夫决策过程

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title: Generalized Linear Markov Decision Process Show Chinese title

Title: 广义线性马尔可夫决策过程

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Generalized Linear Markov Decision Process