Group Sequence Policy Optimization

Zheng, Chujie; Liu, Shixuan; Li, Mingze; Chen, Xiong-Hui; Yu, Bowen; Gao, Chang; Dang, Kai; Liu, Yuqiong; Men, Rui; Yang, An; Zhou, Jingren; Lin, Junyang

Computer Science > Machine Learning

arXiv:2507.18071 (cs)

[Submitted on 24 Jul 2025 (v1) , last revised 28 Jul 2025 (this version, v2)]

Title: Group Sequence Policy Optimization

Title: 组序列策略优化

Authors:Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, Jingren Zhou, Junyang Lin

Abstract: This paper introduces Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant reinforcement learning algorithm for training large language models. Unlike previous algorithms that adopt token-level importance ratios, GSPO defines the importance ratio based on sequence likelihood and performs sequence-level clipping, rewarding, and optimization. We demonstrate that GSPO achieves superior training efficiency and performance compared to the GRPO algorithm, notably stabilizes Mixture-of-Experts (MoE) RL training, and has the potential for simplifying the design of RL infrastructure. These merits of GSPO have contributed to the remarkable improvements in the latest Qwen3 models.

Abstract: 本文介绍了群体序列策略优化（GSPO），这是一种稳定、高效且性能优越的强化学习算法，用于训练大型语言模型。与之前采用逐标记重要性比率的算法不同，GSPO基于序列似然定义重要性比率，并执行序列级别的裁剪、奖励和优化。我们证明了GSPO相比GRPO算法在训练效率和性能方面表现出色，特别是在稳定混合专家（MoE）强化学习训练方面表现显著，并有望简化强化学习基础设施的设计。 GSPO的这些优点促成了最新Qwen3模型的显著提升。

Subjects:	Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2507.18071 [cs.LG]
	(or arXiv:2507.18071v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2507.18071

Submission history

From: Chujie Zheng [view email]
[v1] Thu, 24 Jul 2025 03:50:32 UTC (259 KB)
[v2] Mon, 28 Jul 2025 11:11:33 UTC (259 KB)

Computer Science > Machine Learning

Title: Group Sequence Policy Optimization

Title: 组序列策略优化

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title: Group Sequence Policy Optimization Show Chinese title

Title: 组序列策略优化

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Group Sequence Policy Optimization