Think before Recommendation: Autonomous Reasoning-enhanced Recommender

Kong, Xiaoyu; Jiang, Junguang; Liu, Bin; Xu, Ziru; Zhu, Han; Xu, Jian; Zheng, Bo; Wu, Jiancan; Wang, Xiang

计算机科学 > 信息检索

arXiv:2510.23077 (cs)

[提交于 2025年10月27日 ]

标题：思考前推荐：增强自主推理的推荐系统

标题： Think before Recommendation: Autonomous Reasoning-enhanced Recommender

Authors:Xiaoyu Kong, Junguang Jiang, Bin Liu, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng, Jiancan Wu, Xiang Wang

摘要：推荐系统的核心任务是从历史的用户-项目交互中学习用户偏好。随着大型语言模型（LLMs）的快速发展，最近的研究探索了利用LLMs的推理能力来增强评分预测任务。然而，现有的基于蒸馏的方法存在诸如教师模型的推荐能力不足、成本高昂且静态的监督以及推理能力的表面迁移等限制。为了解决这些问题，本文提出了RecZero，一种基于强化学习（RL）的推荐范式，摒弃了传统的多模型和多阶段蒸馏方法。相反，RecZero通过纯强化学习训练一个LLM，使其自主发展评分预测的推理能力。RecZero包含两个关键组件：(1) "推荐前思考"提示构建，采用结构化的推理模板引导模型逐步分析用户兴趣、项目特征和用户-项目兼容性；以及(2)基于规则的奖励建模，采用群体相对策略优化（GRPO）计算推理轨迹的奖励并优化LLM。此外，本文还探索了一种混合范式RecOne，结合监督微调与RL，使用冷启动推理样本初始化模型，并进一步通过RL进行优化。实验结果表明，RecZero和RecOne在多个基准数据集上显著优于现有的基线方法，验证了强化学习范式在实现自主推理增强的推荐系统中的优越性。

摘要： The core task of recommender systems is to learn user preferences from historical user-item interactions. With the rapid development of large language models (LLMs), recent research has explored leveraging the reasoning capabilities of LLMs to enhance rating prediction tasks. However, existing distillation-based methods suffer from limitations such as the teacher model's insufficient recommendation capability, costly and static supervision, and superficial transfer of reasoning ability. To address these issues, this paper proposes RecZero, a reinforcement learning (RL)-based recommendation paradigm that abandons the traditional multi-model and multi-stage distillation approach. Instead, RecZero trains a single LLM through pure RL to autonomously develop reasoning capabilities for rating prediction. RecZero consists of two key components: (1) "Think-before-Recommendation" prompt construction, which employs a structured reasoning template to guide the model in step-wise analysis of user interests, item features, and user-item compatibility; and (2) rule-based reward modeling, which adopts group relative policy optimization (GRPO) to compute rewards for reasoning trajectories and optimize the LLM. Additionally, the paper explores a hybrid paradigm, RecOne, which combines supervised fine-tuning with RL, initializing the model with cold-start reasoning samples and further optimizing it with RL. Experimental results demonstrate that RecZero and RecOne significantly outperform existing baseline methods on multiple benchmark datasets, validating the superiority of the RL paradigm in achieving autonomous reasoning-enhanced recommender systems.

评论：	神经网络与人工智能国际会议2025海报
主题：	信息检索 (cs.IR) ; 人工智能 (cs.AI)
引用方式：	arXiv:2510.23077 [cs.IR]
	(或者 arXiv:2510.23077v1 [cs.IR] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.23077

提交历史

来自： Xiaoyu Kong [查看电子邮件]
[v1] 星期一， 2025 年 10 月 27 日 07:26:32 UTC (492 KB)

计算机科学 > 信息检索

标题：思考前推荐：增强自主推理的推荐系统

标题： Think before Recommendation: Autonomous Reasoning-enhanced Recommender

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 信息检索

标题： 思考前推荐：增强自主推理的推荐系统 显示英文标题

标题： Think before Recommendation: Autonomous Reasoning-enhanced Recommender

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：思考前推荐：增强自主推理的推荐系统