Enhancing Speech Large Language Models through Reinforced Behavior Alignment

Liu, Yansong; Li, Jiateng; Liu, Yuan

计算机科学 > 计算与语言

arXiv:2509.03526 (cs)

[提交于 2025年8月25日 ]

标题：通过强化行为对齐增强语音大语言模型

标题： Enhancing Speech Large Language Models through Reinforced Behavior Alignment

Authors:Yansong Liu, Jiateng Li, Yuan Liu

摘要：最近大型语言模型（LLMs）的进展激发了研究者们将它们的语言能力扩展到文本以外的其他模态的兴趣，这导致了具有处理用户请求能力的语音基础LLMs（SpeechLMs）的出现，这些模型可以以语音或文本格式处理用户请求。然而，由于模态间的差异，这些SpeechLMs在遵循指令方面与文本基础的LLM相比仍存在显著性能差距，尤其是在面对用户语音的动态和多变性质时。为了解决这一挑战，本文介绍了一个称为强化行为对齐（RBA）的框架，旨在增强SpeechLMs的语言生成能力。RBA不依赖于从人类标注中进行监督微调，而是采用一种自我合成方法，通过一个强大的教师LLM生成大量高保真对齐数据。然后使用基于强化学习的方法将SpeechLMs的行为与教师的行为对齐。实验结果表明，这种方法有效提升了SpeechLMs的指令遵循能力，其表现优于传统的蒸馏基线。关键的是，我们证明RBA可以无缝扩展到包括语音问答和语音到文本翻译在内的任务，在仅使用自生成数据的情况下，在开放基准测试中取得了最先进的性能。

摘要： The recent advancements of Large Language Models (LLMs) have spurred considerable research interest in extending their linguistic capabilities beyond text to other modalities, which leads to emergence of speech-based LLMs (SpeechLMs) with capability of processing user request in either speech or textual formats. However, owing to inter-modal discrepancies, these SpeechLMs still exhibit a significant performance gap compared to their text-based LLM counterparts in instruction-following, particularly when confronted with the dynamic and variable nature of user speech. To address this challenge, this paper introduces a framework termed Reinforced Behavior Alignment (RBA), designed to bolster the language generation proficiency of SpeechLMs. Instead of relying on supervised fine-tuning from human annotations, RBA employs a self-synthesis methodology to generate extensive, high-fidelity alignment data by a powerful teacher LLM. Then SpeechLMs is aligned its behavior with that of a teacher using a reinforcement learning-based approach. Experimental results demonstrate that this method effectively enhances the instruction-following capabilities of SpeechLMs that outperform conventional distillation baselines. Crucially, we demonstrate that RBA can be seamlessly extended to tasks such including spoken question answering and speech-to-text translation, attaining state-of-the-art performance on open benchmarks with only self-generated data.

主题：	计算与语言 (cs.CL) ; 音频与语音处理 (eess.AS)
引用方式：	arXiv:2509.03526 [cs.CL]
	(或者 arXiv:2509.03526v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.03526

提交历史

来自： Yuan Liu [查看电子邮件]
[v1] 星期一， 2025 年 8 月 25 日 07:31:48 UTC (916 KB)

计算机科学 > 计算与语言

标题：通过强化行为对齐增强语音大语言模型

标题： Enhancing Speech Large Language Models through Reinforced Behavior Alignment

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 通过强化行为对齐增强语音大语言模型 显示英文标题

标题： Enhancing Speech Large Language Models through Reinforced Behavior Alignment

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：通过强化行为对齐增强语音大语言模型