Emotion Omni: Enabling Empathetic Speech Response Generation through Large Language Models

Wang, Haoyu; Zhang, Guangyan; Chen, Jiale; Li, Jingyu; Wang, Yuehai; Guo, Yiwen

计算机科学 > 计算与语言

arXiv:2508.18655 (cs)

[提交于 2025年8月26日 ]

标题：情感全能：通过大型语言模型实现共情语音响应生成

标题： Emotion Omni: Enabling Empathetic Speech Response Generation through Large Language Models

Authors:Haoyu Wang, Guangyan Zhang, Jiale Chen, Jingyu Li, Yuehai Wang, Yiwen Guo

摘要：随着语音大语言模型（语音LLMs）的发展，用户现在可以通过语音直接与助手进行交互。然而，大多数现有模型只是将响应内容转换为语音，而没有充分理解用户查询中嵌入的丰富情感和副语言线索。在许多情况下，同一句话可以根据情感表达的不同而具有不同的含义。此外，情感理解对于提升人机交互中的用户体验至关重要。目前，具有同理心能力的语音LLMs大多是在大规模数据集上训练的。这种方法需要大量的数据和巨大的计算资源。因此，一个关键挑战是如何在数据有限且不需要大规模训练的情况下开发出能够生成同理心响应的语音LLMs。为了解决这个挑战，我们提出了Emotion Omni，一种新的模型架构，旨在理解用户语音输入的情感内容并生成同理心的语音响应。此外，我们开发了一个基于开源TTS框架的数据生成管道，以构建一个20万条情感对话数据集，该数据集支持构建一个同理心语音助手。演示地址为https://w311411.github.io/omni_demo/

摘要： With the development of speech large language models (speech LLMs), users can now interact directly with assistants via speech. However, most existing models simply convert the response content into speech without fully understanding the rich emotional and paralinguistic cues embedded in the user's query. In many cases, the same sentence can have different meanings depending on the emotional expression. Furthermore, emotional understanding is essential for improving user experience in human-machine interaction. Currently, most speech LLMs with empathetic capabilities are trained on massive datasets. This approach requires vast amounts of data and significant computational resources. Therefore, a key challenge lies in how to develop a speech LLM capable of generating empathetic responses with limited data and without the need for large-scale training. To address this challenge, we propose Emotion Omni, a novel model architecture designed to understand the emotional content of user speech input and generate empathetic speech responses. Additionally, we developed a data generation pipeline based on an open-source TTS framework to construct a 200k emotional dialogue dataset, which supports the construction of an empathetic speech assistant. The demos are available at https://w311411.github.io/omni_demo/

评论：	5页，1图，提交至ICASSP 2026
主题：	计算与语言 (cs.CL) ; 声音 (cs.SD); 音频与语音处理 (eess.AS)
MSC 类：	I.2.7
引用方式：	arXiv:2508.18655 [cs.CL]
	(或者 arXiv:2508.18655v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.18655

提交历史

来自： Haoyu Wang [查看电子邮件]
[v1] 星期二， 2025 年 8 月 26 日 03:54:39 UTC (166 KB)

计算机科学 > 计算与语言

标题：情感全能：通过大型语言模型实现共情语音响应生成

标题： Emotion Omni: Enabling Empathetic Speech Response Generation through Large Language Models

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 情感全能：通过大型语言模型实现共情语音响应生成 显示英文标题

标题： Emotion Omni: Enabling Empathetic Speech Response Generation through Large Language Models

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：情感全能：通过大型语言模型实现共情语音响应生成