Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody

Sasu, David; Yamoah, Kweku Andoh; Quartey, Benedict; Schluter, Natalie

Computer Science > Robotics

arXiv:2506.02057 (cs)

[Submitted on 1 Jun 2025 ]

Title: Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody

Title: 通过语音韵律增强机器人领域的语音指令理解和消歧

Authors:David Sasu, Kweku Andoh Yamoah, Benedict Quartey, Natalie Schluter

Abstract: Enabling robots to accurately interpret and execute spoken language instructions is essential for effective human-robot collaboration. Traditional methods rely on speech recognition to transcribe speech into text, often discarding crucial prosodic cues needed for disambiguating intent. We propose a novel approach that directly leverages speech prosody to infer and resolve instruction intent. Predicted intents are integrated into large language models via in-context learning to disambiguate and select appropriate task plans. Additionally, we present the first ambiguous speech dataset for robotics, designed to advance research in speech disambiguation. Our method achieves 95.79% accuracy in detecting referent intents within an utterance and determines the intended task plan of ambiguous instructions with 71.96% accuracy, demonstrating its potential to significantly improve human-robot communication.

Abstract: 使机器人能够准确地解释和执行口语指令对于有效的人机协作至关重要。传统方法依赖于语音识别将语音转录为文本，通常会丢弃用于消除意图歧义的关键韵律线索。我们提出了一种新颖的方法，直接利用语音韵律来推断和解决指令意图。通过上下文学习预测的意图被整合到大型语言模型中，以消除歧义并选择适当的任务计划。此外，我们提出了第一个用于机器人领域的有歧义语音数据集，旨在推动语音消歧研究的发展。我们的方法在检测话语中的参考意图时达到了95.79%的准确率，并且能够以71.96%的准确率确定有歧义指令的预期任务计划，展示了其显著改善人机通信的巨大潜力。

Comments:	Accepted to Interspeech 2025
Subjects:	Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2506.02057 [cs.RO]
	(or arXiv:2506.02057v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2506.02057

Submission history

From: David Sasu [view email]
[v1] Sun, 1 Jun 2025 14:06:57 UTC (3,149 KB)

Computer Science > Robotics

Title: Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody

Title: 通过语音韵律增强机器人领域的语音指令理解和消歧

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title: Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody Show Chinese title

Title: 通过语音韵律增强机器人领域的语音指令理解和消歧

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody