Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2506.02057

Help | Advanced Search

Computer Science > Robotics

arXiv:2506.02057 (cs)
[Submitted on 1 Jun 2025 ]

Title: Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody

Title: 通过语音韵律增强机器人领域的语音指令理解和消歧

Authors:David Sasu, Kweku Andoh Yamoah, Benedict Quartey, Natalie Schluter
Abstract: Enabling robots to accurately interpret and execute spoken language instructions is essential for effective human-robot collaboration. Traditional methods rely on speech recognition to transcribe speech into text, often discarding crucial prosodic cues needed for disambiguating intent. We propose a novel approach that directly leverages speech prosody to infer and resolve instruction intent. Predicted intents are integrated into large language models via in-context learning to disambiguate and select appropriate task plans. Additionally, we present the first ambiguous speech dataset for robotics, designed to advance research in speech disambiguation. Our method achieves 95.79% accuracy in detecting referent intents within an utterance and determines the intended task plan of ambiguous instructions with 71.96% accuracy, demonstrating its potential to significantly improve human-robot communication.
Abstract: 使机器人能够准确地解释和执行口语指令对于有效的人机协作至关重要。传统方法依赖于语音识别将语音转录为文本,通常会丢弃用于消除意图歧义的关键韵律线索。我们提出了一种新颖的方法,直接利用语音韵律来推断和解决指令意图。通过上下文学习预测的意图被整合到大型语言模型中,以消除歧义并选择适当的任务计划。此外,我们提出了第一个用于机器人领域的有歧义语音数据集,旨在推动语音消歧研究的发展。我们的方法在检测话语中的参考意图时达到了95.79%的准确率,并且能够以71.96%的准确率确定有歧义指令的预期任务计划,展示了其显著改善人机通信的巨大潜力。
Comments: Accepted to Interspeech 2025
Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2506.02057 [cs.RO]
  (or arXiv:2506.02057v1 [cs.RO] for this version)
  https://doi.org/10.48550/arXiv.2506.02057
arXiv-issued DOI via DataCite

Submission history

From: David Sasu [view email]
[v1] Sun, 1 Jun 2025 14:06:57 UTC (3,149 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs.RO
< prev   |   next >
new | recent | 2025-06
Change to browse by:
cs
cs.AI
cs.CL
cs.LG

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号