Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2509.12647

Help | Advanced Search

Computer Science > Computation and Language

arXiv:2509.12647 (cs)
[Submitted on 16 Sep 2025 ]

Title: PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition

Title: PAC:基于发音感知的上下文大语言模型自动语音识别

Authors:Li Fu, Yu Xin, Sunlu Zeng, Lu Fan, Youzheng Wu, Xiaodong He
Abstract: This paper presents a Pronunciation-Aware Contextualized (PAC) framework to address two key challenges in Large Language Model (LLM)-based Automatic Speech Recognition (ASR) systems: effective pronunciation modeling and robust homophone discrimination. Both are essential for raw or long-tail word recognition. The proposed approach adopts a two-stage learning paradigm. First, we introduce a pronunciation-guided context learning method. It employs an interleaved grapheme-phoneme context modeling strategy that incorporates grapheme-only distractors, encouraging the model to leverage phonemic cues for accurate recognition. Then, we propose a pronunciation-discriminative reinforcement learning method with perturbed label sampling to further enhance the model\'s ability to distinguish contextualized homophones. Experimental results on the public English Librispeech and Mandarin AISHELL-1 datasets indicate that PAC: (1) reduces relative Word Error Rate (WER) by 30.2% and 53.8% compared to pre-trained LLM-based ASR models, and (2) achieves 31.8% and 60.5% relative reductions in biased WER for long-tail words compared to strong baselines, respectively.
Abstract: 本文提出了一种发音感知的上下文框架(PAC),以解决基于大型语言模型(LLM)的自动语音识别(ASR)系统中的两个关键挑战:有效的发音建模和鲁棒的同音词区分。这两点对于原始或长尾词识别至关重要。所提出的方法采用了一个两阶段的学习范式。首先,我们引入了一种发音引导的上下文学习方法。它采用了一种交错的字符-音素上下文建模策略,结合了仅字符的干扰项,鼓励模型利用音素线索进行准确识别。然后,我们提出了一种带有扰动标签采样的发音区分强化学习方法,以进一步增强模型区分上下文同音词的能力。在公开的英文Librispeech和中文AISHELL-1数据集上的实验结果表明,PAC:(1)与预训练的基于LLM的ASR模型相比,相对单词错误率(WER)分别降低了30.2%和53.8%,(2)与强基线相比,分别实现了长尾词有偏WER 31.8%和60.5%的相对降低。
Comments: Submitted to ICASSP 2026
Subjects: Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
Cite as: arXiv:2509.12647 [cs.CL]
  (or arXiv:2509.12647v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2509.12647
arXiv-issued DOI via DataCite

Submission history

From: Li Fu [view email]
[v1] Tue, 16 Sep 2025 04:07:28 UTC (140 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
view license
Current browse context:
cs.CL
< prev   |   next >
new | recent | 2025-09
Change to browse by:
cs
eess
eess.AS

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号