Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > eess > arXiv:2306.01208

Help | Advanced Search

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2306.01208 (eess)
[Submitted on 1 Jun 2023 (v1) , last revised 10 Oct 2023 (this version, v2)]

Title: Adapting an Unadaptable ASR System

Title: 适应一个无法适应的ASR系统

Authors:Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill
Abstract: As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves. In this scenario it is challenging to adapt systems to a specific target domain. To address this problem we consider the recently released OpenAI Whisper ASR as an example of a large-scale ASR system to assess adaptation methods. An error correction based approach is adopted, as this does not require access to the model, but can be trained from either 1-best or N-best outputs that are normally available via the ASR API. LibriSpeech is used as the primary target domain for adaptation. The generalization ability of the system in two distinct dimensions are then evaluated. First, whether the form of correction model is portable to other speech recognition domains, and secondly whether it can be used for ASR models having a different architecture.
Abstract: 随着语音识别模型的规模和训练数据需求的增长,越来越多的系统只能通过在线服务提供商的API获得,而不是直接访问模型本身。在此场景下,很难将系统适应到特定的目标领域。为了解决这个问题,我们以最近发布的OpenAI Whisper ASR为例,评估了适应方法,这是一个大规模的ASR系统。采用了一种基于错误修正的方法,因为这种方法不需要访问模型,而是可以从通常通过ASR API提供的1-best或N-best输出中进行训练。LibriSpeech被用作主要的目标适应领域。然后,在两个不同的维度上评估了该系统的泛化能力。首先,修正模型的形式是否可以移植到其他语音识别领域;其次,它是否可以用于具有不同架构的ASR模型。
Comments: Proceedings of INTERSPEECH
Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Sound (cs.SD)
Cite as: arXiv:2306.01208 [eess.AS]
  (or arXiv:2306.01208v2 [eess.AS] for this version)
  https://doi.org/10.48550/arXiv.2306.01208
arXiv-issued DOI via DataCite
Related DOI: https://doi.org/10.21437/Interspeech.2023-1899
DOI(s) linking to related resources

Submission history

From: Rao Ma [view email]
[v1] Thu, 1 Jun 2023 23:54:11 UTC (3,032 KB)
[v2] Tue, 10 Oct 2023 09:44:24 UTC (3,032 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • TeX Source
view license
Current browse context:
eess.AS
< prev   |   next >
new | recent | 2023-06
Change to browse by:
cs
cs.CL
cs.SD
eess

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号