Adapting an Unadaptable ASR System

Ma, Rao; Qian, Mengjie; Gales, Mark J. F.; Knill, Kate M.

doi:10.21437/Interspeech.2023-1899

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2306.01208 (eess)

[Submitted on 1 Jun 2023 (v1) , last revised 10 Oct 2023 (this version, v2)]

Title: Adapting an Unadaptable ASR System

Title: 适应一个无法适应的ASR系统

Authors:Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

Abstract: As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves. In this scenario it is challenging to adapt systems to a specific target domain. To address this problem we consider the recently released OpenAI Whisper ASR as an example of a large-scale ASR system to assess adaptation methods. An error correction based approach is adopted, as this does not require access to the model, but can be trained from either 1-best or N-best outputs that are normally available via the ASR API. LibriSpeech is used as the primary target domain for adaptation. The generalization ability of the system in two distinct dimensions are then evaluated. First, whether the form of correction model is portable to other speech recognition domains, and secondly whether it can be used for ASR models having a different architecture.

Abstract: 随着语音识别模型的规模和训练数据需求的增长，越来越多的系统只能通过在线服务提供商的API获得，而不是直接访问模型本身。在此场景下，很难将系统适应到特定的目标领域。为了解决这个问题，我们以最近发布的OpenAI Whisper ASR为例，评估了适应方法，这是一个大规模的ASR系统。采用了一种基于错误修正的方法，因为这种方法不需要访问模型，而是可以从通常通过ASR API提供的1-best或N-best输出中进行训练。LibriSpeech被用作主要的目标适应领域。然后，在两个不同的维度上评估了该系统的泛化能力。首先，修正模型的形式是否可以移植到其他语音识别领域；其次，它是否可以用于具有不同架构的ASR模型。

Comments:	Proceedings of INTERSPEECH
Subjects:	Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2306.01208 [eess.AS]
	(or arXiv:2306.01208v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2306.01208
Related DOI:	https://doi.org/10.21437/Interspeech.2023-1899

Submission history

From: Rao Ma [view email]
[v1] Thu, 1 Jun 2023 23:54:11 UTC (3,032 KB)
[v2] Tue, 10 Oct 2023 09:44:24 UTC (3,032 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Adapting an Unadaptable ASR System

Title: 适应一个无法适应的ASR系统

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Adapting an Unadaptable ASR System Show Chinese title

Title: 适应一个无法适应的ASR系统

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Adapting an Unadaptable ASR System