Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2509.14689v1

Help | Advanced Search

Computer Science > Computation and Language

arXiv:2509.14689v1 (cs)
[Submitted on 18 Sep 2025 ]

Title: HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Title: HARNESS:轻量级蒸馏阿拉伯语语音基础模型

Authors:Vrunda N. sukhadia, Shammur Absar Chowdhury
Abstract: Large pre-trained speech models excel in downstream tasks but their deployment is impractical for resource-limited environments. In this paper, we introduce HArnESS, the first Arabic-centric self-supervised speech model family, designed to capture Arabic speech nuances. Using iterative self-distillation, we train large bilingual HArnESS (HL) SSL models and then distill knowledge into compressed student models (HS, HST), preserving Arabic-specific representations. We use low-rank approximation to further compact the teacher's discrete supervision into shallow, thin models. We evaluate HArnESS on Arabic ASR, Speaker Emotion Recognition (SER), and Dialect Identification (DID), demonstrating effectiveness against HuBERT and XLS-R. With minimal fine-tuning, HArnESS achieves SOTA or comparable performance, making it a lightweight yet powerful alternative for real-world use. We release our distilled models and findings to support responsible research and deployment in low-resource settings.
Abstract: 大型预训练语音模型在下游任务中表现出色,但在资源受限的环境中部署不切实际。 在本文中,我们介绍了HArnESS,第一个以阿拉伯语为中心的自监督语音模型系列,旨在捕捉阿拉伯语语音的细微差别。 通过迭代自蒸馏,我们训练大型双语HArnESS(HL)SSL模型,然后将知识蒸馏到压缩的学生模型(HS,HST)中,保留阿拉伯语特有的表示。 我们使用低秩近似进一步将教师的离散监督压缩成浅层、薄型模型。 我们在阿拉伯语自动语音识别(ASR)、说话人情感识别(SER)和方言识别(DID)上评估HArnESS,证明其在HuBERT和XLS-R中的有效性。 经过最少的微调,HArnESS实现了最先进或相当的性能,使其成为现实世界使用的轻量级但强大的替代方案。 我们发布我们的蒸馏模型和发现,以支持资源有限环境中的负责任的研究和部署。
Comments: 5 pages, 4 figures
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2509.14689 [cs.CL]
  (or arXiv:2509.14689v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2509.14689
arXiv-issued DOI via DataCite

Submission history

From: Vrunda N. Sukhadia [view email]
[v1] Thu, 18 Sep 2025 07:30:37 UTC (449 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
license icon view license
Current browse context:
cs.CL
< prev   |   next >
new | recent | 2025-09
Change to browse by:
cs

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号