Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > eess > arXiv:2509.13215

Help | Advanced Search

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2509.13215 (eess)
[Submitted on 16 Sep 2025 ]

Title: Importance-Weighted Domain Adaptation for Sound Source Tracking

Title: 重要性加权的声源跟踪域适应

Authors:Bingxiang Zhong, Thomas Dietzen
Abstract: In recent years, deep learning has significantly advanced sound source localization (SSL). However, training such models requires large labeled datasets, and real recordings are costly to annotate in particular if sources move. While synthetic data using simulated room impulse responses (RIRs) and noise offers a practical alternative, models trained on synthetic data suffer from domain shift in real environments. Unsupervised domain adaptation (UDA) can address this by aligning synthetic and real domains without relying on labels from the latter. The few existing UDA approaches however focus on static SSL and do not account for the problem of sound source tracking (SST), which presents two specific domain adaptation challenges. First, variable-length input sequences create mismatches in feature dimensionality across domains. Second, the angular coverages of the synthetic and the real data may not be well aligned either due to partial domain overlap or due to batch size constraints, which we refer to as directional diversity mismatch. To address these, we propose a novel UDA approach tailored for SST based on two key features. We employ the final hidden state of a recurrent neural network as a fixed-dimensional feature representation to handle variable-length sequences. Further, we use importance-weighted adversarial training to tackle directional diversity mismatch by prioritizing synthetic samples similar to the real domain. Experimental results demonstrate that our approach successfully adapts synthetic-trained models to real environments, improving SST performance.
Abstract: 近年来,深度学习在声源定位(SSL)方面取得了显著进展。 然而,训练此类模型需要大量的标注数据,而真实录音尤其昂贵,特别是当声源移动时。 虽然使用模拟房间脉冲响应(RIR)和噪声的合成数据提供了一个实用的替代方案,但仅在合成数据上训练的模型在真实环境中会受到领域偏移的影响。 无监督领域自适应(UDA)可以通过在不依赖后者标签的情况下对齐合成和真实领域来解决这个问题。 然而,现有的少数UDA方法主要关注静态SSL,没有考虑到声源跟踪(SST)的问题,这带来了两个特定的领域自适应挑战。 首先,可变长度输入序列会导致不同领域之间的特征维度不匹配。 其次,由于部分领域重叠或由于批量大小限制,合成数据和真实数据的角度覆盖可能无法很好地对齐,我们将其称为方向多样性不匹配。 为了解决这些问题,我们提出了一种针对SST的新型UDA方法,该方法基于两个关键特性。 我们采用循环神经网络的最终隐藏状态作为固定维度的特征表示,以处理可变长度序列。 此外,我们使用重要性加权对抗训练,通过优先考虑与真实领域相似的合成样本,来解决方向多样性不匹配问题。 实验结果表明,我们的方法成功地将合成训练的模型适应到真实环境中,提高了SST性能。
Comments: Accepted paper: Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025)
Subjects: Audio and Speech Processing (eess.AS)
Cite as: arXiv:2509.13215 [eess.AS]
  (or arXiv:2509.13215v1 [eess.AS] for this version)
  https://doi.org/10.48550/arXiv.2509.13215
arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Bingxiang Zhong [view email]
[v1] Tue, 16 Sep 2025 16:20:13 UTC (214 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
eess.AS
< prev   |   next >
new | recent | 2025-09
Change to browse by:
eess

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号