Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2509.11976

Help | Advanced Search

Computer Science > Sound

arXiv:2509.11976 (cs)
[Submitted on 15 Sep 2025 (v1) , last revised 23 Sep 2025 (this version, v3)]

Title: PoolingVQ: A VQVAE Variant for Reducing Audio Redundancy and Boosting Multi-Modal Fusion in Music Emotion Analysis

Title: PoolingVQ:一种用于减少音频冗余并增强音乐情感分析中多模态融合的VQVAE变体

Authors:Dinghao Zou, Yicheng Gong, Xiaokang Li, Xin Cao, Sunbowen Lee
Abstract: Multimodal music emotion analysis leverages both audio and MIDI modalities to enhance performance. While mainstream approaches focus on complex feature extraction networks, we propose that shortening the length of audio sequence features to mitigate redundancy, especially in contrast to MIDI's compact representation, may effectively boost task performance. To achieve this, we developed PoolingVQ by combining Vector Quantized Variational Autoencoder (VQVAE) with spatial pooling, which directly compresses audio feature sequences through codebook-guided local aggregation to reduce redundancy, then devised a two-stage co-attention approach to fuse audio and MIDI information. Experimental results on the public datasets EMOPIA and VGMIDI demonstrate that our multimodal framework achieves state-of-the-art performance, with PoolingVQ yielding effective improvement. Our proposed metho's code is available at Anonymous GitHub
Abstract: 多模态音乐情感分析利用音频和MIDI模态来提升性能。 虽然主流方法专注于复杂特征提取网络,但我们提出,缩短音频序列特征的长度以减少冗余,尤其是在与MIDI的紧凑表示对比时,可能会有效提升任务性能。 为了实现这一点,我们通过将向量量化变分自编码器(VQVAE)与空间池化结合,开发了PoolingVQ,它通过代码本引导的局部聚合直接压缩音频特征序列以减少冗余,然后设计了一个两阶段的协同注意力方法来融合音频和MIDI信息。 在公开数据集EMOPIA和VGMIDI上的实验结果表明,我们的多模态框架实现了最先进性能,其中PoolingVQ带来了有效的提升。 我们提出的metho的代码可在Anonymous GitHub上获得
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
Cite as: arXiv:2509.11976 [cs.SD]
  (or arXiv:2509.11976v3 [cs.SD] for this version)
  https://doi.org/10.48550/arXiv.2509.11976
arXiv-issued DOI via DataCite

Submission history

From: Dinghao Zou [view email]
[v1] Mon, 15 Sep 2025 14:24:04 UTC (1,202 KB)
[v2] Mon, 22 Sep 2025 13:57:49 UTC (1,214 KB)
[v3] Tue, 23 Sep 2025 02:20:49 UTC (1,214 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
view license
Current browse context:
cs.SD
< prev   |   next >
new | recent | 2025-09
Change to browse by:
cs
eess
eess.AS

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号