Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets

Wu, Xinyi; Wang, Haohong; Katsaggelos, Aggelos K.

Computer Science > Multimedia

arXiv:2506.07076 (cs)

[Submitted on 8 Jun 2025 ]

Title: Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets

Title: 基于感知约束的谐波感知音乐驱动的UGC数据集运动合成

Authors:Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos

Abstract: With the popularity of video-based user-generated content (UGC) on social media, harmony, as dictated by human perceptual principles, is critical in assessing the rhythmic consistency of audio-visual UGCs for better user engagement. In this work, we propose a novel harmony-aware GAN framework, following a specifically designed harmony evaluation strategy to enhance rhythmic synchronization in the automatic music-to-motion synthesis using a UGC dance dataset. This harmony strategy utilizes refined cross-modal beat detection to capture closely correlated audio and visual rhythms in an audio-visual pair. To mimic human attention mechanism, we introduce saliency-based beat weighting and interval-driven beat alignment, which ensures accurate harmony score estimation consistent with human perception. Building on this strategy, our model, employing efficient encoder-decoder and depth-lifting designs, is adversarially trained based on categorized musical meter segments to generate realistic and rhythmic 3D human motions. We further incorporate our harmony evaluation strategy as a weakly supervised perceptual constraint to flexibly guide the synchronized audio-visual rhythms during the generation process. Experimental results show that our proposed model significantly outperforms other leading music-to-motion methods in rhythmic harmony, both quantitatively and qualitatively, even with limited UGC training data. Live samples 15 can be watched at: https://youtu.be/tWwz7yq4aUs

Abstract: 随着基于视频的用户生成内容（UGC）在社交媒体上的普及，根据人类感知原则，和谐对于评估音视频 UGC 的节奏一致性以提高用户参与度至关重要。在这项工作中，我们提出了一种新的基于和谐感知的 GAN 框架，并采用专门设计的和谐评估策略，以增强使用 UGC 舞蹈数据集进行自动音乐到运动合成中的节奏同步性。该和谐策略利用改进的跨模态节拍检测来捕捉音频-视觉对中的紧密相关的音频和视觉节奏。为了模仿人类注意力机制，我们引入了基于显著性的节拍加权和基于时间间隔的节拍对齐方法，这确保了与人类感知一致的准确和谐评分估计。在此策略基础上，我们的模型采用高效的编码器-解码器和深度提升设计，基于分类的音乐节拍段进行对抗训练，以生成逼真且有节奏的三维人体运动。我们进一步将和谐评估策略作为弱监督感知约束，灵活地引导生成过程中的同步音视频节奏。实验结果显示，即使在有限的 UGC 训练数据情况下，我们提出的模型在节奏和谐方面也显著优于其他领先的音乐到运动方法，无论是定量还是定性分析。实时样本可在以下链接观看：https://youtu.be/tWwz7yq4aUs

Subjects:	Multimedia (cs.MM)
Cite as:	arXiv:2506.07076 [cs.MM]
	(or arXiv:2506.07076v1 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2506.07076

Submission history

From: Xinyi Wu [view email]
[v1] Sun, 8 Jun 2025 10:32:56 UTC (9,396 KB)

Computer Science > Multimedia

Title: Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets

Title: 基于感知约束的谐波感知音乐驱动的UGC数据集运动合成

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title: Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets Show Chinese title

Title: 基于感知约束的谐波感知音乐驱动的UGC数据集运动合成

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets