Fast Algorithm for Moving Sound Source

Yang, Dong

电气工程与系统科学 > 音频与语音处理

arXiv:2508.03065 (eess)

[提交于 2025年8月4日 (v1) ，最后修订 2025年8月17日 (此版本， v2)]

标题：快速移动声源算法

标题： Fast Algorithm for Moving Sound Source

Authors:Dong Yang

摘要：现代基于神经网络的语音处理系统通常需要具有混响抵抗能力，因此此类系统的训练需要大量的混响数据。在系统训练过程中，现在更倾向于使用采样静态系统来模拟动态系统，或者通过实际记录的数据进行数据补充。然而，这无法从根本上解决符合物理规律的运动数据模拟问题。针对运动场景中语音增强模型训练数据不足的核心问题，本文提出了杨氏运动时空采样重建理论，以实现运动连续时变混响的高效模拟。该理论突破了传统静态图像源法（ISM）在时变系统中的局限性。通过将运动图像源的脉冲响应分解为两部分：线性时不变调制和离散时变分数延迟，建立了一个符合物理规律的运动声场模型。基于运动位移的带限特性，提出了一种分层采样策略：对低阶图像使用高采样率以保留细节，对高阶图像使用低采样率以降低计算复杂度。设计了一种快速合成架构以实现实时模拟。实验表明，与开源模型相比，所提出的理论可以更准确地恢复运动场景中的幅度和相位变化，解决了运动声源数据模拟的行业问题，并为语音增强模型提供了高质量的动态训练数据。

摘要： Modern neural network-based speech processing systems usually need to have reverberation resistance, so the training of such systems requires a large amount of reverberation data. In the process of system training, it is now more inclined to use sampling static systems to simulate dynamic systems, or to supplement data through actually recorded data. However, this cannot fundamentally solve the problem of simulating motion data that conforms to physical laws. Aiming at the core issue of insufficient training data for speech enhancement models in moving scenarios, this paper proposes Yang's motion spatio-temporal sampling reconstruction theory to realize efficient simulation of motion continuous time-varying reverberation. This theory breaks through the limitations of the traditional static Image-Source Method (ISM) in time-varying systems. By decomposing the impulse response of the moving image source into two parts: linear time-invariant modulation and discrete time-varying fractional delay, a moving sound field model conforming to physical laws is established. Based on the band-limited characteristics of motion displacement, a hierarchical sampling strategy is proposed: high sampling rate is used for low-order images to retain details, and low sampling rate is used for high-order images to reduce computational complexity. A fast synthesis architecture is designed to realize real-time simulation. Experiments show that compared with the open-source models, the proposed theory can more accurately restore the amplitude and phase changes in moving scenarios, solving the industry problem of motion sound source data simulation, and providing high-quality dynamic training data for speech enhancement models.

主题：	音频与语音处理 (eess.AS) ; 声音 (cs.SD)
引用方式：	arXiv:2508.03065 [eess.AS]
	(或者 arXiv:2508.03065v2 [eess.AS] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.03065

提交历史

来自： Dong Yang [查看电子邮件]
[v1] 星期一， 2025 年 8 月 4 日 09:07:51 UTC (545 KB)
[v2] 星期日， 2025 年 8 月 17 日 16:55:50 UTC (548 KB)

电气工程与系统科学 > 音频与语音处理

标题：快速移动声源算法

标题： Fast Algorithm for Moving Sound Source

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

电气工程与系统科学 > 音频与语音处理

标题： 快速移动声源算法 显示英文标题

标题： Fast Algorithm for Moving Sound Source

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：快速移动声源算法