FlowSE: Flow Matching-based Speech Enhancement

Lee, Seonggyu; Cheong, Sein; Han, Sangwook; Shin, Jong Won

电气工程与系统科学 > 音频与语音处理

arXiv:2508.06840 (eess)

[提交于 2025年8月9日 ]

标题： FlowSE：基于流匹配的语音增强

标题： FlowSE: Flow Matching-based Speech Enhancement

Authors:Seonggyu Lee, Sein Cheong, Sangwook Han, Jong Won Shin

摘要：扩散概率模型在语音增强方面表现出色，但通常在推理阶段需要25到60次函数评估，导致计算复杂度较高。最近，提出了一种微调方法来校正反向过程，这显著降低了函数评估次数（NFE）。流匹配是一种训练连续归一化流的方法，它模拟从已知分布到未知分布的概率路径，包括由扩散过程描述的路径。本文提出了一种基于条件流匹配的语音增强方法。所提出的方法在NFE为5时，达到了与基于扩散的语音增强方法在NFE为60时相当的性能，并且在无需额外微调过程的情况下，在NFE从1到5时与校正反向过程的扩散模型表现出相似的性能。我们还证明了从具有修改后的最优传输条件向量场的条件概率路径推导出的对应扩散模型，在无需任何微调过程的情况下，NFE为5时表现出类似的性能。

摘要： Diffusion probabilistic models have shown impressive performance for speech enhancement, but they typically require 25 to 60 function evaluations in the inference phase, resulting in heavy computational complexity. Recently, a fine-tuning method was proposed to correct the reverse process, which significantly lowered the number of function evaluations (NFE). Flow matching is a method to train continuous normalizing flows which model probability paths from known distributions to unknown distributions including those described by diffusion processes. In this paper, we propose a speech enhancement based on conditional flow matching. The proposed method achieved the performance comparable to those for the diffusion-based speech enhancement with the NFE of 60 when the NFE was 5, and showed similar performance with the diffusion model correcting the reverse process at the same NFE from 1 to 5 without additional fine tuning procedure. We also have shown that the corresponding diffusion model derived from the conditional probability path with a modified optimal transport conditional vector field demonstrated similar performances with the NFE of 5 without any fine-tuning procedure.

评论：	发表于ICASSP 2025
主题：	音频与语音处理 (eess.AS) ; 信号处理 (eess.SP)
引用方式：	arXiv:2508.06840 [eess.AS]
	(或者 arXiv:2508.06840v1 [eess.AS] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.06840

提交历史

来自： Seonggyu Lee [查看电子邮件]
[v1] 星期六， 2025 年 8 月 9 日 05:45:17 UTC (345 KB)

电气工程与系统科学 > 音频与语音处理

标题： FlowSE：基于流匹配的语音增强

标题： FlowSE: Flow Matching-based Speech Enhancement

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

电气工程与系统科学 > 音频与语音处理

标题： FlowSE：基于流匹配的语音增强 显示英文标题

标题： FlowSE: Flow Matching-based Speech Enhancement

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： FlowSE：基于流匹配的语音增强