Flowing Straighter with Conditional Flow Matching for Accurate Speech Enhancement

Cross, Mattias; Ragni, Anton

计算机科学 > 声音

arXiv:2508.20584 (cs)

[提交于 2025年8月28日 ]

标题：更准确的语音增强条件流匹配流畅化

标题： Flowing Straighter with Conditional Flow Matching for Accurate Speech Enhancement

Authors:Mattias Cross, Anton Ragni

摘要：基于流的当前生成语音增强方法学习弯曲的概率路径，这些路径建模了干净语音和噪声语音之间的映射。尽管表现令人印象深刻，但弯曲概率路径的含义尚不清楚。像薛定谔桥这样的方法关注弯曲路径，其中时间依赖的梯度和方差不促进直线路径。机器学习研究中的发现表明，直线路径，如条件流匹配，更容易训练并提供更好的泛化能力。在本文中，我们量化了路径直线性对语音增强质量的影响。我们报告了薛定谔桥的实验，其中我们展示了某些配置会导致更直的路径。相反，我们提出了独立的条件流匹配用于语音增强，它在噪声语音和干净语音之间建模直线路径。我们通过实证证明，时间无关的方差对样本质量的影响大于梯度。虽然条件流匹配提高了几个语音质量指标，但它需要多个推理步骤。我们通过将训练好的流模型推断为直接预测的方式来解决这个问题。我们的工作表明，更直的时间无关概率路径在生成语音增强方面优于曲线时间相关路径。

摘要： Current flow-based generative speech enhancement methods learn curved probability paths which model a mapping between clean and noisy speech. Despite impressive performance, the implications of curved probability paths are unknown. Methods such as Schrodinger bridges focus on curved paths, where time-dependent gradients and variance do not promote straight paths. Findings in machine learning research suggest that straight paths, such as conditional flow matching, are easier to train and offer better generalisation. In this paper we quantify the effect of path straightness on speech enhancement quality. We report experiments with the Schrodinger bridge, where we show that certain configurations lead to straighter paths. Conversely, we propose independent conditional flow-matching for speech enhancement, which models straight paths between noisy and clean speech. We demonstrate empirically that a time-independent variance has a greater effect on sample quality than the gradient. Although conditional flow matching improves several speech quality metrics, it requires multiple inference steps. We rectify this with a one-step solution by inferring the trained flow-based model as if it was directly predictive. Our work suggests that straighter time-independent probability paths improve generative speech enhancement over curved time-dependent paths.

评论：	预印本，已接受
主题：	声音 (cs.SD) ; 人工智能 (cs.AI); 机器学习 (cs.LG)
引用方式：	arXiv:2508.20584 [cs.SD]
	(或者 arXiv:2508.20584v1 [cs.SD] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.20584

提交历史

来自： Mattias Cross [查看电子邮件]
[v1] 星期四， 2025 年 8 月 28 日 09:21:22 UTC (742 KB)

计算机科学 > 声音

标题：更准确的语音增强条件流匹配流畅化

标题： Flowing Straighter with Conditional Flow Matching for Accurate Speech Enhancement

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 声音

标题： 更准确的语音增强条件流匹配流畅化 显示英文标题

标题： Flowing Straighter with Conditional Flow Matching for Accurate Speech Enhancement

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：更准确的语音增强条件流匹配流畅化