Recomposer: Event-roll-guided generative audio editing

Ellis, Daniel P. W.; Fonseca, Eduardo; Weiss, Ron J.; Wilson, Kevin; Wisdom, Scott; Erdogan, Hakan; Hershey, John R.; Jansen, Aren; Moore, R. Channing; Plakal, Manoj

计算机科学 > 声音

arXiv:2509.05256 (cs)

[提交于 2025年9月5日 ]

标题： Recomposer：事件滚动引导的生成音频编辑

标题： Recomposer: Event-roll-guided generative audio editing

Authors:Daniel P. W. Ellis, Eduardo Fonseca, Ron J. Weiss, Kevin Wilson, Scott Wisdom, Hakan Erdogan, John R. Hershey, Aren Jansen, R. Channing Moore, Manoj Plakal

摘要：编辑复杂的现实世界声音场景很困难，因为各个声音源在时间上会重叠。生成模型可以根据其对数据领域的强大先验理解来填补缺失或损坏的细节。我们提出了一种系统，能够在复杂场景中编辑单独的声音事件，能够根据文本编辑描述（例如“增强门”）和从“事件滚动”转录中派生出的声音事件时间图形表示来删除、插入和增强单独的声音事件。我们提出了一种在SoundStream表示上工作的编码器-解码器变压器，在合成的（输入，期望输出）音频示例对上进行训练，这些示例是通过将孤立的声音事件添加到密集的真实世界背景中形成的。评估揭示了每个编辑描述部分的重要性——动作、类别、时间。我们的工作展示了“重新组合”是一个重要且实用的应用。

摘要： Editing complex real-world sound scenes is difficult because individual sound sources overlap in time. Generative models can fill-in missing or corrupted details based on their strong prior understanding of the data domain. We present a system for editing individual sound events within complex scenes able to delete, insert, and enhance individual sound events based on textual edit descriptions (e.g., ``enhance Door'') and a graphical representation of the event timing derived from an ``event roll'' transcription. We present an encoder-decoder transformer working on SoundStream representations, trained on synthetic (input, desired output) audio example pairs formed by adding isolated sound events to dense, real-world backgrounds. Evaluation reveals the importance of each part of the edit descriptions -- action, class, timing. Our work demonstrates ``recomposition'' is an important and practical application.

评论：	5页，5图
主题：	声音 (cs.SD) ; 人工智能 (cs.AI); 机器学习 (cs.LG); 音频与语音处理 (eess.AS)
引用方式：	arXiv:2509.05256 [cs.SD]
	(或者 arXiv:2509.05256v1 [cs.SD] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.05256

提交历史

来自： Daniel Ellis [查看电子邮件]
[v1] 星期五， 2025 年 9 月 5 日 17:14:29 UTC (2,104 KB)

计算机科学 > 声音

标题： Recomposer：事件滚动引导的生成音频编辑

标题： Recomposer: Event-roll-guided generative audio editing

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 声音

标题： Recomposer：事件滚动引导的生成音频编辑 显示英文标题

标题： Recomposer: Event-roll-guided generative audio editing

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： Recomposer：事件滚动引导的生成音频编辑