RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer

Matiyali, Neeraj; Srivastava, Siddharth; Sharma, Gaurav

计算机科学 > 声音

arXiv:2508.17031 (cs)

[提交于 2025年8月23日 ]

标题： RephraseTTS：基于动态长度文本的语音插入与说话人风格迁移

标题： RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer

Authors:Neeraj Matiyali, Siddharth Srivastava, Gaurav Sharma

摘要：我们提出了一种针对文本条件语音插入任务的方法，即在输入语音样本中插入一个语音样本，该任务基于相应的完整文本转录稿进行条件控制。该任务的一个示例用例是在对相应文本转录稿进行更正时更新语音音频。所提出的方法采用基于Transformer的非自回归方法，允许插入不同长度的语音，这些长度在推理过程中根据文本转录稿和可用部分输入的节奏动态确定。它能够保持可用语音输入的说话人声音特征、语调和其他频谱特性。我们在LibriTTS上的实验结果和用户研究表明，我们的方法优于基于现有自适应文本到语音方法的基线。我们还提供了大量定性结果，以欣赏所提方法输出的质量。

摘要： We propose a method for the task of text-conditioned speech insertion, i.e. inserting a speech sample in an input speech sample, conditioned on the corresponding complete text transcript. An example use case of the task would be to update the speech audio when corrections are done on the corresponding text transcript. The proposed method follows a transformer-based non-autoregressive approach that allows speech insertions of variable lengths, which are dynamically determined during inference, based on the text transcript and tempo of the available partial input. It is capable of maintaining the speaker's voice characteristics, prosody and other spectral properties of the available speech input. Results from our experiments and user study on LibriTTS show that our method outperforms baselines based on an existing adaptive text to speech method. We also provide numerous qualitative results to appreciate the quality of the output from the proposed method.

主题：	声音 (cs.SD) ; 计算与语言 (cs.CL)
引用方式：	arXiv:2508.17031 [cs.SD]
	(或者 arXiv:2508.17031v1 [cs.SD] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.17031

提交历史

来自： Neeraj Matiyali [查看电子邮件]
[v1] 星期六， 2025 年 8 月 23 日 14:12:49 UTC (198 KB)

计算机科学 > 声音

标题： RephraseTTS：基于动态长度文本的语音插入与说话人风格迁移

标题： RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 声音

标题： RephraseTTS：基于动态长度文本的语音插入与说话人风格迁移 显示英文标题

标题： RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： RephraseTTS：基于动态长度文本的语音插入与说话人风格迁移