Controlling your Attributes in Voice

Li, Xuyuan; Wang, Zengqiang Shang. Li; Zhang, Pengyuan

计算机科学 > 声音

arXiv:2501.01674v1 (cs)

[提交于 2025年1月3日 ]

标题：控制您的属性在语音中

标题： Controlling your Attributes in Voice

Authors:Xuyuan Li, Zengqiang Shang.Li Wang, Pengyuan Zhang

摘要：属性控制在生成任务中旨在修改个人属性，如年龄和性别，同时保留源样本中的身份信息。尽管在图像生成中对面部属性的控制取得了显著进展，但类似的方法在语音生成中仍大多未被探索。这封信提出了一种在没有平行数据的情况下控制说话人属性的新方法。我们的方法包括两个主要组成部分：基于GAN的说话人表示变分自编码器，从说话人向量中提取说话人身份和属性，以及一个两阶段的声音转换模型，该模型捕捉语音中说话人属性的自然表达。实验结果表明，我们提出的方法不仅在说话人表示层面实现了属性控制，还能够在保持语音质量和说话人身份的同时，在语音层面实现对说话人年龄和性别的操控。

摘要： Attribute control in generative tasks aims to modify personal attributes, such as age and gender while preserving the identity information in the source sample. Although significant progress has been made in controlling facial attributes in image generation, similar approaches for speech generation remain largely unexplored. This letter proposes a novel method for controlling speaker attributes in speech without parallel data. Our approach consists of two main components: a GAN-based speaker representation variational autoencoder that extracts speaker identity and attributes from speaker vector, and a two-stage voice conversion model that captures the natural expression of speaker attributes in speech. Experimental results show that our proposed method not only achieves attribute control at the speaker representation level but also enables manipulation of the speaker age and gender at the speech level while preserving speech quality and speaker identity.

评论：	5页，3图
主题：	声音 (cs.SD) ; 音频与语音处理 (eess.AS)
引用方式：	arXiv:2501.01674 [cs.SD]
	(或者 arXiv:2501.01674v1 [cs.SD] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.01674

提交历史

来自： Xuyuan Li [查看电子邮件]
[v1] 星期五， 2025 年 1 月 3 日 07:35:08 UTC (1,921 KB)

计算机科学 > 声音

标题：控制您的属性在语音中

标题： Controlling your Attributes in Voice

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 声音

标题： 控制您的属性在语音中 显示英文标题

标题： Controlling your Attributes in Voice

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：控制您的属性在语音中