Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > eess.AS

Help | Advanced Search

Audio and Speech Processing

Authors and titles for June 2025

Total of 502 entries : 1-50 51-100 101-150 151-200 ... 501-502
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:2506.00185 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
Title: 基于转录机的ASR模型中推动束搜索解码的极限
Lilit Grigoryan, Vladimir Bataev, Andrei Andrusenko, Hainan Xu, Vitaly Lavrukhin, Boris Ginsburg
Comments: Accepted to Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[2] arXiv:2506.00273 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction
Title: SoundSculpt: 基于方向和语义的Ambisonic目标声音提取
Tuochao Chen, D Shin, Hakan Erdogan, Sinan Hersek
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[3] arXiv:2506.00454 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Towards Temporally Explainable Dysarthric Speech Clarity Assessment
Title: 面向时间可解释的构音障碍语音清晰度评估
Seohyun Park, Chitralekha Gupta, Michelle Kah Yian Kwan, Xinhui Fung, Alexander Wenjun Yip, Suranga Nanayakkara
Comments: Accepted in Interspeech 2025. First two authors were equal contributors
Subjects: Audio and Speech Processing (eess.AS) ; Human-Computer Interaction (cs.HC) ; Sound (cs.SD)
[4] arXiv:2506.00466 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction
Title: M3ANet:用于脑辅助目标说话人提取的多尺度和多模态对齐网络
Cunhang Fan, Ying Chen, Jian Zhou, Zexu Pan, Jingjing Zhang, Youdian Gao, Xiaoke Yang, Zhengqi Wen, Zhao Lv
Comments: Accepted to IJCAI 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[5] arXiv:2506.00506 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for VoiceMOS 2024
Title: 有限数据下噪声语音和增强语音的质量评估:UWB-NTIS 系统用于 VoiceMOS 2024
Marie Kunešová, Aleš Pražák, Jan Lehečka
Comments: Submitted to ICASSP 2026
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[6] arXiv:2506.00733 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Quantifying and Reducing Speaker Heterogeneity within the Common Voice Corpus for Phonetic Analysis
Title: 用于音位分析的Common Voice语料库中说话者异质性的量化与减少
Miao Zhang, Aref Farhadipour, Annie Baker, Jiachen Ma, Bogdan Pricop, Eleanor Chodroff
Comments: Accepted for Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[7] arXiv:2506.00736 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
Title: 影响:基于迭代掩码的并行解码用于文本到音频生成的扩散建模
Kuan-Po Huang, Shu-wen Yang, Huy Phan, Bo-Ru Lu, Byeonggeun Kim, Sashank Macha, Qingming Tang, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang
Comments: Accepted by ICML 2025. Project website: https://audio-impact.github.io/
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[8] arXiv:2506.00800 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Title: CLAP-ART:具有语义丰富音频表示标记器的自动化音频描述生成
Daiki Takeuchi, Binh Thien Nguyen, Masahiro Yasuda, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada
Comments: Accepted to Interspeech2025
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[9] arXiv:2506.00843 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement
Title: HASRD:分层声学和语义表示解缠
Amir Hussein, Sameer Khurana, Gordon Wichern, Francois G. Germain, Jonathan Le Roux
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[10] arXiv:2506.00861 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment
Title: 利用幅度调制和频率调制节律光谱图进行痴呆分类与评估
Parismita Gogoi, Vishwanath Pratap Singh, Seema Khadirnaikar, Soma Siddhartha, Sishir Kalita, Jagabandhu Mishra, Md Sahidullah, Priyankoo Sarmah, S. R. M. Prasanna
Comments: Accepted in Interspeech, All codes are available in GitHub repo https://github.com/seemark11/DhiNirnayaAMFM
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[11] arXiv:2506.00950 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Crowdsourcing MUSHRA Tests in the Age of Generative Speech Technologies: A Comparative Analysis of Subjective and Objective Testing Methods
Title: 生成式语音技术时代的MUSHRA测试众包:主观和客观测试方法的比较分析
Laura Lechler, Chamran Moradi, Ivana Balic
Comments: This is a preprint of a paper submitted to and accepted for INTERSPEECH 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[12] arXiv:2506.01014 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Title: 通过捷径流匹配的节奏可控且高效的零样本语音转换
Jialong Zuo, Shengpeng Ji, Minghui Fang, Mingze Li, Ziyue Jiang, Xize Cheng, Xiaoda Yang, Chen Feiyang, Xinyu Duan, Zhou Zhao
Comments: Accepted by ACL 2025 (Main Conference)
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[13] arXiv:2506.01039 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data
Title: 伪配对语音转换:利用伪配对数据改进一次性语音转换
Songjun Cao, Qinghua Wu, Jie Chen, Jin Li, Long Ma
Comments: 5 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[14] arXiv:2506.01138 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
Title: PARROT:通过并行分支Hadamard最优传输结合Mamba和基于注意力的SSL预训练模型进行语音情感识别
Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Jaya Sai Kiran Patibandla, Arun Balaji Buduru, Rajesh Sharma
Comments: Accepted to INTERSPEECH 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[15] arXiv:2506.01148 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism
Title: 基于神经音频编解码器表示与频谱融合的心杂音分类通过基于多臂-bandit的交叉注意力机制
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Santanu Roy, Arun Balaji Buduru, Rajesh Sharma
Comments: Accepted to INTERSPEECH 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[16] arXiv:2506.01157 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations
Title: 通过副语言预训练表示的合成语音系统溯源
Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Drishti Singh, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
Comments: Accepted to EUSIPCO 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[17] arXiv:2506.01192 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: GigaAM: Efficient Self-Supervised Learner for Speech Recognition
Title: GigaAM:高效的自监督语音识别学习器
Aleksandr Kutsakov, Alexandr Maximenko, Georgii Gospodinov, Pavel Bogomolov, Fyodor Minkin
Comments: Accepted to Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[18] arXiv:2506.01256 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Confidence intervals for forced alignment boundaries using model ensembles
Title: 使用模型集成的强制对齐边界的置信区间
Matthew C. Kelley
Comments: submitted for publication; 7 pages, 1 figure
Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[19] arXiv:2506.01270 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Online Audio-Visual Autoregressive Speaker Extraction
Title: 在线音频视觉自回归说话人提取
Zexu Pan, Wupeng Wang, Shengkui Zhao, Chong Zhang, Kun Zhou, Yukun Ma, Bin Ma
Comments: Interspeech2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[20] arXiv:2506.01483 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction
Title: 基于说话人间相对提示的文本引导目标语音提取
Wang Dai, Archontis Politis, Tuomas Virtanen
Comments: Accepted by Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[21] arXiv:2506.01510 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: LinearVC: Linear transformations of self-supervised features through the lens of voice conversion
Title: LinearVC:通过声转换的视角对自监督特征进行线性变换
Herman Kamper, Benjamin van Niekerk, Julian Zaïdi, Marc-André Carbonneau
Comments: Accepted to Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL)
[22] arXiv:2506.01611 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
Title: 从URGENT 2024语音增强挑战中学到的经验教训
Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian
Comments: 5 pages, 4 figures, 1 table. Accepted by Interspeech 2025. Code available at https://github.com/urgent-challenge/urgent2024_analysis
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD) ; Signal Processing (eess.SP)
[23] arXiv:2506.01618 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech
Title: 基于非监督节奏和语音转换的构音障碍语音识别系统改进
Karl El Hajal, Enno Hermann, Sevada Hovsepyan, Mathew Magimai.-Doss
Comments: Accepted at Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[24] arXiv:2506.01655 (cross-list from eess.AS) [cn-pdf, pdf, other]
Title: Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric
Title: 自监督语音质量评估(S3QA):利用语音基础模型实现可扩展的语音质量指标
Mattson Ogg, Caitlyn Bishop, Han Yi, Sarah Robinson
Comments: Five tables, three figures, twelve pages
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[25] arXiv:2506.01731 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Benchmarking Neural Speech Codec Intelligibility with SITool
Title: 使用SITool benchmarking神经语音编解码器的可懂度
Anna Leschanowsky, Kishor Kayyar Lakshminarayana, Anjana Rajasekhar, Lyonel Behringer, Ibrahim Kilinc, Guillaume Fuchs, Emanuël A. P. Habets
Comments: submitted to Interspeech
Subjects: Audio and Speech Processing (eess.AS)
[26] arXiv:2506.01845 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: On-device Streaming Discrete Speech Units
Title: 设备端流式离散语音单元
Kwanghee Choi, Masao Someki, Emma Strubell, Shinji Watanabe
Comments: Accepted to Interspeech 2025, source code at https://github.com/Masao-Someki/StreamingDSU
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[27] arXiv:2506.01916 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: DNCASR: End-to-End Training for Speaker-Attributed ASR
Title: DNCASR:面向说话人属性的端到端语音识别训练
Xianrui Zheng, Chao Zhang, Philip C. Woodland
Comments: Accepted by ACL 2025 Main Conference
Subjects: Audio and Speech Processing (eess.AS)
[28] arXiv:2506.02039 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction
Title: 无听力图:利用现有分数进行个性化言语可懂度预测
Haoshuai Zhou, Changgeng Mo, Boxuan Cao, Linkai Li, Shan Xiang Wang
Comments: Accepted at Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD)
[29] arXiv:2506.02078 (cross-list from eess.AS) [cn-pdf, pdf, other]
Title: Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
Title: 评估预训练音频嵌入在帕金森氏症语音数据分类中的有效性
Emmy Postma, Cristian Tejedor-Garcia
Comments: Accepted to Interspeech 2025. This publication is part of the project Responsible AI for Voice Diagnostics (RAIVD) with file number NGF.1607.22.013 of the research programme NGF AiNed Fellowship Grants which is financed by the Dutch Research Council (NWO)
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI)
[30] arXiv:2506.02080 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge
Title: 基于音系知识的CTC误读检测中GOP的增强
Aditya Kamlesh Parikh, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik
Comments: Accepted to Interspeech 2025. This publication is part of the project Responsible AI for Voice Diagnostics (RAIVD) with file number NGF.1607.22.013 of the research programme NGF AiNed Fellowship Grants which is financed by the Dutch Research Council (NWO)
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI)
[31] arXiv:2506.02166 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Dhvani: A Weakly-supervised Phonemic Error Detection and Personalized Feedback System for Hindi
Title: Dhvani:印地语的弱监督音素错误检测与个性化反馈系统
Arnav Rustagi, Satvik Bajpai, Nimrat Kaur, Siddharth Siddharth
Comments: Accepted for publication at Interspeech 2025 to be held in Rotterdam, the Netherlands
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
[32] arXiv:2506.02230 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Towards Machine Unlearning for Paralinguistic Speech Processing
Title: 面向机器遗忘的副语言语音处理
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Shubham Singh, Swarup Ranjan Behera, Vandana Rajan, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma
Comments: Accepted to INTERSPEECH 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[33] arXiv:2506.02232 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
Title: 探究说话人预训练模型的合理有效性及其在SingMOS预测中的协同作用
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
Comments: Accepted to INTERSPEECH 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[34] arXiv:2506.02258 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?
Title: 基于 Mamba 的音频基础模型是否最适合非语言情感识别?
Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Girish, Swarup Ranjan Behera, Ananda Chandra Nayak, Sanjib Kumar Nayak, Arun Balaji Buduru, Rajesh Sharma
Comments: Accepted to EUSIPCO 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[35] arXiv:2506.02339 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss
Title: 带有一致性损失的音乐混合歌词转录增强
Jiawen Huang, Felipe Sousa, Emir Demirel, Emmanouil Benetos, Igor Gadelha
Comments: submitted to Interspeech
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[36] arXiv:2506.02505 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Adaptive Differential Denoising for Respiratory Sounds Classification
Title: 自适应微分去噪在呼吸音分类中的应用
Gaoyang Dong, Zhicheng Zhang, Ping Sun, Minghui Zhang
Comments: accepted at Interspeech2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[37] arXiv:2506.02742 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions
Title: 基于提示的未见情绪:使用提示大型语言模型上下文知识的多情感表达语音合成
Xiaoxue Gao, Huayun Zhang, Nancy F. Chen
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD) ; Signal Processing (eess.SP)
[38] arXiv:2506.02773 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers
Title: AuralNet:基于分层注意力的重叠说话人三维双耳定位
Linya Fu, Yu Liu, Zhijie Liu, Zedong Yang, Zhong-Qiu Wang, Youfu Li, He Kong
Comments: Accepted and to appear at Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[39] arXiv:2506.02777 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: On the influence of language similarity in non-target speaker verification trials
Title: 关于语言相似性在非目标说话人验证试验中的影响
Paul M. Reuter, Michael Jessen
Comments: accepted to Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[40] arXiv:2506.02797 (cross-list from eess.AS) [cn-pdf, pdf, other]
Title: Fast-Converging Distributed Signal Estimation in Topology-Unconstrained Wireless Acoustic Sensor Networks
Title: 拓扑无约束无线声学传感器网络中的快速收敛分布式信号估计
Paul Didier, Toon van Waterschoot, Simon Doclo, Jörg Bitzer, Marc Moonen
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[41] arXiv:2506.02858 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization
Title: DGMO:通过扩散引导掩码优化的无训练音频源分离
Geonyoung Lee, Geonhee Han, Paul Hongsuck Seo
Comments: Interspeech 2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI)
[42] arXiv:2506.02863 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
Title: CapSpeech:在风格标注文本到语音中启用下游应用
Helin Wang, Jiarui Hai, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, Zengyi Qin, Shrikanth Narayanan, Mounya Elhiali, Najim Dehak
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD)
[43] arXiv:2506.02908 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency
Title: 扩散缓冲区:具有亚秒延迟的在线扩散语音增强
Bunlong Lay, Rostislav Makarov, Timo Gerkmann
Comments: 5 pages, 2 figures, Accepted to Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG)
[44] arXiv:2506.02958 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing
Title: 部分编辑:神经语音编辑时代的部分深度伪造识别
You Zhang, Baotong Tian, Lin Zhang, Zhiyao Duan
Comments: Interspeech 2025 camera ready. Project page: https://yzyouzhang.com/PartialEdit/
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[45] arXiv:2506.03020 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: InfiniteAudio: Infinite-Length Audio Generation with Consistency
Title: InfiniteAudio: 具有一致性的无限长度音频生成
Chaeyoung Jung, Hojoon Ki, Ji-Hoon Kim, Junmo Kim, Joon Son Chung
Subjects: Audio and Speech Processing (eess.AS)
[46] arXiv:2506.03364 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models
Title: 基于多模态基础模型的歌唱语音深度伪造源归因研究
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
Comments: Accepted to INTERSPEECH 2025
Subjects: Audio and Speech Processing (eess.AS) ; Multimedia (cs.MM) ; Sound (cs.SD)
[47] arXiv:2506.03378 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer
Title: SNIFR:通过级联交叉Transformer实现音频-视觉对齐以提升细粒度儿童有害内容检测
Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Abu Osama Siddiqui, Sarthak Jain, Priyabrata Mallick, Jaya Sai Kiran Patibandla, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
Comments: Accepted to INTERSPEECH 2025
Subjects: Audio and Speech Processing (eess.AS) ; Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[48] arXiv:2506.03403 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
Title: HYFuse:在双曲空间中对齐异构语音预训练表示用于语音情感识别
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma
Comments: Accepted to INTERSPEECH 2025
Subjects: Audio and Speech Processing (eess.AS)
[49] arXiv:2506.03425 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations
Title: 基于数据驱动的扩散方法的音频深度伪造解释
Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj
Comments: 5 pages, 3 figures, accepted at Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
[50] arXiv:2506.03515 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
Title: BitTTS:使用1.58位量化和权重索引的高度紧凑型文本到语音
Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto
Comments: Accepted to INTERSPEECH 2025
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD) ; Signal Processing (eess.SP)
Total of 502 entries : 1-50 51-100 101-150 151-200 ... 501-502
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号