Audio and Speech Processing

Authors and titles for June 2025

Total of 502 entries : 1-50 51-100 101-150 151-200 ... 501-502

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2506.00185 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Pushing the Limits of Beam Search Decoding for Transducer-based ASR models

Title: 基于转录机的ASR模型中推动束搜索解码的极限

Lilit Grigoryan, Vladimir Bataev, Andrei Andrusenko, Hainan Xu, Vitaly Lavrukhin, Boris Ginsburg

Comments: Accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[2] arXiv:2506.00273 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction

Title: SoundSculpt: 基于方向和语义的Ambisonic目标声音提取

Tuochao Chen, D Shin, Hakan Erdogan, Sinan Hersek

Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[3] arXiv:2506.00454 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Towards Temporally Explainable Dysarthric Speech Clarity Assessment

Title: 面向时间可解释的构音障碍语音清晰度评估

Seohyun Park, Chitralekha Gupta, Michelle Kah Yian Kwan, Xinhui Fung, Alexander Wenjun Yip, Suranga Nanayakkara

Comments: Accepted in Interspeech 2025. First two authors were equal contributors

Subjects: Audio and Speech Processing (eess.AS) ; Human-Computer Interaction (cs.HC) ; Sound (cs.SD)
[4] arXiv:2506.00466 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction

Title: M3ANet：用于脑辅助目标说话人提取的多尺度和多模态对齐网络

Cunhang Fan, Ying Chen, Jian Zhou, Zexu Pan, Jingjing Zhang, Youdian Gao, Xiaoke Yang, Zhengqi Wen, Zhao Lv

Comments: Accepted to IJCAI 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[5] arXiv:2506.00506 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for VoiceMOS 2024

Title: 有限数据下噪声语音和增强语音的质量评估：UWB-NTIS 系统用于 VoiceMOS 2024

Marie Kunešová, Aleš Pražák, Jan Lehečka

Comments: Submitted to ICASSP 2026

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[6] arXiv:2506.00733 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Quantifying and Reducing Speaker Heterogeneity within the Common Voice Corpus for Phonetic Analysis

Title: 用于音位分析的Common Voice语料库中说话者异质性的量化与减少

Miao Zhang, Aref Farhadipour, Annie Baker, Jiachen Ma, Bogdan Pricop, Eleanor Chodroff

Comments: Accepted for Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[7] arXiv:2506.00736 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling

Title: 影响：基于迭代掩码的并行解码用于文本到音频生成的扩散建模

Kuan-Po Huang, Shu-wen Yang, Huy Phan, Bo-Ru Lu, Byeonggeun Kim, Sashank Macha, Qingming Tang, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang

Comments: Accepted by ICML 2025. Project website: https://audio-impact.github.io/

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[8] arXiv:2506.00800 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer

Title: CLAP-ART：具有语义丰富音频表示标记器的自动化音频描述生成

Daiki Takeuchi, Binh Thien Nguyen, Masahiro Yasuda, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada

Comments: Accepted to Interspeech2025

Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[9] arXiv:2506.00843 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement

Title: HASRD：分层声学和语义表示解缠

Amir Hussein, Sameer Khurana, Gordon Wichern, Francois G. Germain, Jonathan Le Roux

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[10] arXiv:2506.00861 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment

Title: 利用幅度调制和频率调制节律光谱图进行痴呆分类与评估

Parismita Gogoi, Vishwanath Pratap Singh, Seema Khadirnaikar, Soma Siddhartha, Sishir Kalita, Jagabandhu Mishra, Md Sahidullah, Priyankoo Sarmah, S. R. M. Prasanna

Comments: Accepted in Interspeech, All codes are available in GitHub repo https://github.com/seemark11/DhiNirnayaAMFM

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[11] arXiv:2506.00950 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Crowdsourcing MUSHRA Tests in the Age of Generative Speech Technologies: A Comparative Analysis of Subjective and Objective Testing Methods

Title: 生成式语音技术时代的MUSHRA测试众包：主观和客观测试方法的比较分析

Laura Lechler, Chamran Moradi, Ivana Balic

Comments: This is a preprint of a paper submitted to and accepted for INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[12] arXiv:2506.01014 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching

Title: 通过捷径流匹配的节奏可控且高效的零样本语音转换

Jialong Zuo, Shengpeng Ji, Minghui Fang, Mingze Li, Ziyue Jiang, Xize Cheng, Xiaoda Yang, Chen Feiyang, Xinyu Duan, Zhou Zhao

Comments: Accepted by ACL 2025 (Main Conference)

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[13] arXiv:2506.01039 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data

Title: 伪配对语音转换：利用伪配对数据改进一次性语音转换

Songjun Cao, Qinghua Wu, Jie Chen, Jin Li, Long Ma

Comments: 5 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[14] arXiv:2506.01138 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition

Title: PARROT：通过并行分支Hadamard最优传输结合Mamba和基于注意力的SSL预训练模型进行语音情感识别

Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Jaya Sai Kiran Patibandla, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[15] arXiv:2506.01148 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism

Title: 基于神经音频编解码器表示与频谱融合的心杂音分类通过基于多臂-bandit的交叉注意力机制

Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Santanu Roy, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[16] arXiv:2506.01157 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations

Title: 通过副语言预训练表示的合成语音系统溯源

Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Drishti Singh, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to EUSIPCO 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[17] arXiv:2506.01192 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: GigaAM: Efficient Self-Supervised Learner for Speech Recognition

Title: GigaAM：高效的自监督语音识别学习器

Aleksandr Kutsakov, Alexandr Maximenko, Georgii Gospodinov, Pavel Bogomolov, Fyodor Minkin

Comments: Accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[18] arXiv:2506.01256 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Confidence intervals for forced alignment boundaries using model ensembles

Title: 使用模型集成的强制对齐边界的置信区间

Matthew C. Kelley

Comments: submitted for publication; 7 pages, 1 figure

Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[19] arXiv:2506.01270 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Online Audio-Visual Autoregressive Speaker Extraction

Title: 在线音频视觉自回归说话人提取

Zexu Pan, Wupeng Wang, Shengkui Zhao, Chong Zhang, Kun Zhou, Yukun Ma, Bin Ma

Comments: Interspeech2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[20] arXiv:2506.01483 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction

Title: 基于说话人间相对提示的文本引导目标语音提取

Wang Dai, Archontis Politis, Tuomas Virtanen

Comments: Accepted by Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[21] arXiv:2506.01510 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: LinearVC: Linear transformations of self-supervised features through the lens of voice conversion

Title: LinearVC：通过声转换的视角对自监督特征进行线性变换

Herman Kamper, Benjamin van Niekerk, Julian Zaïdi, Marc-André Carbonneau

Comments: Accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL)
[22] arXiv:2506.01611 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Lessons Learned from the URGENT 2024 Speech Enhancement Challenge

Title: 从URGENT 2024语音增强挑战中学到的经验教训

Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

Comments: 5 pages, 4 figures, 1 table. Accepted by Interspeech 2025. Code available at https://github.com/urgent-challenge/urgent2024_analysis

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD) ; Signal Processing (eess.SP)
[23] arXiv:2506.01618 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech

Title: 基于非监督节奏和语音转换的构音障碍语音识别系统改进

Karl El Hajal, Enno Hermann, Sevada Hovsepyan, Mathew Magimai.-Doss

Comments: Accepted at Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[24] arXiv:2506.01655 (cross-list from eess.AS) [cn-pdf, pdf, other]: Title: Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric

Title: 自监督语音质量评估（S3QA）：利用语音基础模型实现可扩展的语音质量指标

Mattson Ogg, Caitlyn Bishop, Han Yi, Sarah Robinson

Comments: Five tables, three figures, twelve pages

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[25] arXiv:2506.01731 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Benchmarking Neural Speech Codec Intelligibility with SITool

Title: 使用SITool benchmarking神经语音编解码器的可懂度

Anna Leschanowsky, Kishor Kayyar Lakshminarayana, Anjana Rajasekhar, Lyonel Behringer, Ibrahim Kilinc, Guillaume Fuchs, Emanuël A. P. Habets

Comments: submitted to Interspeech

Subjects: Audio and Speech Processing (eess.AS)
[26] arXiv:2506.01845 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: On-device Streaming Discrete Speech Units

Title: 设备端流式离散语音单元

Kwanghee Choi, Masao Someki, Emma Strubell, Shinji Watanabe

Comments: Accepted to Interspeech 2025, source code at https://github.com/Masao-Someki/StreamingDSU

Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[27] arXiv:2506.01916 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: DNCASR: End-to-End Training for Speaker-Attributed ASR

Title: DNCASR：面向说话人属性的端到端语音识别训练

Xianrui Zheng, Chao Zhang, Philip C. Woodland

Comments: Accepted by ACL 2025 Main Conference

Subjects: Audio and Speech Processing (eess.AS)
[28] arXiv:2506.02039 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction

Title: 无听力图：利用现有分数进行个性化言语可懂度预测

Haoshuai Zhou, Changgeng Mo, Boxuan Cao, Linkai Li, Shan Xiang Wang

Comments: Accepted at Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD)
[29] arXiv:2506.02078 (cross-list from eess.AS) [cn-pdf, pdf, other]: Title: Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data

Title: 评估预训练音频嵌入在帕金森氏症语音数据分类中的有效性

Emmy Postma, Cristian Tejedor-Garcia

Comments: Accepted to Interspeech 2025. This publication is part of the project Responsible AI for Voice Diagnostics (RAIVD) with file number NGF.1607.22.013 of the research programme NGF AiNed Fellowship Grants which is financed by the Dutch Research Council (NWO)

Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI)
[30] arXiv:2506.02080 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge

Title: 基于音系知识的CTC误读检测中GOP的增强

Aditya Kamlesh Parikh, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik

Comments: Accepted to Interspeech 2025. This publication is part of the project Responsible AI for Voice Diagnostics (RAIVD) with file number NGF.1607.22.013 of the research programme NGF AiNed Fellowship Grants which is financed by the Dutch Research Council (NWO)

Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI)
[31] arXiv:2506.02166 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Dhvani: A Weakly-supervised Phonemic Error Detection and Personalized Feedback System for Hindi

Title: Dhvani：印地语的弱监督音素错误检测与个性化反馈系统

Arnav Rustagi, Satvik Bajpai, Nimrat Kaur, Siddharth Siddharth

Comments: Accepted for publication at Interspeech 2025 to be held in Rotterdam, the Netherlands

Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
[32] arXiv:2506.02230 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Towards Machine Unlearning for Paralinguistic Speech Processing

Title: 面向机器遗忘的副语言语音处理

Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Shubham Singh, Swarup Ranjan Behera, Vandana Rajan, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[33] arXiv:2506.02232 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction

Title: 探究说话人预训练模型的合理有效性及其在SingMOS预测中的协同作用

Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[34] arXiv:2506.02258 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Are Mamba-based Audio Foundation Models the Best Fit for Non-Verbal Emotion Recognition?

Title: 基于 Mamba 的音频基础模型是否最适合非语言情感识别？

Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Girish, Swarup Ranjan Behera, Ananda Chandra Nayak, Sanjib Kumar Nayak, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to EUSIPCO 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[35] arXiv:2506.02339 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss

Title: 带有一致性损失的音乐混合歌词转录增强

Jiawen Huang, Felipe Sousa, Emir Demirel, Emmanouil Benetos, Igor Gadelha

Comments: submitted to Interspeech

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[36] arXiv:2506.02505 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Adaptive Differential Denoising for Respiratory Sounds Classification

Title: 自适应微分去噪在呼吸音分类中的应用

Gaoyang Dong, Zhicheng Zhang, Ping Sun, Minghui Zhang

Comments: accepted at Interspeech2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[37] arXiv:2506.02742 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions

Title: 基于提示的未见情绪：使用提示大型语言模型上下文知识的多情感表达语音合成

Xiaoxue Gao, Huayun Zhang, Nancy F. Chen

Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD) ; Signal Processing (eess.SP)
[38] arXiv:2506.02773 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers

Title: AuralNet：基于分层注意力的重叠说话人三维双耳定位

Linya Fu, Yu Liu, Zhijie Liu, Zedong Yang, Zhong-Qiu Wang, Youfu Li, He Kong

Comments: Accepted and to appear at Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[39] arXiv:2506.02777 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: On the influence of language similarity in non-target speaker verification trials

Title: 关于语言相似性在非目标说话人验证试验中的影响

Paul M. Reuter, Michael Jessen

Comments: accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[40] arXiv:2506.02797 (cross-list from eess.AS) [cn-pdf, pdf, other]: Title: Fast-Converging Distributed Signal Estimation in Topology-Unconstrained Wireless Acoustic Sensor Networks

Title: 拓扑无约束无线声学传感器网络中的快速收敛分布式信号估计

Paul Didier, Toon van Waterschoot, Simon Doclo, Jörg Bitzer, Marc Moonen

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[41] arXiv:2506.02858 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization

Title: DGMO：通过扩散引导掩码优化的无训练音频源分离

Geonyoung Lee, Geonhee Han, Paul Hongsuck Seo

Comments: Interspeech 2025

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI)
[42] arXiv:2506.02863 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

Title: CapSpeech：在风格标注文本到语音中启用下游应用

Helin Wang, Jiarui Hai, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, Zengyi Qin, Shrikanth Narayanan, Mounya Elhiali, Najim Dehak

Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD)
[43] arXiv:2506.02908 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency

Title: 扩散缓冲区：具有亚秒延迟的在线扩散语音增强

Bunlong Lay, Rostislav Makarov, Timo Gerkmann

Comments: 5 pages, 2 figures, Accepted to Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG)
[44] arXiv:2506.02958 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing

Title: 部分编辑：神经语音编辑时代的部分深度伪造识别

You Zhang, Baotong Tian, Lin Zhang, Zhiyao Duan

Comments: Interspeech 2025 camera ready. Project page: https://yzyouzhang.com/PartialEdit/

Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[45] arXiv:2506.03020 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: InfiniteAudio: Infinite-Length Audio Generation with Consistency

Title: InfiniteAudio: 具有一致性的无限长度音频生成

Chaeyoung Jung, Hojoon Ki, Ji-Hoon Kim, Junmo Kim, Joon Son Chung

Subjects: Audio and Speech Processing (eess.AS)
[46] arXiv:2506.03364 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models

Title: 基于多模态基础模型的歌唱语音深度伪造源归因研究

Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Priyabrata Mallick, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS) ; Multimedia (cs.MM) ; Sound (cs.SD)
[47] arXiv:2506.03378 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer

Title: SNIFR：通过级联交叉Transformer实现音频-视觉对齐以提升细粒度儿童有害内容检测

Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Abu Osama Siddiqui, Sarthak Jain, Priyabrata Mallick, Jaya Sai Kiran Patibandla, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS) ; Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[48] arXiv:2506.03403 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition

Title: HYFuse：在双曲空间中对齐异构语音预训练表示用于语音情感识别

Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

Comments: Accepted to INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS)
[49] arXiv:2506.03425 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations

Title: 基于数据驱动的扩散方法的音频深度伪造解释

Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj

Comments: 5 pages, 3 figures, accepted at Interspeech 2025

Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
[50] arXiv:2506.03515 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing

Title: BitTTS：使用1.58位量化和权重索引的高度紧凑型文本到语音

Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto

Comments: Accepted to INTERSPEECH 2025

Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD) ; Signal Processing (eess.SP)

Total of 502 entries : 1-50 51-100 101-150 151-200 ... 501-502

Showing up to 50 entries per page: fewer | more | all