Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs.SD

Help | Advanced Search

Sound

Authors and titles for July 2025

Total of 322 entries : 1-50 51-100 101-150 151-200 ... 301-322
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:2507.00229 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss
Title: 一种使用复数全局注意力模块和频谱时域损失的高保真语音超分辨率网络
Tarikul Islam Tamiti, Biraj Joshi, Rida Hasan, Rashedul Hasan, Taieba Athay, Nursad Mamun, Anomadarshi Barua
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[2] arXiv:2507.00466 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
Title: 基于端到端Transformer架构的演奏MIDI中的节拍和强拍跟踪
Sebastian Murgul, Michael Heizmann
Comments: Accepted to the 22nd Sound and Music Computing Conference (SMC), 2025
Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[3] arXiv:2507.00475 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences
Title: AudioBERTScore:基于音频嵌入序列相似性的环境声音合成客观评估
Minoru Kishi, Ryosuke Sakai, Shinnosuke Takamichi, Yusuke Kanamori, Yuki Okamoto
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[4] arXiv:2507.00498 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: MuteSwap: Visual-informed Silent Video Identity Conversion
Title: MuteSwap:视觉引导的无声视频身份转换
Yifan Liu, Yu Fang, Zhouhan Lin
Subjects: Sound (cs.SD) ; Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[5] arXiv:2507.00693 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Leveraging Large Language Models for Spontaneous Speech-Based Suicide Risk Detection
Title: 利用大型语言模型进行自发言语的自杀风险检测
Yifan Gao, Jiao Fu, Long Guo, Hong Liu
Comments: Accepted to Interspeech 2025
Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[6] arXiv:2507.00808 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Multi-interaction TTS toward professional recording reproduction
Title: 多交互语音合成以实现专业录音再现
Hiroki Kanagawa, Kenichi Fujita, Aya Watanabe, Yusuke Ijima
Comments: 7 pages,6 figures, Accepted to Speech Synthesis Workshop 2025 (SSW13)
Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[7] arXiv:2507.00966 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
Title: MambAttention:具有多头注意力的Mamba用于可推广的单通道语音增强
Nikolai Lund Kühne, Jesper Jensen, Jan Østergaard, Zheng-Hua Tan
Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing for possible publication
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[8] arXiv:2507.01339 (cross-list from cs.SD) [cn-pdf, pdf, other]
Title: User-guided Generative Source Separation
Title: 用户引导的生成性源分离
Yutong Wen, Minje Kim, Paris Smaragdis
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[9] arXiv:2507.01563 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Real-Time Emergency Vehicle Siren Detection with Efficient CNNs on Embedded Hardware
Title: 基于嵌入式硬件的高效卷积神经网络实时应急车辆警报检测
Marco Giordano, Stefano Giacomelli, Claudia Rinaldi, Fabio Graziosi
Comments: 10 pages, 10 figures, submitted to https://internetofsounds2025.ieee-is2.org/. arXiv admin note: text overlap with arXiv:2506.23437
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[10] arXiv:2507.01582 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder
Title: 探索使用富有表现力的音乐变分自编码器进行古典钢琴演奏生成
Jing Luo, Xinyu Yang, Jie Wei
Comments: Accepted by IEEE SMC 2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[11] arXiv:2507.01805 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: A Dataset for Automatic Assessment of TTS Quality in Spanish
Title: 用于西班牙语文本到语音质量自动评估的数据集
Alejandro Sosa Welford, Leonardo Pepino
Comments: 5 pages, 2 figures. Accepted at Interspeech 2025
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[12] arXiv:2507.01974 (cross-list from cs.SD) [cn-pdf, pdf, other]
Title: Acoustic evaluation of a neural network dedicated to the detection of animal vocalisations
Title: 声学评估用于检测动物叫声的神经网络
Jérémy Rouch (CRNL-ENES), M Ducrettet (CRNL-ENES, ISYEB), S Haupert (ISYEB), R Emonet (LabHC), F Sèbe (CRNL-ENES, OFB - DRAS)
Journal-ref: 17e Congr{\`e}s Fran{\c c}ais d'Acoustique, soci{\'e}t{\'e} fran{\c c}aise d'acoustique, Apr 2025, Paris Universit{\'e} Sorbonne Nouvelle, France
Subjects: Sound (cs.SD) ; Machine Learning (cs.LG)
[13] arXiv:2507.02176 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis
Title: 分析和改进语音合成中的说话人相似性评估
Marc-André Carbonneau, Benjamin van Niekerk, Hugo Seuté, Jean-Philippe Letendre, Herman Kamper, Julian Zaïdi
Comments: Accepted at SSW13 - Interspeech 2025 Speech Synthesis Workshop
Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[14] arXiv:2507.02273 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures
Title: Fx-Encoder++:从混音中提取乐器相关的音频效果表示
Yen-Tung Yeh, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yi-Hsuan Yang, Yuki Mitsufuji
Comments: ISMIR 2025
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[15] arXiv:2507.02380 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: JoyTTS: LLM-based Spoken Chatbot With Voice Cloning
Title: JoyTTS:基于大语言模型的语音聊天机器人与语音克隆
Fangru Zhou, Jun Zhao, Guoxin Wang
Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[16] arXiv:2507.02391 (cross-list from cs.SD) [cn-pdf, pdf, other]
Title: Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
Title: 基于后验转换的无监督扩散语音增强
Mostafa Sadeghi (MULTISPEECH), Jean-Eudes Ayilo (MULTISPEECH), Romain Serizel (MULTISPEECH), Xavier Alameda-Pineda (ROBOTLEARN)
Journal-ref: IEEE Signal Processing Letters, pp.1-5
Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[17] arXiv:2507.02606 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks
Title: 去防伪:重新思考针对语音克隆攻击的保护性扰动
Wei Fan, Kejiang Chen, Chang Liu, Weiming Zhang, Nenghai Yu
Comments: Accepted by ICML 2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[18] arXiv:2507.02666 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
Title: ASDA:用于自监督表示学习的音频频谱差分注意力机制
Junyu Wang, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang
Comments: Accepted at Interspeech2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[19] arXiv:2507.02915 (cross-list from cs.SD) [cn-pdf, pdf, other]
Title: Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning
Title: 音频-JEPA:用于音频表示学习的联合嵌入预测架构
Ludovic Tuncay (IRIT-SAMoVA), Etienne Labbé (IRIT-SAMoVA), Emmanouil Benetos (QMUL), Thomas Pellegrini (IRIT-SAMoVA)
Journal-ref: ICME 2025, Jun 2025, Nantes, France
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS) ; Signal Processing (eess.SP)
[20] arXiv:2507.03251 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention
Title: 通过频谱学习和注意力实现高效的语音情感识别
HyeYoung Lee, Muhammad Nadeem
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[21] arXiv:2507.03377 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Eigenvoice Synthesis based on Model Editing for Speaker Generation
Title: 基于模型编辑的说话人生成特征语音合成
Masato Murata, Koichi Miyazaki, Tomoki Koriyama, Tomoki Toda
Comments: Accepted by INTERSPEECH 2025
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[22] arXiv:2507.03382 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control
Title: 说话人无关的情感向量用于跨说话人情感强度控制
Masato Murata, Koichi Miyazaki, Tomoki Koriyama
Comments: Accepted by INTERSPEECH 2025
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[23] arXiv:2507.03395 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: MaskBeat: Loopable Drum Beat Generation
Title: MaskBeat:可循环的鼓点生成
Luca A. Lanzendörfer, Florian Grötschla, Karim Galal, Roger Wattenhofer
Comments: Extended Abstract ISMIR 2025
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[24] arXiv:2507.03466 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Direction Estimation of Sound Sources Using Microphone Arrays and Signal Strength
Title: 基于麦克风阵列和信号强度的声音源方向估计
Mahdi Ali Pour, Zahra Habibzadeh
Comments: 5 pages
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS) ; Systems and Control (eess.SY)
[25] arXiv:2507.03468 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation
Title: 部分虚假语音的鲁棒定位:度量与域外评估
Hieu-Thi Luong, Inbal Rimon, Haim Permuter, Kong Aik Lee, Eng Siong Chng
Comments: APSIPA 2025
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[26] arXiv:2507.03482 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: OMAR-RQ: Open Music Audio Representation Model Trained with Multi-Feature Masked Token Prediction
Title: OMAR-RQ:使用多特征掩码标记预测训练的开放音乐音频表示模型
Pablo Alonso-Jiménez, Pedro Ramoneda, R. Oguz Araz, Andrea Poltronieri, Dmitry Bogdanov
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[27] arXiv:2507.03594 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: RECA-PD: A Robust Explainable Cross-Attention Method for Speech-based Parkinson's Disease Classification
Title: RECA-PD:一种用于基于语音的帕金森病分类的鲁棒可解释交叉注意力方法
Terry Yi Zhong, Cristian Tejedor-Garcia, Martha Larson, Bastiaan R. Bloem
Comments: Accepted for TSD 2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[28] arXiv:2507.03599 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: MusGO: A Community-Driven Framework For Assessing Openness in Music-Generative AI
Title: MusGO:一种社区驱动的框架,用于评估音乐生成人工智能的开放性
Roser Batlle-Roca, Laura Ibáñez-Martínez, Xavier Serra, Emilia Gómez, Martín Rocamora
Comments: Accepted at ISMIR 2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY) ; Audio and Speech Processing (eess.AS)
[29] arXiv:2507.04048 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: CLEP-DG: Contrastive Learning for Speech Emotion Domain Generalization via Soft Prompt Tuning
Title: CLEP-DG:通过软提示微调进行语音情感领域泛化的对比学习
Jiacheng Shi, Yanfu Zhang, Ye Gao
Comments: Accepted to Interspeech2025
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[30] arXiv:2507.04230 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: High-Resolution Sustain Pedal Depth Estimation from Piano Audio Across Room Acoustics
Title: 高分辨率延音踏板深度估计从钢琴音频跨房间声学
Kun Fang, Hanwen Zhang, Ziyu Wang, Ichiro Fujinaga
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR) ; Audio and Speech Processing (eess.AS)
[31] arXiv:2507.04349 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet
Title: TTS-CtrlNet:具有ControlNet的时变情感对齐文本到语音生成
Jaeseok Jeong, Yuna Lee, Mingi Kwon, Youngjung Uh
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[32] arXiv:2507.04419 (cross-list from cs.SD) [cn-pdf, pdf, other]
Title: Machine Learning in Acoustics: A Review and Open-Source Repository
Title: 声学中的机器学习:综述与开源代码库
Ryan A. McCarthy, You Zhang, Samuel A. Verburg, William F. Jenkins, Peter Gerstoft
Comments: Accepted by npj Acoustics, 22 pages, 12 figures
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS) ; Signal Processing (eess.SP)
[33] arXiv:2507.04554 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Self-supervised learning of speech representations with Dutch archival data
Title: 使用荷兰档案数据进行语音表示的自监督学习
Nik Vaessen, Roeland Ordelman, David A. van Leeuwen
Comments: accepted at interspeech 2025
Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[34] arXiv:2507.04598 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis
Title: 文本到语音合成中分层情感分布的多步预测与控制
Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li
Comments: Accepted to APSIPA Transactions on Signal and Information Processing
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[35] arXiv:2507.04776 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction
Title: 使用标记去噪和钢琴卷预测改进BERT用于符号音乐理解
Jun-You Wang, Li Su
Comments: Accepted at ISMIR 2025
Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[36] arXiv:2507.04817 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Fast-VGAN: Lightweight Voice Conversion with Explicit Control of F0 and Duration Parameters
Title: Fast-VGAN:具有显式控制F0和持续时间参数的轻量级语音转换
Mathilde Abrassart, Nicolas Obin, Axel Roebel
Comments: 8 pages, 4 figures
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[37] arXiv:2507.04858 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu
Title: 面向人机协同的起始检测:一种用于马拉卡图的迁移学习方法
António Sá Pinto
Comments: Accepted at ISMIR 2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[38] arXiv:2507.04864 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation
Title: 音乐回声:用于数据增强和音频操作的扩散模型重用
Alexander Fichtinger, Jan Schlüter, Gerhard Widmer
Comments: Accepted at SMC 2025. Code at https://malex1106.github.io/boomify/
Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[39] arXiv:2507.04955 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation
Title: EXPOTION:面部表情和运动控制的多模态音乐生成
Fathinah Izzati, Xinyue Li, Gus Xia
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[40] arXiv:2507.04963 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Modeling the Difficulty of Saxophone Music
Title: 萨克斯管音乐的难度建模
Šimon Libřický, Jan Hajič jr
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[41] arXiv:2507.04966 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: LAPS-Diff: A Diffusion-Based Framework for Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning
Title: LAPS-Diff:一种基于扩散的语音合成框架,具有语言感知的韵律风格引导学习
Sandipan Dhar, Mayank Gupta, Preeti Rao
Comments: 10 pages, 5 figures, 3 Tables
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[42] arXiv:2507.05657 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Adaptive Linearly Constrained Minimum Variance Framework for Volumetric Active Noise Control
Title: 自适应线性约束最小方差框架用于体积主动噪声控制
Manan Mittal, Ryan M. Corey, Andrew C. Singer
Comments: 5 pages, 6 figures
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[43] arXiv:2507.05662 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Beamforming with Random Projections: Upper and Lower Bounds
Title: 具有随机投影的波束成形:上界和下界
Manan Mittal, Ryan M. Corey, Andrew C. Singer
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[44] arXiv:2507.05729 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Non-Intrusive Binaural Speech Intelligibility Prediction Using Mamba for Hearing-Impaired Listeners
Title: 基于Mamba的非侵入式双耳语音可懂度预测用于听力受损听众
Katsuhiko Yamamoto, Koichi Miyazaki
Comments: Accepted by INTERSPEECH 2025
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[45] arXiv:2507.05900 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Stable Acoustic Relay Assignment with High Throughput via Lase Chaos-based Reinforcement Learning
Title: 基于激光混沌的强化学习稳定声学中继分配高吞吐量
Zengjing Chen, Lu Wang, Chengzhi Xing
Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS) ; Optimization and Control (math.OC)
[46] arXiv:2507.05911 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Differentiable Reward Optimization for LLM based TTS system
Title: 基于LLM的TTS系统的可微奖励优化
Changfeng Gao, Zhihao Du, Shiliang Zhang
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[47] arXiv:2507.06070 (cross-list from cs.SD) [cn-pdf, pdf, other]
Title: Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol
Title: 对比和迁移学习通过现实世界评估协议实现有效的音频指纹识别
Christos Nikou, Theodoros Giannakopoulos
Comments: International Journal of Music Science, Technology and Art, 15 pages, 7 figures
Journal-ref: IJMSTA - Vol. 7 - Issue 1 - January 2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[48] arXiv:2507.06116 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis
Title: 基于专家混合的语音质量评估模型:系统级性能提升与话语级挑战分析
Xintong Hu, Yixuan Chen, Rui Yang, Wenxiang Guo, Changhao Pan
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[49] arXiv:2507.06329 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing
Title: MixAssist:用于音乐混音中协同创作AI辅助的音频-语言数据集
Michael Clemens, Ana Marasović
Comments: Published at COLM 2025. Code and dataset are available here http://mclemcrew.github.io/mixassist-website
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[50] arXiv:2507.06481 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer
Title: IMPACT:通过声学认知变压器的工业机器感知
Changheon Han, Yuseop Sim, Hoin Jung, Jiho Lee, Hojun Lee, Yun Seok Kang, Sucheol Woo, Garam Kim, Hyung Wook Park, Martin Byung-Guk Jun
Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
Total of 322 entries : 1-50 51-100 101-150 151-200 ... 301-322
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号