Sound

Authors and titles for July 2025

Total of 322 entries : 1-50 51-100 101-150 151-200 ... 301-322

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2507.00229 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss

Title: 一种使用复数全局注意力模块和频谱时域损失的高保真语音超分辨率网络

Tarikul Islam Tamiti, Biraj Joshi, Rida Hasan, Rashedul Hasan, Taieba Athay, Nursad Mamun, Anomadarshi Barua

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[2] arXiv:2507.00466 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture

Title: 基于端到端Transformer架构的演奏MIDI中的节拍和强拍跟踪

Sebastian Murgul, Michael Heizmann

Comments: Accepted to the 22nd Sound and Music Computing Conference (SMC), 2025

Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[3] arXiv:2507.00475 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences

Title: AudioBERTScore：基于音频嵌入序列相似性的环境声音合成客观评估

Minoru Kishi, Ryosuke Sakai, Shinnosuke Takamichi, Yusuke Kanamori, Yuki Okamoto

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[4] arXiv:2507.00498 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: MuteSwap: Visual-informed Silent Video Identity Conversion

Title: MuteSwap：视觉引导的无声视频身份转换

Yifan Liu, Yu Fang, Zhouhan Lin

Subjects: Sound (cs.SD) ; Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[5] arXiv:2507.00693 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Leveraging Large Language Models for Spontaneous Speech-Based Suicide Risk Detection

Title: 利用大型语言模型进行自发言语的自杀风险检测

Yifan Gao, Jiao Fu, Long Guo, Hong Liu

Comments: Accepted to Interspeech 2025

Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[6] arXiv:2507.00808 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Multi-interaction TTS toward professional recording reproduction

Title: 多交互语音合成以实现专业录音再现

Hiroki Kanagawa, Kenichi Fujita, Aya Watanabe, Yusuke Ijima

Comments: 7 pages,6 figures, Accepted to Speech Synthesis Workshop 2025 (SSW13)

Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[7] arXiv:2507.00966 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement

Title: MambAttention：具有多头注意力的Mamba用于可推广的单通道语音增强

Nikolai Lund Kühne, Jesper Jensen, Jan Østergaard, Zheng-Hua Tan

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing for possible publication

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[8] arXiv:2507.01339 (cross-list from cs.SD) [cn-pdf, pdf, other]: Title: User-guided Generative Source Separation

Title: 用户引导的生成性源分离

Yutong Wen, Minje Kim, Paris Smaragdis

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[9] arXiv:2507.01563 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Real-Time Emergency Vehicle Siren Detection with Efficient CNNs on Embedded Hardware

Title: 基于嵌入式硬件的高效卷积神经网络实时应急车辆警报检测

Marco Giordano, Stefano Giacomelli, Claudia Rinaldi, Fabio Graziosi

Comments: 10 pages, 10 figures, submitted to https://internetofsounds2025.ieee-is2.org/. arXiv admin note: text overlap with arXiv:2506.23437

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[10] arXiv:2507.01582 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder

Title: 探索使用富有表现力的音乐变分自编码器进行古典钢琴演奏生成

Jing Luo, Xinyu Yang, Jie Wei

Comments: Accepted by IEEE SMC 2025

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[11] arXiv:2507.01805 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: A Dataset for Automatic Assessment of TTS Quality in Spanish

Title: 用于西班牙语文本到语音质量自动评估的数据集

Alejandro Sosa Welford, Leonardo Pepino

Comments: 5 pages, 2 figures. Accepted at Interspeech 2025

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[12] arXiv:2507.01974 (cross-list from cs.SD) [cn-pdf, pdf, other]: Title: Acoustic evaluation of a neural network dedicated to the detection of animal vocalisations

Title: 声学评估用于检测动物叫声的神经网络

Jérémy Rouch (CRNL-ENES), M Ducrettet (CRNL-ENES, ISYEB), S Haupert (ISYEB), R Emonet (LabHC), F Sèbe (CRNL-ENES, OFB - DRAS)

Journal-ref: 17e Congr{\`e}s Fran{\c c}ais d'Acoustique, soci{\'e}t{\'e} fran{\c c}aise d'acoustique, Apr 2025, Paris Universit{\'e} Sorbonne Nouvelle, France

Subjects: Sound (cs.SD) ; Machine Learning (cs.LG)
[13] arXiv:2507.02176 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis

Title: 分析和改进语音合成中的说话人相似性评估

Marc-André Carbonneau, Benjamin van Niekerk, Hugo Seuté, Jean-Philippe Letendre, Herman Kamper, Julian Zaïdi

Comments: Accepted at SSW13 - Interspeech 2025 Speech Synthesis Workshop

Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[14] arXiv:2507.02273 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures

Title: Fx-Encoder++：从混音中提取乐器相关的音频效果表示

Yen-Tung Yeh, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yi-Hsuan Yang, Yuki Mitsufuji

Comments: ISMIR 2025

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[15] arXiv:2507.02380 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: JoyTTS: LLM-based Spoken Chatbot With Voice Cloning

Title: JoyTTS：基于大语言模型的语音聊天机器人与语音克隆

Fangru Zhou, Jun Zhao, Guoxin Wang

Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[16] arXiv:2507.02391 (cross-list from cs.SD) [cn-pdf, pdf, other]: Title: Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

Title: 基于后验转换的无监督扩散语音增强

Mostafa Sadeghi (MULTISPEECH), Jean-Eudes Ayilo (MULTISPEECH), Romain Serizel (MULTISPEECH), Xavier Alameda-Pineda (ROBOTLEARN)

Journal-ref: IEEE Signal Processing Letters, pp.1-5

Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[17] arXiv:2507.02606 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

Title: 去防伪：重新思考针对语音克隆攻击的保护性扰动

Wei Fan, Kejiang Chen, Chang Liu, Weiming Zhang, Nenghai Yu

Comments: Accepted by ICML 2025

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[18] arXiv:2507.02666 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning

Title: ASDA：用于自监督表示学习的音频频谱差分注意力机制

Junyu Wang, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

Comments: Accepted at Interspeech2025

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[19] arXiv:2507.02915 (cross-list from cs.SD) [cn-pdf, pdf, other]: Title: Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning

Title: 音频-JEPA：用于音频表示学习的联合嵌入预测架构

Ludovic Tuncay (IRIT-SAMoVA), Etienne Labbé (IRIT-SAMoVA), Emmanouil Benetos (QMUL), Thomas Pellegrini (IRIT-SAMoVA)

Journal-ref: ICME 2025, Jun 2025, Nantes, France

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS) ; Signal Processing (eess.SP)
[20] arXiv:2507.03251 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention

Title: 通过频谱学习和注意力实现高效的语音情感识别

HyeYoung Lee, Muhammad Nadeem

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[21] arXiv:2507.03377 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Eigenvoice Synthesis based on Model Editing for Speaker Generation

Title: 基于模型编辑的说话人生成特征语音合成

Masato Murata, Koichi Miyazaki, Tomoki Koriyama, Tomoki Toda

Comments: Accepted by INTERSPEECH 2025

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[22] arXiv:2507.03382 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control

Title: 说话人无关的情感向量用于跨说话人情感强度控制

Masato Murata, Koichi Miyazaki, Tomoki Koriyama

Comments: Accepted by INTERSPEECH 2025

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[23] arXiv:2507.03395 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: MaskBeat: Loopable Drum Beat Generation

Title: MaskBeat：可循环的鼓点生成

Luca A. Lanzendörfer, Florian Grötschla, Karim Galal, Roger Wattenhofer

Comments: Extended Abstract ISMIR 2025

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[24] arXiv:2507.03466 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Direction Estimation of Sound Sources Using Microphone Arrays and Signal Strength

Title: 基于麦克风阵列和信号强度的声音源方向估计

Mahdi Ali Pour, Zahra Habibzadeh

Comments: 5 pages

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS) ; Systems and Control (eess.SY)
[25] arXiv:2507.03468 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation

Title: 部分虚假语音的鲁棒定位：度量与域外评估

Hieu-Thi Luong, Inbal Rimon, Haim Permuter, Kong Aik Lee, Eng Siong Chng

Comments: APSIPA 2025

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[26] arXiv:2507.03482 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: OMAR-RQ: Open Music Audio Representation Model Trained with Multi-Feature Masked Token Prediction

Title: OMAR-RQ：使用多特征掩码标记预测训练的开放音乐音频表示模型

Pablo Alonso-Jiménez, Pedro Ramoneda, R. Oguz Araz, Andrea Poltronieri, Dmitry Bogdanov

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[27] arXiv:2507.03594 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: RECA-PD: A Robust Explainable Cross-Attention Method for Speech-based Parkinson's Disease Classification

Title: RECA-PD：一种用于基于语音的帕金森病分类的鲁棒可解释交叉注意力方法

Terry Yi Zhong, Cristian Tejedor-Garcia, Martha Larson, Bastiaan R. Bloem

Comments: Accepted for TSD 2025

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Audio and Speech Processing (eess.AS)
[28] arXiv:2507.03599 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: MusGO: A Community-Driven Framework For Assessing Openness in Music-Generative AI

Title: MusGO：一种社区驱动的框架，用于评估音乐生成人工智能的开放性

Roser Batlle-Roca, Laura Ibáñez-Martínez, Xavier Serra, Emilia Gómez, Martín Rocamora

Comments: Accepted at ISMIR 2025

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY) ; Audio and Speech Processing (eess.AS)
[29] arXiv:2507.04048 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: CLEP-DG: Contrastive Learning for Speech Emotion Domain Generalization via Soft Prompt Tuning

Title: CLEP-DG：通过软提示微调进行语音情感领域泛化的对比学习

Jiacheng Shi, Yanfu Zhang, Ye Gao

Comments: Accepted to Interspeech2025

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[30] arXiv:2507.04230 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: High-Resolution Sustain Pedal Depth Estimation from Piano Audio Across Room Acoustics

Title: 高分辨率延音踏板深度估计从钢琴音频跨房间声学

Kun Fang, Hanwen Zhang, Ziyu Wang, Ichiro Fujinaga

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR) ; Audio and Speech Processing (eess.AS)
[31] arXiv:2507.04349 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet

Title: TTS-CtrlNet：具有ControlNet的时变情感对齐文本到语音生成

Jaeseok Jeong, Yuna Lee, Mingi Kwon, Youngjung Uh

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[32] arXiv:2507.04419 (cross-list from cs.SD) [cn-pdf, pdf, other]: Title: Machine Learning in Acoustics: A Review and Open-Source Repository

Title: 声学中的机器学习：综述与开源代码库

Ryan A. McCarthy, You Zhang, Samuel A. Verburg, William F. Jenkins, Peter Gerstoft

Comments: Accepted by npj Acoustics, 22 pages, 12 figures

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS) ; Signal Processing (eess.SP)
[33] arXiv:2507.04554 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Self-supervised learning of speech representations with Dutch archival data

Title: 使用荷兰档案数据进行语音表示的自监督学习

Nik Vaessen, Roeland Ordelman, David A. van Leeuwen

Comments: accepted at interspeech 2025

Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[34] arXiv:2507.04598 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis

Title: 文本到语音合成中分层情感分布的多步预测与控制

Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

Comments: Accepted to APSIPA Transactions on Signal and Information Processing

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[35] arXiv:2507.04776 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction

Title: 使用标记去噪和钢琴卷预测改进BERT用于符号音乐理解

Jun-You Wang, Li Su

Comments: Accepted at ISMIR 2025

Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[36] arXiv:2507.04817 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Fast-VGAN: Lightweight Voice Conversion with Explicit Control of F0 and Duration Parameters

Title: Fast-VGAN：具有显式控制F0和持续时间参数的轻量级语音转换

Mathilde Abrassart, Nicolas Obin, Axel Roebel

Comments: 8 pages, 4 figures

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[37] arXiv:2507.04858 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu

Title: 面向人机协同的起始检测：一种用于马拉卡图的迁移学习方法

António Sá Pinto

Comments: Accepted at ISMIR 2025

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[38] arXiv:2507.04864 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation

Title: 音乐回声：用于数据增强和音频操作的扩散模型重用

Alexander Fichtinger, Jan Schlüter, Gerhard Widmer

Comments: Accepted at SMC 2025. Code at https://malex1106.github.io/boomify/

Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[39] arXiv:2507.04955 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation

Title: EXPOTION：面部表情和运动控制的多模态音乐生成

Fathinah Izzati, Xinyue Li, Gus Xia

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[40] arXiv:2507.04963 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Modeling the Difficulty of Saxophone Music

Title: 萨克斯管音乐的难度建模

Šimon Libřický, Jan Hajič jr

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[41] arXiv:2507.04966 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: LAPS-Diff: A Diffusion-Based Framework for Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning

Title: LAPS-Diff：一种基于扩散的语音合成框架，具有语言感知的韵律风格引导学习

Sandipan Dhar, Mayank Gupta, Preeti Rao

Comments: 10 pages, 5 figures, 3 Tables

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[42] arXiv:2507.05657 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Adaptive Linearly Constrained Minimum Variance Framework for Volumetric Active Noise Control

Title: 自适应线性约束最小方差框架用于体积主动噪声控制

Manan Mittal, Ryan M. Corey, Andrew C. Singer

Comments: 5 pages, 6 figures

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[43] arXiv:2507.05662 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Beamforming with Random Projections: Upper and Lower Bounds

Title: 具有随机投影的波束成形：上界和下界

Manan Mittal, Ryan M. Corey, Andrew C. Singer

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[44] arXiv:2507.05729 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Non-Intrusive Binaural Speech Intelligibility Prediction Using Mamba for Hearing-Impaired Listeners

Title: 基于Mamba的非侵入式双耳语音可懂度预测用于听力受损听众

Katsuhiko Yamamoto, Koichi Miyazaki

Comments: Accepted by INTERSPEECH 2025

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[45] arXiv:2507.05900 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Stable Acoustic Relay Assignment with High Throughput via Lase Chaos-based Reinforcement Learning

Title: 基于激光混沌的强化学习稳定声学中继分配高吞吐量

Zengjing Chen, Lu Wang, Chengzhi Xing

Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS) ; Optimization and Control (math.OC)
[46] arXiv:2507.05911 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Differentiable Reward Optimization for LLM based TTS system

Title: 基于LLM的TTS系统的可微奖励优化

Changfeng Gao, Zhihao Du, Shiliang Zhang

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[47] arXiv:2507.06070 (cross-list from cs.SD) [cn-pdf, pdf, other]: Title: Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol

Title: 对比和迁移学习通过现实世界评估协议实现有效的音频指纹识别

Christos Nikou, Theodoros Giannakopoulos

Comments: International Journal of Music Science, Technology and Art, 15 pages, 7 figures

Journal-ref: IJMSTA - Vol. 7 - Issue 1 - January 2025

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR) ; Machine Learning (cs.LG) ; Audio and Speech Processing (eess.AS)
[48] arXiv:2507.06116 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis

Title: 基于专家混合的语音质量评估模型：系统级性能提升与话语级挑战分析

Xintong Hu, Yixuan Chen, Rui Yang, Wenxiang Guo, Changhao Pan

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[49] arXiv:2507.06329 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing

Title: MixAssist：用于音乐混音中协同创作AI辅助的音频-语言数据集

Michael Clemens, Ana Marasović

Comments: Published at COLM 2025. Code and dataset are available here http://mclemcrew.github.io/mixassist-website

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Audio and Speech Processing (eess.AS)
[50] arXiv:2507.06481 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer

Title: IMPACT：通过声学认知变压器的工业机器感知

Changheon Han, Yuseop Sim, Hoin Jung, Jiho Lee, Hojun Lee, Yun Seok Kang, Sucheol Woo, Garam Kim, Hyung Wook Park, Martin Byung-Guk Jun

Subjects: Sound (cs.SD) ; Audio and Speech Processing (eess.AS)

Total of 322 entries : 1-50 51-100 101-150 151-200 ... 301-322

Showing up to 50 entries per page: fewer | more | all