Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > eess.AS

Help | Advanced Search

Audio and Speech Processing

Authors and titles for January 2024

Total of 278 entries : 1-50 51-100 101-150 151-200 ... 251-278
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:2401.00197 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: ODAQ: Open Dataset of Audio Quality
Title: 开放音频质量数据集
Matteo Torcoli, Chih-Wei Wu, Sascha Dick, Phillip A. Williams, Mhd Modar Halimeh, William Wolcott, Emanuel A. P. Habets
Comments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024
Subjects: Audio and Speech Processing (eess.AS)
[2] arXiv:2401.00225 (cross-list from eess.AS) [cn-pdf, pdf, other]
Title: Enhancing dysarthria speech feature representation with empirical mode decomposition and Walsh-Hadamard transform
Title: 基于经验模态分解和沃尔什-哈达玛变换的构音障碍语音特征表示增强
Ting Zhu, Shufei Duan, Camille Dingam, Huizhi Liang, Wei Zhang
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Signal Processing (eess.SP)
[3] arXiv:2401.00273 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision
Title: 研究近期具有自监督和弱监督的基础模型在普通话-英语代码切换的ASR和语音到文本翻译上的零样本泛化能力
Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee
Comments: Submitted to ICASSP 2024 Self-supervision in Audio, Speech and Beyond workshop
Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL)
[4] arXiv:2401.00813 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Ultraspherical/Gegenbauer polynomials to unify 2D/3D Ambisonic directivity designs
Title: ultraspherical/Gegenbauer多项式统一2D/3D Ambisonic指向性设计
Franz Zotter
Comments: 56 pages, 9 figures
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[5] arXiv:2401.00900 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Detecting the presence of sperm whales echolocation clicks in noisy environments
Title: 在嘈杂环境中检测雄性抹香鲸回声定位脉冲的存在
Guy Gubnitsky, Roee Diamant
Comments: 10 pages and 10 figures
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[6] arXiv:2401.00936 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: The role of direct sound spherical harmonics representation in externalization using binaural reproduction
Title: 双耳再现中直接声球面谐波表示在外化中的作用
Eran Miller, Boaz Rafaely
Journal-ref: Applied Acoustics, Volume 148, 2019, Pages 40-45
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[7] arXiv:2401.01099 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Efficient Parallel Audio Generation using Group Masked Language Modeling
Title: 使用分组掩码语言建模的高效并行音频生成
Myeonghun Jeong, Minchan Kim, Joun Yeop Lee, Nam Soo Kim
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
[8] arXiv:2401.01145 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
Title: HAAQI-Net:一种用于助听器的非侵入式神经音乐音频质量评估模型
Dyah A. M. G. Wisnu, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao
Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[9] arXiv:2401.01206 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Room impulse response reconstruction with physics-informed deep learning
Title: 基于物理信息的深度学习的房间脉冲响应重建
Xenofon Karakonstantis, Diego Caviedes-Nozal, Antoine Richard, Efren Fernandez-Grande
Comments: Submitted to Journal of Acoustical Society of America (JASA)
Subjects: Audio and Speech Processing (eess.AS)
[10] arXiv:2401.01255 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: On the Parameter Estimation of Sinusoidal Models for Speech and Audio Signals
Title: 正弦模型在语音和音频信号中的参数估计
George P. Kafentzis
Subjects: Audio and Speech Processing (eess.AS) ; Signal Processing (eess.SP)
[11] arXiv:2401.01473 (cross-list from eess.AS) [cn-pdf, pdf, other]
Title: Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning
Title: 通过自蒸馏和在线聚类的自监督反射学习用于说话人表征学习
Danwei Cai, Zexin Cai, Ze Li, Ming Li
Journal-ref: IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 1535-1550, 2025
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[12] arXiv:2401.01498 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
Title: 利用神经转导器通过语义标记预测进行两阶段文本到语音
Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[13] arXiv:2401.01792 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: CoMoSVC: Consistency Model-based Singing Voice Conversion
Title: CoMoSVC:基于一致性模型的歌唱语音转换
Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[14] arXiv:2401.02046 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
Title: CTC空白触发的动态层跳过用于高效的基于CTC的语音识别
Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin
Comments: accepted by ASRU 2023
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[15] arXiv:2401.02164 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Listening broadband physical model for microphones: a first step
Title: 用于麦克风的宽频带物理模型:第一步
Laurent Millot (IDEAT), Antoine Valette, Manuel Lopes, Gérard Pelé (IDEAT), Mohammed Elliq, Dominique Lambert (IDEAT)
Journal-ref: 120th Convention of the Audio Engineering Society, Audio Engineering Society, May 2006, Paris, France
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[16] arXiv:2401.02285 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Optimal Real-Weighted Beamforming With Application to Linear and Spherical Arrays
Title: 最优实权重波束成形及其在直线阵列和球面阵列中的应用
V. Tourbabin, M. Agmon, B. Rafaely, J. Tabrikian
Journal-ref: n IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 9, pp. 2575-2585, Nov. 2012
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[17] arXiv:2401.02386 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Direction of Arrival Estimation Using Microphone Array Processing for Moving Humanoid Robots
Title: 基于麦克风阵列处理的移动人形机器人到达方向估计
Vladimir Tourbabin, Boaz Rafaely
Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 11, pp. 2046-2058, Nov. 2015
Subjects: Audio and Speech Processing (eess.AS) ; Robotics (cs.RO) ; Sound (cs.SD)
[18] arXiv:2401.02417 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
Title: 面向任务的对话作为自监督自动语音识别的催化剂
David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister
Comments: To appear in ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[19] arXiv:2401.02463 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Some clues to build a sound analysis relevant to hearing
Title: 一些有助于建立与听力相关的合理分析的线索
Laurent Millot (ACTE)
Journal-ref: 116th Convention of the Audio Engineering Society,, Audio Engineering Society, May 2004, Berlin (Germany), Germany
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[20] arXiv:2401.02673 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
Title: 统一的多通道远场语音识别系统:结合神经波束形成与基于注意力的端到端模型
Dongdi Zhao, Jianbo Ma, Lu Lu, Jinke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD)
[21] arXiv:2401.02839 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Pheme: Efficient and Conversational Speech Generation
Title: Pheme:高效且对话的语音生成
Paweł Budzianowski, Taras Sereda, Tomasz Cichy, Ivan Vulić
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
[22] arXiv:2401.03078 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: StreamVC: Real-Time Low-Latency Voice Conversion
Title: StreamVC:实时低延迟语音转换
Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann
Comments: Accepted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[23] arXiv:2401.03251 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
Title: TeLeS:用于估计端到端自动语音识别置信度的时间词素相似性得分
Nagarathna Ravi, Thishyan Raj T, Vipul Arora
Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD) ; Machine Learning (stat.ML)
[24] arXiv:2401.03286 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Theoretical Framework for the Optimization of Microphone Array Configuration for Humanoid Robot Audition
Title: 用于类人机器人听觉的麦克风阵列配置优化的理论框架
Vladimir Tourbabin, Boaz Rafaely
Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, 1803-1814, 2014
Subjects: Audio and Speech Processing (eess.AS) ; Robotics (cs.RO) ; Sound (cs.SD)
[25] arXiv:2401.03291 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Design framework for spherical microphone and loudspeaker arrays in a multiple-input multiple-output system
Title: 多输入多输出系统中球形麦克风和扬声器阵列的设计框架
Hai Morgenstern, Boaz Rafaely, Markus Noisternig
Journal-ref: J. Acoust. Soc. Am. 2017, vol 141, no 3, 2024-2038
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[26] arXiv:2401.03441 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Spatial Reverberation and Dereverberation using an Acoustic Multiple-Input Multiple-Output System
Title: 基于声学多输入多输出系统的空间混响与去混响
Hai Morgenstern, Boaz Rafaely
Journal-ref: J. Audio Eng. Soc, vol. 65, no. 1/2, pp. 42-55, 2017
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[27] arXiv:2401.03448 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments
Title: 单麦克风语音分离和噪声及混响环境中的语音活动检测
Renana Opochinsky, Mordehay Moradi, Sharon Gannot
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[28] arXiv:2401.03458 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Modal smoothing for analysis of room reflections measured with spherical microphone and loudspeaker arrays
Title: 模态平滑用于分析使用球形传声器和扬声器阵列测量的房间反射
Hai Morgenstern, Boaz Rafaely
Journal-ref: J. Acoust. Soc. Am., vol. 143, no. 2, pp. 1008-1018, 2018
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[29] arXiv:2401.03468 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Title: 多通道 AV-wav2vec2:一种学习多通道多模态语音表示的框架
Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, Lirong Dai
Comments: Accepted by AAAI 2024
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[30] arXiv:2401.03493 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Theory and investigation of acoustic multiple-input multiple-output systems based on spherical arrays in a room
Title: 基于球形阵列的房间内声学多输入多输出系统理论与研究
Hai Morgenstern, Boaz Rafaely, Franz Zotter
Journal-ref: J. Acoust. Soc. Am., vol. 138, no. 5, pp. 2998-3009, November 2015
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[31] arXiv:2401.03497 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Title: EAT:基于高效音频变换器的自监督预训练
Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen
Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[32] arXiv:2401.03506 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
Title: DiarizationLM:使用大型语言模型的说话人日志后处理
Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao
Journal-ref: Proc. Interspeech 2024, 3754-3758 (2024)
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[33] arXiv:2401.03567 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Hyperbolic Distance-Based Speech Separation
Title: 基于双曲距离的语音分离
Darius Petermann, Minje Kim
Comments: To be published at ICASSP2024, 14th of April 2024, Seoul, South Korea. Copyright (c) 2023 IEEE. 5 pages, 2 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[34] arXiv:2401.03650 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper
Title: DDD:一种感知更优的低响应时间的基于DNN的去削波器
Jayeon Yi, Junghyun Koo, Kyogu Lee
Comments: To appear, ICASSP 2024. Demo samples at https://stet-stet.github.io/DDD, repo at https://github.com/stet-stet/DDD
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD) ; Signal Processing (eess.SP)
[35] arXiv:2401.03687 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators
Title: BS-PLCNet:带多任务学习框架和多判别器的频带分割丢包隐藏网络
Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie
Comments: submitted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[36] arXiv:2401.03689 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Title: LUPET:将分层信息路径融入多语言自动语音识别
Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee
Comments: Accepted by Interspeech 2024
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[37] arXiv:2401.03816 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss
Title: 从语音清晰度受损的语音中创建个性化合成语音的增强重建损失方法
Yusheng Tian, Jingyu Li, Tan Lee
Comments: Accepted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[38] arXiv:2401.03850 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Inverse Nonlinearity Compensation of Hyperelastic Deformation in Dielectric Elastomer for Acoustic Actuation
Title: 介电弹性体用于声学激励的超弹性变形的逆非线性补偿
Jin Woo Lee, Gwang Seok An, Jeong-Yun Sun, Kyogu Lee
Journal-ref: IEEE Access 2024
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[39] arXiv:2401.03936 (cross-list from eess.AS) [cn-pdf, pdf, other]
Title: Exploratory Evaluation of Speech Content Masking
Title: 语音内容遮蔽的探索性评估
Jennifer Williams, Karla Pizzi, Paul-Gauthier Noe, Sneha Das
Comments: Accepted to ITG Speech Conference 2023
Subjects: Audio and Speech Processing (eess.AS) ; Cryptography and Security (cs.CR) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[40] arXiv:2401.03963 (cross-list from eess.AS) [cn-pdf, pdf, other]
Title: Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios
Title: 会议场景说话人日志的逐帧嵌入测地线插值
Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach
Comments: Accepted at ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS)
[41] arXiv:2401.04127 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Using perceptive subbands analysis to perform audio scenes cartography
Title: 使用感知子带分析进行音频场景制图
Laurent Millot (IDEAC), Gérard Pelé (IDEAC), Mohammed Elliq
Journal-ref: 118th Convention of the Audio Engineering Society, Audio Engineering Society, May 2005, Barcelone (Espagne), Spain
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD) ; Signal Processing (eess.SP) ; Classical Physics (physics.class-ph)
[42] arXiv:2401.04283 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation
Title: FADI-AEC:基于远端信号的快速评分扩散模型用于声学回声消除
Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[43] arXiv:2401.04447 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Class-Incremental Learning for Multi-Label Audio Classification
Title: 类别增量学习用于多标签音频分类
Manjunath Mulimani, Annamaria Mesaros
Comments: Accepted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[44] arXiv:2401.04511 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Title: 零样本音频到音频情感迁移与说话人解耦
Soumya Dutta, Sriram Ganapathy
Comments: 5 pages, 3 figures, accepted at ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[45] arXiv:2401.04976 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
Title: 全频段动态卷积:用于声音事件检测的物理频率依赖卷积
Haobo Yue, Zhicheng Zhang, Da Mu, Yonghao Dang, Jianqin Yin, Jin Tang
Comments: Accepted by ICPR2024
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[46] arXiv:2401.05187 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Comparison of linear and nonlinear methods for decoding selective attention to speech from ear-EEG recordings
Title: 线性和非线性方法在从耳部脑电图记录中解码选择性注意力的比较
Mike Thornton, Danilo Mandic, Tobias Reichenbach
Subjects: Audio and Speech Processing (eess.AS)
[47] arXiv:2401.05314 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video
Title: ANIM-400K:视频自动端到端配音的大规模数据集
Kevin Cai, Chonghua Liu, David M. Chan
Comments: To appear in ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV) ; Sound (cs.SD)
[48] arXiv:2401.05717 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Segment Boundary Detection via Class Entropy Measurements in Connectionist Phoneme Recognition
Title: 通过连接主义音素识别中的类别熵度量进行段边界检测
Giampiero Salvi
Journal-ref: Speech Communication Volume 48, Issue 12, December 2006, Pages 1666-1676
Subjects: Audio and Speech Processing (eess.AS) ; Information Theory (cs.IT) ; Machine Learning (cs.LG) ; Sound (cs.SD)
[49] arXiv:2401.05809 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression
Title: 通过方向加权外部辐射抑制在声场合成中定位声能
Yoshihide Tomita, Shoichi Koyama, Hiroshi Saruwatari
Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
[50] arXiv:2401.05916 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Neural Ambisonics encoding for compact irregular microphone arrays
Title: 神经 Ambisonics 编码用于紧凑的不规则麦克风阵列
Mikko Heikkinen, Archontis Politis, Tuomas Virtanen
Comments: Accepted for publication in Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Audio and Speech Processing (eess.AS) ; Sound (cs.SD)
Total of 278 entries : 1-50 51-100 101-150 151-200 ... 251-278
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号