Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs.MM

Help | Advanced Search

Multimedia

Authors and titles for June 2025

Total of 153 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-153
Showing up to 25 entries per page: fewer | more | all
[76] arXiv:2506.08200 (cross-list from cs.HC) [cn-pdf, pdf, html, other]
Title: AffectMachine-Pop: A controllable expert system for real-time pop music generation
Title: 情感机器-流行音乐:一种可控制的实时流行音乐生成专家系统
Kat R. Agres, Adyasha Dash, Phoebe Chua, Stefan K. Ehrlich
Subjects: Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM)
[77] arXiv:2506.08493 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Context-aware TFL: A Universal Context-aware Contrastive Learning Framework for Temporal Forgery Localization
Title: 上下文感知的TFL:一种用于时间伪造定位的通用上下文感知对比学习框架
Qilin Yin, Wei Lu, Xiangyang Luo, Xiaochun Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[78] arXiv:2506.08524 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Teaching Physical Awareness to LLMs through Sounds
Title: 通过声音教授物理意识给LLMs
Weiguo Wang, Andy Nie, Wenrui Zhou, Yi Kai, Chengchen Hu
Comments: ICML 2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM) ; Robotics (cs.RO) ; Audio and Speech Processing (eess.AS)
[79] arXiv:2506.08591 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Diversity-Guided MLP Reduction for Efficient Large Vision Transformers
Title: 基于多样性的MLP精简方法用于高效的大型视觉变换器
Chengchao Shen, Hourun Zhu, Gongfan Fang, Jianxin Wang, Xinchao Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) ; Multimedia (cs.MM)
[80] arXiv:2506.09650 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios
Title: HopaDIFF:用于多人场景中指代人体动作分割的全方位-部分感知傅里叶条件扩散模型
Kunyu Peng, Junchao Huang, Xiangsheng Huang, Di Wen, Junwei Zheng, Yufan Chen, Kailun Yang, Jiamin Wu, Chongqing Hao, Rainer Stiefelhagen
Comments: The code is available at https://github.com/KPeng9510/HopaDIFF.git
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Robotics (cs.RO) ; Image and Video Processing (eess.IV)
[81] arXiv:2506.09792 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
Title: 利用外部知识源的语言约束进行视听目标语音提取
Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li
Comments: Accepted by Interspeech 2025
Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[82] arXiv:2506.09999 (cross-list from cs.LG) [cn-pdf, pdf, html, other]
Title: Leveraging Pre-Trained Models for Multimodal Class-Incremental Learning under Adaptive Fusion
Title: 利用预训练模型在自适应融合下的多模态类增量学习
Yukun Chen, Zihuan Qiu, Fanman Meng, Hongliang Li, Linfeng Xu, Qingbo Wu
Subjects: Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[83] arXiv:2506.10005 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models
Title: 使用文本到图像和音频生成模型的多模态电影视频合成
Sridhar S, Nithin A, Shakeel Rifath, Vasantha Raj
Comments: 10 pages, seven figures about Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Graphics (cs.GR) ; Multimedia (cs.MM)
[84] arXiv:2506.10009 (cross-list from eess.IV) [cn-pdf, pdf, html, other]
Title: The Iris File Extension
Title: Iris 文件扩展名
Ryan Erik Landvater, Michael David Olp, Mustafa Yousif, Ulysses Balis
Comments: 17 pages, 7 figures
Subjects: Image and Video Processing (eess.IV) ; Multimedia (cs.MM)
[85] arXiv:2506.10452 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts
Title: 面向缺失模态和分布偏移下的鲁棒多模态情感识别
Guowei Zhong, Ruohong Huan, Mingzhen Wu, Ronghua Liang, Peng Chen
Comments: Submitted to TAC. The code is available at https://github.com/gw-zhong/CIDer
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Multimedia (cs.MM)
[86] arXiv:2506.10574 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: DanceChat: Large Language Model-Guided Music-to-Dance Generation
Title: DanceChat:大型语言模型引导的音乐到舞蹈生成
Qing Wang, Xiaohang Yang, Yilan Dong, Naveen Raj Govindaraj, Gregory Slabaugh, Shanxin Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[87] arXiv:2506.10857 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Title: VRBench:长叙事视频中多步骤推理的基准测试
Jiashuo Yu, Yue Wu, Meng Chu, Zhifei Ren, Zizheng Huang, Pei Chu, Ruijie Zhang, Yinan He, Qirui Li, Songze Li, Zhenxiang Li, Zhongying Tu, Conghui He, Yu Qiao, Yali Wang, Yi Wang, Limin Wang
Comments: ICCV2025
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM)
[88] arXiv:2506.10932 (cross-list from cs.HC) [cn-pdf, pdf, other]
Title: Video-Mediated Emotion Disclosure: Expressions of Fear, Sadness, and Joy by People with Schizophrenia on YouTube
Title: 视频中介的情感宣泄:精神分裂症患者在YouTube上表达恐惧、悲伤和喜悦
Jiaying Lizzy Liu, Yan Zhang
Comments: 10 pages
Journal-ref: ASIS&T 2025
Subjects: Human-Computer Interaction (cs.HC) ; Computers and Society (cs.CY) ; Multimedia (cs.MM)
[89] arXiv:2506.10941 (cross-list from cs.CV) [cn-pdf, pdf, other]
Title: VINCIE: Unlocking In-context Image Editing from Video
Title: VINCIE:从视频中解锁情境图像编辑
Leigang Qu, Feng Cheng, Ziyan Yang, Qi Zhao, Shanchuan Lin, Yichun Shi, Yicong Li, Wenjie Wang, Tat-Seng Chua, Lu Jiang
Comments: Project page: https://vincie2025.github.io/
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Machine Learning (cs.LG) ; Multimedia (cs.MM)
[90] arXiv:2506.11036 (cross-list from cs.LG) [cn-pdf, pdf, html, other]
Title: Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
Title: 基于MLLMs的人类中心交互学习用于文本到图像的人再识别
Yang Qin, Chao Chen, Zhihang Fu, Dezhong Peng, Xi Peng, Peng Hu
Subjects: Machine Learning (cs.LG) ; Multimedia (cs.MM)
[91] arXiv:2506.11521 (cross-list from cs.CR) [cn-pdf, pdf, html, other]
Title: Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models
Title: 探究针对音频-视觉攻击的漏洞与防御:强调多模态模型的全面调查
Jinming Wen, Xinyi Wu, Shuai Zhao, Yanhao Jia, Yuwen Li
Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM)
[92] arXiv:2506.11737 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Quizzard@INOVA Challenge 2025 -- Track A: Plug-and-Play Technique in Interleaved Multi-Image Model
Title: Quizzard@INOVA 挑战赛 2025 -- 轨道 A:交错多图像模型中的即插即用技术
Dinh Viet Cuong, Hoang-Bao Le, An Pham Ngoc Nguyen, Liting Zhou, Cathal Gurrin
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL) ; Multimedia (cs.MM)
[93] arXiv:2506.11934 (cross-list from cs.SI) [cn-pdf, pdf, html, other]
Title: Temporal Dynamics of Emotions in Italian Online Soccer Fandoms
Title: 意大利在线足球迷群体的情绪动态变化
Salvatore Citraro, Giovanni Mauro, Emanuele Ferragina
Subjects: Social and Information Networks (cs.SI) ; Multimedia (cs.MM)
[94] arXiv:2506.12269 (cross-list from eess.IV) [cn-pdf, pdf, html, other]
Title: ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing
Title: ICME 2025 视频会议视频超分辨率大挑战
Babak Naderi, Ross Cutler, Juhee Cho, Nabakumar Khongbantabam, Dejan Ivkovic
Subjects: Image and Video Processing (eess.IV) ; Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[95] arXiv:2506.12573 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
Title: 基于公共领域电影资料的视频引导文本到音乐生成
Haven Kim, Zachary Novack, Weihan Xu, Julian McAuley, Hao-Wen Dong
Comments: ISMIR 2025 regular paper. Dataset, code, and demo available at https://havenpersona.github.io/ossl-v1
Subjects: Sound (cs.SD) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[96] arXiv:2506.12935 (cross-list from cs.CL) [cn-pdf, pdf, html, other]
Title: SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models
Title: SoundMind:音频-语言模型的RL激励逻辑推理
Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui
Comments: Accepted to EMNLP 2025 Main Conference (Oral Presentation)
Subjects: Computation and Language (cs.CL) ; Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[97] arXiv:2506.13001 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Personalizable Long-Context Symbolic Music Infilling with MIDI-RWKV
Title: 基于MIDI-RWKV的可个性化长上下文符号音乐补全
Christian Zhou-Zheng, Philippe Pasquier
Subjects: Sound (cs.SD) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[98] arXiv:2506.13038 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs
Title: HKD4VLM: 一种用于视觉语言模型中鲁棒多模态幻觉和事实性检测的渐进式混合知识蒸馏框架
Zijian Zhang, Xuecheng Wu, Danlei Huang, Siyu Yan, Chong Peng, Xuezhi Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[99] arXiv:2506.13971 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Multimodal Fusion with Semi-Supervised Learning Minimizes Annotation Quantity for Modeling Videoconference Conversation Experience
Title: 多模态融合与半监督学习可最小化建模视频会议对话体验的标注量
Andrew Chang, Chenkai Hu, Ji Qi, Zhuojian Wei, Kexin Zhang, Viswadruth Akkaraju, David Poeppel, Dustin Freeman
Comments: Interspeech 2025
Subjects: Audio and Speech Processing (eess.AS) ; Computation and Language (cs.CL) ; Human-Computer Interaction (cs.HC) ; Machine Learning (cs.LG) ; Multimedia (cs.MM)
[100] arXiv:2506.14223 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription
Title: fretting-变压器:用于MIDI到指法转录的编码器-解码器模型
Anna Hamberger, Sebastian Murgul, Jochen Schmidt, Michael Heizmann
Comments: Accepted to the 50th International Computer Music Conference (ICMC), 2025
Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
Total of 153 entries : 1-25 26-50 51-75 76-100 101-125 126-150 151-153
Showing up to 25 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号