Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs.MM

Help | Advanced Search

Multimedia

Authors and titles for June 2025

Total of 153 entries : 1-50 51-100 101-150 151-153
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:2506.00868 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual Manipulations
Title: 通过深度伪造的多元宇宙:以人物为中心的视觉和概念操纵的MultiFakeVerse数据集
Parul Gupta, Shreya Ghosh, Tom Gedeon, Thanh-Toan Do, Abhinav Dhall
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[2] arXiv:2506.01211 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Iola Walker: A Mobile Footfall Detection System for Music Composition
Title: Iola Walker:用于音乐创作的移动人流检测系统
William B. James
Subjects: Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[3] arXiv:2506.01668 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach
Title: 小贴纸,大含义:一种带有游戏化方法的多语言贴纸语义理解数据集
Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang
Subjects: Multimedia (cs.MM) ; Information Retrieval (cs.IR)
[4] arXiv:2506.02380 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR
Title: EyeNavGS:用于VR中真实世界3DGS场景的6自由度导航数据集和记录-重放软件
Zihao Ding, Cheng-Tse Lee, Mufeng Zhu, Tao Guan, Yuan-Chun Sun, Cheng-Hsin Hsu, Yao Liu
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV) ; Graphics (cs.GR) ; Human-Computer Interaction (cs.HC)
[5] arXiv:2506.02414 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
Title: StarVC:一种用于语音转换中文本和语音联合生成的统一自回归框架
Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu
Comments: 5 pages, 2 figures, Accepted by Interspeech 2025, Demo: https://thuhcsi.github.io/StarVC/
Subjects: Multimedia (cs.MM) ; Computation and Language (cs.CL) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[6] arXiv:2506.02997 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation
Title: 带有掩码自编码风格丰富表示的可控文本到语音合成
Yongqi Wang, Chunlei Zhang, Hangting Chen, Zhou Zhao, Dong Yu
Subjects: Multimedia (cs.MM)
[7] arXiv:2506.03530 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: How Far Are We from Generating Missing Modalities with Foundation Models?
Title: 我们离用基础模型生成缺失模态还有多远?
Guanzhou Ke, Bo Wang, Guoqing Chao, Weiming Hu, Shengfeng He
Subjects: Multimedia (cs.MM) ; Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)
[8] arXiv:2506.05851 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection
Title: 深度伪造医生:诊断和治疗音频-视频伪造检测
Marcel Klemt, Carlotta Segna, Anna Rohrbach
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[9] arXiv:2506.05987 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: The JPEG XL Image Coding System: History, Features, Coding Tools, Design Rationale, and Future
Title: JPEG XL图像编码系统:历史、特性、编码工具、设计原理及未来
Jon Sneyers, Jyrki Alakuijala, Luca Versari, Zoltán Szabadka, Sami Boukortt, Amnon Cohen-Tidhar, Moritz Firsching, Evgenii Kliuchnikov, Tal Lev-Ami, Eric Portis, Thomas Richter, Osamu Watanabe
Comments: 73 pages, 62 figures
Subjects: Multimedia (cs.MM)
[10] arXiv:2506.06018 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models
Title: 基于再生扩散模型的无优化通用水印伪造
Chaoyi Zhu, Zaitang Li, Renyi Yang, Robert Birke, Pin-Yu Chen, Tsung-Yi Ho, Lydia Y. Chen
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR)
[11] arXiv:2506.06037 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: SVD: Spatial Video Dataset
Title: SVD:空间视频数据集
M. H. Izadimehr, Milad Ghanbari, Guodong Chen, Wei Zhou, Xiaoshuai Hao, Mallesham Dasari, Christian Timmerer, Hadi Amirpour
Subjects: Multimedia (cs.MM)
[12] arXiv:2506.06691 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: An Efficient Digital Watermarking Technique for Small Scale devices
Title: 一种适用于小型设备的高效数字水印技术
Kaushik Talathi, Aparna Santra Biswas
Comments: 28 pages, 11 figures, 4 tables
Subjects: Multimedia (cs.MM) ; Cryptography and Security (cs.CR)
[13] arXiv:2506.06743 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: The State-of-the-Art in Lifelog Retrieval: A Review of Progress at the ACM Lifelog Search Challenge Workshop 2022-24
Title: 日志检索的最新进展:对2022-24年ACM日志检索挑战研讨会进展的回顾
Allie Tran, Werner Bailer, Duc-Tien Dang-Nguyen, Graham Healy, Steve Hodges, Björn Þór Jónsson, Luca Rossetto, Klaus Schoeffmann, Minh-Triet Tran, Lucia Vadicamo, Cathal Gurrin
Subjects: Multimedia (cs.MM) ; Information Retrieval (cs.IR)
[14] arXiv:2506.06938 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: Experimental Evaluation of Static Image Sub-Region-Based Search Models Using CLIP
Title: 使用 CLIP 的基于静态图像子区域的搜索模型的实验评估
Bastian Jäckl, Vojtěch Kloda, Daniel A. Keim, Jakub Lokoč
Comments: 14 pages, 4 figures, 2 tables
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[15] arXiv:2506.07076 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets
Title: 基于感知约束的谐波感知音乐驱动的UGC数据集运动合成
Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos
Subjects: Multimedia (cs.MM)
[16] arXiv:2506.09506 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Dynamic Sub-region Search in Homogeneous Collections Using CLIP
Title: 使用CLIP在同质集合中进行动态子区域搜索
Bastian Jäckl, Vojtěch Kloda, Daniel A. Keim, Jakub Lokoč
Comments: 18 pages, 4 figures, 5 tables
Subjects: Multimedia (cs.MM)
[17] arXiv:2506.09795 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model for Video Quality Assessment
Title: 从复杂性和结构中学习质量:用于视频质量评估的特征融合XGBoost模型
Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon
Comments: ICME 2025
Subjects: Multimedia (cs.MM)
[18] arXiv:2506.10001 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Semantic Communication-Enabled Cloud-Edge-End-collaborative Metaverse Services Architecure
Title: 语义通信赋能的云边端协作元宇宙服务架构
Yuxuan Li, Sheng Jinag, Bizhu Wang
Comments: arXiv admin note: text overlap with arXiv:2407.13764 by other authors
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
[19] arXiv:2506.10002 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis
Title: 基于扩散的事故视频合成的交通事故预测的等变模型:EQ-TAA
Jianwu Fang, Lei-Lei Li, Zhedong Zheng, Hongkai Yu, Jianru Xue, Zhengguo Li, Tat-Seng Chua
Comments: Accepted by IEEE-TMM
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)
[20] arXiv:2506.10003 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Integrating multimedia documents in 3D city models for a better understanding of territories
Title: 将多媒体文档集成到三维城市模型中以更好地理解领土
C.Gautier, J. Delanoy, G. Gesquière
Comments: 8 pages, 11 figures
Journal-ref: sprs-annals-X-4-W2-2022-69-2022
Subjects: Multimedia (cs.MM) ; Human-Computer Interaction (cs.HC)
[21] arXiv:2506.10004 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: Immersive Multimedia Communication: State-of-the-Art on eXtended Reality Streaming
Title: 沉浸式多媒体通信:扩展现实流媒体的最新进展
Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik
Comments: accepted by ACM Transactions on Multimedia Computing, Communications, and Applications
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Emerging Technologies (cs.ET) ; Networking and Internet Architecture (cs.NI)
[22] arXiv:2506.10006 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction
Title: 通过动态双向重建的灵活多模态输入进行HER2表达预测
Jie Qin, Wei Yang, Yan Su, Yiran Zhu, Weizhen Li, Yunyue Pan, Chengchang Pan, Honggang Qi
Comments: 8 pages,6 figures,3 tables,accepted by the 33rd ACM International Conference on Multimedia(ACM MM 2025)
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
[23] arXiv:2506.10007 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space
Title: 通过统一多模态空间中的扩散实现可控的表达性3D面部动画
Kangwei Liu, Junwu Liu, Xiaowei Yi, Jinlin Guo, Yun Cao
Comments: Accepted by ICME2025
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV)
[24] arXiv:2506.10008 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics
Title: 结构化图表示用于视觉叙事推理:连环漫画的分层框架
Yi-Chun Chen
Comments: This paper has been submitted to ACM Multimedia 2025 and is currently under review
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV)
[25] arXiv:2506.10010 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: Multimodal Emotion Coupling via Speech-to-Facial and Bodily Gestures in Dyadic Interaction
Title: 双人互动中通过语音到面部及身体手势的多模态情感耦合
Von Ralph Dane Marquez Herbuela, Yukie Nagai
Subjects: Multimedia (cs.MM) ; Machine Learning (cs.LG) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[26] arXiv:2506.10011 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: WDMIR: Wavelet-Driven Multimodal Intent Recognition
Title: WDMIR:基于小波驱动的多模态意图识别
Weiyin Gong, Kai Zhang, Yanghai Zhang, Qi Liu, Xinjie Sun, Junyu Lu, Linbo Zhu
Comments: Accepted at IJCAI 2025, 9pages, 6figures
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV) ; Signal Processing (eess.SP)
[27] arXiv:2506.10012 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: Thief of Truth: VR comics about the relationship between AI and humans
Title: 真相小偷:关于人工智能与人类关系的VR漫画
Joonhyung Bae
Subjects: Multimedia (cs.MM) ; Human-Computer Interaction (cs.HC)
[28] arXiv:2506.10013 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Immersive Fantasy Based on Digital Nostalgia: Environmental Narratives for the Korean Millennials and Gen Z
Title: 基于数字怀旧的沉浸式幻想:韩国千禧一代和Z世代的环境叙事
Yerin Doh, Joonhyung Bae
Comments: Accepted at ISEA 2025 (International Symposium on Electronic Art)
Subjects: Multimedia (cs.MM) ; Computers and Society (cs.CY)
[29] arXiv:2506.10016 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: A Survey of Generative Categories and Techniques in Multimodal Large Language Models
Title: 多模态大型语言模型中的生成类别与技术survey
Longzhen Han, Awes Mubarak, Almas Baimagambetov, Nikolaos Polatidis, Thar Baker
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
[30] arXiv:2506.10416 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Can Sound Replace Vision in LLaVA With Token Substitution?
Title: 声音能否通过标记替换在LLaVA中替代视觉?
Ali Vosoughi, Jing Bi, Pinxin Liu, Yunlong Tang, Chenliang Xu
Comments: Project page: https://ali-vosoughi.github.io/SoundCLIP/
Subjects: Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[31] arXiv:2506.14803 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Omnidirectional Video Super-Resolution using Deep Learning
Title: 使用深度学习的全向视频超分辨率
Arbind Agrahari Baniya, Tsz-Kwan Lee, Peter W. Eklund, Sunil Aryal
Journal-ref: in IEEE Transactions on Multimedia, vol. 26, pp. 540-554, 2024
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
[32] arXiv:2506.16258 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing
Title: ViFusion:网络内张量融合用于可扩展视频特征索引
Yisu Wang, Yixiang Zhu, Xinjiao Li, Yulong Zhang, Ruilong Wu, Dirk Kutscher
Subjects: Multimedia (cs.MM)
[33] arXiv:2506.16495 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation
Title: DT-UFC:通过峰值到平衡分布变换的通用大模型特征编码
Changsheng Gao, Zijie Liu, Li Li, Dong Liu, Xiaoyan Sun, Weisi Lin
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[34] arXiv:2506.17623 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning?
Title: 生成的图像能否作为文本中心多模态学习的有效模态?
Yuesheng Huang, Peng Zhang, Riliang Liu, Jiaqi Liang
Comments: 4 figures,7 tables
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[35] arXiv:2506.18055 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings
Title: 第一人称视角记录中的视听主动说话人检测的面孔-声音关联
Jason Clarke, Yoshihiko Gotoh, Stefan Goetze
Comments: Accepted to EUSIPCO 2025. 5 pages, 1 figure. To appear in the Proceedings of the 33rd European Signal Processing Conference (EUSIPCO), September 8-12, 2025, Palermo, Italy
Subjects: Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[36] arXiv:2506.19769 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects
Title: 基于多传感器融合感知的具身人工智能综述:背景、方法、挑战与展望
Shulan Ruan, Rongwei Wang, Xuchen Shen, Huijie Liu, Baihui Xiao, Jun Shi, Kun Zhang, Zhenya Huang, Yu Liu, Enhong Chen, You He
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
[37] arXiv:2506.20944 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs
Title: E-FreeM2:通过MLLMs实现高效训练-free 多尺度和跨模态新闻验证
Van-Hoang Phan, Long-Khanh Pham, Dang Vu, Anh-Duy Tran, Minh-Son Dao
Comments: Accepted to AsiaCCS 2025 @ SCID
Subjects: Multimedia (cs.MM) ; Cryptography and Security (cs.CR)
[38] arXiv:2506.21865 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture
Title: 河声:古代黄河文化实时交互数字系统
Haofeng Wang, Yilin Guo, Zehao Li, Tong Yue, Yizong Wang, Enci Zhang, Rongqun Lin, Feng Gao, Shiqi Wang, Siwei Ma
Comments: IEEE International Conference on Multimedia and Expo Workshop, 2025.(Accepted)
Subjects: Multimedia (cs.MM) ; Computation and Language (cs.CL)
[39] arXiv:2506.23484 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity
Title: TAG-WM:通过扩散逆过程敏感性实现的防篡改生成图像水印
Yuzhuo Chen, Zehua Ma, Han Fang, Weiming Zhang, Nenghai Yu
Comments: Camera-ready version for ICCV 2025. Adds GitHub link; acknowledgments; appendix. Abstract and Figure 1 updated for clarity
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV) ; Image and Video Processing (eess.IV)
[40] arXiv:2506.23707 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Efficient and Accurate Image Provenance Analysis: A Scalable Pipeline for Large-scale Images
Title: 高效且准确的图像来源分析:一种适用于大规模图像的可扩展流程
Jiewei Lai, Lan Zhang, Chen Tang, Pengcheng Sun
Comments: 25 pages, 6 figures
Subjects: Multimedia (cs.MM)
[41] arXiv:2506.00562 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: SEED: A Benchmark Dataset for Sequential Facial Attribute Editing with Diffusion Models
Title: SEED:一种用于基于扩散模型的顺序面部属性编辑的基准数据集
Yule Zhu, Ping Liu, Zhedong Zheng, Wei Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[42] arXiv:2506.00667 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis
Title: 大规模视频分析的场景检测策略与关键帧提取策略
Vasilii Korolkov
Comments: 24 pages, 8 figures, submitted as a preprint. ArXiv preprint only, not submitted to a journal yet
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[43] arXiv:2506.00854 (cross-list from cs.CL) [cn-pdf, pdf, html, other]
Title: EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG
Title: EEG2TEXT-CN:通过大型语言模型和对比学习的中文EEG开放词汇文本-EEG对齐探索性研究
Jacky Tai-Yu Lu, Jung Chiang, Chi-Sheng Chen, Anna Nai-Yun Tung, Hsiang Wei Hu, Yuan Chiao Cheng
Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Neurons and Cognition (q-bio.NC)
[44] arXiv:2506.00974 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Camera Trajectory Generation: A Comprehensive Survey of Methods, Metrics, and Future Directions
Title: 摄像机轨迹生成:方法、指标及未来方向的综合调查
Zahra Dehghanian, Pouya Ardekhani, Amir Vahedi, Hamid Beigy, Hamid R. Rabiee
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[45] arXiv:2506.01109 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: CountingFruit: Language-Guided 3D Fruit Counting with Semantic Gaussian Splatting
Title: CountingFruit:语义高斯点云的语言引导3D水果计数
Fengze Li, Yangle Liu, Jieming Ma, Hai-Ning Liang, Yaochun Shen, Huangxiang Li, Zhijing Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM)
[46] arXiv:2506.01319 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Learning Sparsity for Effective and Efficient Music Performance Question Answering
Title: 学习稀疏性以实现高效有效的音乐表演问答
Xingjian Diao, Tianzhen Yang, Chunhui Zhang, Weiyi Wu, Ming Cheng, Jiang Gui
Comments: Accepted to the main conference of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
Subjects: Sound (cs.SD) ; Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[47] arXiv:2506.01478 (cross-list from cs.LG) [cn-pdf, pdf, html, other]
Title: MUDI: A Multimodal Biomedical Dataset for Understanding Pharmacodynamic Drug-Drug Interactions
Title: MUDI:理解药效药物-药物相互作用的多模态生物医学数据集
Tung-Lam Ngo, Ba-Hoang Tran, Duy-Cat Can, Trung-Hieu Do, Oliver Y. Chén, Hoang-Quynh Le
Subjects: Machine Learning (cs.LG) ; Computation and Language (cs.CL) ; Multimedia (cs.MM) ; Quantitative Methods (q-bio.QM)
[48] arXiv:2506.01482 (cross-list from cs.LG) [cn-pdf, pdf, html, other]
Title: Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
Title: 自动舞台灯光控制:是规则驱动的过程还是生成任务?
Zijian Zhao, Dian Jin, Zijing Zhou, Xiaoyu Zhang
Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[49] arXiv:2506.01822 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: GSCodec Studio: A Modular Framework for Gaussian Splat Compression
Title: GSCodec工作室:高斯点云压缩的模块化框架
Sicheng Li, Chengzhen Wu, Hao Li, Xiang Gao, Yiyi Liao, Lu Yu
Comments: Repository of the project: https://github.com/JasonLSC/GSCodec_Studio
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[50] arXiv:2506.01850 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs
Title: MoDA:用于细粒度视觉接地的指令调优大模型的调制适配器
Wayner Barrios, Andrés Villa, Juan León Alcázar, SouYoung Jin, Bernard Ghanem
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) ; Multimedia (cs.MM)
Total of 153 entries : 1-50 51-100 101-150 151-153
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号