Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs.MM

Help | Advanced Search

Multimedia

Authors and titles for July 2025

Total of 147 entries : 1-50 51-100 101-147
Showing up to 50 entries per page: fewer | more | all
[1] arXiv:2507.00926 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction
Title: 超融合:面向社交媒体受欢迎程度预测的分层多模态集成学习
Liliang Ye (1), Yunyao Zhang (1), Yafeng Wu (1), Yi-Ping Phoebe Chen (2), Junqing Yu (1), Wei Yang (1), Zikai Song (1) ((1) Huazhong University of Science and Technology, Wuhan, China, (2) La Trobe University, Melbourne, Australia)
Subjects: Multimedia (cs.MM) ; Machine Learning (cs.LG)
[2] arXiv:2507.01320 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Robust Multi-generation Learned Compression of Point Cloud Attribute
Title: 点云属性的鲁棒多代学习压缩
Xiangzuo Liu, Zhikai Liu, PengPeng Yu, Ruishan Huang, Fan Liang
Subjects: Multimedia (cs.MM)
[3] arXiv:2507.02080 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation
Title: TAGF:面向时间的门控融合用于多模态情感兴奋度估计
Yubeen Lee, Sangeun Lee, Chaewon Park, Junyeop Cha, Eunil Park
Comments: 9 pages, 2 figures, 2 tables
Subjects: Multimedia (cs.MM) ; Sound (cs.SD)
[4] arXiv:2507.02626 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning
Title: VRAgent-R1:通过强化学习基于MLLM的代理提升视频推荐
Siran Chen, Boyu Chen, Chenyun Yu, Yuxiao Luo, Ouyang Yi, Lei Cheng, Chengxiang Zhuo, Zang Li, Yali Wang
Subjects: Multimedia (cs.MM)
[5] arXiv:2507.04758 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning
Title: Music2Palette:通过跨模态表示学习生成情感一致的色彩调板
Jiayun Hu, Yueyi He, Tianyi Liang, Changbo Wang, Chenhui Li
Subjects: Multimedia (cs.MM)
[6] arXiv:2507.05113 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation
Title: 通过基于熵的中毒数据集分离的CLIP引导后门防御
Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang
Comments: 15 pages, 9 figures, 15 tables. To appear in the Proceedings of the 32nd ACM International Conference on Multimedia (MM '25)
Subjects: Multimedia (cs.MM) ; Cryptography and Security (cs.CR) ; Machine Learning (cs.LG)
[7] arXiv:2507.07396 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
Title: IML-Spikeformer:用于语音处理的输入感知多级脉冲变换器
Zeyang Song, Shimin Zhang, Yuhong Chou, Jibin Wu, Haizhou Li
Comments: Accepted by TNNLS
Subjects: Multimedia (cs.MM) ; Machine Learning (cs.LG) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[8] arXiv:2507.07911 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: The Potential of Olfactory Stimuli in Stress Reduction through Virtual Reality
Title: 嗅觉刺激在通过虚拟现实减轻压力中的潜力
Yasmin Elsaddik Valdivieso, Mohd Faisal, Karim Alghoul, Monireh (Monica)Vahdati, Kamran Gholizadeh Hamlabadi, Fedwa Laamarti, Hussein Al Osman, Abdulmotaleb El Saddik
Comments: Accepted to IEEE Medical Measurements & Applications (MeMeA) 2025
Subjects: Multimedia (cs.MM) ; Human-Computer Interaction (cs.HC)
[9] arXiv:2507.07938 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency
Title: 多模态可解释自动驾驶框架:整合视频、传感器和文本数据以增强决策能力和透明度
Abolfazl Zarghani, Amirhossein Ebrahimi, Amir Malekesfandiari
Subjects: Multimedia (cs.MM)
[10] arXiv:2507.08064 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning
Title: PUMA:用于高效统一多模态检索的层剪枝语言模型与模态自适应学习
Yibo Lyu, Rui Shao, Gongwei Chen, Yijie Zhu, Weili Guan, Liqiang Nie
Comments: Accepted to ACM MM 2025
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[11] arXiv:2507.08104 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: VideoConviction: A Multimodal Benchmark for Human Conviction and Stock Market Recommendations
Title: 视频定罪:人类定罪和股票市场推荐的多模态基准
Michael Galarnyk, Veer Kejriwal, Agam Shah, Yash Bhardwaj, Nicholas Meyer, Anand Krishnan, Sudheer Chava
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)
[12] arXiv:2507.08590 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Visual Semantic Description Generation with MLLMs for Image-Text Matching
Title: 基于多模态大语言模型的图像-文本匹配的视觉语义描述生成
Junyu Chen, Yihua Gao, Mingyong Li
Comments: Accepted by ICME2025 oral
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[13] arXiv:2507.09647 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection
Title: KEN:多模态虚假新闻检测的知识增强和情感引导网络
Peican Zhu, Yubo Jing, Le Cheng, Keke Tang, Yangming Guo
Comments: Accepted by ACM MM 2025
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
[14] arXiv:2507.09945 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Title: ESG-Net:事件感知语义引导网络用于密集音视频事件定位
Huilai Li, Yonghao Dang, Ying Xing, Yiming Wang, Jianqin Yin
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[15] arXiv:2507.10066 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: LayLens: Improving Deepfake Understanding through Simplified Explanations
Title: LayLens:通过简化解释提高深度伪造理解
Abhijeet Narang, Parul Gupta, Liuyijia Su, Abhinav Dhall
Comments: Accepted to ACM ICMI 2025 Demos
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[16] arXiv:2507.10109 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis
Title: DualDub:通过联合语音和背景音频合成的视频到配乐生成
Wenjie Tian, Xinfa Zhu, Haohe Liu, Zhixian Zhao, Zihao Chen, Chaofan Ding, Xinhan Di, Junjie Zheng, Lei Xie
Subjects: Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[17] arXiv:2507.10859 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: MultiVox: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions
Title: 多视:用于评估多模态交互语音助手的基准
Ramaneswaran Selvakumar, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha
Subjects: Multimedia (cs.MM) ; Computation and Language (cs.CL) ; Human-Computer Interaction (cs.HC)
[18] arXiv:2507.13415 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection
Title: SEER:多模态虚假新闻检测的语义增强和情感推理网络
Peican Zhu, Yubo Jing, Le Cheng, Bin Chen, Xiaodong Cui, Lianwei Wu, Keke Tang
Comments: Accepted by SMC 2025
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
[19] arXiv:2507.14915 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
Title: 通过分层运动建模的音乐对齐整体3D舞蹈生成
Xiaojie Li, Ronghui Li, Shukai Fang, Shuzhao Xie, Xiaoyang Guo, Jiaqing Zhou, Junkun Peng, Zhi Wang
Subjects: Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[20] arXiv:2507.15491 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval
Title: 针对高效文本-视频检索的提示感知帧采样
Deyu Zhang, Tingting Long, Jinrui Zhang, Ligeng Chen, Ju Ren, Yaoxue Zhang
Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[21] arXiv:2507.15673 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Point Cloud Streaming with Latency-Driven Implicit Adaptation using MoQ
Title: 基于MoQ的延迟驱动隐式自适应点云流
Andrew Freeman, Michael Rudolph, Amr Rizk
Subjects: Multimedia (cs.MM) ; Networking and Internet Architecture (cs.NI)
[22] arXiv:2507.16396 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Knowledge-aware Diffusion-Enhanced Multimedia Recommendation
Title: 基于知识的扩散增强多模态推荐
Xian Mo, Fei Liu, Rui Tang, Jintao, Gao, Hao Liu
Subjects: Multimedia (cs.MM)
[23] arXiv:2507.17232 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task
Title: 一种具有成分状态注释的高清洁食谱数据集,用于状态探测任务
Mashiro Toyooka, Kiyoharu Aizawa, Yoko Yamakata
Comments: Accepted to ACM Multimedia 2025. The dataset are publicly available at: https://huggingface.co/datasets/mashi6n/nhkrecipe-100-anno-1
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
[24] arXiv:2507.17653 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: QuMAB: Query-based Multi-Annotator Behavior Modeling with Reliability under Sparse Labels
Title: QuMAB:稀疏标签下基于查询的多标注者行为建模与可靠性
Liyun Zhang, Zheng Lian, Hong Liu, Takanori Takebe, Yuta Nakashima
Comments: 12 pages. arXiv admin note: substantial text overlap with arXiv:2503.15237
Subjects: Multimedia (cs.MM) ; Information Retrieval (cs.IR)
[25] arXiv:2507.18750 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation
Title: 语音到图像生成的EXPrompt引导编码器适应
Hyunwoo Oh, SeungJu Cha, Kwanyoung Lee, Si-Woo Kim, Dong-Jin Kim
Subjects: Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[26] arXiv:2507.18932 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
Title: MMESGBench:面向ESG任务的多模态理解和复杂推理基准测试
Lei Zhang, Xin Zhou, Chaoyue He, Di Wang, Yi Wu, Hong Xu, Wei Liu, Chunyan Miao
Comments: Accepted at ACM MM 2025
Subjects: Multimedia (cs.MM) ; Computation and Language (cs.CL)
[27] arXiv:2507.19863 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Anchoring Trends: Mitigating Social Media Popularity Prediction Drift via Feature Clustering and Expansion
Title: 锚定趋势:通过特征聚类和扩展缓解社交媒体流行度预测漂移
Chia-Ming Lee, Bo-Cheng Qiu, Cheng-Jun Kang, Yi-Hsuan Wu, Jun-Lin Chen, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Chung Hsu
Comments: Accepted by ACM Multimedia 2025
Subjects: Multimedia (cs.MM)
[28] arXiv:2507.20627 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: Controllable Video-to-Music Generation with Multiple Time-Varying Conditions
Title: 具有多个时变条件的可控视频到音乐生成
Junxian Wu, Weitao You, Heda Zuo, Dengming Zhang, Pei Chen, Lingyun Sun
Comments: Accepted by the 33rd ACM International Conference on Multimedia (ACMMM 2025). The project page is available at https://kita-wjx.github.io/MCV2M/
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[29] arXiv:2507.20738 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning
Title: 模态的黑暗面:用于多模态知识图谱推理的强化多模态蒸馏
Yu Zhao, Ying Zhang, Xuhui Sui, Baohang Zhou, Haoze Zhu, Jeff Z. Pan, Xiaojie Yuan
Comments: Accepted by ACM MM 2025
Subjects: Multimedia (cs.MM)
[30] arXiv:2507.21395 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion
Title: Sync-TVA:一种用于跨模态融合的多模态情感识别图注意力框架
Zeyu Deng, Yanhui Lu, Jiashu Liao, Shuang Wu, Chongfeng Wei
Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[31] arXiv:2507.21557 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: PC-JND: Subjective Study and Dataset on Just Noticeable Difference for Point Clouds in 6DoF Virtual Reality
Title: PC-JND:6DoF虚拟现实中点云可察觉差异的主观研究和数据集
Chunling Fan, Yun Zhang, Dietmar Saupe, Raouf Hamzaoui, Weisi Lin
Comments: 13 pages, 10 figures, Journal
Subjects: Multimedia (cs.MM)
[32] arXiv:2507.21926 (cross-list from cs.MM) [cn-pdf, pdf, other]
Title: Efficient Sub-pixel Motion Compensation in Learned Video Codecs
Title: 学习视频编解码器中的高效子像素运动补偿
Théo Ladune, Thomas Leguay, Pierrick Philippe, Gordon Clare, Félix Henry
Subjects: Multimedia (cs.MM) ; Image and Video Processing (eess.IV)
[33] arXiv:2507.22731 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
Title: GestureHYDRA:通过混合模态扩散变压器和级联同步检索增强生成的语义协同言语手势合成
Quanwei Yang, Luying Huang, Kaisiyuan Wang, Jiazhi Guan, Shengyi He, Fengguo Li, Hang Zhou, Lingyun Yu, Yingying Li, Haocheng Feng, Hongtao Xie
Comments: 10 pages, 5 figures, Accepted by ICCV 2025
Subjects: Multimedia (cs.MM)
[34] arXiv:2507.23444 (cross-list from cs.MM) [cn-pdf, pdf, html, other]
Title: Hybrid CNN-Mamba Enhancement Network for Robust Multimodal Sentiment Analysis
Title: 基于混合CNN-Mamba增强网络的鲁棒多模态情感分析
Xiang Li, Xianfu Cheng, Xiaoming Zhang, Zhoujun Li
Subjects: Multimedia (cs.MM)
[35] arXiv:2507.00055 (cross-list from cs.LG) [cn-pdf, pdf, html, other]
Title: Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation
Title: 利用知识蒸馏在语音情感识别中使用未标记的音频视觉数据
Varsha Pendyala, Pedro Morgado, William Sethares
Comments: Accepted at INTERSPEECH 2025
Subjects: Machine Learning (cs.LG) ; Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS) ; Image and Video Processing (eess.IV) ; Signal Processing (eess.SP)
[36] arXiv:2507.00466 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
Title: 基于端到端Transformer架构的演奏MIDI中的节拍和强拍跟踪
Sebastian Murgul, Michael Heizmann
Comments: Accepted to the 22nd Sound and Music Computing Conference (SMC), 2025
Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[37] arXiv:2507.00498 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: MuteSwap: Visual-informed Silent Video Identity Conversion
Title: MuteSwap:视觉引导的无声视频身份转换
Yifan Liu, Yu Fang, Zhouhan Lin
Subjects: Sound (cs.SD) ; Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[38] arXiv:2507.00950 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: MVP: Winning Solution to SMP Challenge 2025 Video Track
Title: MVP:SMP 2025 视频赛道获胜解决方案
Liliang Ye (1), Yunyao Zhang (1), Yafeng Wu (1), Yi-Ping Phoebe Chen (2), Junqing Yu (1), Wei Yang (1), Zikai Song (1) ((1) Huazhong University of Science and Technology, Wuhan, China, (2) La Trobe University, Melbourne, Australia)
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) ; Multimedia (cs.MM)
[39] arXiv:2507.01022 (cross-list from eess.AS) [cn-pdf, pdf, html, other]
Title: Workflow-Based Evaluation of Music Generation Systems
Title: 基于工作流的音乐生成系统评估
Shayan Dadman, Bernt Arild Bremdal, Andreas Bergsland
Comments: 54 pages, 3 figures, 6 tables, 5 appendices
Subjects: Audio and Speech Processing (eess.AS) ; Human-Computer Interaction (cs.HC) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Sound (cs.SD)
[40] arXiv:2507.01582 (cross-list from cs.SD) [cn-pdf, pdf, html, other]
Title: Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder
Title: 探索使用富有表现力的音乐变分自编码器进行古典钢琴演奏生成
Jing Luo, Xinyu Yang, Jie Wei
Comments: Accepted by IEEE SMC 2025
Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[41] arXiv:2507.01652 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective
Title: 基于线性复杂度的自回归图像生成:一种空间感知衰减视角
Yuxin Mao, Zhen Qin, Jinxing Zhou, Hui Deng, Xuyang Shen, Bin Fan, Jing Zhang, Yiran Zhong, Yuchao Dai
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM)
[42] arXiv:2507.01776 (cross-list from cs.HC) [cn-pdf, pdf, html, other]
Title: Human-Machine Collaboration-Guided Space Design: Combination of Machine Learning Models and Humanistic Design Concepts
Title: 人机协作引导的空间设计:机器学习模型与人文设计概念的结合
Yuxuan Yang
Subjects: Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM)
[43] arXiv:2507.01800 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision
Title: HCNQA:通过分层集中缩小监督增强3D VQA
Shengli Zhou, Jianuo Zhu, Qilin Huang, Fangjing Wang, Yanfu Zhang, Feng Zheng
Comments: ICANN 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[44] arXiv:2507.02000 (cross-list from cs.IR) [cn-pdf, pdf, html, other]
Title: Why Multi-Interest Fairness Matters: Hypergraph Contrastive Multi-Interest Learning for Fair Conversational Recommender System
Title: 为什么多兴趣公平性很重要:用于公平对话推荐系统的超图对比多兴趣学习
Yongsen Zheng, Zongxuan Xie, Guohua Wang, Ziyao Liu, Liang Lin, Kwok-Yan Lam
Subjects: Information Retrieval (cs.IR) ; Computation and Language (cs.CL) ; Multimedia (cs.MM)
[45] arXiv:2507.02271 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Spotlighting Partially Visible Cinematic Language for Video-to-Audio Generation via Self-distillation
Title: 通过自蒸馏突出部分可见的影视语言用于视频到音频生成
Feizhen Huang, Yu Wu, Yutian Lin, Bo Du
Comments: Accepted by IJCAI 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM)
[46] arXiv:2507.02900 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions
Title: 推进说话头生成:多模态方法、数据集、评估指标和损失函数的全面综述
Vineet Kumar Rakesh, Soumya Mazumdar, Research Pratim Maity, Sarbajit Pal, Amitabha Das, Tapas Samanta
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Graphics (cs.GR) ; Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM)
[47] arXiv:2507.02941 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: GameTileNet: A Semantic Dataset for Low-Resolution Game Art in Procedural Content Generation
Title: GameTileNet:程序生成中低分辨率游戏艺术的语义数据集
Yi-Chun Chen, Arnav Jhala
Comments: Note: This is a preprint version of a paper submitted to AIIDE 2025. It includes additional discussion of limitations and future directions that were omitted from the conference version due to space constraints
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Multimedia (cs.MM)
[48] arXiv:2507.03286 (cross-list from cs.HC) [cn-pdf, pdf, html, other]
Title: Gaze and Glow: Exploring Editing Processes on Social Media through Interactive Exhibition
Title: 凝视与光芒:通过互动展览探索社交媒体上的编辑过程
Yang Hong, Jie-Yi Feng, Yi-Chun Yao, I-Hsuan Cho, Yu-Ting Lin, Ying-Yu Chen
Comments: 6 pages, 6 figures, to be published in DIS 2025 (Provocations and Works in Progress)
Subjects: Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM)
[49] arXiv:2507.03434 (cross-list from cs.CV) [cn-pdf, pdf, html, other]
Title: Unlearning the Noisy Correspondence Makes CLIP More Robust
Title: 消除噪声对应关系使CLIP更稳健
Haochen Han, Alex Jinpeng Wang, Peijun Ye, Fangming Liu
Comments: ICCV 2025
Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[50] arXiv:2507.03797 (cross-list from cs.HC) [cn-pdf, pdf, html, other]
Title: Assessing the Viability of Wave Field Synthesis in VR-Based Cognitive Research
Title: 评估基于虚拟现实的认知研究中波场合成的可行性
Benjamin Kahl
Comments: 35 pages
Subjects: Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
Total of 147 entries : 1-50 51-100 101-147
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号