Multimedia

Authors and titles for July 2025

Total of 147 entries : 1-50 51-100 101-147

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2507.00926 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction

Title: 超融合：面向社交媒体受欢迎程度预测的分层多模态集成学习

Liliang Ye (1), Yunyao Zhang (1), Yafeng Wu (1), Yi-Ping Phoebe Chen (2), Junqing Yu (1), Wei Yang (1), Zikai Song (1) ((1) Huazhong University of Science and Technology, Wuhan, China, (2) La Trobe University, Melbourne, Australia)

Subjects: Multimedia (cs.MM) ; Machine Learning (cs.LG)
[2] arXiv:2507.01320 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Robust Multi-generation Learned Compression of Point Cloud Attribute

Title: 点云属性的鲁棒多代学习压缩

Xiangzuo Liu, Zhikai Liu, PengPeng Yu, Ruishan Huang, Fan Liang

Subjects: Multimedia (cs.MM)
[3] arXiv:2507.02080 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation

Title: TAGF：面向时间的门控融合用于多模态情感兴奋度估计

Yubeen Lee, Sangeun Lee, Chaewon Park, Junyeop Cha, Eunil Park

Comments: 9 pages, 2 figures, 2 tables

Subjects: Multimedia (cs.MM) ; Sound (cs.SD)
[4] arXiv:2507.02626 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning

Title: VRAgent-R1：通过强化学习基于MLLM的代理提升视频推荐

Siran Chen, Boyu Chen, Chenyun Yu, Yuxiao Luo, Ouyang Yi, Lei Cheng, Chengxiang Zhuo, Zang Li, Yali Wang

Subjects: Multimedia (cs.MM)
[5] arXiv:2507.04758 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning

Title: Music2Palette：通过跨模态表示学习生成情感一致的色彩调板

Jiayun Hu, Yueyi He, Tianyi Liang, Changbo Wang, Chenhui Li

Subjects: Multimedia (cs.MM)
[6] arXiv:2507.05113 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

Title: 通过基于熵的中毒数据集分离的CLIP引导后门防御

Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang

Comments: 15 pages, 9 figures, 15 tables. To appear in the Proceedings of the 32nd ACM International Conference on Multimedia (MM '25)

Subjects: Multimedia (cs.MM) ; Cryptography and Security (cs.CR) ; Machine Learning (cs.LG)
[7] arXiv:2507.07396 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing

Title: IML-Spikeformer：用于语音处理的输入感知多级脉冲变换器

Zeyang Song, Shimin Zhang, Yuhong Chou, Jibin Wu, Haizhou Li

Comments: Accepted by TNNLS

Subjects: Multimedia (cs.MM) ; Machine Learning (cs.LG) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[8] arXiv:2507.07911 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: The Potential of Olfactory Stimuli in Stress Reduction through Virtual Reality

Title: 嗅觉刺激在通过虚拟现实减轻压力中的潜力

Yasmin Elsaddik Valdivieso, Mohd Faisal, Karim Alghoul, Monireh (Monica)Vahdati, Kamran Gholizadeh Hamlabadi, Fedwa Laamarti, Hussein Al Osman, Abdulmotaleb El Saddik

Comments: Accepted to IEEE Medical Measurements & Applications (MeMeA) 2025

Subjects: Multimedia (cs.MM) ; Human-Computer Interaction (cs.HC)
[9] arXiv:2507.07938 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency

Title: 多模态可解释自动驾驶框架：整合视频、传感器和文本数据以增强决策能力和透明度

Abolfazl Zarghani, Amirhossein Ebrahimi, Amir Malekesfandiari

Subjects: Multimedia (cs.MM)
[10] arXiv:2507.08064 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning

Title: PUMA：用于高效统一多模态检索的层剪枝语言模型与模态自适应学习

Yibo Lyu, Rui Shao, Gongwei Chen, Yijie Zhu, Weili Guan, Liqiang Nie

Comments: Accepted to ACM MM 2025

Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[11] arXiv:2507.08104 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: VideoConviction: A Multimodal Benchmark for Human Conviction and Stock Market Recommendations

Title: 视频定罪：人类定罪和股票市场推荐的多模态基准

Michael Galarnyk, Veer Kejriwal, Agam Shah, Yash Bhardwaj, Nicholas Meyer, Anand Krishnan, Sudheer Chava

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)
[12] arXiv:2507.08590 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Visual Semantic Description Generation with MLLMs for Image-Text Matching

Title: 基于多模态大语言模型的图像-文本匹配的视觉语义描述生成

Junyu Chen, Yihua Gao, Mingyong Li

Comments: Accepted by ICME2025 oral

Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[13] arXiv:2507.09647 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection

Title: KEN：多模态虚假新闻检测的知识增强和情感引导网络

Peican Zhu, Yubo Jing, Le Cheng, Keke Tang, Yangming Guo

Comments: Accepted by ACM MM 2025

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
[14] arXiv:2507.09945 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization

Title: ESG-Net：事件感知语义引导网络用于密集音视频事件定位

Huilai Li, Yonghao Dang, Ying Xing, Yiming Wang, Jianqin Yin

Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[15] arXiv:2507.10066 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: LayLens: Improving Deepfake Understanding through Simplified Explanations

Title: LayLens：通过简化解释提高深度伪造理解

Abhijeet Narang, Parul Gupta, Liuyijia Su, Abhinav Dhall

Comments: Accepted to ACM ICMI 2025 Demos

Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[16] arXiv:2507.10109 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis

Title: DualDub：通过联合语音和背景音频合成的视频到配乐生成

Wenjie Tian, Xinfa Zhu, Haohe Liu, Zhixian Zhao, Zihao Chen, Chaofan Ding, Xinhan Di, Junjie Zheng, Lei Xie

Subjects: Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[17] arXiv:2507.10859 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: MultiVox: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions

Title: 多视：用于评估多模态交互语音助手的基准

Ramaneswaran Selvakumar, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha

Subjects: Multimedia (cs.MM) ; Computation and Language (cs.CL) ; Human-Computer Interaction (cs.HC)
[18] arXiv:2507.13415 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection

Title: SEER：多模态虚假新闻检测的语义增强和情感推理网络

Peican Zhu, Yubo Jing, Le Cheng, Bin Chen, Xiaodong Cui, Lianwei Wu, Keke Tang

Comments: Accepted by SMC 2025

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
[19] arXiv:2507.14915 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling

Title: 通过分层运动建模的音乐对齐整体3D舞蹈生成

Xiaojie Li, Ronghui Li, Shukai Fang, Shuzhao Xie, Xiaoyang Guo, Jiaqing Zhou, Junkun Peng, Zhi Wang

Subjects: Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[20] arXiv:2507.15491 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval

Title: 针对高效文本-视频检索的提示感知帧采样

Deyu Zhang, Tingting Long, Jinrui Zhang, Ligeng Chen, Ju Ren, Yaoxue Zhang

Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[21] arXiv:2507.15673 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Point Cloud Streaming with Latency-Driven Implicit Adaptation using MoQ

Title: 基于MoQ的延迟驱动隐式自适应点云流

Andrew Freeman, Michael Rudolph, Amr Rizk

Subjects: Multimedia (cs.MM) ; Networking and Internet Architecture (cs.NI)
[22] arXiv:2507.16396 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Knowledge-aware Diffusion-Enhanced Multimedia Recommendation

Title: 基于知识的扩散增强多模态推荐

Xian Mo, Fei Liu, Rui Tang, Jintao, Gao, Hao Liu

Subjects: Multimedia (cs.MM)
[23] arXiv:2507.17232 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task

Title: 一种具有成分状态注释的高清洁食谱数据集，用于状态探测任务

Mashiro Toyooka, Kiyoharu Aizawa, Yoko Yamakata

Comments: Accepted to ACM Multimedia 2025. The dataset are publicly available at: https://huggingface.co/datasets/mashi6n/nhkrecipe-100-anno-1

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
[24] arXiv:2507.17653 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: QuMAB: Query-based Multi-Annotator Behavior Modeling with Reliability under Sparse Labels

Title: QuMAB：稀疏标签下基于查询的多标注者行为建模与可靠性

Liyun Zhang, Zheng Lian, Hong Liu, Takanori Takebe, Yuta Nakashima

Comments: 12 pages. arXiv admin note: substantial text overlap with arXiv:2503.15237

Subjects: Multimedia (cs.MM) ; Information Retrieval (cs.IR)
[25] arXiv:2507.18750 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation

Title: 语音到图像生成的EXPrompt引导编码器适应

Hyunwoo Oh, SeungJu Cha, Kwanyoung Lee, Si-Woo Kim, Dong-Jin Kim

Subjects: Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[26] arXiv:2507.18932 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks

Title: MMESGBench：面向ESG任务的多模态理解和复杂推理基准测试

Lei Zhang, Xin Zhou, Chaoyue He, Di Wang, Yi Wu, Hong Xu, Wei Liu, Chunyan Miao

Comments: Accepted at ACM MM 2025

Subjects: Multimedia (cs.MM) ; Computation and Language (cs.CL)
[27] arXiv:2507.19863 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Anchoring Trends: Mitigating Social Media Popularity Prediction Drift via Feature Clustering and Expansion

Title: 锚定趋势：通过特征聚类和扩展缓解社交媒体流行度预测漂移

Chia-Ming Lee, Bo-Cheng Qiu, Cheng-Jun Kang, Yi-Hsuan Wu, Jun-Lin Chen, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Chung Hsu

Comments: Accepted by ACM Multimedia 2025

Subjects: Multimedia (cs.MM)
[28] arXiv:2507.20627 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: Controllable Video-to-Music Generation with Multiple Time-Varying Conditions

Title: 具有多个时变条件的可控视频到音乐生成

Junxian Wu, Weitao You, Heda Zuo, Dengming Zhang, Pei Chen, Lingyun Sun

Comments: Accepted by the 33rd ACM International Conference on Multimedia (ACMMM 2025). The project page is available at https://kita-wjx.github.io/MCV2M/

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[29] arXiv:2507.20738 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning

Title: 模态的黑暗面：用于多模态知识图谱推理的强化多模态蒸馏

Yu Zhao, Ying Zhang, Xuhui Sui, Baohang Zhou, Haoze Zhu, Jeff Z. Pan, Xiaojie Yuan

Comments: Accepted by ACM MM 2025

Subjects: Multimedia (cs.MM)
[30] arXiv:2507.21395 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion

Title: Sync-TVA：一种用于跨模态融合的多模态情感识别图注意力框架

Zeyu Deng, Yanhui Lu, Jiashu Liao, Shuang Wu, Chongfeng Wei

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[31] arXiv:2507.21557 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: PC-JND: Subjective Study and Dataset on Just Noticeable Difference for Point Clouds in 6DoF Virtual Reality

Title: PC-JND：6DoF虚拟现实中点云可察觉差异的主观研究和数据集

Chunling Fan, Yun Zhang, Dietmar Saupe, Raouf Hamzaoui, Weisi Lin

Comments: 13 pages, 10 figures, Journal

Subjects: Multimedia (cs.MM)
[32] arXiv:2507.21926 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: Efficient Sub-pixel Motion Compensation in Learned Video Codecs

Title: 学习视频编解码器中的高效子像素运动补偿

Théo Ladune, Thomas Leguay, Pierrick Philippe, Gordon Clare, Félix Henry

Subjects: Multimedia (cs.MM) ; Image and Video Processing (eess.IV)
[33] arXiv:2507.22731 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation

Title: GestureHYDRA：通过混合模态扩散变压器和级联同步检索增强生成的语义协同言语手势合成

Quanwei Yang, Luying Huang, Kaisiyuan Wang, Jiazhi Guan, Shengyi He, Fengguo Li, Hang Zhou, Lingyun Yu, Yingying Li, Haocheng Feng, Hongtao Xie

Comments: 10 pages, 5 figures, Accepted by ICCV 2025

Subjects: Multimedia (cs.MM)
[34] arXiv:2507.23444 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Hybrid CNN-Mamba Enhancement Network for Robust Multimodal Sentiment Analysis

Title: 基于混合CNN-Mamba增强网络的鲁棒多模态情感分析

Xiang Li, Xianfu Cheng, Xiaoming Zhang, Zhoujun Li

Subjects: Multimedia (cs.MM)
[35] arXiv:2507.00055 (cross-list from cs.LG) [cn-pdf, pdf, html, other]: Title: Leveraging Unlabeled Audio-Visual Data in Speech Emotion Recognition using Knowledge Distillation

Title: 利用知识蒸馏在语音情感识别中使用未标记的音频视觉数据

Varsha Pendyala, Pedro Morgado, William Sethares

Comments: Accepted at INTERSPEECH 2025

Subjects: Machine Learning (cs.LG) ; Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS) ; Image and Video Processing (eess.IV) ; Signal Processing (eess.SP)
[36] arXiv:2507.00466 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture

Title: 基于端到端Transformer架构的演奏MIDI中的节拍和强拍跟踪

Sebastian Murgul, Michael Heizmann

Comments: Accepted to the 22nd Sound and Music Computing Conference (SMC), 2025

Subjects: Sound (cs.SD) ; Computation and Language (cs.CL) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[37] arXiv:2507.00498 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: MuteSwap: Visual-informed Silent Video Identity Conversion

Title: MuteSwap：视觉引导的无声视频身份转换

Yifan Liu, Yu Fang, Zhouhan Lin

Subjects: Sound (cs.SD) ; Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[38] arXiv:2507.00950 (cross-list from cs.CV) [cn-pdf, pdf, html, other]: Title: MVP: Winning Solution to SMP Challenge 2025 Video Track

Title: MVP：SMP 2025 视频赛道获胜解决方案

Liliang Ye (1), Yunyao Zhang (1), Yafeng Wu (1), Yi-Ping Phoebe Chen (2), Junqing Yu (1), Wei Yang (1), Zikai Song (1) ((1) Huazhong University of Science and Technology, Wuhan, China, (2) La Trobe University, Melbourne, Australia)

Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG) ; Multimedia (cs.MM)
[39] arXiv:2507.01022 (cross-list from eess.AS) [cn-pdf, pdf, html, other]: Title: Workflow-Based Evaluation of Music Generation Systems

Title: 基于工作流的音乐生成系统评估

Shayan Dadman, Bernt Arild Bremdal, Andreas Bergsland

Comments: 54 pages, 3 figures, 6 tables, 5 appendices

Subjects: Audio and Speech Processing (eess.AS) ; Human-Computer Interaction (cs.HC) ; Machine Learning (cs.LG) ; Multimedia (cs.MM) ; Sound (cs.SD)
[40] arXiv:2507.01582 (cross-list from cs.SD) [cn-pdf, pdf, html, other]: Title: Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder

Title: 探索使用富有表现力的音乐变分自编码器进行古典钢琴演奏生成

Jing Luo, Xinyu Yang, Jie Wei

Comments: Accepted by IEEE SMC 2025

Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[41] arXiv:2507.01652 (cross-list from cs.CV) [cn-pdf, pdf, html, other]: Title: Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective

Title: 基于线性复杂度的自回归图像生成：一种空间感知衰减视角

Yuxin Mao, Zhen Qin, Jinxing Zhou, Hui Deng, Xuyang Shen, Bin Fan, Jing Zhang, Yiran Zhong, Yuchao Dai

Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM)
[42] arXiv:2507.01776 (cross-list from cs.HC) [cn-pdf, pdf, html, other]: Title: Human-Machine Collaboration-Guided Space Design: Combination of Machine Learning Models and Humanistic Design Concepts

Title: 人机协作引导的空间设计：机器学习模型与人文设计概念的结合

Yuxuan Yang

Subjects: Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM)
[43] arXiv:2507.01800 (cross-list from cs.CV) [cn-pdf, pdf, html, other]: Title: HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision

Title: HCNQA：通过分层集中缩小监督增强3D VQA

Shengli Zhou, Jianuo Zhu, Qilin Huang, Fangjing Wang, Yanfu Zhang, Feng Zheng

Comments: ICANN 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[44] arXiv:2507.02000 (cross-list from cs.IR) [cn-pdf, pdf, html, other]: Title: Why Multi-Interest Fairness Matters: Hypergraph Contrastive Multi-Interest Learning for Fair Conversational Recommender System

Title: 为什么多兴趣公平性很重要：用于公平对话推荐系统的超图对比多兴趣学习

Yongsen Zheng, Zongxuan Xie, Guohua Wang, Ziyao Liu, Liang Lin, Kwok-Yan Lam

Subjects: Information Retrieval (cs.IR) ; Computation and Language (cs.CL) ; Multimedia (cs.MM)
[45] arXiv:2507.02271 (cross-list from cs.CV) [cn-pdf, pdf, html, other]: Title: Spotlighting Partially Visible Cinematic Language for Video-to-Audio Generation via Self-distillation

Title: 通过自蒸馏突出部分可见的影视语言用于视频到音频生成

Feizhen Huang, Yu Wu, Yutian Lin, Bo Du

Comments: Accepted by IJCAI 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Multimedia (cs.MM)
[46] arXiv:2507.02900 (cross-list from cs.CV) [cn-pdf, pdf, html, other]: Title: Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions

Title: 推进说话头生成：多模态方法、数据集、评估指标和损失函数的全面综述

Vineet Kumar Rakesh, Soumya Mazumdar, Research Pratim Maity, Sarbajit Pal, Amitabha Das, Tapas Samanta

Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Graphics (cs.GR) ; Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM)
[47] arXiv:2507.02941 (cross-list from cs.CV) [cn-pdf, pdf, html, other]: Title: GameTileNet: A Semantic Dataset for Low-Resolution Game Art in Procedural Content Generation

Title: GameTileNet：程序生成中低分辨率游戏艺术的语义数据集

Yi-Chun Chen, Arnav Jhala

Comments: Note: This is a preprint version of a paper submitted to AIIDE 2025. It includes additional discussion of limitations and future directions that were omitted from the conference version due to space constraints

Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) ; Multimedia (cs.MM)
[48] arXiv:2507.03286 (cross-list from cs.HC) [cn-pdf, pdf, html, other]: Title: Gaze and Glow: Exploring Editing Processes on Social Media through Interactive Exhibition

Title: 凝视与光芒：通过互动展览探索社交媒体上的编辑过程

Yang Hong, Jie-Yi Feng, Yi-Chun Yao, I-Hsuan Cho, Yu-Ting Lin, Ying-Yu Chen

Comments: 6 pages, 6 figures, to be published in DIS 2025 (Provocations and Works in Progress)

Subjects: Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM)
[49] arXiv:2507.03434 (cross-list from cs.CV) [cn-pdf, pdf, html, other]: Title: Unlearning the Noisy Correspondence Makes CLIP More Robust

Title: 消除噪声对应关系使CLIP更稳健

Haochen Han, Alex Jinpeng Wang, Peijun Ye, Fangming Liu

Comments: ICCV 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Multimedia (cs.MM)
[50] arXiv:2507.03797 (cross-list from cs.HC) [cn-pdf, pdf, html, other]: Title: Assessing the Viability of Wave Field Synthesis in VR-Based Cognitive Research

Title: 评估基于虚拟现实的认知研究中波场合成的可行性

Benjamin Kahl

Comments: 35 pages

Subjects: Human-Computer Interaction (cs.HC) ; Multimedia (cs.MM) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)

Total of 147 entries : 1-50 51-100 101-147

Showing up to 50 entries per page: fewer | more | all