Multimedia

Authors and titles for June 2025

Total of 153 entries : 1-25 26-50 51-75 76-100 ... 151-153

Showing up to 25 entries per page: fewer | more | all

[1] arXiv:2506.00868 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual Manipulations

Title: 通过深度伪造的多元宇宙：以人物为中心的视觉和概念操纵的MultiFakeVerse数据集

Parul Gupta, Shreya Ghosh, Tom Gedeon, Thanh-Toan Do, Abhinav Dhall

Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[2] arXiv:2506.01211 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Iola Walker: A Mobile Footfall Detection System for Music Composition

Title: Iola Walker：用于音乐创作的移动人流检测系统

William B. James

Subjects: Multimedia (cs.MM) ; Audio and Speech Processing (eess.AS)
[3] arXiv:2506.01668 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach

Title: 小贴纸，大含义：一种带有游戏化方法的多语言贴纸语义理解数据集

Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang

Subjects: Multimedia (cs.MM) ; Information Retrieval (cs.IR)
[4] arXiv:2506.02380 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR

Title: EyeNavGS：用于VR中真实世界3DGS场景的6自由度导航数据集和记录-重放软件

Zihao Ding, Cheng-Tse Lee, Mufeng Zhu, Tao Guan, Yuan-Chun Sun, Cheng-Hsin Hsu, Yao Liu

Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV) ; Graphics (cs.GR) ; Human-Computer Interaction (cs.HC)
[5] arXiv:2506.02414 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion

Title: StarVC：一种用于语音转换中文本和语音联合生成的统一自回归框架

Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu

Comments: 5 pages, 2 figures, Accepted by Interspeech 2025, Demo: https://thuhcsi.github.io/StarVC/

Subjects: Multimedia (cs.MM) ; Computation and Language (cs.CL) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[6] arXiv:2506.02997 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation

Title: 带有掩码自编码风格丰富表示的可控文本到语音合成

Yongqi Wang, Chunlei Zhang, Hangting Chen, Zhou Zhao, Dong Yu

Subjects: Multimedia (cs.MM)
[7] arXiv:2506.03530 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: How Far Are We from Generating Missing Modalities with Foundation Models?

Title: 我们离用基础模型生成缺失模态还有多远？

Guanzhou Ke, Bo Wang, Guoqing Chao, Weiming Hu, Shengfeng He

Subjects: Multimedia (cs.MM) ; Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)
[8] arXiv:2506.05851 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection

Title: 深度伪造医生：诊断和治疗音频-视频伪造检测

Marcel Klemt, Carlotta Segna, Anna Rohrbach

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)
[9] arXiv:2506.05987 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: The JPEG XL Image Coding System: History, Features, Coding Tools, Design Rationale, and Future

Title: JPEG XL图像编码系统：历史、特性、编码工具、设计原理及未来

Jon Sneyers, Jyrki Alakuijala, Luca Versari, Zoltán Szabadka, Sami Boukortt, Amnon Cohen-Tidhar, Moritz Firsching, Evgenii Kliuchnikov, Tal Lev-Ami, Eric Portis, Thomas Richter, Osamu Watanabe

Comments: 73 pages, 62 figures

Subjects: Multimedia (cs.MM)
[10] arXiv:2506.06018 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models

Title: 基于再生扩散模型的无优化通用水印伪造

Chaoyi Zhu, Zaitang Li, Renyi Yang, Robert Birke, Pin-Yu Chen, Tsung-Yi Ho, Lydia Y. Chen

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR)
[11] arXiv:2506.06037 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: SVD: Spatial Video Dataset

Title: SVD：空间视频数据集

M. H. Izadimehr, Milad Ghanbari, Guodong Chen, Wei Zhou, Xiaoshuai Hao, Mallesham Dasari, Christian Timmerer, Hadi Amirpour

Subjects: Multimedia (cs.MM)
[12] arXiv:2506.06691 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: An Efficient Digital Watermarking Technique for Small Scale devices

Title: 一种适用于小型设备的高效数字水印技术

Kaushik Talathi, Aparna Santra Biswas

Comments: 28 pages, 11 figures, 4 tables

Subjects: Multimedia (cs.MM) ; Cryptography and Security (cs.CR)
[13] arXiv:2506.06743 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: The State-of-the-Art in Lifelog Retrieval: A Review of Progress at the ACM Lifelog Search Challenge Workshop 2022-24

Title: 日志检索的最新进展：对2022-24年ACM日志检索挑战研讨会进展的回顾

Allie Tran, Werner Bailer, Duc-Tien Dang-Nguyen, Graham Healy, Steve Hodges, Björn Þór Jónsson, Luca Rossetto, Klaus Schoeffmann, Minh-Triet Tran, Lucia Vadicamo, Cathal Gurrin

Subjects: Multimedia (cs.MM) ; Information Retrieval (cs.IR)
[14] arXiv:2506.06938 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: Experimental Evaluation of Static Image Sub-Region-Based Search Models Using CLIP

Title: 使用 CLIP 的基于静态图像子区域的搜索模型的实验评估

Bastian Jäckl, Vojtěch Kloda, Daniel A. Keim, Jakub Lokoč

Comments: 14 pages, 4 figures, 2 tables

Subjects: Multimedia (cs.MM) ; Computer Vision and Pattern Recognition (cs.CV)
[15] arXiv:2506.07076 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Harmony-Aware Music-driven Motion Synthesis with Perceptual Constraint on UGC Datasets

Title: 基于感知约束的谐波感知音乐驱动的UGC数据集运动合成

Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos

Subjects: Multimedia (cs.MM)
[16] arXiv:2506.09506 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Dynamic Sub-region Search in Homogeneous Collections Using CLIP

Title: 使用CLIP在同质集合中进行动态子区域搜索

Bastian Jäckl, Vojtěch Kloda, Daniel A. Keim, Jakub Lokoč

Comments: 18 pages, 4 figures, 5 tables

Subjects: Multimedia (cs.MM)
[17] arXiv:2506.09795 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model for Video Quality Assessment

Title: 从复杂性和结构中学习质量：用于视频质量评估的特征融合XGBoost模型

Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon

Comments: ICME 2025

Subjects: Multimedia (cs.MM)
[18] arXiv:2506.10001 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Semantic Communication-Enabled Cloud-Edge-End-collaborative Metaverse Services Architecure

Title: 语义通信赋能的云边端协作元宇宙服务架构

Yuxuan Li, Sheng Jinag, Bizhu Wang

Comments: arXiv admin note: text overlap with arXiv:2407.13764 by other authors

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
[19] arXiv:2506.10002 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis

Title: 基于扩散的事故视频合成的交通事故预测的等变模型：EQ-TAA

Jianwu Fang, Lei-Lei Li, Zhedong Zheng, Hongkai Yu, Jianru Xue, Zhengguo Li, Tat-Seng Chua

Comments: Accepted by IEEE-TMM

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)
[20] arXiv:2506.10003 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Integrating multimedia documents in 3D city models for a better understanding of territories

Title: 将多媒体文档集成到三维城市模型中以更好地理解领土

C.Gautier, J. Delanoy, G. Gesquière

Comments: 8 pages, 11 figures

Journal-ref: sprs-annals-X-4-W2-2022-69-2022

Subjects: Multimedia (cs.MM) ; Human-Computer Interaction (cs.HC)
[21] arXiv:2506.10004 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: Immersive Multimedia Communication: State-of-the-Art on eXtended Reality Streaming

Title: 沉浸式多媒体通信：扩展现实流媒体的最新进展

Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik

Comments: accepted by ACM Transactions on Multimedia Computing, Communications, and Applications

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Emerging Technologies (cs.ET) ; Networking and Internet Architecture (cs.NI)
[22] arXiv:2506.10006 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction

Title: 通过动态双向重建的灵活多模态输入进行HER2表达预测

Jie Qin, Wei Yang, Yan Su, Yiran Zhu, Weizhen Li, Yunyue Pan, Chengchang Pan, Honggang Qi

Comments: 8 pages,6 figures,3 tables,accepted by the 33rd ACM International Conference on Multimedia(ACM MM 2025)

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV) ; Machine Learning (cs.LG)
[23] arXiv:2506.10007 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space

Title: 通过统一多模态空间中的扩散实现可控的表达性3D面部动画

Kangwei Liu, Junwu Liu, Xiaowei Yi, Jinlin Guo, Yun Cao

Comments: Accepted by ICME2025

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV)
[24] arXiv:2506.10008 (cross-list from cs.MM) [cn-pdf, pdf, html, other]: Title: Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics

Title: 结构化图表示用于视觉叙事推理：连环漫画的分层框架

Yi-Chun Chen

Comments: This paper has been submitted to ACM Multimedia 2025 and is currently under review

Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV)
[25] arXiv:2506.10010 (cross-list from cs.MM) [cn-pdf, pdf, other]: Title: Multimodal Emotion Coupling via Speech-to-Facial and Bodily Gestures in Dyadic Interaction

Title: 双人互动中通过语音到面部及身体手势的多模态情感耦合

Von Ralph Dane Marquez Herbuela, Yukie Nagai

Subjects: Multimedia (cs.MM) ; Machine Learning (cs.LG) ; Sound (cs.SD) ; Audio and Speech Processing (eess.AS)

Total of 153 entries : 1-25 26-50 51-75 76-100 ... 151-153

Showing up to 25 entries per page: fewer | more | all