TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

Yan, Zehong; Qi, Peng; Hsu, Wynne; Lee, Mong Li

计算机科学 > 计算机视觉与模式识别

arXiv:2509.04448v1 (cs)

[提交于 2025年9月4日 ]

标题： TRUST-VL：一种用于通用多模态虚假信息检测的可解释新闻助手

标题： TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

Authors:Zehong Yan, Peng Qi, Wynne Hsu, Mong Li Lee

摘要：多模态虚假信息，包括文本、视觉和跨模态的扭曲，正日益成为一种社会威胁，而生成式AI进一步加剧了这一威胁。现有方法通常专注于一种类型的扭曲，并难以推广到未见过的场景。在本工作中，我们观察到不同的扭曲类型具有共同的推理能力，同时还需要特定任务的技能。我们假设在不同扭曲类型之间的联合训练有助于知识共享并增强模型的泛化能力。为此，我们引入了TRUST-VL，这是一个统一且可解释的视觉-语言模型，用于一般的多模态虚假信息检测。 TRUST-VL结合了一个新颖的问答感知视觉增强模块，旨在提取特定任务的视觉特征。为了支持训练，我们还构建了 TRUST-Instruct，一个包含198K个样本的大规模指令数据集，其中包含与人类事实核查工作流程对齐的结构化推理链。在领域内和零样本基准上的广泛实验表明，TRUST-VL实现了最先进性能，同时提供了强大的泛化能力和可解释性。

摘要： Multimodal misinformation, encompassing textual, visual, and cross-modal distortions, poses an increasing societal threat that is amplified by generative AI. Existing methods typically focus on a single type of distortion and struggle to generalize to unseen scenarios. In this work, we observe that different distortion types share common reasoning capabilities while also requiring task-specific skills. We hypothesize that joint training across distortion types facilitates knowledge sharing and enhances the model's ability to generalize. To this end, we introduce TRUST-VL, a unified and explainable vision-language model for general multimodal misinformation detection. TRUST-VL incorporates a novel Question-Aware Visual Amplifier module, designed to extract task-specific visual features. To support training, we also construct TRUST-Instruct, a large-scale instruction dataset containing 198K samples featuring structured reasoning chains aligned with human fact-checking workflows. Extensive experiments on both in-domain and zero-shot benchmarks demonstrate that TRUST-VL achieves state-of-the-art performance, while also offering strong generalization and interpretability.

评论：	EMNLP 2025；项目主页：https://yanzehong.github.io/trust-vl/
主题：	计算机视觉与模式识别 (cs.CV) ; 多媒体 (cs.MM)
引用方式：	arXiv:2509.04448 [cs.CV]
	(或者 arXiv:2509.04448v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.04448

提交历史

来自： Zehong Yan [查看电子邮件]
[v1] 星期四， 2025 年 9 月 4 日 17:59:43 UTC (15,340 KB)

计算机科学 > 计算机视觉与模式识别

标题： TRUST-VL：一种用于通用多模态虚假信息检测的可解释新闻助手

标题： TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： TRUST-VL：一种用于通用多模态虚假信息检测的可解释新闻助手 显示英文标题

标题： TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： TRUST-VL：一种用于通用多模态虚假信息检测的可解释新闻助手