Scaling Truth: The Confidence Paradox in AI Fact-Checking

Qazi, Ihsan A.; Khan, Zohaib; Ghani, Abdullah; Raza, Agha A.; Qazi, Zafar A.; Sajjad, Wassay; Ali, Ayesha; Javaid, Asher; Sohail, Muhammad Abdullah; Azeemi, Abdul H.

Computer Science > Social and Information Networks

arXiv:2509.08803 (cs)

[Submitted on 10 Sep 2025 ]

Title: Scaling Truth: The Confidence Paradox in AI Fact-Checking

Title: 缩放真实性：人工智能事实核查中的置信度悖论

Authors:Ihsan A. Qazi, Zohaib Khan, Abdullah Ghani, Agha A. Raza, Zafar A. Qazi, Wassay Sajjad, Ayesha Ali, Asher Javaid, Muhammad Abdullah Sohail, Abdul H. Azeemi

Abstract: The rise of misinformation underscores the need for scalable and reliable fact-checking solutions. Large language models (LLMs) hold promise in automating fact verification, yet their effectiveness across global contexts remains uncertain. We systematically evaluate nine established LLMs across multiple categories (open/closed-source, multiple sizes, diverse architectures, reasoning-based) using 5,000 claims previously assessed by 174 professional fact-checking organizations across 47 languages. Our methodology tests model generalizability on claims postdating training cutoffs and four prompting strategies mirroring both citizen and professional fact-checker interactions, with over 240,000 human annotations as ground truth. Findings reveal a concerning pattern resembling the Dunning-Kruger effect: smaller, accessible models show high confidence despite lower accuracy, while larger models demonstrate higher accuracy but lower confidence. This risks systemic bias in information verification, as resource-constrained organizations typically use smaller models. Performance gaps are most pronounced for non-English languages and claims originating from the Global South, threatening to widen existing information inequalities. These results establish a multilingual benchmark for future research and provide an evidence base for policy aimed at ensuring equitable access to trustworthy, AI-assisted fact-checking.

Abstract: 虚假信息的兴起凸显了对可扩展且可靠的事实核查解决方案的需求。大型语言模型（LLMs）在自动化事实验证方面具有潜力，但它们在全球情境中的有效性仍不确定。我们使用5000条之前由47种语言中174个专业事实核查组织评估过的声明，系统地评估了九种已建立的LLMs，这些模型涵盖多个类别（开源/闭源、多种规模、多样化架构、基于推理的）。我们的方法测试了模型在训练截止日期之后的声明的泛化能力，以及四种提示策略，这些策略模拟了普通公民和专业事实核查员的互动，以超过240,000个人标注作为真实数据。研究结果揭示了一种令人担忧的模式，类似于邓宁-克鲁格效应：较小且易于使用的模型表现出较高的信心，但准确性较低，而较大的模型则表现出较高的准确性但较低的信心。这可能导致信息验证中的系统性偏差，因为资源有限的组织通常使用较小的模型。性能差距在非英语语言和来自全球南方的声明中最为明显，这可能加剧现有的信息不平等。这些结果为未来的研究建立了多语言基准，并为旨在确保公平获取可信AI辅助事实核查的政策提供了实证基础。

Comments:	65 pages, 26 figures, 6 tables
Subjects:	Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Cite as:	arXiv:2509.08803 [cs.SI]
	(or arXiv:2509.08803v1 [cs.SI] for this version)
	https://doi.org/10.48550/arXiv.2509.08803

Submission history

From: Abdullah Ghani [view email]
[v1] Wed, 10 Sep 2025 17:36:25 UTC (11,248 KB)

Computer Science > Social and Information Networks

Title: Scaling Truth: The Confidence Paradox in AI Fact-Checking

Title: 缩放真实性：人工智能事实核查中的置信度悖论

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Social and Information Networks

Title: Scaling Truth: The Confidence Paradox in AI Fact-Checking Show Chinese title

Title: 缩放真实性：人工智能事实核查中的置信度悖论

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Scaling Truth: The Confidence Paradox in AI Fact-Checking