Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

Chi, Haoang; Li, He; Yang, Wenjing; Liu, Feng; Lan, Long; Ren, Xiaoguang; Liu, Tongliang; Han, Bo

计算机科学 > 人工智能

arXiv:2506.21215 (cs)

[提交于 2025年6月26日 ]

标题：揭示大型语言模型中的因果推理：现实还是幻象？

标题： Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

Authors:Haoang Chi, He Li, Wenjing Yang, Feng Liu, Long Lan, Xiaoguang Ren, Tongliang Liu, Bo Han

摘要：因果推理能力在推动大型语言模型（LLMs）向强人工智能发展方面至关重要。尽管多功能的LLMs似乎展示了理解上下文因果关系和提供符合因果定律的响应的能力，但尚不清楚它们是否能像人类一样进行真正的因果推理。然而，目前的证据表明情况恰恰相反。具体来说，LLMs只能进行浅层（第1级）因果推理，这主要归因于其参数中嵌入的因果知识，但它们缺乏真正的人类式（第2级）因果推理能力。为了支持这一假设，从方法论上，我们深入研究了基于变压器的LLMs的自回归机制，揭示出它本身并不具有因果性。实证上，我们引入了一个新的因果问答基准CausalProbe-2024，其语料库对所研究的LLMs来说是新鲜且几乎未见过的。与早期基准相比，LLMs在CausalProbe-2024上的表现显著下降，这表明它们主要进行的是第1级因果推理。为了弥合通往第2级因果推理的差距，我们受到一个事实的启发，即人类推理通常由通用知识和既定目标所促进。我们提出了G^2-Reasoner，一种将通用知识和目标导向提示整合到LLMs因果推理过程中的方法。实验表明，G^2-Reasoner显著增强了LLMs的因果推理能力，尤其是在新鲜和反事实情境中。这项工作为LLMs迈向真正的因果推理指明了一条新路径，超越第1级并朝着第2级迈进。

摘要： Causal reasoning capability is critical in advancing large language models (LLMs) toward strong artificial intelligence. While versatile LLMs appear to have demonstrated capabilities in understanding contextual causality and providing responses that obey the laws of causality, it remains unclear whether they perform genuine causal reasoning akin to humans. However, current evidence indicates the contrary. Specifically, LLMs are only capable of performing shallow (level-1) causal reasoning, primarily attributed to the causal knowledge embedded in their parameters, but they lack the capacity for genuine human-like (level-2) causal reasoning. To support this hypothesis, methodologically, we delve into the autoregression mechanism of transformer-based LLMs, revealing that it is not inherently causal. Empirically, we introduce a new causal Q&A benchmark called CausalProbe-2024, whose corpora are fresh and nearly unseen for the studied LLMs. The LLMs exhibit a significant performance drop on CausalProbe-2024 compared to earlier benchmarks, indicating the fact that they primarily engage in level-1 causal reasoning. To bridge the gap towards level-2 causal reasoning, we draw inspiration from the fact that human reasoning is usually facilitated by general knowledge and intended goals. We propose G^2-Reasoner, a method that incorporates general knowledge and goal-oriented prompts into LLMs' causal reasoning processes. Experiments demonstrate that G^2-Reasoner significantly enhances LLMs' causal reasoning capability, particularly in fresh and counterfactual contexts. This work sheds light on a new path for LLMs to advance towards genuine causal reasoning, going beyond level-1 and making strides towards level-2.

评论：	24页，被NeurIPS 2024接收
主题：	人工智能 (cs.AI) ; 计算与语言 (cs.CL); 机器学习 (cs.LG)
引用方式：	arXiv:2506.21215 [cs.AI]
	(或者 arXiv:2506.21215v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.21215
期刊参考：	Advances in Neural Information Processing Systems, 2024, 37: 96640-96670

提交历史

来自： He Li [查看电子邮件]
[v1] 星期四， 2025 年 6 月 26 日 13:11:01 UTC (1,040 KB)

计算机科学 > 人工智能

标题：揭示大型语言模型中的因果推理：现实还是幻象？

标题： Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： 揭示大型语言模型中的因果推理：现实还是幻象？ 显示英文标题

标题： Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：揭示大型语言模型中的因果推理：现实还是幻象？