The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Zhou, Chulun; Wang, Qiujing; Yu, Mo; Yue, Xiaoqian; Lu, Rui; Li, Jiangnan; Zhou, Yifan; Zhang, Shunchi; Zhou, Jie; Lam, Wai

计算机科学 > 计算与语言

arXiv:2501.01705v2 (cs)

[提交于 2025年1月3日 (v1) ，最后修订 2025年4月9日 (此版本， v2)]

标题：心智理论中情境理解的本质：基于故事角色的问题回答研究

标题： The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Authors:Chulun Zhou, Qiujing Wang, Mo Yu, Xiaoqian Yue, Rui Lu, Jiangnan Li, Yifan Zhou, Shunchi Zhang, Jie Zhou, Wai Lam

摘要：理论-心智（ToM）是一种基本的心理能力，使人类能够理解和解释他人的心理状态。人类通过整合来自广泛上下文信息的因果线索和间接线索来推断他人的想法，这些信息通常来自于过去的互动。换句话说，人类的ToM在很大程度上依赖于对他人背景和生活故事的理解。不幸的是，由于现有评估机器ToM能力的基准测试使用的是没有全局上下文的短篇叙事，尤其是角色的个人背景，因此这一方面被很大程度上忽视了。在本文中，我们验证了在ToM中对个人背景进行全面上下文理解的重要性，并评估了大语言模型（LLMs）在这些复杂场景中的表现。为了实现这一目标，我们引入了CharToM基准测试，该测试包含基于经典小说角色的1,035个ToM问题。我们的研究表明，存在显著的性能差异：同一组受过教育的参与者在读过小说时的表现明显优于未读过小说时的表现。同时，我们在最先进的LLMs（包括最近的o1和DeepSeek-R1模型）上的实验表明，尽管这些模型在预训练期间已经接触过这些故事，但它们的表现仍然明显逊色于人类。这突显了当前LLMs在捕捉ToM推理所需细微上下文信息方面的局限性。

摘要： Theory-of-Mind (ToM) is a fundamental psychological capability that allows humans to understand and interpret the mental states of others. Humans infer others' thoughts by integrating causal cues and indirect clues from broad contextual information, often derived from past interactions. In other words, human ToM heavily relies on the understanding about the backgrounds and life stories of others. Unfortunately, this aspect is largely overlooked in existing benchmarks for evaluating machines' ToM capabilities, due to their usage of short narratives without global context, especially personal background of characters. In this paper, we verify the importance of comprehensive contextual understanding about personal backgrounds in ToM and assess the performance of LLMs in such complex scenarios. To achieve this, we introduce CharToM benchmark, comprising 1,035 ToM questions based on characters from classic novels. Our human study reveals a significant disparity in performance: the same group of educated participants performs dramatically better when they have read the novels compared to when they have not. In parallel, our experiments on state-of-the-art LLMs, including the very recent o1 and DeepSeek-R1 models, show that LLMs still perform notably worse than humans, despite that they have seen these stories during pre-training. This highlights the limitations of current LLMs in capturing the nuanced contextual information required for ToM reasoning.

评论：	20页
主题：	计算与语言 (cs.CL) ; 人工智能 (cs.AI)
引用方式：	arXiv:2501.01705 [cs.CL]
	(或者 arXiv:2501.01705v2 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.01705

提交历史

来自： Chulun Zhou [查看电子邮件]
[v1] 星期五， 2025 年 1 月 3 日 09:04:45 UTC (583 KB)
[v2] 星期三， 2025 年 4 月 9 日 08:36:10 UTC (745 KB)

计算机科学 > 计算与语言

标题：心智理论中情境理解的本质：基于故事角色的问题回答研究

标题： The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 心智理论中情境理解的本质：基于故事角色的问题回答研究 显示英文标题

标题： The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：心智理论中情境理解的本质：基于故事角色的问题回答研究