Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

Jia, Jingru; Yuan, Zehua; Pan, Junhao; McNamara, Paul E.; Chen, Deming

计算机科学 > 人工智能

arXiv:2406.05972 (cs)

[提交于 2024年6月10日 (v1) ，最后修订 2024年11月1日 (此版本， v2)]

标题：不确定情境下大语言模型决策行为评估框架

标题： Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

Authors:Jingru Jia, Zehua Yuan, Junhao Pan, Paul E. McNamara, Deming Chen

摘要：在不确定性下做出决策时，个体常常偏离理性行为，这可以从三个维度进行评估：风险偏好、概率加权和损失厌恶。鉴于大型语言模型（LLMs）在决策过程中的广泛应用，评估其行为是否符合人类规范和伦理期望，或是否存在潜在偏见至关重要。一些实证研究已经调查了LLMs的理性和社会行为表现，但它们内部的决策倾向和能力仍缺乏充分理解。本文提出一个基于行为经济学的框架，用于评估LLMs的决策行为。通过一个多选列表实验，我们在无上下文的环境中估计了三种商业LLMs的风险偏好、概率加权和损失厌恶程度： ChatGPT-4.0-Turbo、Claude-3-Opus和Gemini-1.0-pro。我们的结果表明， LLMs通常表现出与人类相似的模式，如风险厌恶和损失厌恶，并倾向于高估小概率。然而，这些行为在不同LLMs中的表现程度存在显著差异。我们还探讨了在嵌入社会人口特征时它们的行为，发现了显著的差异。例如，当模拟性少数群体或身体残疾的属性时，Claude-3-Opus表现出更高的风险厌恶，导致更保守的选择。这些发现强调了在将LLMs应用于决策场景时，需要仔细考虑其伦理影响和潜在偏见。因此，本研究主张制定标准和指南，以确保LLMs在复杂决策环境中的伦理边界内运行，同时提高其实用性。

摘要： When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potential biases. Several empirical studies have investigated the rationality and social behavior performance of LLMs, yet their internal decision-making tendencies and capabilities remain inadequately understood. This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of LLMs. Through a multiple-choice-list experiment, we estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro. Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities. However, there are significant variations in the degree to which these behaviors are expressed across different LLMs. We also explore their behavior when embedded with socio-demographic features, uncovering significant disparities. For instance, when modeled with attributes of sexual minority groups or physical disabilities, Claude-3-Opus displays increased risk aversion, leading to more conservative choices. These findings underscore the need for careful consideration of the ethical implications and potential biases in deploying LLMs in decision-making scenarios. Therefore, this study advocates for developing standards and guidelines to ensure that LLMs operate within ethical boundaries while enhancing their utility in complex decision-making environments.

评论：	贾静如和袁泽华贡献相同
主题：	人工智能 (cs.AI) ; 计算机与社会 (cs.CY); 人机交互 (cs.HC); 机器学习 (cs.LG); 理论经济学 (econ.TH)
引用方式：	arXiv:2406.05972 [cs.AI]
	(或者 arXiv:2406.05972v2 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2406.05972

提交历史

来自： Jingru Jia [查看电子邮件]
[v1] 星期一， 2024 年 6 月 10 日 02:14:19 UTC (1,211 KB)
[v2] 星期五， 2024 年 11 月 1 日 00:50:56 UTC (2,187 KB)

计算机科学 > 人工智能

标题：不确定情境下大语言模型决策行为评估框架

标题： Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： 不确定情境下大语言模型决策行为评估框架 显示英文标题

标题： Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：不确定情境下大语言模型决策行为评估框架