The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games

Mozikov, Mikhail; Severin, Nikita; Bodishtianu, Valeria; Glushanina, Maria; Baklashkin, Mikhail; Savchenko, Andrey V.; Makarov, Ilya

计算机科学 > 人工智能

arXiv:2406.03299 (cs)

[提交于 2024年6月5日 ]

标题：好的、坏的和绿巨人般的GPT：分析大型语言模型在合作和讨价还价游戏中的情感决策

标题： The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games

Authors:Mikhail Mozikov, Nikita Severin, Valeria Bodishtianu, Maria Glushanina, Mikhail Baklashkin, Andrey V. Savchenko, Ilya Makarov

摘要：行为研究实验是社会建模和理解人类互动的重要组成部分。在实践中，许多行为实验由于人类用户研究中社会互动和合作的复杂性，面临内部和外部效度、可重复性和社会偏见等挑战。大型语言模型（LLMs）的最新进展为研究人员提供了模拟人类行为的新有前景的工具。然而，现有的基于LLM的模拟在未经证实的假设下运行，即LLM代理的行为与人类相似，并且忽略了人类决策中的一个关键因素：情绪。在本文中，我们介绍了一种新的方法和框架，用于研究LLM的决策过程以及它们在情绪状态下的与人类行为的一致性。在两个不同类别的行为博弈理论中的四种游戏上对GPT-3.5和GPT-4进行的实验表明，情绪深刻影响LLM的性能，导致更优策略的发展。虽然GPT-3.5的行为反应与人类参与者之间存在强烈的对应关系，特别是在谈判游戏中尤为明显， GPT-4表现出一致的行为，为了理性决策而忽略诱导的情绪。令人惊讶的是，情绪提示，特别是“愤怒”情绪，可以破坏GPT-4的“超人类”一致性，类似于人类的情绪反应。

摘要： Behavior study experiments are an important part of society modeling and understanding human interactions. In practice, many behavioral experiments encounter challenges related to internal and external validity, reproducibility, and social bias due to the complexity of social interactions and cooperation in human user studies. Recent advances in Large Language Models (LLMs) have provided researchers with a new promising tool for the simulation of human behavior. However, existing LLM-based simulations operate under the unproven hypothesis that LLM agents behave similarly to humans as well as ignore a crucial factor in human decision-making: emotions. In this paper, we introduce a novel methodology and the framework to study both, the decision-making of LLMs and their alignment with human behavior under emotional states. Experiments with GPT-3.5 and GPT-4 on four games from two different classes of behavioral game theory showed that emotions profoundly impact the performance of LLMs, leading to the development of more optimal strategies. While there is a strong alignment between the behavioral responses of GPT-3.5 and human participants, particularly evident in bargaining games, GPT-4 exhibits consistent behavior, ignoring induced emotions for rationality decisions. Surprisingly, emotional prompting, particularly with `anger' emotion, can disrupt the "superhuman" alignment of GPT-4, resembling human emotional responses.

主题：	人工智能 (cs.AI) ; 计算与语言 (cs.CL)
ACM 类：	I.2.7; J.4
引用方式：	arXiv:2406.03299 [cs.AI]
	(或者 arXiv:2406.03299v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2406.03299

提交历史

来自： Ilya Makarov [查看电子邮件]
[v1] 星期三， 2024 年 6 月 5 日 14:08:54 UTC (2,837 KB)

计算机科学 > 人工智能

标题：好的、坏的和绿巨人般的GPT：分析大型语言模型在合作和讨价还价游戏中的情感决策

标题： The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： 好的、坏的和绿巨人般的GPT：分析大型语言模型在合作和讨价还价游戏中的情感决策 显示英文标题

标题： The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：好的、坏的和绿巨人般的GPT：分析大型语言模型在合作和讨价还价游戏中的情感决策