Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Heitzig, Jobst; Potham, Ram

计算机科学 > 人工智能

arXiv:2508.00159 (cs)

[提交于 2025年7月31日 (v1) ，最后修订 2025年8月4日 (此版本， v2)]

标题：基于模型的长期人类功率的合适度量的软最大化

标题： Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Authors:Jobst Heitzig, Ram Potham

摘要：权力是人工智能安全中的一个关键概念：追求权力作为工具性目标，人类突然或逐渐失去权力，人机交互中权力平衡以及国际人工智能治理中的权力平衡。同时，权力作为追求多样化目标的能力，对福祉至关重要。本文探讨了通过明确强制人工智能代理增强人类权力，并以理想的方式管理人类与人工智能代理之间的权力平衡，从而促进安全和福祉的想法。我们采用一种有原则的、部分公理化的方法，设计了一个可参数化和可分解的目标函数，该函数代表了人类权力的不平等和风险厌恶的长期总和。它考虑到人类的有限理性和社会规范，并且最关键的是，考虑了各种可能的人类目标。我们通过向后归纳法计算该度量标准，或者通过从给定的世界模型中进行多智能体强化学习的一种形式来近似它。我们在各种典型的场景中举例说明（软性）最大化此度量标准的后果，并描述它可能会产生的工具性子目标。我们谨慎评估认为，软性最大化人类权力的适当总和度量可能构成一种有益于代理型人工智能系统的客观目标，比直接基于效用的目标更安全。

摘要： Power is a key concept in AI safety: power-seeking as an instrumental goal, sudden or gradual disempowerment of humans, power balance in human-AI interaction and international AI governance. At the same time, power as the ability to pursue diverse goals is essential for wellbeing. This paper explores the idea of promoting both safety and wellbeing by forcing AI agents explicitly to empower humans and to manage the power balance between humans and AI agents in a desirable way. Using a principled, partially axiomatic approach, we design a parametrizable and decomposable objective function that represents an inequality- and risk-averse long-term aggregate of human power. It takes into account humans' bounded rationality and social norms, and, crucially, considers a wide variety of possible human goals. We derive algorithms for computing that metric by backward induction or approximating it via a form of multi-agent reinforcement learning from a given world model. We exemplify the consequences of (softly) maximizing this metric in a variety of paradigmatic situations and describe what instrumental sub-goals it will likely imply. Our cautious assessment is that softly maximizing suitable aggregate metrics of human power might constitute a beneficial objective for agentic AI systems that is safer than direct utility-based objectives.

主题：	人工智能 (cs.AI) ; 计算机与社会 (cs.CY); 机器学习 (cs.LG); 理论经济学 (econ.TH); 优化与控制 (math.OC)
MSC 类：	68Txx
ACM 类：	I.2
引用方式：	arXiv:2508.00159 [cs.AI]
	(或者 arXiv:2508.00159v2 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.00159

提交历史

来自： Jobst Heitzig [查看电子邮件]
[v1] 星期四， 2025 年 7 月 31 日 20:56:43 UTC (1,867 KB)
[v2] 星期一， 2025 年 8 月 4 日 21:59:37 UTC (1,868 KB)

计算机科学 > 人工智能

标题：基于模型的长期人类功率的合适度量的软最大化

标题： Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： 基于模型的长期人类功率的合适度量的软最大化 显示英文标题

标题： Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于模型的长期人类功率的合适度量的软最大化