Societal Alignment Frameworks Can Improve LLM Alignment

Stańczak, Karolina; Meade, Nicholas; Bhatia, Mehar; Zhou, Hattie; Böttinger, Konstantin; Barnes, Jeremy; Stanley, Jason; Montgomery, Jessica; Zemel, Richard; Papernot, Nicolas; Chapados, Nicolas; Therien, Denis; Lillicrap, Timothy P.; Marasović, Ana; Delacroix, Sylvie; Hadfield, Gillian K.; Reddy, Siva

计算机科学 > 计算机与社会

arXiv:2503.00069 (cs)

[提交于 2025年2月27日 ]

标题：社会对齐框架可以改善大语言模型的对齐

标题： Societal Alignment Frameworks Can Improve LLM Alignment

Authors:Karolina Stańczak, Nicholas Meade, Mehar Bhatia, Hattie Zhou, Konstantin Böttinger, Jeremy Barnes, Jason Stanley, Jessica Montgomery, Richard Zemel, Nicolas Papernot, Nicolas Chapados, Denis Therien, Timothy P. Lillicrap, Ana Marasović, Sylvie Delacroix, Gillian K. Hadfield, Siva Reddy

摘要：大型语言模型（LLMs）的最新进展集中在生成符合人类期望和共享价值观的响应上——这一过程被称为对齐。然而，由于人类价值观的复杂性与旨在解决这些问题的技术方法的狭窄性之间的固有脱节，对齐LLMs仍然具有挑战性。当前的对齐方法经常导致目标定义不明确，反映了更广泛的不完整合同问题，即在模型开发者和模型之间指定涵盖所有情境的合同的不可行性。在本文中，我们认为改进LLMs对齐需要结合社会对齐框架中的见解，包括社会、经济和合同对齐，并讨论来自这些领域的潜在解决方案。鉴于社会对齐框架中的不确定性作用，我们随后研究了它在LLMs对齐中的表现形式。我们在讨论结束时提供了一种关于LLMs对齐的不同观点，将目标的不明确性视为一种机会，而不是完美定义它们。除了LLMs对齐的技术改进外，我们还讨论了参与式对齐界面设计的必要性。

摘要： Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.

主题：	计算机与社会 (cs.CY) ; 人工智能 (cs.AI); 计算与语言 (cs.CL)
引用方式：	arXiv:2503.00069 [cs.CY]
	(或者 arXiv:2503.00069v1 [cs.CY] 对于此版本)
	https://doi.org/10.48550/arXiv.2503.00069

提交历史

来自： Karolina Stańczak [查看电子邮件]
[v1] 星期四， 2025 年 2 月 27 日 13:26:07 UTC (299 KB)

计算机科学 > 计算机与社会

标题：社会对齐框架可以改善大语言模型的对齐

标题： Societal Alignment Frameworks Can Improve LLM Alignment

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机与社会

标题： 社会对齐框架可以改善大语言模型的对齐 显示英文标题

标题： Societal Alignment Frameworks Can Improve LLM Alignment

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：社会对齐框架可以改善大语言模型的对齐