Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective

Loghmani, Erfan

计算机科学 > 机器学习

arXiv:2506.00152 (cs)

[提交于 2025年5月30日 ]

标题：用观测数据对齐语言模型：从因果角度的机会与风险

标题： Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective

Authors:Erfan Loghmani

摘要：大型语言模型正在各行各业中被广泛使用，以生成直接有助于关键绩效指标（如转化率）的内容。然而，预训练模型往往难以与人类偏好保持一致或优化业务目标。因此，利用高质量标注数据进行微调对于引导模型生成更优内容至关重要。控制实验，如 A/B 测试，可以提供此类数据，但它们通常成本高昂，并伴随着显著的工程和后勤挑战。与此同时，公司拥有大量未充分利用的历史（观察性）数据。在这项工作中，我们研究了利用观察性数据微调大型语言模型所面临的挑战和机遇。我们表明，尽管观察性结果可以提供有价值的监督信号，但直接基于这些数据微调模型可能导致其学习到虚假的相关性。我们使用各种真实世界的数据集展示了这一问题，并提出了 DeconfoundLM 方法，该方法明确从奖励信号中去除已知混杂因素的影响。通过模拟实验，我们证明 DeconfoundLM 提高了因果关系的恢复能力，并缓解了忽视或简单地包含混杂变量的微调方法中出现的失败模式。我们的发现强调了虽然观察性数据存在风险，但在正确的因果修正下，它可以成为引导大型语言模型的强大信号来源。有关代码及相关资源，请参阅项目页面。

摘要： Large language models are being widely used across industries to generate content that contributes directly to key performance metrics, such as conversion rates. Pretrained models, however, often fall short when it comes to aligning with human preferences or optimizing for business objectives. As a result, fine-tuning with good-quality labeled data is essential to guide models to generate content that achieves better results. Controlled experiments, like A/B tests, can provide such data, but they are often expensive and come with significant engineering and logistical challenges. Meanwhile, companies have access to a vast amount of historical (observational) data that remains underutilized. In this work, we study the challenges and opportunities of fine-tuning LLMs using observational data. We show that while observational outcomes can provide valuable supervision, directly fine-tuning models on such data can lead them to learn spurious correlations. We present empirical evidence of this issue using various real-world datasets and propose DeconfoundLM, a method that explicitly removes the effect of known confounders from reward signals. Using simulation experiments, we demonstrate that DeconfoundLM improves the recovery of causal relationships and mitigates failure modes found in fine-tuning methods that ignore or naively incorporate confounding variables. Our findings highlight that while observational data presents risks, with the right causal corrections, it can be a powerful source of signal for LLM alignment. Please refer to the project page for code and related resources.

评论：	10+12页，8幅图
主题：	机器学习 (cs.LG) ; 计量经济学 (econ.EM); 机器学习 (stat.ML)
ACM 类：	I.2.6; I.2.7; H.4.0; J.4
引用方式：	arXiv:2506.00152 [cs.LG]
	(或者 arXiv:2506.00152v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.00152

提交历史

来自： Erfan Loghmani [查看电子邮件]
[v1] 星期五， 2025 年 5 月 30 日 18:44:09 UTC (190 KB)

计算机科学 > 机器学习

标题：用观测数据对齐语言模型：从因果角度的机会与风险

标题： Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 用观测数据对齐语言模型：从因果角度的机会与风险 显示英文标题

标题： Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：用观测数据对齐语言模型：从因果角度的机会与风险