Soft Best-of-n Sampling for Model Alignment

Verdun, Claudio Mayrink; Oesterling, Alex; Lakkaraju, Himabindu; Calmon, Flavio P.

计算机科学 > 信息论

arXiv:2505.03156 (cs)

[提交于 2025年5月6日 ]

标题：软最佳-n采样用于模型对齐

标题： Soft Best-of-n Sampling for Model Alignment

Authors:Claudio Mayrink Verdun, Alex Oesterling, Himabindu Lakkaraju, Flavio P. Calmon

摘要： Best-of-$n$ (BoN) 抽样是一种实用的方法，用于使语言模型的输出与人类偏好对齐，而无需昂贵的微调。 BoN 抽样通过为一个提示生成$n$个响应，然后选择最大化奖励函数的样本来执行。根据采样分布和原始分布之间的 KL 散度测量，BoN 在实践中以失真成本为代价获得高奖励值。这种失真通过改变样本数量大致得到控制：更大的$n$会在更高的失真成本下产生更高的奖励。我们引入了 Soft Best-of-$n$ 抽样，这是 BoN 的一种泛化，它允许通过温度参数$\lambda$在原始分布和奖励最大化的分布之间平滑插值。我们建立了理论保证，表明 Soft Best-of-$n$ 抽样以$O(1/n)$ 的速率在 KL 散度和期望（相对）奖励方面急剧收敛到最优倾斜分布。对于离散输出序列，我们分析了一个加性奖励模型，揭示了分块抽样的基本局限性。

摘要： Best-of-$n$ (BoN) sampling is a practical approach for aligning language model outputs with human preferences without expensive fine-tuning. BoN sampling is performed by generating $n$ responses to a prompt and then selecting the sample that maximizes a reward function. BoN yields high reward values in practice at a distortion cost, as measured by the KL-divergence between the sampled and original distribution. This distortion is coarsely controlled by varying the number of samples: larger $n$ yields a higher reward at a higher distortion cost. We introduce Soft Best-of-$n$ sampling, a generalization of BoN that allows for smooth interpolation between the original distribution and reward-maximizing distribution through a temperature parameter $\lambda$. We establish theoretical guarantees showing that Soft Best-of-$n$ sampling converges sharply to the optimal tilted distribution at a rate of $O(1/n)$ in KL and the expected (relative) reward. For sequences of discrete outputs, we analyze an additive reward model that reveals the fundamental limitations of blockwise sampling.

评论：	被接受在2025年IEEE信息理论国际研讨会（ISIT 2025）上展示
主题：	信息论 (cs.IT) ; 人工智能 (cs.AI)
引用方式：	arXiv:2505.03156 [cs.IT]
	(或者 arXiv:2505.03156v1 [cs.IT] 对于此版本)
	https://doi.org/10.48550/arXiv.2505.03156

提交历史

来自： Alex Oesterling [查看电子邮件]
[v1] 星期二， 2025 年 5 月 6 日 04:03:11 UTC (107 KB)

计算机科学 > 信息论

标题：软最佳-n采样用于模型对齐

标题： Soft Best-of-n Sampling for Model Alignment

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 信息论

标题： 软最佳-n采样用于模型对齐 显示英文标题

标题： Soft Best-of-n Sampling for Model Alignment

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：软最佳-n采样用于模型对齐