Efficient Online Random Sampling via Randomness Recycling

Draper, Thomas L.; Saad, Feras A.

计算机科学 > 数据结构与算法

arXiv:2505.18879 (cs)

[提交于 2025年5月24日 (v1) ，最后修订 2025年7月17日 (此版本， v2)]

标题：通过随机性再利用的高效在线随机抽样

标题： Efficient Online Random Sampling via Randomness Recycling

Authors:Thomas L. Draper, Feras A. Saad

摘要： “随机性再利用”是一种强大的算法技术，用于重新使用概率算法消耗的随机信息的一部分，以减少其熵需求。本文提出了一类随机性再利用算法，用于高效采样一个服从任意随机过程的离散随机变量序列$X_1, X_2, X_3, \dots$。我们开发了随机性再利用技术，以降低多种著名采样算法的熵成本，这些算法包括均匀采样、逆变换采样、查找表采样、别名采样和离散分布生成（DDG）树采样。我们的方法在使用$O(\log(1/\varepsilon))$空间时，每输出样本的期望摊还熵成本为$H(X_1,\dots,X_k)/k + \varepsilon$输入位，这与$k\to\infty$的最优香农熵率$H(X_1,\dots,X_k)/k$位每样本非常接近。我们方法的空间、时间和熵特性相结合，优于Knuth和Yao的熵最优算法以及Han和Hoshi的区间算法，用于采样离散随机序列。在实验方面，我们展示了当使用密码学安全的伪随机数生成器时，随机性再利用能够实现Fisher-Yates洗牌的最先进运行时性能；它还可以加速离散高斯采样器。随文附带了一个高性能的C语言软件库，该库使用随机性再利用来加速几种现有的随机采样算法。

摘要： ``Randomness recycling'' is a powerful algorithmic technique for reusing a fraction of the random information consumed by a probabilistic algorithm to reduce its entropy requirements. This article presents a family of randomness recycling algorithms for efficiently sampling a sequence $X_1, X_2, X_3, \dots$ of discrete random variables whose joint distribution follows an arbitrary stochastic process. We develop randomness recycling techniques to reduce the entropy cost of a variety of prominent sampling algorithms, which include uniform sampling, inverse transform sampling, lookup-table sampling, alias sampling, and discrete distribution generating (DDG) tree sampling. Our method achieves an expected amortized entropy cost of $H(X_1,\dots,X_k)/k + \varepsilon$ input bits per output sample using $O(\log(1/\varepsilon))$ space as $k\to\infty$, which is arbitrarily close to the optimal Shannon entropy rate of $H(X_1,\dots,X_k)/k$ bits per sample. The combination of space, time, and entropy properties of our method improves upon the Knuth and Yao entropy-optimal algorithm and Han and Hoshi interval algorithm for sampling a discrete random sequence. On the empirical side, we show that randomness recycling enables state-of-the-art runtime performance on the Fisher-Yates shuffle when using a cryptographically secure pseudorandom number generator; and it can also speed up discrete Gaussian samplers. Accompanying the manuscript is a performant software library in the C programming language that uses randomness recycling to accelerate several existing algorithms for random sampling.

评论：	35页，9图，2表，14算法
主题：	数据结构与算法 (cs.DS) ; 离散数学 (cs.DM); 信息论 (cs.IT); 概率 (math.PR); 计算 (stat.CO)
引用方式：	arXiv:2505.18879 [cs.DS]
	(或者 arXiv:2505.18879v2 [cs.DS] 对于此版本)
	https://doi.org/10.48550/arXiv.2505.18879

提交历史

来自： Feras Saad [查看电子邮件]
[v1] 星期六， 2025 年 5 月 24 日 21:34:08 UTC (229 KB)
[v2] 星期四， 2025 年 7 月 17 日 18:39:50 UTC (7,500 KB)

计算机科学 > 数据结构与算法

标题：通过随机性再利用的高效在线随机抽样

标题： Efficient Online Random Sampling via Randomness Recycling

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 数据结构与算法

标题： 通过随机性再利用的高效在线随机抽样 显示英文标题

标题： Efficient Online Random Sampling via Randomness Recycling

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：通过随机性再利用的高效在线随机抽样