Approximate Top-$k$ for Increased Parallelism

Key, Oscar; Ribar, Luka; Cattaneo, Alberto; Hudlass-Galley, Luke; Orr, Douglas

计算机科学 > 机器学习

arXiv:2412.04358 (cs)

[提交于 2024年12月5日 ]

标题：近似顶部-$k$以增加并行性

标题： Approximate Top-$k$ for Increased Parallelism

Authors:Oscar Key, Luka Ribar, Alberto Cattaneo, Luke Hudlass-Galley, Douglas Orr

摘要：我们对分桶近似top-$k$算法进行评估。计算top-$k$的精确值受到并行性的限制，因为必须沿向量聚合$k$个最大值，因此不适合在高度并行的机器学习加速器上进行计算。通过放松对top-$k$必须精确的要求，分桶算法可以通过独立计算许多较小的top-$k$操作来显著增加可用的并行性。我们通过理论分析和下游任务的实证评估来探索这类算法的设计选择。我们的动机示例是语言模型中的稀疏性算法，这些算法通常使用top-$k$来选择最重要的参数或激活值。我们还为PyTorch发布了一个快速的分桶top-$k$实现。

摘要： We present an evaluation of bucketed approximate top-$k$ algorithms. Computing top-$k$ exactly suffers from limited parallelism, because the $k$ largest values must be aggregated along the vector, thus is not well suited to computation on highly-parallel machine learning accelerators. By relaxing the requirement that the top-$k$ is exact, bucketed algorithms can dramatically increase the parallelism available by independently computing many smaller top-$k$ operations. We explore the design choices of this class of algorithms using both theoretical analysis and empirical evaluation on downstream tasks. Our motivating examples are sparsity algorithms for language models, which often use top-$k$ to select the most important parameters or activations. We also release a fast bucketed top-$k$ implementation for PyTorch.

主题：	机器学习 (cs.LG)
引用方式：	arXiv:2412.04358 [cs.LG]
	(或者 arXiv:2412.04358v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2412.04358

提交历史

来自： Oscar Key [查看电子邮件]
[v1] 星期四， 2024 年 12 月 5 日 17:17:28 UTC (3,788 KB)

计算机科学 > 机器学习

标题：近似顶部-$k$以增加并行性

标题： Approximate Top-$k$ for Increased Parallelism

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 近似顶部-$k$以增加并行性 显示英文标题

标题： Approximate Top-$k$ for Increased Parallelism

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：近似顶部-$k$以增加并行性