Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine

Kashi, Aditya; Koukpaizan, Nicholson; Lu, Hao; Matheson, Michael; Oral, Sarp; Wang, Feiyi

计算机科学 > 分布式、并行与集群计算

arXiv:2507.11512 (cs)

[提交于 2025年7月15日 ]

标题：使用混合精度跨越内存墙 -- 在百亿亿次机器上的HPG-MxP

标题： Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine

Authors:Aditya Kashi, Nicholson Koukpaizan, Hao Lu, Michael Matheson, Sarp Oral, Feiyi Wang

摘要：混合精度算法被提出作为一种方式，使科学计算能够从最近高性能计算（HPC）平台上看到的人工智能（AI）的一些优势中受益。一些以密集矩阵运算为主的应用程序通过利用低精度格式如FP16已经看到了显著的加速。然而，大多数科学模拟应用程序受内存带宽限制。除了初步研究外，在给定的HPC系统上使用混合精度算法的实际收益仍然 largely 不清楚。高性能GMRES混合精度（HPG-MxP）基准已被提出，以衡量HPC系统在基于稀疏矩阵的混合精度应用程序中的有用性能。在这项工作中，我们为exascale系统提出了HPG-MxP基准的高效优化实现，并描述了我们的算法增强。我们首次展示了在现代基于GPU的超级计算机上结合双精度和单精度使用的1.6倍加速。

摘要： Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision formats such as FP16. However, a majority of scientific simulation applications are memory bandwidth limited. Beyond preliminary studies, the practical gain from using mixed-precision algorithms on a given HPC system is largely unclear. The High Performance GMRES Mixed Precision (HPG-MxP) benchmark has been proposed to measure the useful performance of a HPC system on sparse matrix-based mixed-precision applications. In this work, we present a highly optimized implementation of the HPG-MxP benchmark for an exascale system and describe our algorithm enhancements. We show for the first time a speedup of 1.6x using a combination of double- and single-precision on modern GPU-based supercomputers.

评论：	被接受在SC25上进行展示，美国密苏里州圣路易斯市
主题：	分布式、并行与集群计算 (cs.DC) ; 性能 (cs.PF); 数值分析 (math.NA)
MSC 类：	65Y10
ACM 类：	G.4; C.4
引用方式：	arXiv:2507.11512 [cs.DC]
	(或者 arXiv:2507.11512v1 [cs.DC] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.11512

提交历史

来自： Aditya Kashi [查看电子邮件]
[v1] 星期二， 2025 年 7 月 15 日 17:26:37 UTC (983 KB)

计算机科学 > 分布式、并行与集群计算

标题：使用混合精度跨越内存墙 -- 在百亿亿次机器上的HPG-MxP

标题： Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 分布式、并行与集群计算

标题： 使用混合精度跨越内存墙 -- 在百亿亿次机器上的HPG-MxP 显示英文标题

标题： Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：使用混合精度跨越内存墙 -- 在百亿亿次机器上的HPG-MxP