Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUs

Pan, Qilong; Abdulah, Sameh; Abduljabbar, Mustafa; Ltaief, Hatem; Herten, Andreas; Bode, Mathis; Pratola, Matthew; Fadikar, Arindam; Genton, Marc G.; Keyes, David E.; Sun, Ying

计算机科学 > 分布式、并行与集群计算

arXiv:2504.12004 (cs)

[提交于 2025年4月16日 ]

标题：高维高斯过程模拟的缩放块Vecchia近似方法在GPU上的应用

标题： Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUs

Authors:Qilong Pan, Sameh Abdulah, Mustafa Abduljabbar, Hatem Ltaief, Andreas Herten, Mathis Bode, Matthew Pratola, Arindam Fadikar, Marc G. Genton, David E. Keyes, Ying Sun

摘要：模拟计算密集型科学仿真对于在大规模下实现不确定性量化、优化和决策制定至关重要。高斯过程（GPs）为统计模拟提供了一种灵活且数据高效的框架，但其较差的可扩展性限制了其在大数据集上的应用。我们引入了缩放块Vecchia（SBV）算法，用于分布式GPU系统。 SBV结合了适用于各向异性输入缩放的缩放Vecchia方法和块Vecchia（BV）方法，在降低计算和内存复杂度的同时，利用GPU加速技术进行高效的线性代数运算。据我们所知，这是任何基于Vecchia的GP变体的第一个分布式实现。我们的实现使用MPI实现节点间并行，并使用MAGMA库进行GPU加速的批量矩阵计算。我们通过在合成和现实世界工作负载上的实验展示了所提出算法的可扩展性和效率，包括来自呼吸系统疾病模型的50M点模拟。 SBV在最多64个A100和GH200 GPU上实现了近线性可扩展性，处理3.2亿个点，并相对于精确GP求解器减少了能耗，确立了SBV作为在基于GPU的分布式系统上模拟大规模科学模型的可扩展且节能的框架。

摘要： Emulating computationally intensive scientific simulations is essential to enable uncertainty quantification, optimization, and decision-making at scale. Gaussian Processes (GPs) offer a flexible and data-efficient foundation for statistical emulation, but their poor scalability limits applicability to large datasets. We introduce the Scaled Block Vecchia (SBV) algorithm for distributed GPU-based systems. SBV integrates the Scaled Vecchia approach for anisotropic input scaling with the Block Vecchia (BV) method to reduce computational and memory complexity while leveraging GPU acceleration techniques for efficient linear algebra operations. To the best of our knowledge, this is the first distributed implementation of any Vecchia-based GP variant. Our implementation employs MPI for inter-node parallelism and the MAGMA library for GPU-accelerated batched matrix computations. We demonstrate the scalability and efficiency of the proposed algorithm through experiments on synthetic and real-world workloads, including a 50M point simulation from a respiratory disease model. SBV achieves near-linear scalability on up to 64 A100 and GH200 GPUs, handles 320M points, and reduces energy use relative to exact GP solvers, establishing SBV as a scalable and energy-efficient framework for emulating large-scale scientific models on GPU-based distributed systems.

主题：	分布式、并行与集群计算 (cs.DC)
引用方式：	arXiv:2504.12004 [cs.DC]
	(或者 arXiv:2504.12004v1 [cs.DC] 对于此版本)
	https://doi.org/10.48550/arXiv.2504.12004

提交历史

来自： Sameh Abdulah [查看电子邮件]
[v1] 星期三， 2025 年 4 月 16 日 11:57:20 UTC (1,814 KB)

计算机科学 > 分布式、并行与集群计算

标题：高维高斯过程模拟的缩放块Vecchia近似方法在GPU上的应用

标题： Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUs

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 分布式、并行与集群计算

标题： 高维高斯过程模拟的缩放块Vecchia近似方法在GPU上的应用 显示英文标题

标题： Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUs

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：高维高斯过程模拟的缩放块Vecchia近似方法在GPU上的应用