ML-Based Optimum Number of CUDA Streams for the GPU Implementation of the Tridiagonal Partition Method

Veneva, Milena; Imamura, Toshiyuki

计算机科学 > 分布式、并行与集群计算

arXiv:2501.05938 (cs)

[提交于 2025年1月10日 ]

标题：基于机器学习的三对角分块方法在GPU实现中的最优CUDA流数量

标题： ML-Based Optimum Number of CUDA Streams for the GPU Implementation of the Tridiagonal Partition Method

Authors:Milena Veneva, Toshiyuki Imamura

摘要：本文提出了一种启发式方法，通过使用现代面向人工智能的方法常用的工具，来找到最优的CUDA流数量，并应用于并行划分算法。构建了GPU实现划分方法的时间复杂度模型。进一步提出了在多个CUDA流上执行的划分算法的细化时间复杂度模型。进行了不同SLAE大小的计算实验，并通过实验方法找到了每个情况下的最优CUDA流数量。基于收集的数据，使用回归分析建立了非主导GPU操作时间总和的模型（这些操作参与流重叠）。创建了一个与CUDA流创建相关的开销时间的拟合非线性模型。对所有构建的模型进行了统计分析。制定了寻找最优CUDA流数量的算法。使用该算法以及上述两个模型，做出了最优CUDA流数量的预测。将预测值与实际数据进行比较，该算法被认为是可以接受的。

摘要： This paper presents a heuristic for finding the optimum number of CUDA streams by using tools common to the modern AI-oriented approaches and applied to the parallel partition algorithm. A time complexity model for the GPU realization of the partition method is built. Further, a refined time complexity model for the partition algorithm being executed on multiple CUDA streams is formulated. Computational experiments for different SLAE sizes are conducted, and the optimum number of CUDA streams for each of them is found empirically. Based on the collected data a model for the sum of the times for the non-dominant GPU operations (that take part in the stream overlap) is formulated using regression analysis. A fitting non-linear model for the overhead time connected with the creation of CUDA streams is created. Statistical analysis is done for all the built models. An algorithm for finding the optimum number of CUDA streams is formulated. Using this algorithm, together with the two models mentioned above, predictions for the optimum number of CUDA streams are made. Comparing the predicted values with the actual data, the algorithm is deemed to be acceptably good.

评论：	7页，4图，5表，MMCP会议2024，埃里温，亚美尼亚
主题：	分布式、并行与集群计算 (cs.DC)
MSC 类：	65Y05, 65Y10, 90C59, 68T20
引用方式：	arXiv:2501.05938 [cs.DC]
	(或者 arXiv:2501.05938v1 [cs.DC] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.05938

提交历史

来自： Milena Veneva [查看电子邮件]
[v1] 星期五， 2025 年 1 月 10 日 13:02:22 UTC (163 KB)

计算机科学 > 分布式、并行与集群计算

标题：基于机器学习的三对角分块方法在GPU实现中的最优CUDA流数量

标题： ML-Based Optimum Number of CUDA Streams for the GPU Implementation of the Tridiagonal Partition Method

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 分布式、并行与集群计算

标题： 基于机器学习的三对角分块方法在GPU实现中的最优CUDA流数量 显示英文标题

标题： ML-Based Optimum Number of CUDA Streams for the GPU Implementation of the Tridiagonal Partition Method

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于机器学习的三对角分块方法在GPU实现中的最优CUDA流数量