NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System

Jiang, Qingcai; Tu, Buxin; Hao, Xiaoyu; Chen, Junshi; An, Hong

计算机科学 > 硬件架构

arXiv:2504.03451 (cs)

[提交于 2025年4月4日 ]

标题： NDFT：通过近数据计算系统的软硬件协同设计加速密度泛函理论计算

标题： NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System

Authors:Qingcai Jiang, Buxin Tu, Xiaoyu Hao, Junshi Chen, Hong An

摘要： Linear-response time-dependent Density Functional Theory (LR-TDDFT) is a widely used method for accurately predicting the excited-state properties of physical systems. Previous works have attempted to accelerate LR-TDDFT using heterogeneous systems such as GPUs, FPGAs, and the Sunway architecture. However, a major drawback of these approaches is the constant data movement between host memory and the memory of the heterogeneous systems, which results in substantial \textit{数据移动开销}. Moreover, these works focus primarily on optimizing the compute-intensive portions of LR-TDDFT, despite the fact that the calculation steps are fundamentally \textit{内存绑定的}. To address these challenges, we propose NDFT, a \underline{N}ear-\underline{D}ata Density \underline{F}unctional \underline{T}heory framework. Specifically, we design a novel task partitioning and scheduling mechanism to offload each part of LR-TDDFT to the most suitable computing units within a CPU-NDP system. Additionally, we implement a hardware/software co-optimization of a critical kernel in LR-TDDFT to further enhance performance on the CPU-NDP system. Our results show that NDFT achieves performance improvements of 5.2x and 2.5x over CPU and GPU baselines, respectively, on a large physical system.

摘要： Linear-response time-dependent Density Functional Theory (LR-TDDFT) is a widely used method for accurately predicting the excited-state properties of physical systems. Previous works have attempted to accelerate LR-TDDFT using heterogeneous systems such as GPUs, FPGAs, and the Sunway architecture. However, a major drawback of these approaches is the constant data movement between host memory and the memory of the heterogeneous systems, which results in substantial \textit{data movement overhead}. Moreover, these works focus primarily on optimizing the compute-intensive portions of LR-TDDFT, despite the fact that the calculation steps are fundamentally \textit{memory-bound}. To address these challenges, we propose NDFT, a \underline{N}ear-\underline{D}ata Density \underline{F}unctional \underline{T}heory framework. Specifically, we design a novel task partitioning and scheduling mechanism to offload each part of LR-TDDFT to the most suitable computing units within a CPU-NDP system. Additionally, we implement a hardware/software co-optimization of a critical kernel in LR-TDDFT to further enhance performance on the CPU-NDP system. Our results show that NDFT achieves performance improvements of 5.2x and 2.5x over CPU and GPU baselines, respectively, on a large physical system.

主题：	硬件架构 (cs.AR) ; 计算物理 (physics.comp-ph)
引用方式：	arXiv:2504.03451 [cs.AR]
	(或者 arXiv:2504.03451v1 [cs.AR] 对于此版本)
	https://doi.org/10.48550/arXiv.2504.03451

提交历史

来自： Qingcai Jiang [查看电子邮件]
[v1] 星期五， 2025 年 4 月 4 日 13:51:24 UTC (3,619 KB)

计算机科学 > 硬件架构

标题： NDFT：通过近数据计算系统的软硬件协同设计加速密度泛函理论计算

标题： NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 硬件架构

标题： NDFT：通过近数据计算系统的软硬件协同设计加速密度泛函理论计算 显示英文标题

标题： NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： NDFT：通过近数据计算系统的软硬件协同设计加速密度泛函理论计算