FLARE: A Dataflow-Aware and Scalable Hardware Architecture for Neural-Hybrid Scientific Lossy Compression

Jia, Wenqi; Huang, Ying; Xu, Jian; Hu, Zhewen; Jin, Sian; Tian, Jiannan; Ji, Yuede; Yin, Miao

计算机科学 > 分布式、并行与集群计算

arXiv:2507.01224v1 (cs)

[提交于 2025年7月1日 ]

标题： FLARE：一种面向数据流且可扩展的神经混合科学有损压缩硬件架构

标题： FLARE: A Dataflow-Aware and Scalable Hardware Architecture for Neural-Hybrid Scientific Lossy Compression

Authors:Wenqi Jia, Ying Huang, Jian Xu, Zhewen Hu, Sian Jin, Jiannan Tian, Yuede Ji, Miao Yin

摘要：科学模拟利用高性能计算（HPC）系统对于建模天体物理学、气候科学和流体力学等领域的复杂系统和现象至关重要，生成的大型数据集通常达到拍字节到艾字节规模。然而，管理这些庞大的数据量会引入显著的输入/输出和网络瓶颈，限制了实际性能和可扩展性。尽管由深度神经网络（DNNs）驱动的前沿有损压缩框架通过捕捉复杂的数据相关性已经展示了优越的压缩比，但由于混合的非神经和神经计算模式，它们在HPC工作流中的集成带来了重大挑战，导致过多的内存访问开销、大的顺序停滞，并且在现有硬件平台上对不同数据大小和工作负载的适应性有限。为了克服这些挑战并推动高性能科学计算的极限，我们首次提出了FLARE，这是一种面向数据流且可扩展的神经混合科学有损压缩硬件架构。 FLARE最小化了片外数据访问，通过高效的数据流减少了气泡开销，并采用模块化设计，提供了可扩展性和灵活性，在现代HPC系统上显著提高了吞吐量和能效。特别是，所提出的FLARE在各种数据集和硬件平台上实现了运行时加速，加速范围从$3.50 \times$到$96.07 \times$，能效提升范围从$24.51 \times$到$520.68 \times$。

摘要： Scientific simulation leveraging high-performance computing (HPC) systems is crucial for modeling complex systems and phenomena in fields such as astrophysics, climate science, and fluid dynamics, generating massive datasets that often reach petabyte to exabyte scales. However, managing these vast data volumes introduces significant I/O and network bottlenecks, limiting practical performance and scalability. While cutting-edge lossy compression frameworks powered by deep neural networks (DNNs) have demonstrated superior compression ratios by capturing complex data correlations, their integration into HPC workflows poses substantial challenges due to the hybrid non-neural and neural computation patterns, causing excessive memory access overhead, large sequential stalls, and limited adaptability to varying data sizes and workloads in existing hardware platforms. To overcome these challenges and push the limit of high-performance scientific computing, we for the first time propose FLARE, a dataflow-aware and scalable hardware architecture for neural-hybrid scientific lossy compression. FLARE minimizes off-chip data access, reduces bubble overhead through efficient dataflow, and adopts a modular design that provides both scalability and flexibility, significantly enhancing throughput and energy efficiency on modern HPC systems. Particularly, the proposed FLARE achieves runtime speedups ranging from $3.50 \times$ to $96.07 \times$, and energy efficiency improvements ranging from $24.51 \times$ to $520.68 \times$, across various datasets and hardware platforms.

主题：	分布式、并行与集群计算 (cs.DC)
引用方式：	arXiv:2507.01224 [cs.DC]
	(或者 arXiv:2507.01224v1 [cs.DC] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.01224

提交历史

来自： Wenqi Jia [查看电子邮件]
[v1] 星期二， 2025 年 7 月 1 日 22:55:56 UTC (5,269 KB)

计算机科学 > 分布式、并行与集群计算

标题： FLARE：一种面向数据流且可扩展的神经混合科学有损压缩硬件架构

标题： FLARE: A Dataflow-Aware and Scalable Hardware Architecture for Neural-Hybrid Scientific Lossy Compression

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 分布式、并行与集群计算

标题： FLARE：一种面向数据流且可扩展的神经混合科学有损压缩硬件架构 显示英文标题

标题： FLARE: A Dataflow-Aware and Scalable Hardware Architecture for Neural-Hybrid Scientific Lossy Compression

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： FLARE：一种面向数据流且可扩展的神经混合科学有损压缩硬件架构