SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan

Konishi, Fumikazu

计算机科学 > 分布式、并行与集群计算

arXiv:2507.02124 (cs)

[提交于 2025年7月2日 ]

标题： SAKURAONE：通过日本私营部门的高性能计算投资推动透明和开放的人工智能平台

标题： SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan

Authors:Fumikazu Konishi

摘要： SAKURAONE是由SAKURA互联网研究中心开发和运营的托管高性能计算（HPC）集群。它强化了“KOKARYOKU PHY”配置的裸金属GPU服务器，并被设计为针对先进工作负载（包括大型语言模型（LLM）训练）优化的集群计算资源。在TOP500列表的ISC 2025版中，SAKURAONE根据其高性能线性代数包（HPL）得分位列世界第\textbf{49届}，展示了其全球竞争力。特别是，它是第\textbf{仅有的前100名系统}个采用基于\textbf{800千兆以太网（千兆以太网）}的完全开放网络堆栈和\textbf{SONiC（用于云中开放网络的软件）}操作系统的系统，突显了开放和供应商中立技术在大规模HPC基础设施中的可行性。SAKURAONE在HPL基准测试（Rmax）上实现了持续性能33.95~PFLOP/s，在高性能共轭梯度（HPCG）基准测试上实现了396.295~TFLOP/s。对于针对人工智能应用代表性低精度工作负载的HPL-MxP基准测试，SAKURAONE使用FP8精度实现了惊人的339.86~PFLOP/s。该系统由100个计算节点组成，每个节点配备八个NVIDIA H100 GPU。它由一个全闪存Lustre存储子系统支持，总物理容量为2~拍字节，提供高吞吐量和低延迟的数据访问。节点间通信通过基于Rail-Optimized拓扑的全分叉带宽互连实现，其中叶层和脊层通过800~GbE链路互连。这种拓扑结构结合RoCEv2（RDMA over Converged Ethernet版本2），实现了高速、无损的数据传输，并缓解了大规模并行工作负载中的通信瓶颈。

摘要： SAKURAONE is a managed high performance computing (HPC) cluster developed and operated by the SAKURA Internet Research Center. It reinforces the ``KOKARYOKU PHY'' configuration of bare-metal GPU servers and is designed as a cluster computing resource optimized for advanced workloads, including large language model (LLM) training. In the ISC 2025 edition of the TOP500 list, SAKURAONE was ranked \textbf{49th} in the world based on its High Performance Linpack (HPL) score, demonstrating its global competitiveness. In particular, it is the \textbf{only system within the top 100} that employs a fully open networking stack based on \textbf{800~GbE (Gigabit Ethernet)} and the \textbf{SONiC (Software for Open Networking in the Cloud)} operating system, highlighting the viability of open and vendor-neutral technologies in large-scale HPC infrastructure. SAKURAONE achieved a sustained performance of 33.95~PFLOP/s on the HPL benchmark (Rmax), and 396.295~TFLOP/s on the High Performance Conjugate Gradient (HPCG) benchmark. For the HPL-MxP benchmark, which targets low-precision workloads representative of AI applications, SAKURAONE delivered an impressive 339.86~PFLOP/s using FP8 precision. The system comprises 100 compute nodes, each equipped with eight NVIDIA H100 GPUs. It is supported by an all-flash Lustre storage subsystem with a total physical capacity of 2~petabytes, providing high-throughput and low-latency data access. Internode communication is enabled by a full-bisection bandwidth interconnect based on a Rail-Optimized topology, where the Leaf and Spine layers are interconnected via 800~GbE links. This topology, in combination with RoCEv2 (RDMA over Converged Ethernet version 2), enables high-speed, lossless data transfers and mitigates communication bottlenecks in large-scale parallel workloads.

评论：	13页，2图，10表
主题：	分布式、并行与集群计算 (cs.DC) ; 网络与互联网架构 (cs.NI)
ACM 类：	C.5.5; B.8.2
引用方式：	arXiv:2507.02124 [cs.DC]
	(或者 arXiv:2507.02124v1 [cs.DC] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.02124

提交历史

来自： Fumikazu Konishi [查看电子邮件]
[v1] 星期三， 2025 年 7 月 2 日 20:13:09 UTC (463 KB)

计算机科学 > 分布式、并行与集群计算

标题： SAKURAONE：通过日本私营部门的高性能计算投资推动透明和开放的人工智能平台

标题： SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 分布式、并行与集群计算

标题： SAKURAONE：通过日本私营部门的高性能计算投资推动透明和开放的人工智能平台 显示英文标题

标题： SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： SAKURAONE：通过日本私营部门的高性能计算投资推动透明和开放的人工智能平台