Experience Deploying Containerized GenAI Services at an HPC Center

Beltre, Angel M.; Ogden, Jeff; Pedretti, Kevin

doi:10.1145/3731599.3767356

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2509.20603 (cs)

[Submitted on 24 Sep 2025 (v1) , last revised 29 Sep 2025 (this version, v2)]

Title: Experience Deploying Containerized GenAI Services at an HPC Center

Title: 在HPC中心部署容器化GenAI服务的经验

Authors:Angel M. Beltre, Jeff Ogden, Kevin Pedretti

Abstract: Generative Artificial Intelligence (GenAI) applications are built from specialized components -- inference servers, object storage, vector and graph databases, and user interfaces -- interconnected via web-based APIs. While these components are often containerized and deployed in cloud environments, such capabilities are still emerging at High-Performance Computing (HPC) centers. In this paper, we share our experience deploying GenAI workloads within an established HPC center, discussing the integration of HPC and cloud computing environments. We describe our converged computing architecture that integrates HPC and Kubernetes platforms running containerized GenAI workloads, helping with reproducibility. A case study illustrates the deployment of the Llama Large Language Model (LLM) using a containerized inference server (vLLM) across both Kubernetes and HPC platforms using multiple container runtimes. Our experience highlights practical considerations and opportunities for the HPC container community, guiding future research and tool development.

Abstract: 生成式人工智能（GenAI）应用程序由专用组件构建——推理服务器、对象存储、向量和图数据库以及用户界面——通过基于网络的API相互连接。尽管这些组件通常被容器化并在云环境中部署，但这种能力在高性能计算（HPC）中心仍处于发展初期。在本文中，我们分享了在已建立的HPC中心部署GenAI工作负载的经验，讨论了HPC和云计算环境的集成。我们描述了整合HPC和Kubernetes平台的融合计算架构，该架构运行容器化的GenAI工作负载，有助于可重复性。一个案例研究展示了使用容器化推理服务器（vLLM）在Kubernetes和HPC平台上跨多个容器运行时部署Llama大语言模型（LLM）。我们的经验突出了HPC容器社区的实际考虑因素和机会，为未来的研究和工具开发提供了指导。

Comments:	10 pages, 12 figures
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
Cite as:	arXiv:2509.20603 [cs.DC]
	(or arXiv:2509.20603v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2509.20603
Related DOI:	https://doi.org/10.1145/3731599.3767356

Submission history

From: Angel Beltre [view email]
[v1] Wed, 24 Sep 2025 22:54:21 UTC (150 KB)
[v2] Mon, 29 Sep 2025 01:14:19 UTC (572 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Experience Deploying Containerized GenAI Services at an HPC Center

Title: 在HPC中心部署容器化GenAI服务的经验

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Experience Deploying Containerized GenAI Services at an HPC Center Show Chinese title

Title: 在HPC中心部署容器化GenAI服务的经验

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Experience Deploying Containerized GenAI Services at an HPC Center