Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication

Riedel, Samuel; Zhang, Yichao; Bertuletti, Marco; Benini, Luca

计算机科学 > 硬件架构

arXiv:2507.05012 (cs)

[提交于 2025年7月7日 ]

标题：优化下一代无线感知与通信的可扩展多集群架构

标题： Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication

Authors:Samuel Riedel, Yichao Zhang, Marco Bertuletti, Luca Benini

摘要：下一代无线技术（用于沉浸式大规模通信、联合通信与感知）需要高度并行的架构来处理大量数据。一种常见的架构模板通过将数十到数百个核心分组为共享内存集群，然后扩展为多集群多核系统。这种分层设计被GPU和加速器所采用，需要在较少的大集群和较多的小集群之间取得平衡，这会影响设计复杂性、同步、通信效率和可编程性。虽然所有多集群架构都必须权衡这些取舍，但关于最优集群大小的见解有限。本文分析了各种集群配置，重点研究了典型无线感知和通信工作负载的同步、数据移动开销和可编程性。我们将开源共享内存集群MemPool扩展为多集群架构，并提出了一种新颖的双缓冲屏障，将处理器和DMA解耦。我们的结果表明，对于内存受限的内核，一个256核集群的速度是16个16核集群的两倍，而对于计算受限的内核，速度最多快24%，这是由于减少了同步和通信开销。

摘要： Next-generation wireless technologies (for immersive-massive communication, joint communication and sensing) demand highly parallel architectures for massive data processing. A common architectural template scales up by grouping tens to hundreds of cores into shared-memory clusters, which are then scaled out as multi-cluster manycore systems. This hierarchical design, used in GPUs and accelerators, requires a balancing act between fewer large clusters and more smaller clusters, affecting design complexity, synchronization, communication efficiency, and programmability. While all multi-cluster architectures must balance these trade-offs, there is limited insight into optimal cluster sizes. This paper analyzes various cluster configurations, focusing on synchronization, data movement overhead, and programmability for typical wireless sensing and communication workloads. We extend the open-source shared-memory cluster MemPool into a multi-cluster architecture and propose a novel double-buffering barrier that decouples processor and DMA. Our results show a single 256-core cluster can be twice as fast as 16 16-core clusters for memory-bound kernels and up to 24% faster for compute-bound kernels due to reduced synchronization and communication overheads.

评论：	6页，8图，已被IWASI 2025接收
主题：	硬件架构 (cs.AR)
引用方式：	arXiv:2507.05012 [cs.AR]
	(或者 arXiv:2507.05012v1 [cs.AR] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.05012

提交历史

来自： Samuel Riedel [查看电子邮件]
[v1] 星期一， 2025 年 7 月 7 日 13:49:59 UTC (1,699 KB)

计算机科学 > 硬件架构

标题：优化下一代无线感知与通信的可扩展多集群架构

标题： Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 硬件架构

标题： 优化下一代无线感知与通信的可扩展多集群架构 显示英文标题

标题： Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：优化下一代无线感知与通信的可扩展多集群架构