SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator

Wang, Peipei; Guan, Wu; Liang, Liping; Wang, Zhijun; Luo, Hanqing; Zhang, Zhibin

计算机科学 > 硬件架构

arXiv:2507.14139 (cs)

[提交于 2025年5月7日 ]

标题： SpeedLLM：大型语言模型推理加速器的FPGA协同设计

标题： SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator

Authors:Peipei Wang, Wu Guan, Liping Liang, Zhijun Wang, Hanqing Luo, Zhibin Zhang

摘要：本文介绍了SpeedLLM，一种基于Xilinx Alevo U280平台设计的神经网络加速器，并针对Tinyllama框架进行了优化，以提升边缘计算性能。关键创新包括数据流并行性、内存复用策略和Llama2操作符融合，这些共同减少了延迟和能耗。SpeedLLM的数据流水线架构优化了读取-计算-写入循环，而内存策略则最小化了FPGA资源需求。操作符融合提高了计算密度和吞吐量。结果表明，SpeedLLM优于传统的Tinyllama实现，实现了最高4.8倍的性能提升和1.18倍的能耗降低，为边缘设备提供了改进。

摘要： This paper introduces SpeedLLM, a neural network accelerator designed on the Xilinx Alevo U280 platform and optimized for the Tinyllama framework to enhance edge computing performance. Key innovations include data stream parallelism, a memory reuse strategy, and Llama2 operator fusion, which collectively reduce latency and energy consumption. SpeedLLM's data pipeline architecture optimizes the read-compute-write cycle, while the memory strategy minimizes FPGA resource demands. The operator fusion boosts computational density and throughput. Results show SpeedLLM outperforms traditional Tinyllama implementations, achieving up to 4.8* faster performance and 1.18* lower energy consumption, offering improvements in edge devices.

主题：	硬件架构 (cs.AR)
引用方式：	arXiv:2507.14139 [cs.AR]
	(或者 arXiv:2507.14139v1 [cs.AR] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.14139

提交历史

来自： Peipei Wang [查看电子邮件]
[v1] 星期三， 2025 年 5 月 7 日 05:39:07 UTC (2,519 KB)

计算机科学 > 硬件架构

标题： SpeedLLM：大型语言模型推理加速器的FPGA协同设计

标题： SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 硬件架构

标题： SpeedLLM：大型语言模型推理加速器的FPGA协同设计 显示英文标题

标题： SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： SpeedLLM：大型语言模型推理加速器的FPGA协同设计