Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

Choong, Benjamin Chen Ming; Luo, Tao; Liu, Cheng; He, Bingsheng; Zhang, Wei; Zhou, Joey Tianyi

doi:10.1016/j.sysarc.2022.102507

计算机科学 > 新兴技术

arXiv:2507.01429 (cs)

[提交于 2025年7月2日 ]

标题：基于赛道存储器的嵌入式系统中用于CNN推理的硬件-软件协同探索内存计算

标题： Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

Authors:Benjamin Chen Ming Choong, Tao Luo, Cheng Liu, Bingsheng He, Wei Zhang, Joey Tianyi Zhou

摘要：深度神经网络生成和处理大量数据，这对资源受限的嵌入式系统构成了挑战。内存计算已被证明是一种高效的计算基础设施，并在嵌入式人工智能应用中展现出前景。在新研究的存储技术中，赛道存储是一种非易失性技术，允许高数据密度的制造，使其成为内存计算的良好选择。然而，将内存内算术电路与存储单元集成会影响内存密度和功耗效率。在面积和能量限制下，在赛道存储上构建高效的内存内算术电路仍然具有挑战性。为此，我们提出了一种针对赛道存储优化的高效内存内卷积神经网络（CNN）加速器。我们设计了一系列适用于乘加操作的内存计算单元的基本算术电路。此外，我们探索了基于赛道存储系统的设计空间和CNN模型架构，采用协同设计方法在保持模型准确性的同时提高在赛道存储中执行CNN推理的效率和性能。我们设计的电路和模型-系统协同优化策略实现了小内存银行面积，并显著提升了基于赛道存储的嵌入式系统的能耗和性能。

摘要： Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-volatile technology that allows high data density fabrication, making it a good fit for in-memory computing. However, integrating in-memory arithmetic circuits with memory cells affects both the memory density and power efficiency. It remains challenging to build efficient in-memory arithmetic circuits on racetrack memory within area and energy constraints. To this end, we present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory. We design a series of fundamental arithmetic circuits as in-memory computing cells suited for multiply-and-accumulate operations. Moreover, we explore the design space of racetrack memory based systems and CNN model architectures, employing co-design to improve the efficiency and performance of performing CNN inference in racetrack memory while maintaining model accuracy. Our designed circuits and model-system co-optimization strategies achieve a small memory bank area with significant improvements in energy and performance for racetrack memory based embedded systems.

主题：	新兴技术 (cs.ET) ; 人工智能 (cs.AI); 硬件架构 (cs.AR)
引用方式：	arXiv:2507.01429 [cs.ET]
	(或者 arXiv:2507.01429v1 [cs.ET] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.01429
相关 DOI:	https://doi.org/10.1016/j.sysarc.2022.102507

提交历史

来自： Benjamin Chen Ming Choong [查看电子邮件]
[v1] 星期三， 2025 年 7 月 2 日 07:29:53 UTC (937 KB)

计算机科学 > 新兴技术

标题：基于赛道存储器的嵌入式系统中用于CNN推理的硬件-软件协同探索内存计算

标题： Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 新兴技术

标题： 基于赛道存储器的嵌入式系统中用于CNN推理的硬件-软件协同探索内存计算 显示英文标题

标题： Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于赛道存储器的嵌入式系统中用于CNN推理的硬件-软件协同探索内存计算