ReHy: A ReRAM-Based Digital/Analog Hybrid PIM Architecture for Accelerating CNN Training

Processing-In-Memory (PIM) has emerged as a high-performance and energy-efficient computing paradigm for accelerating convolutional neural network (CNN) applications. Resistive random access memory (ReRAM) has been widely used in PIM architectures due to its extremely high efficiency for acceleratin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2022-11, Vol.33 (11), p.2872-2884
Hauptverfasser:	Jin, Hai, Liu, Cong, Liu, Haikun, Luo, Ruikun, Xu, Jiahong, Mao, Fubing, Liao, Xiaofei
Format:	Artikel
Sprache:	eng
Schlagworte:	Arrays Artificial neural networks Boolean algebra Computation Computer architecture convolutional neural network training Convolutional neural networks digital-analog hybrid accelerator Energy consumption Floating point arithmetic Mathematical analysis Matrices (mathematics) Matrix algebra Memristors Multiplication Parallel processing Propagation Random access memory Resistance Resistive random access memory Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Processing-In-Memory (PIM) has emerged as a high-performance and energy-efficient computing paradigm for accelerating convolutional neural network (CNN) applications. Resistive random access memory (ReRAM) has been widely used in PIM architectures due to its extremely high efficiency for accelerating matrix-vector multiplications through analog computing. However, because CNN training usually requires high-precision computation in the backward propagation (BP) stage, the limited precision of analog PIM accelerators impedes their adoption in CNN training. In this article, we propose ReHy, a hybrid PIM accelerator to support CNN training in ReRAM arrays. It is composed of Analog PIM (APIM) and Digital PIM (DPIM) modules. ReHy uses APIM to accelerate the feed-forward propagation (FP) stage for high performance, and DPIM to process the BP stage for high accuracy. We exploit the capability of ReRAM for Boolean logic operations to design the DPIM architecture. Particularly, we design floating-point multiplication and addition operators to support matrix multiplications in ReRAM arrays. We also propose a performance model to offload high-precision matrix multiplications to DPIM according to the data parallelism. Experimental results show that ReHy can speed up CNN training by 48.8× and 2.4×, and reduce energy consumption by 35.1× and 2.33×, compared with CPU/GPU architectures (baseline) and the state-of-the-art FloatPIM, respectively.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2021.3138087