ReHy: A ReRAM-Based Digital/Analog Hybrid PIM Architecture for Accelerating CNN Training
Processing-In-Memory (PIM) has emerged as a high-performance and energy-efficient computing paradigm for accelerating convolutional neural network (CNN) applications. Resistive random access memory (ReRAM) has been widely used in PIM architectures due to its extremely high efficiency for acceleratin...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2022-11, Vol.33 (11), p.2872-2884 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Processing-In-Memory (PIM) has emerged as a high-performance and energy-efficient computing paradigm for accelerating convolutional neural network (CNN) applications. Resistive random access memory (ReRAM) has been widely used in PIM architectures due to its extremely high efficiency for accelerating matrix-vector multiplications through analog computing. However, because CNN training usually requires high-precision computation in the backward propagation (BP) stage, the limited precision of analog PIM accelerators impedes their adoption in CNN training. In this article, we propose ReHy, a hybrid PIM accelerator to support CNN training in ReRAM arrays. It is composed of Analog PIM (APIM) and Digital PIM (DPIM) modules. ReHy uses APIM to accelerate the feed-forward propagation (FP) stage for high performance, and DPIM to process the BP stage for high accuracy. We exploit the capability of ReRAM for Boolean logic operations to design the DPIM architecture. Particularly, we design floating-point multiplication and addition operators to support matrix multiplications in ReRAM arrays. We also propose a performance model to offload high-precision matrix multiplications to DPIM according to the data parallelism. Experimental results show that ReHy can speed up CNN training by 48.8× and 2.4×, and reduce energy consumption by 35.1× and 2.33×, compared with CPU/GPU architectures (baseline) and the state-of-the-art FloatPIM, respectively. |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2021.3138087 |