MemUnison: A Racetrack-ReRAM-Combined Pipeline Architecture for Energy-Efficient in-Memory CNNs

Though ReRAM has been greatly successful in reducing energy consumption of various neural networks, it still suffers write amplification in energy, which impedes ReRAM to provide efficient storage for the ubiquitous streaming data in CNNs, such as feature-maps. Racetrack memory, an emerging magnetic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computers 2022-12, Vol.71 (12), p.3281-3294
Hauptverfasser:	Wang, Jihe, Liu, Jun, Wang, Danghui, Zhang, Shengbing, Fan, Xiaoya
Format:	Artikel
Sprache:	eng
Schlagworte:	Alliances Computer architecture Computer memory Energy consumption energy-efficiency Memory management Microprocessors Neural networks Pipelines Pipelining (computers) Processing-in-memory racetrack memory Racetracks Random access memory ReRAM sequential access Throughput write amplification
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Though ReRAM has been greatly successful in reducing energy consumption of various neural networks, it still suffers write amplification in energy, which impedes ReRAM to provide efficient storage for the ubiquitous streaming data in CNNs, such as feature-maps. Racetrack memory, an emerging magnetic memory technique, is a proper candidate to hold streaming data since it enjoys fast sequential-access with ultra-low operating energy in read and write. In this work, we propose a hybrid processing-in-memory architecture, called MemUnison, that coordinates ReRAM and racetrack to overcome the expenditure storage of streaming data in ReRAM. By placing feature-maps in racetrack and leaving weights in ReRAM, a datapath is constructed between the two sides to form a fetch-process-writeback pipeline. As the invalid-shifts of the racetrack memory incurs a large amount of pipeline bubble, we propose a row-based access that can read and write a feature-map without any invalid-shifts. For the row-based operation, a cohesive controlling method is proposed to coordinate racetrack and ReRAM. In runtime, convolution kernels are scheduled in ReRAM banks for cross-channel calculations of one row, by which computing complexity of a convolutional layer can be reduced by 4 orders of magnitude, excessing the 2 order of reduction by traditional ReRAM.
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2022.3148858