ESSENCE: Exploiting Structured Stochastic Gradient Pruning for Endurance-Aware ReRAM-Based In-Memory Training Systems

Processing-in-memory (PIM) enables energy-efficient deployment of convolutional neural networks (CNNs) from edge to cloud. Resistive random-access memory (ReRAM) is one of the most commonly used technologies for PIM architectures. One of the primary limitations of ReRAM-based PIM in neural network t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2023-07, Vol.42 (7), p.2187-2199
Hauptverfasser:	Yang, Xiaoxuan, Yang, Huanrui, Doppa, Janardhan Rao, Pande, Partha Pratim, Chakrabartys, Krishnendu, Li, Hai
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Convolutional neural networks Datasets Endurance Fatigue limit Kernel Neural networks processing-in-memory (PIM) Programming Pruning Random access memory resistive random-access memory (ReRAM) structured gradient pruning Training Virtual machine monitors Writing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Processing-in-memory (PIM) enables energy-efficient deployment of convolutional neural networks (CNNs) from edge to cloud. Resistive random-access memory (ReRAM) is one of the most commonly used technologies for PIM architectures. One of the primary limitations of ReRAM-based PIM in neural network training arises from the limited write endurance due to the frequent weight updates. To make ReRAM-based architectures viable for CNN training, the write endurance issue needs to be addressed. This work aims to reduce the number of weight reprogrammings without compromising the final model accuracy. We propose the ESSENCE framework with an endurance-aware structured stochastic gradient pruning method, which dynamically adjusts the probability of gradient update based on the current update counts. Experimental results with multiple CNNs and datasets demonstrate that the proposed method can extend ReRAM's life time for training. For instance, with the ResNet20 network and CIFAR-10 dataset, ESSENCE can save the mean update counts of up to 10.29\times compared to the stochastic gradient descent method and effectively reduce the maximum update counts compared with the No Endurance method. Furthermore, an aggressive tuning method based on ESSENCE can boost the mean update count savings by up to 14.41\times .
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2022.3216546