Increasing Throughput of In-Memory DNN Accelerators by Flexible Layerwise DNN Approximation

Approximate computing and mixed-signal in-memory accelerators are promising paradigms to significantly reduce computational requirements of deep neural network (DNN) inference without accuracy loss. In this work, we present a novel in-memory design for layerwise approximate computation at different...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE MICRO 2022-11, Vol.42 (6), p.17-24
Hauptverfasser:	Parra, Cecilia De la, Soliman, Taha, Guntoro, Andre, Kumar, Akash, Wehn, Norbert
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Accuracy Approximation Artificial neural networks Computational modeling Computer memory Deep learning Hardware Image classification In-memory computing Mathematical analysis Neural networks Optimization Quantization (signal) Space exploration Throughput
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Approximate computing and mixed-signal in-memory accelerators are promising paradigms to significantly reduce computational requirements of deep neural network (DNN) inference without accuracy loss. In this work, we present a novel in-memory design for layerwise approximate computation at different approximation levels. A sensitivity-based high-dimensional search is performed to explore the optimal approximation level for each DNN layer. Our new methodology offers high flexibility and optimal tradeoff between accuracy and throughput, which we demonstrate by an extensive evaluation on various DNN benchmarks for medium- and large-scale image classification with CIFAR10, CIFAR100, and ImageNet. With our novel approach, we reach an average of 5× and up to 8× speedup without accuracy loss.
ISSN:	0272-1732 1937-4143
DOI:	10.1109/MM.2022.3196865