Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

This article presents Neural Cache architecture, which repurposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks. Techniques to do in situ arithmetic in SRAM arrays create efficient data mapping, and reducing data movem...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE MICRO 2019-05, Vol.39 (3), p.11-19
Hauptverfasser: Eckert, Charles, Wang, Xiaowei, Wang, Jingcheng, Subramaniyan, Arun, Sylvester, Dennis, Blaauw, David, Das, Reetuparna, Iyer, Ravi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This article presents Neural Cache architecture, which repurposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks. Techniques to do in situ arithmetic in SRAM arrays create efficient data mapping, and reducing data movement is proposed. Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in cache. Our experimental results show that the proposed architecture can improve efficiency over a GPU by \mathbf {128\times}128× while requiring a minimal area overhead of 2%.
ISSN:0272-1732
1937-4143
DOI:10.1109/MM.2019.2908101