Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
This article presents Neural Cache architecture, which repurposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks. Techniques to do in situ arithmetic in SRAM arrays create efficient data mapping, and reducing data movem...
Gespeichert in:
Veröffentlicht in: | IEEE MICRO 2019-05, Vol.39 (3), p.11-19 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This article presents Neural Cache architecture, which repurposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks. Techniques to do in situ arithmetic in SRAM arrays create efficient data mapping, and reducing data movement is proposed. Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in cache. Our experimental results show that the proposed architecture can improve efficiency over a GPU by \mathbf {128\times}128× while requiring a minimal area overhead of 2%. |
---|---|
ISSN: | 0272-1732 1937-4143 |
DOI: | 10.1109/MM.2019.2908101 |