MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks

Recurrent Neural Networks (RNNs) spend most of their execution time performing matrix-vector multiplication (MV-mul). Because the matrices in RNNs have poor reusability and the ever-increasing size of the matrices becomes too large to fit in the on-chip storage of mobile/IoT devices, the performance...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2020-07, Vol.69 (7), p.955-967
Hauptverfasser: Kim, Byeongho, Chung, Jongwook, Lee, Eojin, Jung, Wonkyung, Lee, Sunjung, Choi, Jaewan, Park, Jaehyun, Wi, Minbok, Lee, Sukhan, Ahn, Jung Ho
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recurrent Neural Networks (RNNs) spend most of their execution time performing matrix-vector multiplication (MV-mul). Because the matrices in RNNs have poor reusability and the ever-increasing size of the matrices becomes too large to fit in the on-chip storage of mobile/IoT devices, the performance and energy efficiency of MV-mul is determined by those of main-memory DRAM. Therefore, computing MV-mul within DRAM draws much attention. However, previous studies lacked consideration for the matrix sparsity, the power constraints of DRAM devices, and concurrency in accessing DRAM from processors while performing MV-mul. We propose a main-memory architecture called MViD, which performs MV-mul by placing MAC units inside DRAM banks. For higher computational efficiency, we use a sparse matrix format and exploit quantization. Because of the limited power budget for DRAM devices, we implement the MAC units only on a portion of the DRAM banks. We architect MViD to slow down or pause MV-mul for concurrently processing memory requests from processors while satisfying the limited power budget. Our results show that MViD provides 7.2× higher throughput compared to the baseline system with four DRAM ranks (performing MV-mul in a chip-multiprocessor) while running inference of Deep Speech 2 with a memory-intensive workload.
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2020.2984496