An Architecture to Accelerate Convolution in Deep Neural Networks
In the past few years, the demand for real-time hardware implementations of deep neural networks (DNNs), especially convolutional neural networks (CNNs), has dramatically increased, thanks to their excellent performance on a wide range of recognition and classification tasks. When considering real-t...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2018-04, Vol.65 (4), p.1349-1362 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the past few years, the demand for real-time hardware implementations of deep neural networks (DNNs), especially convolutional neural networks (CNNs), has dramatically increased, thanks to their excellent performance on a wide range of recognition and classification tasks. When considering real-time action recognition and video/image classification systems, latency is of paramount importance. Therefore, applications strive to maximize the accuracy while keeping the latency under a given application-specific maximum: in most cases, this threshold cannot exceed a few hundred milliseconds. Until now, the research on DNNs has mainly focused on achieving a better classification or recognition accuracy, whereas very few works in literature take in account the computational complexity of the model. In this paper, we propose an efficient computational method, which is inspired by a computational core of fully connected neural networks, to process convolutional layers of state-of-the-art deep CNNs within strict latency requirements. To this end, we implemented our method customized for VGG and VGG-based networks which have shown state-of-the-art performance on different classification/recognition data sets. The implementation results in 65-nm CMOS technology show that the proposed accelerator can process convolutional layers of VGGNet up to 9.5 times faster than state-of-the-art accelerators reported to-date while occupying 3.5 mm 2 . |
---|---|
ISSN: | 1549-8328 1558-0806 |
DOI: | 10.1109/TCSI.2017.2757036 |