An Architecture to Accelerate Convolution in Deep Neural Networks

In the past few years, the demand for real-time hardware implementations of deep neural networks (DNNs), especially convolutional neural networks (CNNs), has dramatically increased, thanks to their excellent performance on a wide range of recognition and classification tasks. When considering real-t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2018-04, Vol.65 (4), p.1349-1362
Hauptverfasser:	Ardakani, Arash, Condo, Carlo, Ahmadi, Mehdi, Gross, Warren J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Accuracy Artificial neural networks Classification CMOS Complexity theory Computer architecture Convolution Convolutional neural networks deep neural network Hardware hardware implementation Image classification machine learning Microprocessors Neural networks Neurons Object recognition pattern recognition Real time State of the art Three-dimensional displays very large scale integration (VLSI) Video data
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the past few years, the demand for real-time hardware implementations of deep neural networks (DNNs), especially convolutional neural networks (CNNs), has dramatically increased, thanks to their excellent performance on a wide range of recognition and classification tasks. When considering real-time action recognition and video/image classification systems, latency is of paramount importance. Therefore, applications strive to maximize the accuracy while keeping the latency under a given application-specific maximum: in most cases, this threshold cannot exceed a few hundred milliseconds. Until now, the research on DNNs has mainly focused on achieving a better classification or recognition accuracy, whereas very few works in literature take in account the computational complexity of the model. In this paper, we propose an efficient computational method, which is inspired by a computational core of fully connected neural networks, to process convolutional layers of state-of-the-art deep CNNs within strict latency requirements. To this end, we implemented our method customized for VGG and VGG-based networks which have shown state-of-the-art performance on different classification/recognition data sets. The implementation results in 65-nm CMOS technology show that the proposed accelerator can process convolutional layers of VGGNet up to 9.5 times faster than state-of-the-art accelerators reported to-date while occupying 3.5 mm 2 .
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2017.2757036