Fast computation of deep neural network and its real‐time implementation for image recognition

The convolution is widely used for deep neural networks to extract the key features, which requires many additions and multiplications. In this study, the fast computational algorithm is presented to reduce the number of arithmetic when the accuracy is kept. The order of deep convolution is alternat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computational intelligence 2022-04, Vol.38 (2), p.560-585
Hauptverfasser:	Hsia, Shih‐Chang, Wang, Szu‐Hong, Kuo, Feng‐Yang
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Artificial neural networks Convolution Datasets deep computation fast algorithm Feature extraction Frames per second neural network Neural networks Object recognition Vgg‐Net
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The convolution is widely used for deep neural networks to extract the key features, which requires many additions and multiplications. In this study, the fast computational algorithm is presented to reduce the number of arithmetic when the accuracy is kept. The order of deep convolution is alternative to save the computational operators. To verify the performance, the proposed algorithm is embedded to the typical deep neural network VggNet. The structure of VggNet is further modified using the proposed summation and concatenation techniques to improve the computational accuracy and to reduce the processing time. Compared with the original VggNet, the simulations show that the operational FLOPs can be greatly reduced at least 50% with various datasets testing. Besides, the training time with epoch per batch can save about 10%–20%. The proposed fast algorithm can lessen the parameters and the mode size over 90%. The recognition accuracy can be improved with 1%–4% from various datasets testing. Based on the fast network, real‐time FPGA had been realized, which the hardware performance can achieve 371 GOPs with 642 DSP cores. The processing speed can achieve near to 1 k frames per second, and the real‐time recognition rate can achieve over 90%.
ISSN:	0824-7935 1467-8640
DOI:	10.1111/coin.12481