FPGA Implementation of a Deep Learning Acceleration Core Architecture for Image Target Detection

Due to the flexibility and ease of deployment of Field Programmable Gate Arrays (FPGA), more and more studies have been conducted on developing and optimizing target detection algorithms based on Convolutional Neural Networks (CNN) models using FPGAs. Still, these studies focus on improving the perf...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2023-04, Vol.13 (7), p.4144
Hauptverfasser:	Yang, Xu, Zhuang, Chen, Feng, Wenquan, Yang, Zhe, Wang, Qiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Acceleration acceleration core Accuracy Algorithms Analysis Arithmetic Computation Computer architecture Deep learning Design Design optimization Digital integrated circuits Digital signal processors Field programmable gate arrays FPGA Frames per second Mathematical optimization Neural networks Optimization parallel acceleration pipeline Resource utilization Semiconductor industry Signal processing target detection TinyYolo Unmanned aerial vehicles
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Due to the flexibility and ease of deployment of Field Programmable Gate Arrays (FPGA), more and more studies have been conducted on developing and optimizing target detection algorithms based on Convolutional Neural Networks (CNN) models using FPGAs. Still, these studies focus on improving the performance of the core algorithm and optimizing hardware structure, with few studies focusing on the unified architecture design and corresponding optimization techniques for the algorithm model, resulting in inefficient overall model performance. The essential reason is that these studies do not address arithmetic power, speed, and resource consistency. In order to solve this problem, we propose a deep learning acceleration core architecture based on FPGAs, which is designed for target detection algorithms with CNN models, using multi-channel parallelization of CNN network models to improve the arithmetic power, using scheduling tasks and intensive computation pipelining to meet the algorithm’s data bandwidth requirements and unifying the speed and area of the orchestrated computation matrix to save hardware resources. The proposed framework achieves 14 Frames Per Second (FPS) inference performance of the TinyYolo model at 5 Giga Operations Per Second (GOPS) with 30% higher running clock frequency, 2–4 times higher arithmetic power, and 28% higher Digital Signal Processing (DSP) resource utilization efficiency using less than 25% of FPGA resource usage.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app13074144