A hardware-efficient computing engine for FPGA-based deep convolutional neural network accelerator
Deep convolutional neural networks (DCNNs) have recently emerged as a promising approach for computer vision tasks with many new DCNN architectures proposed to further improve their performance. However, the significant computation workload limits the deployment of such networks on embedded devices....
Gespeichert in:
Veröffentlicht in: | Microelectronics 2022-10, Vol.128, p.105547, Article 105547 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep convolutional neural networks (DCNNs) have recently emerged as a promising approach for computer vision tasks with many new DCNN architectures proposed to further improve their performance. However, the significant computation workload limits the deployment of such networks on embedded devices. Research on accelerating DCNN inference usually employs field-programmable gate arrays (FPGAs) due to their programmability. However, hardware efficiency and reconfigurability do not often receive sufficient attention. This paper proposes an efficient accelerator to support multiple DCNNs and improve the hardware utilization from three perspectives. First, a bandwidth-based tiling algorithm is used to improve the data transfer efficiency for direct memory access (DMA). Second, three parallel strategies are proposed to improve the utilization of the computing units (CUs). Third, a configurable CU is designed to improve the digital signal processor (DSP) utilization. The proposed accelerator is implemented on the Xilinx ZYNQ-7 ZC706 Evaluation Board at 200 MHz. The accelerator reaches 163 Giga Operations Per Second (GOPS) and 0.36 GOPS/DSP on the VGG-16 while consuming only 448 DSPs. A 0.24 GOPS/DSP is achieved with ResNet50 and 0.27 GOPS/DSP with YOLOv2-tiny. The experimental results demonstrate that this design achieves a better trade-off between hardware resource consumption, performance, and reconfigurability over previous works. |
---|---|
ISSN: | 1879-2391 1879-2391 |
DOI: | 10.1016/j.mejo.2022.105547 |