DCP-CNN: Efficient Acceleration of CNNs With Dynamic Computing Parallelism on FPGA
Convolutional Neural Networks (CNNs) have demonstrated outstanding accuracy among a range of machine learning tasks. However, the huge computational overhead limits their deployability in real-time applications. For this reason, parallel computing has been extensively employed to accelerate CNNs in...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on computer-aided design of integrated circuits and systems 2024-07, p.1-1 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Convolutional Neural Networks (CNNs) have demonstrated outstanding accuracy among a range of machine learning tasks. However, the huge computational overhead limits their deployability in real-time applications. For this reason, parallel computing has been extensively employed to accelerate CNNs in parallel computing devices such as GPUs and FPGAs, by unrolling multiple loop operations of convolutional layers. Nevertheless, existing CNN accelerators can hardly exploit different parallelisms offered by the CNN algorithms efficiently, since their degrees of parallelism are fixed at different dimensions and layers. In this paper, we propose the DCP-CNN, an FPGA-based CNN accelerator which implements the CNN with Dynamic Computing Parallelism degrees. DCP-CNN employs a parallel computing architecture which dynamically allocates the computing resources between different data dimensions of each layer based on layer size, to ensure that all computing units are working to full capacity and thus achieve optimal compute efficiency. Furthermore, in order to boost the performance of throughput, we propose a design space exploration (DSE) framework based on the simulated annealing method, which automatically generates the parallelism degrees between different dimensions of the network layers, according to the resource constraints and CNN structure. On Intel Stratix 10 GX650 FPGA, the proposed DCP-CNN achieves the throughput of more than 800 Gop/s and the compute efficiency of 72% ~ 98%, which outperforms the existing state-of-the-art FPGA-based CNN accelerators. |
---|---|
ISSN: | 0278-0070 1937-4151 |
DOI: | 10.1109/TCAD.2024.3435996 |