Reconfigurable Network-on-Chip based Convolutional Neural Network Accelerator

Convolutional Neural Networks (CNNs) have a wide range of applications due to their superior performance in image and pattern classification. However, the performance of CNNs comes at the price of high computational load and memory bandwidth usage. Hardware acceleration has become the primary way to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of systems architecture 2022-08, Vol.129, p.102567, Article 102567
Hauptverfasser: Firuzan, Arash, Modarressi, Mehdi, Reshadi, Midia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Convolutional Neural Networks (CNNs) have a wide range of applications due to their superior performance in image and pattern classification. However, the performance of CNNs comes at the price of high computational load and memory bandwidth usage. Hardware acceleration has become the primary way to tackle this ever-increasing complexity of CNNs. Most of the recent accelerators arrange processing units (PEs) as a many-core accelerator architecture, with the inter-PE connections tailored to the specific dataflow of the CNN layers. The performance of such accelerators is maximized if the input feature map and filter size/dimension matches that of the underlying accelerator. However, current fixed-size accelerator structures lead to sever resource underutilization because the same structure is used to compute CNN layers of varying dimensions. In this paper, we tackle this problem by presenting RC-CNN, a reconfigurable accelerator architecture for CNNs that can adapt the structure of accelerator to the size and dataflow pattern of the running CNN layer. RC-CNN relies on a reconfigurable on-chip interconnection fabric that can organize a sub-set of accelerator’s PEs as a PE set with the same size/dimension of the target CNN layer and customize the inter-PE connections for the layer’s dataflow pattern. Since the area/energy overhead does not justify using a full-fledged packet-switched network in accelerators with fine-grained PEs, we use a reconfigurable network with very simple switches in order to efficiently implement the dynamic reconfiguration capability for many-core fine-grained CNN accelerators. Experimental results show that, based on the CNN size and accelerator structure, RC-CNN yields 37% higher PE utilization over a baseline design, on average. It also improves the PE utilization of the state-of-the-art CNN accelerators we selected for comparison purpose by 18%, on average. The results show that these improvements translate to 9%–41% increase in the accelerator’s throughput. Further, RC-CNN reduces the network latency and energy consumption by 28% and 22%, respectively, compared to the state-of-the-art utilization-aware methods that employ packet-switched networks-on-chip.
ISSN:1383-7621
1873-6165
DOI:10.1016/j.sysarc.2022.102567