RASHT: A Partially Reconfigurable Architecture for Efficient Implementation of CNNs

Convolutional neural networks (CNNs) are widely used in machine learning (ML) applications such as image processing. CNN requires heavy computations to provide significant accuracy for many ML tasks. Therefore, the efficient implementations of CNNs to improve performance using limited resources with...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on very large scale integration (VLSI) systems 2022-07, Vol.30 (7), p.860-868
Hauptverfasser:	Darbani, Paria, Rohbani, Nezam, Beitollahi, Hakem, Lotfi-Kamran, Pejman
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Accuracy Array accelerator Arrays Artificial neural networks Chips (memory devices) Computational modeling Computer architecture convolutional neural network (CNN) Convolutional neural networks Image processing image processing and computer vision Machine learning machine learning (ML) Performance enhancement reconfigurable hardware Reconfiguration Resource management Resource utilization System-on-chip Very large scale integration
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Convolutional neural networks (CNNs) are widely used in machine learning (ML) applications such as image processing. CNN requires heavy computations to provide significant accuracy for many ML tasks. Therefore, the efficient implementations of CNNs to improve performance using limited resources without accuracy reduction is a challenge for ML systems. One of the architectures for the efficient execution of CNNs is the array-based accelerator, that consists of an array of similar processing elements (PEs). The array accelerators are popular as high-performance architecture using the features of parallel computing and data reuse. These accelerators are optimized for a set of CNN layers, not for individual layers. Using the same accelerator dimension size to compute all CNN layers with varying shapes and sizes leads to the resource underutilization problem. We propose a flexible and scalable architecture for array-based accelerator that increases resource utilization by resizing PEs to better match the different shapes of CNN layers. The low-cost partial reconfiguration improves resource utilization and performance, resulting in a 23.2% reduction in computational times of GoogLeNet compared to the state-of-the-art accelerators. The proposed architecture decreases the on-chip memory access rate by 26.5% with no accuracy loss.
ISSN:	1063-8210 1557-9999
DOI:	10.1109/TVLSI.2022.3167449