Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept
An accelerator is a specialized integrated circuit designed to perform specific computations faster than if those were performed by CPU or GPU. A Field-Programmable DNN learning and inference accelerator (FProg-DNN) using hybrid systolic and non-systolic techniques, distributed information-control a...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | An accelerator is a specialized integrated circuit designed to perform
specific computations faster than if those were performed by CPU or GPU. A
Field-Programmable DNN learning and inference accelerator (FProg-DNN) using
hybrid systolic and non-systolic techniques, distributed information-control
and deep pipelined structure is proposed and its microarchitecture and
operation presented here. Reconfigurability attends diverse DNN designs and
allows for different number of workers to be assigned to different layers as a
function of the relative difference in computational load among layers. The
computational delay per layer is made roughly the same along pipelined
accelerator structure. VGG-16 and recently proposed Inception Modules are used
for showing the flexibility of the FProg-DNN reconfigurability. Special
structures were also added for a combination of convolution layer, map
coincidence and feedback for state of the art learning with small set of
examples, which is the focus of a companion paper by the author (Franca-Neto,
2018). The accelerator described is able to reconfigure from (1) allocating all
a DNN computations to a single worker in one extreme of sub-optimal performance
to (2) optimally allocating workers per layer according to computational load
in each DNN layer to be realized. Due the pipelined architecture, more than 50x
speedup is achieved relative to GPUs or TPUs. This speed-up is consequence of
hiding the delay in transporting activation outputs from one layer to the next
in a DNN behind the computations in the receiving layer. This FProg-DNN concept
has been simulated and validated at behavioral-functional level. |
---|---|
DOI: | 10.48550/arxiv.1802.04899 |