All-rounder: A flexible DNN accelerator with diverse data format support
Recognizing the explosive increase in the use of DNN-based applications, several industrial companies developed a custom ASIC (e.g., Google TPU, IBM RaPiD, Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure with it. The ASIC performs operations of the inference or training process...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recognizing the explosive increase in the use of DNN-based applications,
several industrial companies developed a custom ASIC (e.g., Google TPU, IBM
RaPiD, Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure
with it. The ASIC performs operations of the inference or training process of
DNN models which are requested by users. Since the DNN models have different
data formats and types of operations, the ASIC needs to support diverse data
formats and generality for the operations. However, the conventional ASICs do
not fulfill these requirements. To overcome the limitations of it, we propose a
flexible DNN accelerator called All-rounder. The accelerator is designed with
an area-efficient multiplier supporting multiple precisions of integer and
floating point datatypes. In addition, it constitutes a flexibly fusible and
fissionable MAC array to support various types of DNN operations efficiently.
We implemented the register transfer level (RTL) design using Verilog and
synthesized it in 28nm CMOS technology. To examine practical effectiveness of
our proposed designs, we designed two multiply units and three state-of-the-art
DNN accelerators. We compare our multiplier with the multiply units and perform
architectural evaluation on performance and energy efficiency with eight
real-world DNN models. Furthermore, we compare benefits of the All-rounder
accelerator to a high-end GPU card, i.e., NVIDIA GeForce RTX30390. The proposed
All-rounder accelerator universally has speedup and high energy efficiency in
various DNN benchmarks than the baselines. |
---|---|
DOI: | 10.48550/arxiv.2310.16757 |