Area Efficient Compression for Floating-Point Feature Maps in Convolutional Neural Network Accelerators

Since Convolutional neural networks (CNNs) need massive computing resources, lots of computing architectures are proposed to improve the throughput and energy efficiency of the computing. However, those computing architectures need high data movement between the chip and off-chip memories, which cau...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems. II, Express briefs Express briefs, 2023-02, Vol.70 (2), p.1-1
Hauptverfasser: Yan, Bai-Kui, Ruan, Shanq-Jang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Since Convolutional neural networks (CNNs) need massive computing resources, lots of computing architectures are proposed to improve the throughput and energy efficiency of the computing. However, those computing architectures need high data movement between the chip and off-chip memories, which causes high energy consumption on the off-chip memory; thus, the feature map (fmap) compression has been discussed for reducing the data movement. Therefore, the design of fmap compression becomes one of the main researches on CNN accelerator for energy efficiency on the off-chip memory. In this brief, we proposed floating-point (FP) fmap compression for a hardware accelerator which includes hardware design and a compression algorithm. This can apply quantization methods such as ternary neural quantization (TTQ), which only quantized weights with little or no degradation in accuracy and reduces the computation cost. In addition to the zero compression, we also compress nonzero values in the fmap based on the FP format. The compression algorithm achieves low area overhead and a similar compression ratio compared with the state-of-the-art on ILSVRC 2012 dataset.
ISSN:1549-7747
1558-3791
DOI:10.1109/TCSII.2022.3213847