BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
Mixed-precision quantization works Neural Networks (NNs) are gaining traction for their efficient realization on the hardware leading to higher throughput and lower energy. In-Memory Computing (IMC) accelerator architectures are offered as alternatives to traditional architectures relying on a data-...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Mixed-precision quantization works Neural Networks (NNs) are gaining traction
for their efficient realization on the hardware leading to higher throughput
and lower energy. In-Memory Computing (IMC) accelerator architectures are
offered as alternatives to traditional architectures relying on a data-centric
computational paradigm, diminishing the memory wall problem, and scoring high
throughput and energy efficiency. These accelerators can support static
fixed-precision but are not flexible to support mixed-precision NNs. In this
paper, we present BF-IMNA, a bit fluid IMC accelerator for end-to-end
Convolutional NN (CNN) inference that is capable of static and dynamic
mixed-precision without any hardware reconfiguration overhead at run-time. At
the heart of BF-IMNA are Associative Processors (APs), which are bit-serial
word-parallel Single Instruction, Multiple Data (SIMD)-like engines. We report
the performance of end-to-end inference of ImageNet on AlexNet, VGG16, and
ResNet50 on BF-IMNA for different technologies (eNVM and NVM), mixed-precision
configurations, and supply voltages. To demonstrate bit fluidity, we implement
HAWQ-V3's per-layer mixed-precision configurations for ResNet18 on BF-IMNA
using different latency budgets, and results reveal a trade-off between
accuracy and Energy-Delay Product (EDP): On one hand, mixed-precision with a
high latency constraint achieves the closest accuracy to fixed-precision INT8
and reports a high (worse) EDP compared to fixed-precision INT4. On the other
hand, with a low latency constraint, BF-IMNA reports the closest EDP to
fixed-precision INT4, with a higher degradation in accuracy compared to
fixed-precision INT8. We also show that BF-IMNA with fixed-precision
configuration still delivers performance that is comparable to current
state-of-the-art accelerators: BF-IMNA achieves $20\%$ higher energy efficiency
and $2\%$ higher throughput. |
---|---|
DOI: | 10.48550/arxiv.2411.01417 |