An Approximate Fault-Tolerance Design for a Convolutional Neural Network Accelerator

Today, various domain-specific convolutional neural network (CNN) accelerators are deployed in large-scale systems to satisfy the massive computational demands of current deep CNNs. Although bringing significant performance improvements, the highly integrated CNN accelerators are more susceptible to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IT professional 2023-07, Vol.25 (4), p.85-90
Hauptverfasser:	Wei, Wenda, Wang, Chenyang, Zheng, Xinyang, Yue, Hengshan
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Aging Artificial neural networks Convolutional neural networks Error detection Fault tolerance Hardware acceleration Neural networks Philosophical considerations Redundancy Reliability engineering Systolic arrays Termination of employment Terminations
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Today, various domain-specific convolutional neural network (CNN) accelerators are deployed in large-scale systems to satisfy the massive computational demands of current deep CNNs. Although bringing significant performance improvements, the highly integrated CNN accelerators are more susceptible to faults caused by radiation, aging, and process variation. CNNs have been increasingly deployed in security-critical areas, requiring more attention to reliable execution. Although the classical fault-tolerant approaches are error-effective, the performance/energy overheads introduced are nonnegligible, which is the opposite of CNN accelerator design philosophy. In this article, we leverage CNN’s intrinsic tolerance for minor errors to explore approximate fault-tolerance (ApFT) opportunities for CNN accelerator fault-tolerance overhead reduction. Specifically, we discuss two branches of ApFT designs: selective duplicating-based approximate fault tolerance (S-ApFT) and imprecise checking-based approximate fault tolerance (I-ApFT). The results show that S-ApFT and I-ApFT can achieve comparable error-detection ability and dual-modular redundancy while achieving significant performance improvements.
ISSN:	1520-9202 1941-045X
DOI:	10.1109/MITP.2023.3264849