Fault-Tolerant Training Enabled by On-Line Fault Detection for RRAM-Based Neural Computing Systems

An resistive random-access memory (RRAM)-based computing system (RCS) is an attractive hardware platform for implementing neural computing algorithms. On-line training for RCS enables hardware-based learning for a given application and reduces the additional error caused by device parameter variatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2019-09, Vol.38 (9), p.1611-1624
Hauptverfasser:	Xia, Lixue, Liu, Mengyun, Ning, Xuefei, Chakrabarty, Krishnendu, Wang, Yu
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Biological neural networks Circuit faults Computation Computer architecture Fatigue limit Fault detection Fault tolerance Fault tolerant systems Hardware Machine learning neural network hardware nonvolatile memory Random access memory Resistance resistive random-access memory (RRAM) Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	An resistive random-access memory (RRAM)-based computing system (RCS) is an attractive hardware platform for implementing neural computing algorithms. On-line training for RCS enables hardware-based learning for a given application and reduces the additional error caused by device parameter variations. However, a high occurrence rate of hard faults due to immature fabrication processes and limited write endurance restrict the applicability of on-line training for RCS. We propose a fault-tolerant on-line training method that alternates between a fault-detection phase and a fault-tolerant training phase. In the fault-detection phase, a quiescent-voltage comparison method is utilized. In the training phase, a threshold-training method and a remapping scheme is proposed. Our results show that, compared to neural computing without fault tolerance, the recognition accuracy for the Cifar-10 dataset improves from 37% to 83% when using low-endurance RRAM cells, and from 63% to 76% when using RRAM cells with high endurance but a high percentage of initial faults.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2018.2855145