Stuck-at Fault Tolerance in RRAM Computing Systems
Emerging metal-oxide resistive switching random-access memory (RRAM) devices and RRAM crossbars have demonstrated their potential in boosting the speed and energy-efficiency of analog matrix-vector multiplication. However, due to the immature fabrication technology, commonly occurring Stuck-At-Fault...
Gespeichert in:
Veröffentlicht in: | IEEE journal on emerging and selected topics in circuits and systems 2018-03, Vol.8 (1), p.102-115 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Emerging metal-oxide resistive switching random-access memory (RRAM) devices and RRAM crossbars have demonstrated their potential in boosting the speed and energy-efficiency of analog matrix-vector multiplication. However, due to the immature fabrication technology, commonly occurring Stuck-At-Faults (SAFs) seriously degrade the computational accuracy of an RRAM-based computing system (RCS). In this paper, we present a fault-tolerant framework for RCS. A mapping algorithm with inner fault tolerance is proposed to convert matrix parameters into RRAM conductances in RCS and tolerate SAFs by fully exploring the available mapping space. Two baseline redundancy schemes are proposed to ensure that RCS is effective when the percentage of faulty RRAM cells is high. To reduce the number of redundant RRAM cells when the SAFs follow a non-uniform distribution or an unknown distribution, a distribution-aware redundancy scheme and a re-configurable redundancy scheme are proposed to provide dynamic fault tolerance. Simulation results show that, the baseline redundancy schemes can improve the recognition accuracy of the MNIST data set to almost the same as the RRAM-fault-free case, with an energy overhead of approximately 30%. When SAFs follow a non-uniform and an unknown distribution, the distribution-aware and re-configurable schemes can reduce the number of redundant RRAM cells from more than 200% to less than 40% and 60%, respectively, without reducing the recognition accuracy. |
---|---|
ISSN: | 2156-3357 2156-3365 |
DOI: | 10.1109/JETCAS.2017.2776980 |