GCN-ABFT: Low-Cost Online Error Checking for Graph Convolutional Networks

Graph convolutional networks (GCNs) are popular for building machine-learning application for graph-structured data. This widespread adoption led to the development of specialized GCN hardware accelerators. In this work, we address a key architectural challenge for GCN accelerators: how to detect er...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2024-12, p.1-1
Hauptverfasser:	Peltekis, Christodoulos, Dimitrakopoulos, Giorgos
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithm-based Fault Tolerance Arithmetic Convolution Design automation Energy Efficiency Fault tolerance Fault tolerant systems Graph Convolution Networks Graph convolutional networks Integrated circuits Machine learning Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Graph convolutional networks (GCNs) are popular for building machine-learning application for graph-structured data. This widespread adoption led to the development of specialized GCN hardware accelerators. In this work, we address a key architectural challenge for GCN accelerators: how to detect errors in GCN computations arising from random hardware faults with the least computation cost. Each GCN layer performs a graph convolution, mathematically equivalent to multiplying three matrices, computed through two separate matrix multiplications. Existing Algorithm-based Fault Tolerance (ABFT) techniques can check the results of individual matrix multiplications. However, for a GCN layer, this check should be performed twice. To avoid this overhead, this work introduces GCN-ABFT that directly calculates a checksum for the entire three-matrix product within a single GCN layer, providing a cost-effective approach for error detection in GCN accelerators. Experimental results demonstrate that GCN-ABFT reduces the number of operations needed for checksum computation by over 21% on average for representative GCN applications. These savings are achieved without sacrificing fault-detection accuracy, as evidenced by the presented fault-injection analysis.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2024.3523425