RedThreads: An Interface for Application-Level Fault Detection/Correction Through Adaptive Redundant Multithreading

In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-oblivious nature of current HPC programming paradigms...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of parallel programming 2017-02, Vol.46 (2)
Hauptverfasser: Hukerikar, Saurabh, Teranishi, Keita, Diniz, Pedro C., Lucas, Robert F.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-oblivious nature of current HPC programming paradigms and execution environments, HPC applications are insufficiently equipped to deal with errors. We believe that HPC applications should be enabled with capabilities to actively search for and correct errors in their computations. The redundant multithreading (RMT) approach offers lightweight replicated execution streams of program instructions within the context of a single application process. Furthermore, the use of complete redundancy incurs significant overhead to the application performance.
ISSN:0885-7458
1573-7640