FAIL-MPI: How Fault-Tolerant Is Fault-Tolerant MPI?

One of the topics of paramount importance in the development of cluster and grid middleware is the impact of faults since their occurrence in grid infrastructures and in large-scale distributed systems is common. MPI (message passing interface) is a popular abstraction for programming distributed an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hoarau, W., Lemarinier, P., Herault, T., Rodriguez, E., Tixeuil, S., Cappello, F.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:One of the topics of paramount importance in the development of cluster and grid middleware is the impact of faults since their occurrence in grid infrastructures and in large-scale distributed systems is common. MPI (message passing interface) is a popular abstraction for programming distributed and parallel applications. FAIL (FAult Injection Language) is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investigate the possibility of using FAIL to inject faults in a fault-tolerant MPI implementation. Our middleware, FAIL-MPI, is used to carry quantitative and qualitative faults and stress testing
ISSN:1552-5244
2168-9253
DOI:10.1109/CLUSTR.2006.311851