Improving the Robustness of Distributed Failure Detectors in Adverse Conditions

Failure detection is at the core of most fault tolerance strategies, but it often depends on reliable communication. We present new algorithms for failure detectors which are appropriate as components of a fault tolerance system that can be deployed in situations of adverse network conditions (such...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Revista IEEE América Latina 2012-01, Vol.10 (1), p.1364-1369
Hauptverfasser: Lemos, F. T. C., Sato, L. M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Failure detection is at the core of most fault tolerance strategies, but it often depends on reliable communication. We present new algorithms for failure detectors which are appropriate as components of a fault tolerance system that can be deployed in situations of adverse network conditions (such as loosely connected and administered computing grids). It packs redundancy into heartbeat messages, thereby improving on the robustness of the traditional protocols. Results from experimental tests conducted in a simulated environment with adverse network conditions show significant improvement over existing solutions.
ISSN:1548-0992
1548-0992
DOI:10.1109/TLA.2012.6142485