Stab-FD: A cooperative and adaptive failure detector for wide area networks

Failure detectors (FDs) are a fundamental abstraction that plays a central role in the design of distributed systems. FDs are distributed oracles that provide processes with unreliable information about process failures, often in the form of a list of trusted or suspected process identities. In this...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of parallel and distributed computing 2024-04, Vol.186, p.104803, Article 104803
Hauptverfasser: Sens, Pierre, Arantes, Luciana, Graciela De Moraes Rossetto, Anubis, Marin, Olivier
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Failure detectors (FDs) are a fundamental abstraction that plays a central role in the design of distributed systems. FDs are distributed oracles that provide processes with unreliable information about process failures, often in the form of a list of trusted or suspected process identities. In this article, we propose a timer-based FD which assesses the quality of its input links, and exchanges its local estimations with other nodes. Nodes use this information to adjust their timers dynamically. Capturing the variations in the quality of each link reduces the number of false suspicions without degrading failure detection time. We present experiments on a dataset of real traces collected on PlanetLab, and compare our approach to well-known state-of-the-art algorithms. Our results show that our new algorithms yield a good trade-off in terms of failure detection speed and accuracy in real scenarios.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2023.104803