Stab-FD: A cooperative and adaptive failure detector for wide area networks
Failure detectors (FDs) are a fundamental abstraction that plays a central role in the design of distributed systems. FDs are distributed oracles that provide processes with unreliable information about process failures, often in the form of a list of trusted or suspected process identities. In this...
Gespeichert in:
Veröffentlicht in: | Journal of parallel and distributed computing 2024-04, Vol.186, p.104803, Article 104803 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Failure detectors (FDs) are a fundamental abstraction that plays a central role in the design of distributed systems. FDs are distributed oracles that provide processes with unreliable information about process failures, often in the form of a list of trusted or suspected process identities. In this article, we propose a timer-based FD which assesses the quality of its input links, and exchanges its local estimations with other nodes. Nodes use this information to adjust their timers dynamically. Capturing the variations in the quality of each link reduces the number of false suspicions without degrading failure detection time. We present experiments on a dataset of real traces collected on PlanetLab, and compare our approach to well-known state-of-the-art algorithms. Our results show that our new algorithms yield a good trade-off in terms of failure detection speed and accuracy in real scenarios. |
---|---|
ISSN: | 0743-7315 1096-0848 |
DOI: | 10.1016/j.jpdc.2023.104803 |