A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems

Communication-induced checkpointing (CIC) has two main advantages: first, it allows processes in a distributed computation to take asynchronous checkpoints, and secondly, it avoids the domino effect. To achieve these, CIC algorithms piggyback information on the application messages and take forced l...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEICE Transactions on Information and Systems 2013/04/01, Vol.E96.D(4), pp.886-896
Hauptverfasser:	SIMON, Alberto CALIXTO, HERNANDEZ, Saul E. POMARES, CRUZ, Jose Roberto PEREZ, GOMEZ-GIL, Pilar, DRIRA, Khalil
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences communication-induced checkpointing Computer science control theory systems Computer systems and distributed systems. User interface Distributed computer systems distributed systems Electronics Exact sciences and technology Hardware immediate dependency relation Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Communication-induced checkpointing (CIC) has two main advantages: first, it allows processes in a distributed computation to take asynchronous checkpoints, and secondly, it avoids the domino effect. To achieve these, CIC algorithms piggyback information on the application messages and take forced local checkpoints when they recognize potentially dangerous patterns. The main disadvantages of CIC algorithms are the amount of overhead per message and the induced storage overhead. In this paper we present a communication-induced checkpointing algorithm called Scalable Fully-Informed (S-FI) that attacks the problem of message overhead. For this, our algorithm modifies the Fully-Informed algorithm by integrating it with the immediate dependency principle. The S-FI algorithm was simulated and the result shows that the algorithm is scalable since the message overhead presents an under-linear growth as the number of processes and/or the message density increase.
ISSN:	0916-8532 1745-1361
DOI:	10.1587/transinf.E96.D.886