A Delayed Checkpoint Approach for Communication-Induced Checkpointing in Autonomic Computing

Although the initiative of Autonomic Computing was introduced a dozen years ago, several challenges remain open. One of these challenges is the efficient monitoring at runtime oriented to the detection, diagnosis, and repair of problems that result from failures or bugs in software and/or hardware c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Calixto Simon, Alberto Calixto, Hernandez, Saul E. Pomares, Perez Cruz, Jose Roberto
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Although the initiative of Autonomic Computing was introduced a dozen years ago, several challenges remain open. One of these challenges is the efficient monitoring at runtime oriented to the detection, diagnosis, and repair of problems that result from failures or bugs in software and/or hardware components. For this purpose, Communication-induced Checkpointing (CIC) can be a useful tool. Communication-induced Checkpointing has been used to attack a wide range of problems that arise in distributed systems, such as rollback recovery, software debugging and software verification, among others. In CIC algorithms, an autonomic component (process) asynchronously cooperates by exchanging information on the application messages about saved local states called checkpoints. CIC aims to form global consistent snapshots by grouping checkpoints (one by each component) in a non-coordinated way. To achieve this, CIC solutions continuously monitor the exchanged control information to identify possible dangerous checkpointing patterns. When a dangerous pattern is identified, it is broken by locally triggering a forced checkpoint. Nevertheless, as we will show, not all forced checkpoints triggered by current solutions are necessary. In this paper, we present a delayed checkpoint approach suitable for autonomic computing that reduces forced checkpoints by establishing certain triggering rules that we call safe checkpoint conditions. Finally, some results are presented which show that our proposal is more efficient than other current solutions.
ISSN:1524-4547
2641-8169
DOI:10.1109/WETICE.2013.15