A Delayed Checkpoint Approach for Communication-Induced Checkpointing in Autonomic Computing
Although the initiative of Autonomic Computing was introduced a dozen years ago, several challenges remain open. One of these challenges is the efficient monitoring at runtime oriented to the detection, diagnosis, and repair of problems that result from failures or bugs in software and/or hardware c...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Although the initiative of Autonomic Computing was introduced a dozen years ago, several challenges remain open. One of these challenges is the efficient monitoring at runtime oriented to the detection, diagnosis, and repair of problems that result from failures or bugs in software and/or hardware components. For this purpose, Communication-induced Checkpointing (CIC) can be a useful tool. Communication-induced Checkpointing has been used to attack a wide range of problems that arise in distributed systems, such as rollback recovery, software debugging and software verification, among others. In CIC algorithms, an autonomic component (process) asynchronously cooperates by exchanging information on the application messages about saved local states called checkpoints. CIC aims to form global consistent snapshots by grouping checkpoints (one by each component) in a non-coordinated way. To achieve this, CIC solutions continuously monitor the exchanged control information to identify possible dangerous checkpointing patterns. When a dangerous pattern is identified, it is broken by locally triggering a forced checkpoint. Nevertheless, as we will show, not all forced checkpoints triggered by current solutions are necessary. In this paper, we present a delayed checkpoint approach suitable for autonomic computing that reduces forced checkpoints by establishing certain triggering rules that we call safe checkpoint conditions. Finally, some results are presented which show that our proposal is more efficient than other current solutions. |
---|---|
ISSN: | 1524-4547 2641-8169 |
DOI: | 10.1109/WETICE.2013.15 |