Optimal Checkpoint Interval with Availability as an Objective Function
We present a simplified derivation of the optimal checkpoint interval in Young_1974 [1]. The optimal checkpoint interval derivation in [1] is based on minimizing the total lost time as an objective-function. Lost time is a function of checkpoint interval, checkpoint save time, and average failure ti...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a simplified derivation of the optimal checkpoint interval in
Young_1974 [1]. The optimal checkpoint interval derivation in [1] is based on
minimizing the total lost time as an objective-function. Lost time is a
function of checkpoint interval, checkpoint save time, and average failure
time. This simplified derivation yields lost-time-optimal that is identical to
the one derived in [1]. For large scale-out super-computer or datacenter
systems, what is important is the selection of optimal checkpoint interval that
maximizes availability. We show that availability-optimal checkpoint interval
is different from the one derived in [1]. However, availability-optimal
checkpoint interval is asymptotically same as lost-time-optimal checkpoint
interval for certain conditions on checkpoint save and recovery time. We show
that these optimal checkpoint intervals hold in situations where the error
detection latency is significantly smaller than any selected checkpoint
interval. However, in cases where the error detection latency is very large
then the optimal checkpoint interval is greater than or equal to the error
detection latency. |
---|---|
DOI: | 10.48550/arxiv.2410.18124 |