Fail-safe concurrency in the EcliPSe system
Local or wide‐area heterogeneous workstation clusters are relatively cheap and highly effective, though inherently unstable operating environments for long‐running distributed computations. We found this to be the case in early experiments with a prototype of the EcliPSe system, a software toolkit f...
Gespeichert in:
Veröffentlicht in: | Concurrency (Chichester, England.) England.), 1996-05, Vol.8 (4), p.283-312 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Local or wide‐area heterogeneous workstation clusters are relatively cheap and highly effective, though inherently unstable operating environments for long‐running distributed computations. We found this to be the case in early experiments with a prototype of the EcliPSe system, a software toolkit for replicative applications on heterogeneous workstation clusters. Hardware or network failures in computations that executed for over a day were not uncommon. In this work, a variety of features for the incorporation of failure resilience in the EcliPSe system are described. Key characteristics of this fault‐tolerant system are ease of use, low state‐saving cost, system scalability and good performance. We present results of some experiments demonstrating low state‐saving overheads and small system‐recovery times, as a function of the amount of state saved. |
---|---|
ISSN: | 1040-3108 1096-9128 |
DOI: | 10.1002/(SICI)1096-9128(199605)8:4<283::AID-CPE224>3.0.CO;2-# |