Analysis for performance and reliability of fault-tolerant parallel software

We propose a technique for constructing a fault‐tolerant parallel software for general commercial massively parallel computers which are not provided with special fault‐tolerant functions. This technique is a hybrid of the primary/backup approach and state machine approach, and can implement paralle...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Systems and computers in Japan 2000-07, Vol.31 (7), p.56-65
Hauptverfasser: Sugino, Eiji, Yokota, Haruo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose a technique for constructing a fault‐tolerant parallel software for general commercial massively parallel computers which are not provided with special fault‐tolerant functions. This technique is a hybrid of the primary/backup approach and state machine approach, and can implement parallel programs in fault tolerance by automatically converting user programs. In general, when a parallel system is to be used as a fault‐tolerant computer, since parallel entities are used as redundant elements for obtaining fault tolerance, the maximum performance will decrease concurrently with the improvement of reliability. Moreover, it is necessary to consider the performance drop for processing which is supplementary to the original program in fault‐tolerant implementation by software. Therefore, a gain by fault‐tolerant implementation cannot be shown if it is merely demonstrated that an improvement of the reliability is obtained. In this paper, we define an evaluation index which takes into account reliability improvement and performance drop; based on this index, we study the execution environment which can tolerate practical use for fault‐tolerant parallel software. © 2000 Scripta Technica Syst Comp Jpn, 31(7): 56–65, 2000
ISSN:0882-1666
1520-684X
DOI:10.1002/(SICI)1520-684X(200007)31:7<56::AID-SCJ7>3.0.CO;2-U