Analysis for performance and reliability of fault-tolerant parallel software
We propose a technique for constructing a fault‐tolerant parallel software for general commercial massively parallel computers which are not provided with special fault‐tolerant functions. This technique is a hybrid of the primary/backup approach and state machine approach, and can implement paralle...
Gespeichert in:
Veröffentlicht in: | Systems and computers in Japan 2000-07, Vol.31 (7), p.56-65 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose a technique for constructing a fault‐tolerant parallel software for general commercial massively parallel computers which are not provided with special fault‐tolerant functions. This technique is a hybrid of the primary/backup approach and state machine approach, and can implement parallel programs in fault tolerance by automatically converting user programs. In general, when a parallel system is to be used as a fault‐tolerant computer, since parallel entities are used as redundant elements for obtaining fault tolerance, the maximum performance will decrease concurrently with the improvement of reliability. Moreover, it is necessary to consider the performance drop for processing which is supplementary to the original program in fault‐tolerant implementation by software. Therefore, a gain by fault‐tolerant implementation cannot be shown if it is merely demonstrated that an improvement of the reliability is obtained. In this paper, we define an evaluation index which takes into account reliability improvement and performance drop; based on this index, we study the execution environment which can tolerate practical use for fault‐tolerant parallel software. © 2000 Scripta Technica Syst Comp Jpn, 31(7): 56–65, 2000 |
---|---|
ISSN: | 0882-1666 1520-684X |
DOI: | 10.1002/(SICI)1520-684X(200007)31:7<56::AID-SCJ7>3.0.CO;2-U |