Implementation and evaluation of a highly reliable server with QRM
For the purpose of providing higher reliability to multiple open‐server computers in a distributed processing environment without changing platforms and application programs, we have developed QRM (Quick Rollback Module), an add‐on module used to provide higher reliability based on checkpoints and t...
Gespeichert in:
Veröffentlicht in: | Systems and computers in Japan 1998-04, Vol.29 (4), p.11-21 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | For the purpose of providing higher reliability to multiple open‐server computers in a distributed processing environment without changing platforms and application programs, we have developed QRM (Quick Rollback Module), an add‐on module used to provide higher reliability based on checkpoints and the rollback mechanism. By adding QRM, computers recover from most transient hardware failures of nearby processors, from some permanent processor failures in multiprocessor computers, and from most panics caused by operating system hugs. QRM consists of hardware implemented as an attachment board and software implemented as loadable modules. With QRM installed, computers recover from failures within a few seconds; however, there exists a slight overhead in normal processing. Though the overhead changes according to the programs executed, the overhead in TPC‐C bechmark is 10.9%. In this paper, we describe the implementation and the results of a performance evaluation of QRM. © 1998 Scripta Technica, Syst Comp Jpn, 29(4): 11–21, 1998 |
---|---|
ISSN: | 0882-1666 1520-684X |
DOI: | 10.1002/(SICI)1520-684X(199804)29:4<11::AID-SCJ2>3.0.CO;2-R |