Design for fault-tolerance in system ES model 900

The authors present the design for fault-tolerance in the IBM ES/9000 Model 900 high-end commercial processor. The design exploits circuit level concurrent-error detection, fault-identification, and reconfiguration with system level techniques when multiple functional resources are available. It pro...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Spainhower, L., Isenberg, J., Chillarege, R., Berding, J.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The authors present the design for fault-tolerance in the IBM ES/9000 Model 900 high-end commercial processor. The design exploits circuit level concurrent-error detection, fault-identification, and reconfiguration with system level techniques when multiple functional resources are available. It provides true graceful degradation during central processor or channel reconfiguration and repair. The authors discuss the design point for this processor and the trade-offs involved; show the error detection and online repair process of a central processor with the work recovered on an alternate central processor, transparent to the application; describe dynamic path selection and the hot-pluggable channels; and illustrate the fault-tolerance techniques used in the level 1 cache and the central store.< >
DOI:10.1109/FTCS.1992.243617