Re-utilizing partially failed resources as network resources

A method and apparatus for re-utilizing partially failed compute resources in a massively parallel super computer system. In the preferred embodiments the compute node comprises a number of clock domains that can be enabled separately. When an error in a compute node is detected, and the failure is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: VRANAS PAVLOS MICHAEL, GARA ALAN, CHEN DONG, LIEBSCH THOMAS ALAN, HEIDELBERGER PHILIP, STEINMACHER-BUROW BURKHARD
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method and apparatus for re-utilizing partially failed compute resources in a massively parallel super computer system. In the preferred embodiments the compute node comprises a number of clock domains that can be enabled separately. When an error in a compute node is detected, and the failure is not in network communication blocks, a clock enable circuit enables the clocks to the network communication blocks only to allow the partially failed compute node to be re-utilized as a network resource. The computer system can then continue to operate with only slightly diminished performance and thereby improve performance and perceived overall reliability.