MANAGEMENT OF FAULT CONDITION IN COMPUTING SYSTEM

Systems, apparatuses, and/or methods may manage a fault condition in a computer system. An apparatus may dynamically publish a message over a publisher-subscriber system and dynamically subscribe to amessage over the publisher-subscriber system, wherein at least one message may be used to address a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: DASARI ANNAPURNA
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Systems, apparatuses, and/or methods may manage a fault condition in a computer system. An apparatus may dynamically publish a message over a publisher-subscriber system and dynamically subscribe to amessage over the publisher-subscriber system, wherein at least one message may be used to address a fault condition in the computing system. The apparatus may predict a fault condition in a high performance computing (HPC) system, communicate fault information to a user, monitor health of the HPC system, respond to the fault condition in the HPC system, recover from the fault condition in the HPCsystem, maintain a rule for a fault management component, and/or communicate the fault information over the publisher- subscriber system in real-time. Messages may also be aggregated to minimize fault information traffic. The publisher-subscriber system may facilitate dynamic and/or real-time coordinated, integrated (e.g., system- wide), and/or scalable fault management. 系统、装置和/或方法可以管理计算机系统中的故障情况。装置可以通过发布者-订户系统动态地发布消息,并且通