MANAGEMENT OF FAULT CONDITION IN COMPUTING SYSTEM
Systems, apparatuses, and/or methods may manage a fault condition in a computer system. An apparatus may dynamically publish a message over a publisher-subscriber system and dynamically subscribe to amessage over the publisher-subscriber system, wherein at least one message may be used to address a...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Systems, apparatuses, and/or methods may manage a fault condition in a computer system. An apparatus may dynamically publish a message over a publisher-subscriber system and dynamically subscribe to amessage over the publisher-subscriber system, wherein at least one message may be used to address a fault condition in the computing system. The apparatus may predict a fault condition in a high performance computing (HPC) system, communicate fault information to a user, monitor health of the HPC system, respond to the fault condition in the HPC system, recover from the fault condition in the HPCsystem, maintain a rule for a fault management component, and/or communicate the fault information over the publisher- subscriber system in real-time. Messages may also be aggregated to minimize fault information traffic. The publisher-subscriber system may facilitate dynamic and/or real-time coordinated, integrated (e.g., system- wide), and/or scalable fault management.
系统、装置和/或方法可以管理计算机系统中的故障情况。装置可以通过发布者-订户系统动态地发布消息,并且通 |
---|