Dependency-aware fault diagnosis with metric-correlation models in enterprise software systems

The normal operation of enterprise software systems can be modeled by stable correlations between various system metrics; errors are detected when some of these correlations fail to hold. The typical approach to diagnosis (i.e., pinpoint the faulty component) based on the correlation models is to us...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Miao Jiang, Munawar, M A, Reidemeister, T, Ward, P A S
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The normal operation of enterprise software systems can be modeled by stable correlations between various system metrics; errors are detected when some of these correlations fail to hold. The typical approach to diagnosis (i.e., pinpoint the faulty component) based on the correlation models is to use the Jaccard coefficient or some variant thereof, without reference to system structure, dependency data, or prior fault data. In this paper we demonstrate the intrinsic limitations of this approach, and propose a solution that mitigates these limitations. We assume knowledge of dependencies between components in the system, and take this information into account when analyzing the correlation models. We also propose the use of the Tanimoto coefficient instead of the Jaccard coefficient to assign anomaly scores to components. We evaluate our new algorithm with a Trade6-based test-bed. We show that we can find the faulty components within top-3 components with the highest anomaly score in four out of nine cases, while the prior method can only find one.
ISSN:2165-9605
DOI:10.1109/CNSM.2010.5691319