Filtering System Metrics for Minimal Correlation-Based Self-Monitoring

Self-adaptive and self-organizing systems must be self-monitoring. Recent research has shown that self-monitoring can be enabled by using correlations between monitoring variables (metrics). However, computer systems often make a very large number of metrics available for collection. Collecting them...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Munawar, M.A., Miao Jiang, Reidemeister, T., Ward, P.A.S.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Self-adaptive and self-organizing systems must be self-monitoring. Recent research has shown that self-monitoring can be enabled by using correlations between monitoring variables (metrics). However, computer systems often make a very large number of metrics available for collection. Collecting them all not only reduces system performance, but also creates other overheads related to communication, storage, and processing. In order to control the overhead, it is necessary to limit collection to a subset of the available metrics. Manual selection of metrics requires a good understanding of system internals, which can be difficult given the size and complexity of modern computer systems. In this paper, assuming no knowledge of metric semantics or importance and no advance availability of fault data, we investigate automated methods for selecting a subset of available metrics in the context of correlation-based monitoring. Our goal is to collect fewer metrics while maintaining the ability to detect errors. We propose several metric selection methods that require no information beside correlations. We compare these methods on the basis of fault coverage. We show that our minimum spanning tree-based selection performs best, detecting on average 66% of faults detectable by full monitoring (i.e., using all considered metrics) with only 30% of the metrics.
ISSN:1949-3673
DOI:10.1109/SASO.2009.36