System Monitoring with Metric-Correlation Models

Modern software systems expose management metrics to help track their health. Recently, it was demonstrated that correlations among these metrics allow errors to be detected and their causes localized. Prior research shows that linear models can capture many of these correlations. However, our resea...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE eTransactions on network and service management 2011-12, Vol.8 (4), p.348-360
Hauptverfasser: Miao Jiang, Munawar, M. A., Reidemeister, T., Ward, P. A. S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Modern software systems expose management metrics to help track their health. Recently, it was demonstrated that correlations among these metrics allow errors to be detected and their causes localized. Prior research shows that linear models can capture many of these correlations. However, our research shows that several factors may prevent linear models from accurately describing correlations, even if the underlying relationship is linear. Common phenomena we have observed include relationships that evolve, relationships with missing variables, and heterogeneous residual variance of the correlated metrics. Usually these phenomena can be discovered by testing for heteroscedasticity of the underlying linear models. Such behaviour violates the assumptions of simple linear regression, which thus fail to describe system dynamics correctly. In this paper we address the above challenges by employing efficient variants of Ordinary Least Squares regression models. In addition, we automate the process of error detection by introducing the Wilcoxon Rank-Sum test after proper correlations modeling. We validate our models using a realistic Java-Enterprise-Edition application. Using fault-injection experiments we show that our improved models capture system behavior accurately.
ISSN:1932-4537
1932-4537
DOI:10.1109/TNSM.2011.120811.100033