Topology-Aware Correlated Network Anomaly Event Detection and Diagnosis

For purposes such as end-to-end monitoring, capacity planning, and performance bottleneck troubleshooting across multi-domain networks, there is an increasing trend to deploy interoperable measurement frameworks such as perfSONAR. These deployments expose vast data archives of current and historic m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of network and systems management 2014-04, Vol.22 (2), p.208-234
Hauptverfasser: Calyam, Prasad, Dhanapalan, Manojprasadh, Sridharan, Mukundan, Krishnamurthy, Ashok, Ramnath, Rajiv
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:For purposes such as end-to-end monitoring, capacity planning, and performance bottleneck troubleshooting across multi-domain networks, there is an increasing trend to deploy interoperable measurement frameworks such as perfSONAR. These deployments expose vast data archives of current and historic measurements, which can be queried using web services. Analysis of these measurements using effective schemes to detect and diagnose anomaly events is vital since it allows for verifying if network behavior meets expectations. In addition, it allows for proactive notification of bottlenecks that may be affecting a large number of users. In this paper, we describe our novel topology-aware scheme that can be integrated into perfSONAR deployments for detection and diagnosis of network-wide correlated anomaly events. Our scheme involves spatial and temporal analyses on combined topology and uncorrelated anomaly events information for detection of correlated anomaly events. Subsequently, a set of ‘filters’ are applied on the detected events to prioritize them based on potential severity, and to drill-down upon the events “nature” (e.g., event burstiness) and “root-location(s)” (e.g., edge or core location affinity). To validate our scheme, we use traceroute information and one-way delay measurements collected over 3 months between the various U.S. Department of Energy national lab network locations, published via perfSONAR web services. Further, using real-world case studies, we show how our scheme can provide helpful insights for detection, visualization and diagnosis of correlated network anomaly events, and can ultimately save time, effort, and costs spent on network management.
ISSN:1064-7570
1573-7705
DOI:10.1007/s10922-013-9286-0