Network-Wide Anomaly Event Detection and Diagnosis With perfSONAR

High-performance computing (HPC) environments supporting data-intensive applications need multidomain network performance measurements from open frameworks such as perfSONAR. Detected network-wide correlated anomaly events that impact data throughput performance need to be quickly and accurately not...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE eTransactions on network and service management 2016-09, Vol.13 (3), p.666-680
Hauptverfasser: Yuanxun Zhang, Debroy, Saptarshi, Calyam, Prasad
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:High-performance computing (HPC) environments supporting data-intensive applications need multidomain network performance measurements from open frameworks such as perfSONAR. Detected network-wide correlated anomaly events that impact data throughput performance need to be quickly and accurately notified along with a root-cause analysis for remediation. In this paper, we present a novel network anomaly events detection and diagnosis scheme for network-wide visibility that improves accuracy of root-cause analysis. We address analysis limitations in cases where there is absence of complete network topology information, and when measurement probes are mis-calibrated leading to erroneous diagnosis. Our proposed scheme fuses perfSONAR time-series path measurements data from multiple domains using principal component analysis (PCA) to transform data for accurate correlated and uncorrelated anomaly events detection. We quantify the certainty of such detection using a measurement data sanity checking that involves: 1) measurement data reputation analysis to qualify the measurement samples and 2) filter framework to prune potentially misleading samples. Lastly, using actual perfSONAR one-way delay measurement traces, we show our proposed scheme's effectiveness in diagnosing the root-cause of critical network performance anomaly events.
ISSN:1932-4537
1932-4537
DOI:10.1109/TNSM.2016.2546943