Anomaly detection for key performance indicators by fusing self-supervised spatio-temporal graph attention networks

With the development of Artificial Intelligence for IT Operations (AIOps), numerous software and services are monitored by Key Performance Indicators (KPIs) collection components. Multivariate KPIs, as a type of time series data, are essential for effective management of the entity's service qu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2024-09, Vol.300, p.112167, Article 112167
Hauptverfasser: Chen, Ningjiang, Tu, Huan, Zeng, Haoyang, Ou, Yangjie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the development of Artificial Intelligence for IT Operations (AIOps), numerous software and services are monitored by Key Performance Indicators (KPIs) collection components. Multivariate KPIs, as a type of time series data, are essential for effective management of the entity's service quality. In recent years, deep learning methods have made great improvements in the anomaly detection of multivariate time series; however, existing methods have not fully considered how to explicitly capture the correlation between multivariate time series in the feature dimension and temporal dimension, resulting in inevitable abnormal false positives. Therefore, this paper proposes a self-supervised multivariate KPIs anomaly detection method MAD-STA that combines graph structure learning and spatio-temporal GAT (Graph Attention Network). In the feature dimension, MAD-STA introduces a node embedding mechanism for graph structure learning and then uses the feature-oriented GAT layer to compute the graph attention coefficient to obtain the correlation between different KPIs. In the temporal dimension, MAD-STA uses the time-oriented GAT layer to compute attention weights between correlated timestamps, and the GRU-based VAE encoder captures long-term dependence to extract more comprehensive temporal feature representations. Finally, MAD-STA uses GRU-based VAE decoder to reconstruct the captured high-level features and achieves efficient anomaly detection and localization by calculating the anomaly score of multiple KPIs. Compared with the baseline methods on multiple data sets, the experimental results show that the anomaly detection accuracy of MAD-STA is better than that of the baseline method. Especially on the KPI data sets of the two server clusters of SMD and CKM, MAD-STA improves the performance and the F1 comprehensive index compared with the best baseline method. In addition, MAD-STA performs well on anomaly false positive rate and has excellent interpretability, which can be used to assist anomaly diagnosis and root cause index analysis.
ISSN:0950-7051
DOI:10.1016/j.knosys.2024.112167