Efficient KPI Anomaly Detection Through Transfer Learning for Large-Scale Web Services

Timely anomaly detection of key performance indicators (KPIs), e.g. , service response time, error rate, is of utmost importance to Web services. Over the years, many unsupervised deep learning-based anomaly detection approaches have been proposed. To achieve good performance, they require a long pe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal on selected areas in communications 2022-08, Vol.40 (8), p.2440-2455
Hauptverfasser: Zhang, Shenglin, Zhong, Zhenyu, Li, Dongwen, Fan, Qiliang, Sun, Yongqian, Zhu, Man, Zhang, Yuzhi, Pei, Dan, Sun, Jiyan, Liu, Yinlong, Yang, Hui, Zou, Yongqiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Timely anomaly detection of key performance indicators (KPIs), e.g. , service response time, error rate, is of utmost importance to Web services. Over the years, many unsupervised deep learning-based anomaly detection approaches have been proposed. To achieve good performance, they require a long period of KPI data for model training, which is not easy to guarantee with frequent service changes. Additionally, the training overhead is too significant for the vast number of KPIs in large-scale Web services. To address the problems, we propose an unsupervised KPI anomaly detection approach, named AnoTransfer , by combining a novel Variational Auto-Encoder (VAE)-based KPI clustering algorithm with an adaptive transfer learning strategy. Extensive evaluation experiments using real-world data collected from several large-scale Web service providers demonstrate that AnoTransfer reduces the average initialization time by 65.71% and improves the training efficiency by 50.62 times, without significantly degrading anomaly detection accuracy.
ISSN:0733-8716
1558-0008
DOI:10.1109/JSAC.2022.3180785