Timely anomaly detection of key performance indicators (KPIs), e.g., service response time, error rate, is of utmost importance to Web services. Over the years, many unsupervised deep learning-based anomaly detection… Click to show full abstract
Timely anomaly detection of key performance indicators (KPIs), e.g., service response time, error rate, is of utmost importance to Web services. Over the years, many unsupervised deep learning-based anomaly detection approaches have been proposed. To achieve good performance, they require a long period of KPI data for model training, which is not easy to guarantee with frequent service changes. Additionally, the training overhead is too significant for the vast number of KPIs in large-scale Web services. To address the problems, we propose an unsupervised KPI anomaly detection approach, named AnoTransfer, by combining a novel Variational Auto-Encoder (VAE)-based KPI clustering algorithm with an adaptive transfer learning strategy. Extensive evaluation experiments using real-world data collected from several large-scale Web service providers demonstrate that AnoTransfer reduces the average initialization time by 65.71% and improves the training efficiency by 50.62 times, without significantly degrading anomaly detection accuracy.
               
Click one of the above tabs to view related content.