Microservice architectures bring many benefits, e.g., faster delivery, improved scalability, and greater autonomy, so they are widely adopted to develop and operate Internet-based applications. How to effectively diagnose the faults… Click to show full abstract
Microservice architectures bring many benefits, e.g., faster delivery, improved scalability, and greater autonomy, so they are widely adopted to develop and operate Internet-based applications. How to effectively diagnose the faults of applications with lots of dynamic microservices has become a key to guarantee applications’ performance and reliability. As a microservice performs various behaviors in different workflows of processing requests, existing approaches often cannot accurately locate the root cause of an application with interactive microservices in a dynamic deployment environment. We propose a workflow-aware automatic fault diagnosis approach for microservice-based applications with statistics. We characterize traces across microservices with calling trees, and then learn trace patterns as baselines. For the faults affecting the workflows of processing requests, we estimate the workflows’ anomaly degrees, and then locate the microservices causing anomalies by comparing the difference between current traces and learned baselines with tree edit distance. For performance anomalies causing significantly increased response time, we employ principal component analysis to extract suspicious microservices with large fluctuation in response time. Finally, we evaluate our approach on three typical microservice-based applications with a series of experiments. The results show that our approach can accurately locate the microservices causing anomalies.
               
Click one of the above tabs to view related content.