Root Cause Localization for Microservice Systems in Cloud-edge Collaborative Environments
With the development of cloud-native technologies, microservice-based software systems face challenges in accurately localizing root causes when failures occur. Additionally, the cloud-edge collaborative environment introduces more difficulties, such as unstable networks and high latency across netw...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the development of cloud-native technologies, microservice-based
software systems face challenges in accurately localizing root causes when
failures occur. Additionally, the cloud-edge collaborative environment
introduces more difficulties, such as unstable networks and high latency across
network segments. Accurately identifying the root cause of microservices in a
cloud-edge collaborative environment has thus become an urgent problem. In this
paper, we propose MicroCERCL, a novel approach that pinpoints root causes at
the kernel and application level in the cloud-edge collaborative environment.
Our key insight is that failures propagate through direct invocations and
indirect resource-competition dependencies in a cloud-edge collaborative
environment characterized by instability and high latency. This will become
more complex in the hybrid deployment that simultaneously involves multiple
microservice systems. Leveraging this insight, we extract valid contents from
kernel-level logs to prioritize localizing the kernel-level root cause.
Moreover, we construct a heterogeneous dynamic topology stack and train a graph
neural network model to accurately localize the application-level root cause
without relying on historical data. Notably, we released the first benchmark
hybrid deployment microservice system in a cloud-edge collaborative environment
(the largest and most complex within our knowledge). Experiments conducted on
the dataset collected from the benchmark show that MicroCERCL can accurately
localize the root cause of microservice systems in such environments,
significantly outperforming state-of-the-art approaches with an increase of at
least 24.1% in top-1 accuracy. |
---|---|
DOI: | 10.48550/arxiv.2406.13604 |