Robust Failure Diagnosis of Microservice System through Multimodal Data
Automatic failure diagnosis is crucial for large microservice systems. Currently, most failure diagnosis methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study using real-world failure cases to show that combining these so...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatic failure diagnosis is crucial for large microservice systems.
Currently, most failure diagnosis methods rely solely on single-modal data
(i.e., using either metrics, logs, or traces). In this study, we conduct an
empirical study using real-world failure cases to show that combining these
sources of data (multimodal data) leads to a more accurate diagnosis. However,
effectively representing these data and addressing imbalanced failures remain
challenging. To tackle these issues, we propose DiagFusion, a robust failure
diagnosis approach that uses multimodal data. It leverages embedding techniques
and data augmentation to represent the multimodal data of service instances,
combines deployment data and traces to build a dependency graph, and uses a
graph neural network to localize the root cause instance and determine the
failure type. Our evaluations using real-world datasets show that DiagFusion
outperforms existing methods in terms of root cause instance localization
(improving by 20.9% to 368%) and failure type determination (improving by 11.0%
to 169%). |
---|---|
DOI: | 10.48550/arxiv.2302.10512 |