Robust Failure Diagnosis of Microservice System through Multimodal Data
Automatic failure diagnosis is crucial for large microservice systems. Currently, most failure diagnosis methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study using real-world failure cases to show that combining these so...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Zhang, Shenglin Jin, Pengxiang Lin, Zihan Sun, Yongqian Zhang, Bicheng Xia, Sibo Li, Zhengdan Zhong, Zhenyu Ma, Minghua Jin, Wa Zhang, Dai Zhu, Zhenyu Pei, Dan |
description | Automatic failure diagnosis is crucial for large microservice systems.
Currently, most failure diagnosis methods rely solely on single-modal data
(i.e., using either metrics, logs, or traces). In this study, we conduct an
empirical study using real-world failure cases to show that combining these
sources of data (multimodal data) leads to a more accurate diagnosis. However,
effectively representing these data and addressing imbalanced failures remain
challenging. To tackle these issues, we propose DiagFusion, a robust failure
diagnosis approach that uses multimodal data. It leverages embedding techniques
and data augmentation to represent the multimodal data of service instances,
combines deployment data and traces to build a dependency graph, and uses a
graph neural network to localize the root cause instance and determine the
failure type. Our evaluations using real-world datasets show that DiagFusion
outperforms existing methods in terms of root cause instance localization
(improving by 20.9% to 368%) and failure type determination (improving by 11.0%
to 169%). |
doi_str_mv | 10.48550/arxiv.2302.10512 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2302_10512</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2302_10512</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-7884b1f00ea07af787e085a439013a85c52a1b9d132c790ca4d13020e95574e13</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKhwAUz4BhKO_7AzopYWpFZI0D06cU9aSwlGtlPRuwdKp--dPulh7E5ArZ0x8IDpOxxrqUDWAoyQ12z1HrspF77EMEyJ-CLg_jPmkHns-Sb4FDOlY_DEP0650MjLIcVpf-CbaShhjDsc-AIL3rCrHodMt5edse3yeTt_qdZvq9f507rCRysr65zuRA9ACBZ76yyBM6hVA0KhM95IFF2zE0p624BH_ZsggRpjrCahZuz-__Ysab9SGDGd2j9RexapHzC5RWw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust Failure Diagnosis of Microservice System through Multimodal Data</title><source>arXiv.org</source><creator>Zhang, Shenglin ; Jin, Pengxiang ; Lin, Zihan ; Sun, Yongqian ; Zhang, Bicheng ; Xia, Sibo ; Li, Zhengdan ; Zhong, Zhenyu ; Ma, Minghua ; Jin, Wa ; Zhang, Dai ; Zhu, Zhenyu ; Pei, Dan</creator><creatorcontrib>Zhang, Shenglin ; Jin, Pengxiang ; Lin, Zihan ; Sun, Yongqian ; Zhang, Bicheng ; Xia, Sibo ; Li, Zhengdan ; Zhong, Zhenyu ; Ma, Minghua ; Jin, Wa ; Zhang, Dai ; Zhu, Zhenyu ; Pei, Dan</creatorcontrib><description>Automatic failure diagnosis is crucial for large microservice systems.
Currently, most failure diagnosis methods rely solely on single-modal data
(i.e., using either metrics, logs, or traces). In this study, we conduct an
empirical study using real-world failure cases to show that combining these
sources of data (multimodal data) leads to a more accurate diagnosis. However,
effectively representing these data and addressing imbalanced failures remain
challenging. To tackle these issues, we propose DiagFusion, a robust failure
diagnosis approach that uses multimodal data. It leverages embedding techniques
and data augmentation to represent the multimodal data of service instances,
combines deployment data and traces to build a dependency graph, and uses a
graph neural network to localize the root cause instance and determine the
failure type. Our evaluations using real-world datasets show that DiagFusion
outperforms existing methods in terms of root cause instance localization
(improving by 20.9% to 368%) and failure type determination (improving by 11.0%
to 169%).</description><identifier>DOI: 10.48550/arxiv.2302.10512</identifier><language>eng</language><subject>Computer Science - Software Engineering</subject><creationdate>2023-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2302.10512$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2302.10512$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Shenglin</creatorcontrib><creatorcontrib>Jin, Pengxiang</creatorcontrib><creatorcontrib>Lin, Zihan</creatorcontrib><creatorcontrib>Sun, Yongqian</creatorcontrib><creatorcontrib>Zhang, Bicheng</creatorcontrib><creatorcontrib>Xia, Sibo</creatorcontrib><creatorcontrib>Li, Zhengdan</creatorcontrib><creatorcontrib>Zhong, Zhenyu</creatorcontrib><creatorcontrib>Ma, Minghua</creatorcontrib><creatorcontrib>Jin, Wa</creatorcontrib><creatorcontrib>Zhang, Dai</creatorcontrib><creatorcontrib>Zhu, Zhenyu</creatorcontrib><creatorcontrib>Pei, Dan</creatorcontrib><title>Robust Failure Diagnosis of Microservice System through Multimodal Data</title><description>Automatic failure diagnosis is crucial for large microservice systems.
Currently, most failure diagnosis methods rely solely on single-modal data
(i.e., using either metrics, logs, or traces). In this study, we conduct an
empirical study using real-world failure cases to show that combining these
sources of data (multimodal data) leads to a more accurate diagnosis. However,
effectively representing these data and addressing imbalanced failures remain
challenging. To tackle these issues, we propose DiagFusion, a robust failure
diagnosis approach that uses multimodal data. It leverages embedding techniques
and data augmentation to represent the multimodal data of service instances,
combines deployment data and traces to build a dependency graph, and uses a
graph neural network to localize the root cause instance and determine the
failure type. Our evaluations using real-world datasets show that DiagFusion
outperforms existing methods in terms of root cause instance localization
(improving by 20.9% to 368%) and failure type determination (improving by 11.0%
to 169%).</description><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKhwAUz4BhKO_7AzopYWpFZI0D06cU9aSwlGtlPRuwdKp--dPulh7E5ArZ0x8IDpOxxrqUDWAoyQ12z1HrspF77EMEyJ-CLg_jPmkHns-Sb4FDOlY_DEP0650MjLIcVpf-CbaShhjDsc-AIL3rCrHodMt5edse3yeTt_qdZvq9f507rCRysr65zuRA9ACBZ76yyBM6hVA0KhM95IFF2zE0p624BH_ZsggRpjrCahZuz-__Ysab9SGDGd2j9RexapHzC5RWw</recordid><startdate>20230221</startdate><enddate>20230221</enddate><creator>Zhang, Shenglin</creator><creator>Jin, Pengxiang</creator><creator>Lin, Zihan</creator><creator>Sun, Yongqian</creator><creator>Zhang, Bicheng</creator><creator>Xia, Sibo</creator><creator>Li, Zhengdan</creator><creator>Zhong, Zhenyu</creator><creator>Ma, Minghua</creator><creator>Jin, Wa</creator><creator>Zhang, Dai</creator><creator>Zhu, Zhenyu</creator><creator>Pei, Dan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230221</creationdate><title>Robust Failure Diagnosis of Microservice System through Multimodal Data</title><author>Zhang, Shenglin ; Jin, Pengxiang ; Lin, Zihan ; Sun, Yongqian ; Zhang, Bicheng ; Xia, Sibo ; Li, Zhengdan ; Zhong, Zhenyu ; Ma, Minghua ; Jin, Wa ; Zhang, Dai ; Zhu, Zhenyu ; Pei, Dan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-7884b1f00ea07af787e085a439013a85c52a1b9d132c790ca4d13020e95574e13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Shenglin</creatorcontrib><creatorcontrib>Jin, Pengxiang</creatorcontrib><creatorcontrib>Lin, Zihan</creatorcontrib><creatorcontrib>Sun, Yongqian</creatorcontrib><creatorcontrib>Zhang, Bicheng</creatorcontrib><creatorcontrib>Xia, Sibo</creatorcontrib><creatorcontrib>Li, Zhengdan</creatorcontrib><creatorcontrib>Zhong, Zhenyu</creatorcontrib><creatorcontrib>Ma, Minghua</creatorcontrib><creatorcontrib>Jin, Wa</creatorcontrib><creatorcontrib>Zhang, Dai</creatorcontrib><creatorcontrib>Zhu, Zhenyu</creatorcontrib><creatorcontrib>Pei, Dan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Shenglin</au><au>Jin, Pengxiang</au><au>Lin, Zihan</au><au>Sun, Yongqian</au><au>Zhang, Bicheng</au><au>Xia, Sibo</au><au>Li, Zhengdan</au><au>Zhong, Zhenyu</au><au>Ma, Minghua</au><au>Jin, Wa</au><au>Zhang, Dai</au><au>Zhu, Zhenyu</au><au>Pei, Dan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Failure Diagnosis of Microservice System through Multimodal Data</atitle><date>2023-02-21</date><risdate>2023</risdate><abstract>Automatic failure diagnosis is crucial for large microservice systems.
Currently, most failure diagnosis methods rely solely on single-modal data
(i.e., using either metrics, logs, or traces). In this study, we conduct an
empirical study using real-world failure cases to show that combining these
sources of data (multimodal data) leads to a more accurate diagnosis. However,
effectively representing these data and addressing imbalanced failures remain
challenging. To tackle these issues, we propose DiagFusion, a robust failure
diagnosis approach that uses multimodal data. It leverages embedding techniques
and data augmentation to represent the multimodal data of service instances,
combines deployment data and traces to build a dependency graph, and uses a
graph neural network to localize the root cause instance and determine the
failure type. Our evaluations using real-world datasets show that DiagFusion
outperforms existing methods in terms of root cause instance localization
(improving by 20.9% to 368%) and failure type determination (improving by 11.0%
to 169%).</abstract><doi>10.48550/arxiv.2302.10512</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2302.10512 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2302_10512 |
source | arXiv.org |
subjects | Computer Science - Software Engineering |
title | Robust Failure Diagnosis of Microservice System through Multimodal Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T13%3A58%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Failure%20Diagnosis%20of%20Microservice%20System%20through%20Multimodal%20Data&rft.au=Zhang,%20Shenglin&rft.date=2023-02-21&rft_id=info:doi/10.48550/arxiv.2302.10512&rft_dat=%3Carxiv_GOX%3E2302_10512%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |