Robust Failure Diagnosis of Microservice System through Multimodal Data

Automatic failure diagnosis is crucial for large microservice systems. Currently, most failure diagnosis methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study using real-world failure cases to show that combining these so...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhang, Shenglin, Jin, Pengxiang, Lin, Zihan, Sun, Yongqian, Zhang, Bicheng, Xia, Sibo, Li, Zhengdan, Zhong, Zhenyu, Ma, Minghua, Jin, Wa, Zhang, Dai, Zhu, Zhenyu, Pei, Dan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhang, Shenglin
Jin, Pengxiang
Lin, Zihan
Sun, Yongqian
Zhang, Bicheng
Xia, Sibo
Li, Zhengdan
Zhong, Zhenyu
Ma, Minghua
Jin, Wa
Zhang, Dai
Zhu, Zhenyu
Pei, Dan
description Automatic failure diagnosis is crucial for large microservice systems. Currently, most failure diagnosis methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study using real-world failure cases to show that combining these sources of data (multimodal data) leads to a more accurate diagnosis. However, effectively representing these data and addressing imbalanced failures remain challenging. To tackle these issues, we propose DiagFusion, a robust failure diagnosis approach that uses multimodal data. It leverages embedding techniques and data augmentation to represent the multimodal data of service instances, combines deployment data and traces to build a dependency graph, and uses a graph neural network to localize the root cause instance and determine the failure type. Our evaluations using real-world datasets show that DiagFusion outperforms existing methods in terms of root cause instance localization (improving by 20.9% to 368%) and failure type determination (improving by 11.0% to 169%).
doi_str_mv 10.48550/arxiv.2302.10512
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2302_10512</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2302_10512</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-7884b1f00ea07af787e085a439013a85c52a1b9d132c790ca4d13020e95574e13</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKhwAUz4BhKO_7AzopYWpFZI0D06cU9aSwlGtlPRuwdKp--dPulh7E5ArZ0x8IDpOxxrqUDWAoyQ12z1HrspF77EMEyJ-CLg_jPmkHns-Sb4FDOlY_DEP0650MjLIcVpf-CbaShhjDsc-AIL3rCrHodMt5edse3yeTt_qdZvq9f507rCRysr65zuRA9ACBZ76yyBM6hVA0KhM95IFF2zE0p624BH_ZsggRpjrCahZuz-__Ysab9SGDGd2j9RexapHzC5RWw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust Failure Diagnosis of Microservice System through Multimodal Data</title><source>arXiv.org</source><creator>Zhang, Shenglin ; Jin, Pengxiang ; Lin, Zihan ; Sun, Yongqian ; Zhang, Bicheng ; Xia, Sibo ; Li, Zhengdan ; Zhong, Zhenyu ; Ma, Minghua ; Jin, Wa ; Zhang, Dai ; Zhu, Zhenyu ; Pei, Dan</creator><creatorcontrib>Zhang, Shenglin ; Jin, Pengxiang ; Lin, Zihan ; Sun, Yongqian ; Zhang, Bicheng ; Xia, Sibo ; Li, Zhengdan ; Zhong, Zhenyu ; Ma, Minghua ; Jin, Wa ; Zhang, Dai ; Zhu, Zhenyu ; Pei, Dan</creatorcontrib><description>Automatic failure diagnosis is crucial for large microservice systems. Currently, most failure diagnosis methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study using real-world failure cases to show that combining these sources of data (multimodal data) leads to a more accurate diagnosis. However, effectively representing these data and addressing imbalanced failures remain challenging. To tackle these issues, we propose DiagFusion, a robust failure diagnosis approach that uses multimodal data. It leverages embedding techniques and data augmentation to represent the multimodal data of service instances, combines deployment data and traces to build a dependency graph, and uses a graph neural network to localize the root cause instance and determine the failure type. Our evaluations using real-world datasets show that DiagFusion outperforms existing methods in terms of root cause instance localization (improving by 20.9% to 368%) and failure type determination (improving by 11.0% to 169%).</description><identifier>DOI: 10.48550/arxiv.2302.10512</identifier><language>eng</language><subject>Computer Science - Software Engineering</subject><creationdate>2023-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2302.10512$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2302.10512$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Shenglin</creatorcontrib><creatorcontrib>Jin, Pengxiang</creatorcontrib><creatorcontrib>Lin, Zihan</creatorcontrib><creatorcontrib>Sun, Yongqian</creatorcontrib><creatorcontrib>Zhang, Bicheng</creatorcontrib><creatorcontrib>Xia, Sibo</creatorcontrib><creatorcontrib>Li, Zhengdan</creatorcontrib><creatorcontrib>Zhong, Zhenyu</creatorcontrib><creatorcontrib>Ma, Minghua</creatorcontrib><creatorcontrib>Jin, Wa</creatorcontrib><creatorcontrib>Zhang, Dai</creatorcontrib><creatorcontrib>Zhu, Zhenyu</creatorcontrib><creatorcontrib>Pei, Dan</creatorcontrib><title>Robust Failure Diagnosis of Microservice System through Multimodal Data</title><description>Automatic failure diagnosis is crucial for large microservice systems. Currently, most failure diagnosis methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study using real-world failure cases to show that combining these sources of data (multimodal data) leads to a more accurate diagnosis. However, effectively representing these data and addressing imbalanced failures remain challenging. To tackle these issues, we propose DiagFusion, a robust failure diagnosis approach that uses multimodal data. It leverages embedding techniques and data augmentation to represent the multimodal data of service instances, combines deployment data and traces to build a dependency graph, and uses a graph neural network to localize the root cause instance and determine the failure type. Our evaluations using real-world datasets show that DiagFusion outperforms existing methods in terms of root cause instance localization (improving by 20.9% to 368%) and failure type determination (improving by 11.0% to 169%).</description><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKhwAUz4BhKO_7AzopYWpFZI0D06cU9aSwlGtlPRuwdKp--dPulh7E5ArZ0x8IDpOxxrqUDWAoyQ12z1HrspF77EMEyJ-CLg_jPmkHns-Sb4FDOlY_DEP0650MjLIcVpf-CbaShhjDsc-AIL3rCrHodMt5edse3yeTt_qdZvq9f507rCRysr65zuRA9ACBZ76yyBM6hVA0KhM95IFF2zE0p624BH_ZsggRpjrCahZuz-__Ysab9SGDGd2j9RexapHzC5RWw</recordid><startdate>20230221</startdate><enddate>20230221</enddate><creator>Zhang, Shenglin</creator><creator>Jin, Pengxiang</creator><creator>Lin, Zihan</creator><creator>Sun, Yongqian</creator><creator>Zhang, Bicheng</creator><creator>Xia, Sibo</creator><creator>Li, Zhengdan</creator><creator>Zhong, Zhenyu</creator><creator>Ma, Minghua</creator><creator>Jin, Wa</creator><creator>Zhang, Dai</creator><creator>Zhu, Zhenyu</creator><creator>Pei, Dan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230221</creationdate><title>Robust Failure Diagnosis of Microservice System through Multimodal Data</title><author>Zhang, Shenglin ; Jin, Pengxiang ; Lin, Zihan ; Sun, Yongqian ; Zhang, Bicheng ; Xia, Sibo ; Li, Zhengdan ; Zhong, Zhenyu ; Ma, Minghua ; Jin, Wa ; Zhang, Dai ; Zhu, Zhenyu ; Pei, Dan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-7884b1f00ea07af787e085a439013a85c52a1b9d132c790ca4d13020e95574e13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Shenglin</creatorcontrib><creatorcontrib>Jin, Pengxiang</creatorcontrib><creatorcontrib>Lin, Zihan</creatorcontrib><creatorcontrib>Sun, Yongqian</creatorcontrib><creatorcontrib>Zhang, Bicheng</creatorcontrib><creatorcontrib>Xia, Sibo</creatorcontrib><creatorcontrib>Li, Zhengdan</creatorcontrib><creatorcontrib>Zhong, Zhenyu</creatorcontrib><creatorcontrib>Ma, Minghua</creatorcontrib><creatorcontrib>Jin, Wa</creatorcontrib><creatorcontrib>Zhang, Dai</creatorcontrib><creatorcontrib>Zhu, Zhenyu</creatorcontrib><creatorcontrib>Pei, Dan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Shenglin</au><au>Jin, Pengxiang</au><au>Lin, Zihan</au><au>Sun, Yongqian</au><au>Zhang, Bicheng</au><au>Xia, Sibo</au><au>Li, Zhengdan</au><au>Zhong, Zhenyu</au><au>Ma, Minghua</au><au>Jin, Wa</au><au>Zhang, Dai</au><au>Zhu, Zhenyu</au><au>Pei, Dan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Failure Diagnosis of Microservice System through Multimodal Data</atitle><date>2023-02-21</date><risdate>2023</risdate><abstract>Automatic failure diagnosis is crucial for large microservice systems. Currently, most failure diagnosis methods rely solely on single-modal data (i.e., using either metrics, logs, or traces). In this study, we conduct an empirical study using real-world failure cases to show that combining these sources of data (multimodal data) leads to a more accurate diagnosis. However, effectively representing these data and addressing imbalanced failures remain challenging. To tackle these issues, we propose DiagFusion, a robust failure diagnosis approach that uses multimodal data. It leverages embedding techniques and data augmentation to represent the multimodal data of service instances, combines deployment data and traces to build a dependency graph, and uses a graph neural network to localize the root cause instance and determine the failure type. Our evaluations using real-world datasets show that DiagFusion outperforms existing methods in terms of root cause instance localization (improving by 20.9% to 368%) and failure type determination (improving by 11.0% to 169%).</abstract><doi>10.48550/arxiv.2302.10512</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2302.10512
ispartof
issn
language eng
recordid cdi_arxiv_primary_2302_10512
source arXiv.org
subjects Computer Science - Software Engineering
title Robust Failure Diagnosis of Microservice System through Multimodal Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T13%3A58%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Failure%20Diagnosis%20of%20Microservice%20System%20through%20Multimodal%20Data&rft.au=Zhang,%20Shenglin&rft.date=2023-02-21&rft_id=info:doi/10.48550/arxiv.2302.10512&rft_dat=%3Carxiv_GOX%3E2302_10512%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true