MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications

Automated fault localization in large-scale cloud-based applications is challenging because it involves mining multivariate time series data from large volumes of operational monitoring metrics. To improve localization accuracy, automated fault localization methods incorporate feature reduction to r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024-01, Vol.12, p.1-1
Hauptverfasser: Tsubouchi, Yuuki, Tsuruta, Hirofumi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE access
container_volume 12
creator Tsubouchi, Yuuki
Tsuruta, Hirofumi
description Automated fault localization in large-scale cloud-based applications is challenging because it involves mining multivariate time series data from large volumes of operational monitoring metrics. To improve localization accuracy, automated fault localization methods incorporate feature reduction to reduce the number of monitoring metrics unrelated to a failure. However, these methods have problems with inaccuracy, either from removing too many failure-related metrics or from retaining too few failure-unrelated metrics. In this paper, we present MetricSifter, a feature reduction framework designed to accurately identify anomalous metrics caused by faults. Our framework locates a failure time window with the highest density of fault-induced change point times across monitoring metrics with a focus on their temporal proximity. Experimental results indicate that MetricSifter achieves an accuracy of 0.981, which is significantly better than the selected baseline methods. Furthermore, experiments combining various reduction methods with various localization methods demonstrate that MetricSifter improves the recall and time efficiency over the baseline methods.
doi_str_mv 10.1109/ACCESS.2024.3374334
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3374334</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10462133</ieee_id><doaj_id>oai_doaj_org_article_0fec4f42f1ab4d6da4c91f30aa8f71cf</doaj_id><sourcerecordid>2956886554</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-8b0484c221eb6d2c12e155a817fafdc01fe426bf14870439a60a74d74e8a25563</originalsourceid><addsrcrecordid>eNpNkU9rGzEQxZfQQkKaT9AcBD3b1f_V9ma2dhtwKNTpWYy1M0FmY7labUv76bvxhpK5aHi890bwq6r3gi-F4M3HVduud7ul5FIvlaq1UvqiupLCNgtllH3zar-sbobhwKdxk2Tqq-r3PZYcwy5SwfyJbRDKmJF9x24MJaYjS8Tux77EX5AjFGQP8QnZDnPEgX2GAoxSZmuiGCIeC9vAZGbbFKCPf-HcEI-s7dPYsdXp1MdwFod31VuCfsCbl_e6-rFZP7RfF9tvX-7a1XYRlGnKwu25djpIKXBvOxmERGEMOFETUBe4INTS7kloV3OtGrAcat3VGh1IY6y6ru7m3i7BwZ9yfIL8xyeI_iyk_Oghlxh69JwwaNKSBOx1ZzvQoRGkOICjWgSauj7MXaecfo44FH9IYz5O3_eyMdY5a4yeXGp2hZyGISP9vyq4fwbmZ2D-GZh_ATalbudURMRXCW2lUEr9A49dksA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2956886554</pqid></control><display><type>article</type><title>MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Tsubouchi, Yuuki ; Tsuruta, Hirofumi</creator><creatorcontrib>Tsubouchi, Yuuki ; Tsuruta, Hirofumi</creatorcontrib><description>Automated fault localization in large-scale cloud-based applications is challenging because it involves mining multivariate time series data from large volumes of operational monitoring metrics. To improve localization accuracy, automated fault localization methods incorporate feature reduction to reduce the number of monitoring metrics unrelated to a failure. However, these methods have problems with inaccuracy, either from removing too many failure-related metrics or from retaining too few failure-unrelated metrics. In this paper, we present MetricSifter, a feature reduction framework designed to accurately identify anomalous metrics caused by faults. Our framework locates a failure time window with the highest density of fault-induced change point times across monitoring metrics with a focus on their temporal proximity. Experimental results indicate that MetricSifter achieves an accuracy of 0.981, which is significantly better than the selected baseline methods. Furthermore, experiments combining various reduction methods with various localization methods demonstrate that MetricSifter improves the recall and time efficiency over the baseline methods.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3374334</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>AIOps ; Automation ; Cloud computing ; Failure Management ; Failure times ; Fault detection ; Fault Localization ; Fault location ; Incident Response ; Localization ; Location awareness ; Measurement ; Monitoring ; Multivariate analysis ; Reduction ; Redundancy ; Site Reliability Engineering ; Task analysis ; Time series ; Time series analysis ; Visualization ; Windows (intervals)</subject><ispartof>IEEE access, 2024-01, Vol.12, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-8b0484c221eb6d2c12e155a817fafdc01fe426bf14870439a60a74d74e8a25563</cites><orcidid>0009-0002-7719-028X ; 0009-0008-5758-7910</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10462133$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,27610,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Tsubouchi, Yuuki</creatorcontrib><creatorcontrib>Tsuruta, Hirofumi</creatorcontrib><title>MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications</title><title>IEEE access</title><addtitle>Access</addtitle><description>Automated fault localization in large-scale cloud-based applications is challenging because it involves mining multivariate time series data from large volumes of operational monitoring metrics. To improve localization accuracy, automated fault localization methods incorporate feature reduction to reduce the number of monitoring metrics unrelated to a failure. However, these methods have problems with inaccuracy, either from removing too many failure-related metrics or from retaining too few failure-unrelated metrics. In this paper, we present MetricSifter, a feature reduction framework designed to accurately identify anomalous metrics caused by faults. Our framework locates a failure time window with the highest density of fault-induced change point times across monitoring metrics with a focus on their temporal proximity. Experimental results indicate that MetricSifter achieves an accuracy of 0.981, which is significantly better than the selected baseline methods. Furthermore, experiments combining various reduction methods with various localization methods demonstrate that MetricSifter improves the recall and time efficiency over the baseline methods.</description><subject>AIOps</subject><subject>Automation</subject><subject>Cloud computing</subject><subject>Failure Management</subject><subject>Failure times</subject><subject>Fault detection</subject><subject>Fault Localization</subject><subject>Fault location</subject><subject>Incident Response</subject><subject>Localization</subject><subject>Location awareness</subject><subject>Measurement</subject><subject>Monitoring</subject><subject>Multivariate analysis</subject><subject>Reduction</subject><subject>Redundancy</subject><subject>Site Reliability Engineering</subject><subject>Task analysis</subject><subject>Time series</subject><subject>Time series analysis</subject><subject>Visualization</subject><subject>Windows (intervals)</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkU9rGzEQxZfQQkKaT9AcBD3b1f_V9ma2dhtwKNTpWYy1M0FmY7labUv76bvxhpK5aHi890bwq6r3gi-F4M3HVduud7ul5FIvlaq1UvqiupLCNgtllH3zar-sbobhwKdxk2Tqq-r3PZYcwy5SwfyJbRDKmJF9x24MJaYjS8Tux77EX5AjFGQP8QnZDnPEgX2GAoxSZmuiGCIeC9vAZGbbFKCPf-HcEI-s7dPYsdXp1MdwFod31VuCfsCbl_e6-rFZP7RfF9tvX-7a1XYRlGnKwu25djpIKXBvOxmERGEMOFETUBe4INTS7kloV3OtGrAcat3VGh1IY6y6ru7m3i7BwZ9yfIL8xyeI_iyk_Oghlxh69JwwaNKSBOx1ZzvQoRGkOICjWgSauj7MXaecfo44FH9IYz5O3_eyMdY5a4yeXGp2hZyGISP9vyq4fwbmZ2D-GZh_ATalbudURMRXCW2lUEr9A49dksA</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Tsubouchi, Yuuki</creator><creator>Tsuruta, Hirofumi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0009-0002-7719-028X</orcidid><orcidid>https://orcid.org/0009-0008-5758-7910</orcidid></search><sort><creationdate>20240101</creationdate><title>MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications</title><author>Tsubouchi, Yuuki ; Tsuruta, Hirofumi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-8b0484c221eb6d2c12e155a817fafdc01fe426bf14870439a60a74d74e8a25563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>AIOps</topic><topic>Automation</topic><topic>Cloud computing</topic><topic>Failure Management</topic><topic>Failure times</topic><topic>Fault detection</topic><topic>Fault Localization</topic><topic>Fault location</topic><topic>Incident Response</topic><topic>Localization</topic><topic>Location awareness</topic><topic>Measurement</topic><topic>Monitoring</topic><topic>Multivariate analysis</topic><topic>Reduction</topic><topic>Redundancy</topic><topic>Site Reliability Engineering</topic><topic>Task analysis</topic><topic>Time series</topic><topic>Time series analysis</topic><topic>Visualization</topic><topic>Windows (intervals)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tsubouchi, Yuuki</creatorcontrib><creatorcontrib>Tsuruta, Hirofumi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tsubouchi, Yuuki</au><au>Tsuruta, Hirofumi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Automated fault localization in large-scale cloud-based applications is challenging because it involves mining multivariate time series data from large volumes of operational monitoring metrics. To improve localization accuracy, automated fault localization methods incorporate feature reduction to reduce the number of monitoring metrics unrelated to a failure. However, these methods have problems with inaccuracy, either from removing too many failure-related metrics or from retaining too few failure-unrelated metrics. In this paper, we present MetricSifter, a feature reduction framework designed to accurately identify anomalous metrics caused by faults. Our framework locates a failure time window with the highest density of fault-induced change point times across monitoring metrics with a focus on their temporal proximity. Experimental results indicate that MetricSifter achieves an accuracy of 0.981, which is significantly better than the selected baseline methods. Furthermore, experiments combining various reduction methods with various localization methods demonstrate that MetricSifter improves the recall and time efficiency over the baseline methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3374334</doi><tpages>1</tpages><orcidid>https://orcid.org/0009-0002-7719-028X</orcidid><orcidid>https://orcid.org/0009-0008-5758-7910</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2024-01, Vol.12, p.1-1
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2024_3374334
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects AIOps
Automation
Cloud computing
Failure Management
Failure times
Fault detection
Fault Localization
Fault location
Incident Response
Localization
Location awareness
Measurement
Monitoring
Multivariate analysis
Reduction
Redundancy
Site Reliability Engineering
Task analysis
Time series
Time series analysis
Visualization
Windows (intervals)
title MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T04%3A35%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MetricSifter:%20Feature%20Reduction%20of%20Multivariate%20Time%20Series%20Data%20for%20Efficient%20Fault%20Localization%20in%20Cloud%20Applications&rft.jtitle=IEEE%20access&rft.au=Tsubouchi,%20Yuuki&rft.date=2024-01-01&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3374334&rft_dat=%3Cproquest_cross%3E2956886554%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2956886554&rft_id=info:pmid/&rft_ieee_id=10462133&rft_doaj_id=oai_doaj_org_article_0fec4f42f1ab4d6da4c91f30aa8f71cf&rfr_iscdi=true