Toward Automated Anomaly Identification in Large-Scale Systems

When a system fails to function properly, health-related data are collected for troubleshooting. However, it is challenging to effectively identify anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and even worse, no...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2010-02, Vol.21 (2), p.174-187
Hauptverfasser:	Lan, Zhiling, Zheng, Ziming, Li, Yawei
Format:	Artikel
Sprache:	eng
Schlagworte:	Anomalies Anomaly identification Application software Automated Automation Computer errors Computer networks Data analysis Fault diagnosis Feature extraction Format Independent component analysis Large-scale systems outlier detection Principal component analysis Principal components analysis Production systems Studies Transformations Unsupervised learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	187
container_issue	2
container_start_page	174
container_title	IEEE transactions on parallel and distributed systems
container_volume	21
creator	Lan, Zhiling Zheng, Ziming Li, Yawei
description	When a system fails to function properly, health-related data are collected for troubleshooting. However, it is challenging to effectively identify anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and even worse, not scalable. In this paper, we present an automated mechanism for node-level anomaly identification in large-scale systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. Moreover, we compare two techniques, principal component analysis (PCA) and independent component analysis (ICA), for feature extraction. We evaluate our prototype implementation by injecting a variety of faults into a production system at NCSA. The results show that our mechanism, in particular, the one using ICA-based feature extraction, can effectively identify faulty nodes with high accuracy and low computation overhead.
doi_str_mv	10.1109/TPDS.2009.52
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_912031664</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4815224</ieee_id><sourcerecordid>2543477621</sourcerecordid><originalsourceid>FETCH-LOGICAL-c352t-aa791faeed728daa9d3f93138da63ca843c64a30d8f599f2baecd24066396a623</originalsourceid><addsrcrecordid>eNpdkEtLw0AQgBdRsFZv3rwEL15M3Xd2L0Kpr0JBofW8jJuJpORRdxOk_96EigdP8x0-ZoaPkEtGZ4xRe7d5e1jPOKV2pvgRmTClTMqZEccDU6lSy5k9JWcxbillUlE5Ifeb9htCnsz7rq2hw4GaAap9ssyx6cqi9NCVbZOUTbKC8Inp2kOFyXofO6zjOTkpoIp48Tun5P3pcbN4SVevz8vFfJV6oXiXAmSWFYCYZ9zkADYXhRVMDKyFByOF1xIEzU2hrC34B6DPuaRaC6tBczElN4e9u9B-9Rg7V5fRY1VBg20fnckU5SyzajCv_5nbtg_N8JyzjFPBtJaDdHuQfGhjDFi4XShrCHvHqBtTujGlG1M6NV6_OuglIv6p0jDFuRQ_sO1ueg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>912031664</pqid></control><display><type>article</type><title>Toward Automated Anomaly Identification in Large-Scale Systems</title><source>IEEE Xplore</source><creator>Lan, Zhiling ; Zheng, Ziming ; Li, Yawei</creator><creatorcontrib>Lan, Zhiling ; Zheng, Ziming ; Li, Yawei</creatorcontrib><description>When a system fails to function properly, health-related data are collected for troubleshooting. However, it is challenging to effectively identify anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and even worse, not scalable. In this paper, we present an automated mechanism for node-level anomaly identification in large-scale systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. Moreover, we compare two techniques, principal component analysis (PCA) and independent component analysis (ICA), for feature extraction. We evaluate our prototype implementation by injecting a variety of faults into a production system at NCSA. The results show that our mechanism, in particular, the one using ICA-based feature extraction, can effectively identify faulty nodes with high accuracy and low computation overhead.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2009.52</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Anomalies ; Anomaly identification ; Application software ; Automated ; Automation ; Computer errors ; Computer networks ; Data analysis ; Fault diagnosis ; Feature extraction ; Format ; Independent component analysis ; Large-scale systems ; outlier detection ; Principal component analysis ; Principal components analysis ; Production systems ; Studies ; Transformations ; Unsupervised learning</subject><ispartof>IEEE transactions on parallel and distributed systems, 2010-02, Vol.21 (2), p.174-187</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2010</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c352t-aa791faeed728daa9d3f93138da63ca843c64a30d8f599f2baecd24066396a623</citedby><cites>FETCH-LOGICAL-c352t-aa791faeed728daa9d3f93138da63ca843c64a30d8f599f2baecd24066396a623</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4815224$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4815224$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lan, Zhiling</creatorcontrib><creatorcontrib>Zheng, Ziming</creatorcontrib><creatorcontrib>Li, Yawei</creatorcontrib><title>Toward Automated Anomaly Identification in Large-Scale Systems</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>When a system fails to function properly, health-related data are collected for troubleshooting. However, it is challenging to effectively identify anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and even worse, not scalable. In this paper, we present an automated mechanism for node-level anomaly identification in large-scale systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. Moreover, we compare two techniques, principal component analysis (PCA) and independent component analysis (ICA), for feature extraction. We evaluate our prototype implementation by injecting a variety of faults into a production system at NCSA. The results show that our mechanism, in particular, the one using ICA-based feature extraction, can effectively identify faulty nodes with high accuracy and low computation overhead.</description><subject>Anomalies</subject><subject>Anomaly identification</subject><subject>Application software</subject><subject>Automated</subject><subject>Automation</subject><subject>Computer errors</subject><subject>Computer networks</subject><subject>Data analysis</subject><subject>Fault diagnosis</subject><subject>Feature extraction</subject><subject>Format</subject><subject>Independent component analysis</subject><subject>Large-scale systems</subject><subject>outlier detection</subject><subject>Principal component analysis</subject><subject>Principal components analysis</subject><subject>Production systems</subject><subject>Studies</subject><subject>Transformations</subject><subject>Unsupervised learning</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkEtLw0AQgBdRsFZv3rwEL15M3Xd2L0Kpr0JBofW8jJuJpORRdxOk_96EigdP8x0-ZoaPkEtGZ4xRe7d5e1jPOKV2pvgRmTClTMqZEccDU6lSy5k9JWcxbillUlE5Ifeb9htCnsz7rq2hw4GaAap9ssyx6cqi9NCVbZOUTbKC8Inp2kOFyXofO6zjOTkpoIp48Tun5P3pcbN4SVevz8vFfJV6oXiXAmSWFYCYZ9zkADYXhRVMDKyFByOF1xIEzU2hrC34B6DPuaRaC6tBczElN4e9u9B-9Rg7V5fRY1VBg20fnckU5SyzajCv_5nbtg_N8JyzjFPBtJaDdHuQfGhjDFi4XShrCHvHqBtTujGlG1M6NV6_OuglIv6p0jDFuRQ_sO1ueg</recordid><startdate>201002</startdate><enddate>201002</enddate><creator>Lan, Zhiling</creator><creator>Zheng, Ziming</creator><creator>Li, Yawei</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>201002</creationdate><title>Toward Automated Anomaly Identification in Large-Scale Systems</title><author>Lan, Zhiling ; Zheng, Ziming ; Li, Yawei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c352t-aa791faeed728daa9d3f93138da63ca843c64a30d8f599f2baecd24066396a623</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Anomalies</topic><topic>Anomaly identification</topic><topic>Application software</topic><topic>Automated</topic><topic>Automation</topic><topic>Computer errors</topic><topic>Computer networks</topic><topic>Data analysis</topic><topic>Fault diagnosis</topic><topic>Feature extraction</topic><topic>Format</topic><topic>Independent component analysis</topic><topic>Large-scale systems</topic><topic>outlier detection</topic><topic>Principal component analysis</topic><topic>Principal components analysis</topic><topic>Production systems</topic><topic>Studies</topic><topic>Transformations</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lan, Zhiling</creatorcontrib><creatorcontrib>Zheng, Ziming</creatorcontrib><creatorcontrib>Li, Yawei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lan, Zhiling</au><au>Zheng, Ziming</au><au>Li, Yawei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Toward Automated Anomaly Identification in Large-Scale Systems</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2010-02</date><risdate>2010</risdate><volume>21</volume><issue>2</issue><spage>174</spage><epage>187</epage><pages>174-187</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>When a system fails to function properly, health-related data are collected for troubleshooting. However, it is challenging to effectively identify anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and even worse, not scalable. In this paper, we present an automated mechanism for node-level anomaly identification in large-scale systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. Moreover, we compare two techniques, principal component analysis (PCA) and independent component analysis (ICA), for feature extraction. We evaluate our prototype implementation by injecting a variety of faults into a production system at NCSA. The results show that our mechanism, in particular, the one using ICA-based feature extraction, can effectively identify faulty nodes with high accuracy and low computation overhead.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2009.52</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1045-9219
ispartof	IEEE transactions on parallel and distributed systems, 2010-02, Vol.21 (2), p.174-187
issn	1045-9219 1558-2183
language	eng
recordid	cdi_proquest_journals_912031664
source	IEEE Xplore
subjects	Anomalies Anomaly identification Application software Automated Automation Computer errors Computer networks Data analysis Fault diagnosis Feature extraction Format Independent component analysis Large-scale systems outlier detection Principal component analysis Principal components analysis Production systems Studies Transformations Unsupervised learning
title	Toward Automated Anomaly Identification in Large-Scale Systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T14%3A09%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Toward%20Automated%20Anomaly%20Identification%20in%20Large-Scale%20Systems&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Lan,%20Zhiling&rft.date=2010-02&rft.volume=21&rft.issue=2&rft.spage=174&rft.epage=187&rft.pages=174-187&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2009.52&rft_dat=%3Cproquest_RIE%3E2543477621%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=912031664&rft_id=info:pmid/&rft_ieee_id=4815224&rfr_iscdi=true