Detecting Outliers in Non-IID Data: A Systematic Literature Review
Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This pres...
Gespeichert in:
Veröffentlicht in: | IEEE access 2023, Vol.11, p.70333-70352 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 70352 |
---|---|
container_issue | |
container_start_page | 70333 |
container_title | IEEE access |
container_volume | 11 |
creator | Siddiqi, Shafaq Qureshi, Faiza Lindstaedt, Stefanie Kern, Roman |
description | Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This presents a challenge in real-world datasets where correlations, dependencies, and complex structures are common. In recent literature, several methods have been proposed to address this issue and each method has its own strengths and limitations, and the selection depends on the data characteristics and application requirements. However, there is a lack of a comprehensive categorization of these methods in the literature. This study aims to systematically review outlier detection methods for non-IID data published between 2015 and 2023. This study focuses on three major aspects; data characteristics, methods, and evaluation measures. In data characteristics, we discuss the differentiating properties of non-IID data. Then we review the recent methods proposed for outlier detection in non-IID data, covering their theoretical foundations and algorithmic approaches. Finally, we discuss the evaluation metrics proposed to measure the performance of these methods. Additionally, we present a taxonomy for organizing these methods and highlight the application domain of outlier detection in non-IID categorical data, outlier detection in federated learning, and outlier detection in attribute graphs. We provide a comprehensive overview of datasets used in the selected literature. Moreover, we discuss open challenges in outlier detection for non-IID to shed light on future research directions. By synthesizing the existing literature, this study contributes to advancing the understanding and development of outlier detection techniques in non-IID data settings. |
doi_str_mv | 10.1109/ACCESS.2023.3294096 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2023_3294096</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10177747</ieee_id><doaj_id>oai_doaj_org_article_f15dc215a71548f792728ed0b913a29f</doaj_id><sourcerecordid>2842169127</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-62ab58c58a744ba4374fcab8c3c5458906f23021a6d6d6e8aa89ab6d6ff0a1333</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhhdRUNRfoIcFz1vzuUm81bZqoViweg6z6URSalezqeK_N3VFOnOYYZj3neEpigtKBpQScz0cjSaLxYARxgecGUFMfVCcMFqbikteH-71x8V5161IDp1HUp0Ut2NM6FLYvJbzbVoHjF0ZNuVju6mm03E5hgQ35bBcfHcJ3yAFV85CwghpG7F8ws-AX2fFkYd1h-d_9bR4uZs8jx6q2fx-OhrOKselSVXNoJHaSQ1KiAYEV8I7aLTjTgqpDak944RRqJc5UQNoA01uvSdAOeenxbT3Xbawsu8xvEH8ti0E-zto46uFmD9co_VULh2jEhSVQntlmGIal6QxlAMzPntd9V7vsf3YYpfsqt3GTX7fMi12vChTeYv3Wy62XRfR_1-lxO7Y25693bG3f-yz6rJXBUTcU1CllFD8BwSbfZ0</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2842169127</pqid></control><display><type>article</type><title>Detecting Outliers in Non-IID Data: A Systematic Literature Review</title><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>IEEE Xplore Open Access Journals</source><creator>Siddiqi, Shafaq ; Qureshi, Faiza ; Lindstaedt, Stefanie ; Kern, Roman</creator><creatorcontrib>Siddiqi, Shafaq ; Qureshi, Faiza ; Lindstaedt, Stefanie ; Kern, Roman</creatorcontrib><description>Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This presents a challenge in real-world datasets where correlations, dependencies, and complex structures are common. In recent literature, several methods have been proposed to address this issue and each method has its own strengths and limitations, and the selection depends on the data characteristics and application requirements. However, there is a lack of a comprehensive categorization of these methods in the literature. This study aims to systematically review outlier detection methods for non-IID data published between 2015 and 2023. This study focuses on three major aspects; data characteristics, methods, and evaluation measures. In data characteristics, we discuss the differentiating properties of non-IID data. Then we review the recent methods proposed for outlier detection in non-IID data, covering their theoretical foundations and algorithmic approaches. Finally, we discuss the evaluation metrics proposed to measure the performance of these methods. Additionally, we present a taxonomy for organizing these methods and highlight the application domain of outlier detection in non-IID categorical data, outlier detection in federated learning, and outlier detection in attribute graphs. We provide a comprehensive overview of datasets used in the selected literature. Moreover, we discuss open challenges in outlier detection for non-IID to shed light on future research directions. By synthesizing the existing literature, this study contributes to advancing the understanding and development of outlier detection techniques in non-IID data settings.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3294096</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Anomaly detection ; Behavioral sciences ; Couplings ; Data analysis ; data dependency ; Data models ; Datasets ; Feature extraction ; heterogeneous data ; Literature reviews ; non-IID data ; Outlier detection ; Outliers (statistics) ; Supervised learning ; Taxonomy ; Unsupervised learning</subject><ispartof>IEEE access, 2023, Vol.11, p.70333-70352</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-62ab58c58a744ba4374fcab8c3c5458906f23021a6d6d6e8aa89ab6d6ff0a1333</cites><orcidid>0000-0003-0202-6100 ; 0000-0003-3039-2255 ; 0000-0003-0031-9911 ; 0000-0002-5414-286X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10177747$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Siddiqi, Shafaq</creatorcontrib><creatorcontrib>Qureshi, Faiza</creatorcontrib><creatorcontrib>Lindstaedt, Stefanie</creatorcontrib><creatorcontrib>Kern, Roman</creatorcontrib><title>Detecting Outliers in Non-IID Data: A Systematic Literature Review</title><title>IEEE access</title><addtitle>Access</addtitle><description>Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This presents a challenge in real-world datasets where correlations, dependencies, and complex structures are common. In recent literature, several methods have been proposed to address this issue and each method has its own strengths and limitations, and the selection depends on the data characteristics and application requirements. However, there is a lack of a comprehensive categorization of these methods in the literature. This study aims to systematically review outlier detection methods for non-IID data published between 2015 and 2023. This study focuses on three major aspects; data characteristics, methods, and evaluation measures. In data characteristics, we discuss the differentiating properties of non-IID data. Then we review the recent methods proposed for outlier detection in non-IID data, covering their theoretical foundations and algorithmic approaches. Finally, we discuss the evaluation metrics proposed to measure the performance of these methods. Additionally, we present a taxonomy for organizing these methods and highlight the application domain of outlier detection in non-IID categorical data, outlier detection in federated learning, and outlier detection in attribute graphs. We provide a comprehensive overview of datasets used in the selected literature. Moreover, we discuss open challenges in outlier detection for non-IID to shed light on future research directions. By synthesizing the existing literature, this study contributes to advancing the understanding and development of outlier detection techniques in non-IID data settings.</description><subject>Anomaly detection</subject><subject>Behavioral sciences</subject><subject>Couplings</subject><subject>Data analysis</subject><subject>data dependency</subject><subject>Data models</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>heterogeneous data</subject><subject>Literature reviews</subject><subject>non-IID data</subject><subject>Outlier detection</subject><subject>Outliers (statistics)</subject><subject>Supervised learning</subject><subject>Taxonomy</subject><subject>Unsupervised learning</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkE1LAzEQhhdRUNRfoIcFz1vzuUm81bZqoViweg6z6URSalezqeK_N3VFOnOYYZj3neEpigtKBpQScz0cjSaLxYARxgecGUFMfVCcMFqbikteH-71x8V5161IDp1HUp0Ut2NM6FLYvJbzbVoHjF0ZNuVju6mm03E5hgQ35bBcfHcJ3yAFV85CwghpG7F8ws-AX2fFkYd1h-d_9bR4uZs8jx6q2fx-OhrOKselSVXNoJHaSQ1KiAYEV8I7aLTjTgqpDak944RRqJc5UQNoA01uvSdAOeenxbT3Xbawsu8xvEH8ti0E-zto46uFmD9co_VULh2jEhSVQntlmGIal6QxlAMzPntd9V7vsf3YYpfsqt3GTX7fMi12vChTeYv3Wy62XRfR_1-lxO7Y25693bG3f-yz6rJXBUTcU1CllFD8BwSbfZ0</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Siddiqi, Shafaq</creator><creator>Qureshi, Faiza</creator><creator>Lindstaedt, Stefanie</creator><creator>Kern, Roman</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-0202-6100</orcidid><orcidid>https://orcid.org/0000-0003-3039-2255</orcidid><orcidid>https://orcid.org/0000-0003-0031-9911</orcidid><orcidid>https://orcid.org/0000-0002-5414-286X</orcidid></search><sort><creationdate>2023</creationdate><title>Detecting Outliers in Non-IID Data: A Systematic Literature Review</title><author>Siddiqi, Shafaq ; Qureshi, Faiza ; Lindstaedt, Stefanie ; Kern, Roman</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-62ab58c58a744ba4374fcab8c3c5458906f23021a6d6d6e8aa89ab6d6ff0a1333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Anomaly detection</topic><topic>Behavioral sciences</topic><topic>Couplings</topic><topic>Data analysis</topic><topic>data dependency</topic><topic>Data models</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>heterogeneous data</topic><topic>Literature reviews</topic><topic>non-IID data</topic><topic>Outlier detection</topic><topic>Outliers (statistics)</topic><topic>Supervised learning</topic><topic>Taxonomy</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Siddiqi, Shafaq</creatorcontrib><creatorcontrib>Qureshi, Faiza</creatorcontrib><creatorcontrib>Lindstaedt, Stefanie</creatorcontrib><creatorcontrib>Kern, Roman</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Siddiqi, Shafaq</au><au>Qureshi, Faiza</au><au>Lindstaedt, Stefanie</au><au>Kern, Roman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Detecting Outliers in Non-IID Data: A Systematic Literature Review</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>70333</spage><epage>70352</epage><pages>70333-70352</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This presents a challenge in real-world datasets where correlations, dependencies, and complex structures are common. In recent literature, several methods have been proposed to address this issue and each method has its own strengths and limitations, and the selection depends on the data characteristics and application requirements. However, there is a lack of a comprehensive categorization of these methods in the literature. This study aims to systematically review outlier detection methods for non-IID data published between 2015 and 2023. This study focuses on three major aspects; data characteristics, methods, and evaluation measures. In data characteristics, we discuss the differentiating properties of non-IID data. Then we review the recent methods proposed for outlier detection in non-IID data, covering their theoretical foundations and algorithmic approaches. Finally, we discuss the evaluation metrics proposed to measure the performance of these methods. Additionally, we present a taxonomy for organizing these methods and highlight the application domain of outlier detection in non-IID categorical data, outlier detection in federated learning, and outlier detection in attribute graphs. We provide a comprehensive overview of datasets used in the selected literature. Moreover, we discuss open challenges in outlier detection for non-IID to shed light on future research directions. By synthesizing the existing literature, this study contributes to advancing the understanding and development of outlier detection techniques in non-IID data settings.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3294096</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0003-0202-6100</orcidid><orcidid>https://orcid.org/0000-0003-3039-2255</orcidid><orcidid>https://orcid.org/0000-0003-0031-9911</orcidid><orcidid>https://orcid.org/0000-0002-5414-286X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2023, Vol.11, p.70333-70352 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_crossref_primary_10_1109_ACCESS_2023_3294096 |
source | DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; IEEE Xplore Open Access Journals |
subjects | Anomaly detection Behavioral sciences Couplings Data analysis data dependency Data models Datasets Feature extraction heterogeneous data Literature reviews non-IID data Outlier detection Outliers (statistics) Supervised learning Taxonomy Unsupervised learning |
title | Detecting Outliers in Non-IID Data: A Systematic Literature Review |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T05%3A59%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Detecting%20Outliers%20in%20Non-IID%20Data:%20A%20Systematic%20Literature%20Review&rft.jtitle=IEEE%20access&rft.au=Siddiqi,%20Shafaq&rft.date=2023&rft.volume=11&rft.spage=70333&rft.epage=70352&rft.pages=70333-70352&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3294096&rft_dat=%3Cproquest_cross%3E2842169127%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2842169127&rft_id=info:pmid/&rft_ieee_id=10177747&rft_doaj_id=oai_doaj_org_article_f15dc215a71548f792728ed0b913a29f&rfr_iscdi=true |