Detecting Outliers in Non-IID Data: A Systematic Literature Review

Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This pres...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023, Vol.11, p.70333-70352
Hauptverfasser: Siddiqi, Shafaq, Qureshi, Faiza, Lindstaedt, Stefanie, Kern, Roman
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 70352
container_issue
container_start_page 70333
container_title IEEE access
container_volume 11
creator Siddiqi, Shafaq
Qureshi, Faiza
Lindstaedt, Stefanie
Kern, Roman
description Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This presents a challenge in real-world datasets where correlations, dependencies, and complex structures are common. In recent literature, several methods have been proposed to address this issue and each method has its own strengths and limitations, and the selection depends on the data characteristics and application requirements. However, there is a lack of a comprehensive categorization of these methods in the literature. This study aims to systematically review outlier detection methods for non-IID data published between 2015 and 2023. This study focuses on three major aspects; data characteristics, methods, and evaluation measures. In data characteristics, we discuss the differentiating properties of non-IID data. Then we review the recent methods proposed for outlier detection in non-IID data, covering their theoretical foundations and algorithmic approaches. Finally, we discuss the evaluation metrics proposed to measure the performance of these methods. Additionally, we present a taxonomy for organizing these methods and highlight the application domain of outlier detection in non-IID categorical data, outlier detection in federated learning, and outlier detection in attribute graphs. We provide a comprehensive overview of datasets used in the selected literature. Moreover, we discuss open challenges in outlier detection for non-IID to shed light on future research directions. By synthesizing the existing literature, this study contributes to advancing the understanding and development of outlier detection techniques in non-IID data settings.
doi_str_mv 10.1109/ACCESS.2023.3294096
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2023_3294096</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10177747</ieee_id><doaj_id>oai_doaj_org_article_f15dc215a71548f792728ed0b913a29f</doaj_id><sourcerecordid>2842169127</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-62ab58c58a744ba4374fcab8c3c5458906f23021a6d6d6e8aa89ab6d6ff0a1333</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhhdRUNRfoIcFz1vzuUm81bZqoViweg6z6URSalezqeK_N3VFOnOYYZj3neEpigtKBpQScz0cjSaLxYARxgecGUFMfVCcMFqbikteH-71x8V5161IDp1HUp0Ut2NM6FLYvJbzbVoHjF0ZNuVju6mm03E5hgQ35bBcfHcJ3yAFV85CwghpG7F8ws-AX2fFkYd1h-d_9bR4uZs8jx6q2fx-OhrOKselSVXNoJHaSQ1KiAYEV8I7aLTjTgqpDak944RRqJc5UQNoA01uvSdAOeenxbT3Xbawsu8xvEH8ti0E-zto46uFmD9co_VULh2jEhSVQntlmGIal6QxlAMzPntd9V7vsf3YYpfsqt3GTX7fMi12vChTeYv3Wy62XRfR_1-lxO7Y25693bG3f-yz6rJXBUTcU1CllFD8BwSbfZ0</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2842169127</pqid></control><display><type>article</type><title>Detecting Outliers in Non-IID Data: A Systematic Literature Review</title><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>IEEE Xplore Open Access Journals</source><creator>Siddiqi, Shafaq ; Qureshi, Faiza ; Lindstaedt, Stefanie ; Kern, Roman</creator><creatorcontrib>Siddiqi, Shafaq ; Qureshi, Faiza ; Lindstaedt, Stefanie ; Kern, Roman</creatorcontrib><description>Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This presents a challenge in real-world datasets where correlations, dependencies, and complex structures are common. In recent literature, several methods have been proposed to address this issue and each method has its own strengths and limitations, and the selection depends on the data characteristics and application requirements. However, there is a lack of a comprehensive categorization of these methods in the literature. This study aims to systematically review outlier detection methods for non-IID data published between 2015 and 2023. This study focuses on three major aspects; data characteristics, methods, and evaluation measures. In data characteristics, we discuss the differentiating properties of non-IID data. Then we review the recent methods proposed for outlier detection in non-IID data, covering their theoretical foundations and algorithmic approaches. Finally, we discuss the evaluation metrics proposed to measure the performance of these methods. Additionally, we present a taxonomy for organizing these methods and highlight the application domain of outlier detection in non-IID categorical data, outlier detection in federated learning, and outlier detection in attribute graphs. We provide a comprehensive overview of datasets used in the selected literature. Moreover, we discuss open challenges in outlier detection for non-IID to shed light on future research directions. By synthesizing the existing literature, this study contributes to advancing the understanding and development of outlier detection techniques in non-IID data settings.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3294096</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Anomaly detection ; Behavioral sciences ; Couplings ; Data analysis ; data dependency ; Data models ; Datasets ; Feature extraction ; heterogeneous data ; Literature reviews ; non-IID data ; Outlier detection ; Outliers (statistics) ; Supervised learning ; Taxonomy ; Unsupervised learning</subject><ispartof>IEEE access, 2023, Vol.11, p.70333-70352</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-62ab58c58a744ba4374fcab8c3c5458906f23021a6d6d6e8aa89ab6d6ff0a1333</cites><orcidid>0000-0003-0202-6100 ; 0000-0003-3039-2255 ; 0000-0003-0031-9911 ; 0000-0002-5414-286X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10177747$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Siddiqi, Shafaq</creatorcontrib><creatorcontrib>Qureshi, Faiza</creatorcontrib><creatorcontrib>Lindstaedt, Stefanie</creatorcontrib><creatorcontrib>Kern, Roman</creatorcontrib><title>Detecting Outliers in Non-IID Data: A Systematic Literature Review</title><title>IEEE access</title><addtitle>Access</addtitle><description>Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This presents a challenge in real-world datasets where correlations, dependencies, and complex structures are common. In recent literature, several methods have been proposed to address this issue and each method has its own strengths and limitations, and the selection depends on the data characteristics and application requirements. However, there is a lack of a comprehensive categorization of these methods in the literature. This study aims to systematically review outlier detection methods for non-IID data published between 2015 and 2023. This study focuses on three major aspects; data characteristics, methods, and evaluation measures. In data characteristics, we discuss the differentiating properties of non-IID data. Then we review the recent methods proposed for outlier detection in non-IID data, covering their theoretical foundations and algorithmic approaches. Finally, we discuss the evaluation metrics proposed to measure the performance of these methods. Additionally, we present a taxonomy for organizing these methods and highlight the application domain of outlier detection in non-IID categorical data, outlier detection in federated learning, and outlier detection in attribute graphs. We provide a comprehensive overview of datasets used in the selected literature. Moreover, we discuss open challenges in outlier detection for non-IID to shed light on future research directions. By synthesizing the existing literature, this study contributes to advancing the understanding and development of outlier detection techniques in non-IID data settings.</description><subject>Anomaly detection</subject><subject>Behavioral sciences</subject><subject>Couplings</subject><subject>Data analysis</subject><subject>data dependency</subject><subject>Data models</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>heterogeneous data</subject><subject>Literature reviews</subject><subject>non-IID data</subject><subject>Outlier detection</subject><subject>Outliers (statistics)</subject><subject>Supervised learning</subject><subject>Taxonomy</subject><subject>Unsupervised learning</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkE1LAzEQhhdRUNRfoIcFz1vzuUm81bZqoViweg6z6URSalezqeK_N3VFOnOYYZj3neEpigtKBpQScz0cjSaLxYARxgecGUFMfVCcMFqbikteH-71x8V5161IDp1HUp0Ut2NM6FLYvJbzbVoHjF0ZNuVju6mm03E5hgQ35bBcfHcJ3yAFV85CwghpG7F8ws-AX2fFkYd1h-d_9bR4uZs8jx6q2fx-OhrOKselSVXNoJHaSQ1KiAYEV8I7aLTjTgqpDak944RRqJc5UQNoA01uvSdAOeenxbT3Xbawsu8xvEH8ti0E-zto46uFmD9co_VULh2jEhSVQntlmGIal6QxlAMzPntd9V7vsf3YYpfsqt3GTX7fMi12vChTeYv3Wy62XRfR_1-lxO7Y25693bG3f-yz6rJXBUTcU1CllFD8BwSbfZ0</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Siddiqi, Shafaq</creator><creator>Qureshi, Faiza</creator><creator>Lindstaedt, Stefanie</creator><creator>Kern, Roman</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-0202-6100</orcidid><orcidid>https://orcid.org/0000-0003-3039-2255</orcidid><orcidid>https://orcid.org/0000-0003-0031-9911</orcidid><orcidid>https://orcid.org/0000-0002-5414-286X</orcidid></search><sort><creationdate>2023</creationdate><title>Detecting Outliers in Non-IID Data: A Systematic Literature Review</title><author>Siddiqi, Shafaq ; Qureshi, Faiza ; Lindstaedt, Stefanie ; Kern, Roman</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-62ab58c58a744ba4374fcab8c3c5458906f23021a6d6d6e8aa89ab6d6ff0a1333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Anomaly detection</topic><topic>Behavioral sciences</topic><topic>Couplings</topic><topic>Data analysis</topic><topic>data dependency</topic><topic>Data models</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>heterogeneous data</topic><topic>Literature reviews</topic><topic>non-IID data</topic><topic>Outlier detection</topic><topic>Outliers (statistics)</topic><topic>Supervised learning</topic><topic>Taxonomy</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Siddiqi, Shafaq</creatorcontrib><creatorcontrib>Qureshi, Faiza</creatorcontrib><creatorcontrib>Lindstaedt, Stefanie</creatorcontrib><creatorcontrib>Kern, Roman</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Siddiqi, Shafaq</au><au>Qureshi, Faiza</au><au>Lindstaedt, Stefanie</au><au>Kern, Roman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Detecting Outliers in Non-IID Data: A Systematic Literature Review</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>70333</spage><epage>70352</epage><pages>70333-70352</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Outlier detection (outlier and anomaly are used interchangeably in this review) in non-independent and identically distributed (non-IID) data refers to identifying unusual or unexpected observations in datasets that do not follow an independent and identically distributed (IID) assumption. This presents a challenge in real-world datasets where correlations, dependencies, and complex structures are common. In recent literature, several methods have been proposed to address this issue and each method has its own strengths and limitations, and the selection depends on the data characteristics and application requirements. However, there is a lack of a comprehensive categorization of these methods in the literature. This study aims to systematically review outlier detection methods for non-IID data published between 2015 and 2023. This study focuses on three major aspects; data characteristics, methods, and evaluation measures. In data characteristics, we discuss the differentiating properties of non-IID data. Then we review the recent methods proposed for outlier detection in non-IID data, covering their theoretical foundations and algorithmic approaches. Finally, we discuss the evaluation metrics proposed to measure the performance of these methods. Additionally, we present a taxonomy for organizing these methods and highlight the application domain of outlier detection in non-IID categorical data, outlier detection in federated learning, and outlier detection in attribute graphs. We provide a comprehensive overview of datasets used in the selected literature. Moreover, we discuss open challenges in outlier detection for non-IID to shed light on future research directions. By synthesizing the existing literature, this study contributes to advancing the understanding and development of outlier detection techniques in non-IID data settings.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3294096</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0003-0202-6100</orcidid><orcidid>https://orcid.org/0000-0003-3039-2255</orcidid><orcidid>https://orcid.org/0000-0003-0031-9911</orcidid><orcidid>https://orcid.org/0000-0002-5414-286X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2023, Vol.11, p.70333-70352
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2023_3294096
source DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; IEEE Xplore Open Access Journals
subjects Anomaly detection
Behavioral sciences
Couplings
Data analysis
data dependency
Data models
Datasets
Feature extraction
heterogeneous data
Literature reviews
non-IID data
Outlier detection
Outliers (statistics)
Supervised learning
Taxonomy
Unsupervised learning
title Detecting Outliers in Non-IID Data: A Systematic Literature Review
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T05%3A59%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Detecting%20Outliers%20in%20Non-IID%20Data:%20A%20Systematic%20Literature%20Review&rft.jtitle=IEEE%20access&rft.au=Siddiqi,%20Shafaq&rft.date=2023&rft.volume=11&rft.spage=70333&rft.epage=70352&rft.pages=70333-70352&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3294096&rft_dat=%3Cproquest_cross%3E2842169127%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2842169127&rft_id=info:pmid/&rft_ieee_id=10177747&rft_doaj_id=oai_doaj_org_article_f15dc215a71548f792728ed0b913a29f&rfr_iscdi=true