A procedure for outlier identification in data sets from continuous distributions

We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Test (Madrid, Spain) Spain), 2004-06, Vol.13 (1), p.247-262
Hauptverfasser: Balakrishnan, N., Quiroz, A. J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 262
container_issue 1
container_start_page 247
container_title Test (Madrid, Spain)
container_volume 13
creator Balakrishnan, N.
Quiroz, A. J.
description We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.[PUBLICATION ABSTRACT]
doi_str_mv 10.1007/BF02603008
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1112393974</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2790608491</sourcerecordid><originalsourceid>FETCH-LOGICAL-c259t-92168cf44b208bb26960c01eee893f6f8b84ca3d22ea19e7ba1ea18beeaed7273</originalsourceid><addsrcrecordid>eNpFkE9LAzEQxYMoWKsXP0HAm7A6SdZscqzFqlAQQc9Lkp1ASrup-XPw27ulBU_zhvnx3vAIuWXwwAC6x-cVcAkCQJ2RGVNSNGrazyfNhGhAKnlJrnLeAMhWcjYjnwu6T9HhUBNSHxONtWwDJhoGHEvwwZkS4kjDSAdTDM1YMvUp7qiL032ssWY6hFxSsPVA5mty4c02481pzsn36uVr-dasP17fl4t14_iTLo3mTCrn29ZyUNZyqSU4YIiotPDSK6taZ8TAORqmsbOGTUJZRINDxzsxJ3dH3-n_n4q59JtY0zhF9owxLrTQXTtR90fKpZhzQt_vU9iZ9Nsz6A-V9f-ViT_gUV79</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1112393974</pqid></control><display><type>article</type><title>A procedure for outlier identification in data sets from continuous distributions</title><source>SpringerLink Journals - AutoHoldings</source><creator>Balakrishnan, N. ; Quiroz, A. J.</creator><creatorcontrib>Balakrishnan, N. ; Quiroz, A. J.</creatorcontrib><description>We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.[PUBLICATION ABSTRACT]</description><identifier>ISSN: 1133-0686</identifier><identifier>EISSN: 1863-8260</identifier><identifier>DOI: 10.1007/BF02603008</identifier><language>eng</language><publisher>Heidelberg: Springer Nature B.V</publisher><subject>Econometrics ; Studies</subject><ispartof>Test (Madrid, Spain), 2004-06, Vol.13 (1), p.247-262</ispartof><rights>Sociedad Española de Estadística e Investigación Operativa 2004</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c259t-92168cf44b208bb26960c01eee893f6f8b84ca3d22ea19e7ba1ea18beeaed7273</citedby><cites>FETCH-LOGICAL-c259t-92168cf44b208bb26960c01eee893f6f8b84ca3d22ea19e7ba1ea18beeaed7273</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Balakrishnan, N.</creatorcontrib><creatorcontrib>Quiroz, A. J.</creatorcontrib><title>A procedure for outlier identification in data sets from continuous distributions</title><title>Test (Madrid, Spain)</title><description>We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.[PUBLICATION ABSTRACT]</description><subject>Econometrics</subject><subject>Studies</subject><issn>1133-0686</issn><issn>1863-8260</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpFkE9LAzEQxYMoWKsXP0HAm7A6SdZscqzFqlAQQc9Lkp1ASrup-XPw27ulBU_zhvnx3vAIuWXwwAC6x-cVcAkCQJ2RGVNSNGrazyfNhGhAKnlJrnLeAMhWcjYjnwu6T9HhUBNSHxONtWwDJhoGHEvwwZkS4kjDSAdTDM1YMvUp7qiL032ssWY6hFxSsPVA5mty4c02481pzsn36uVr-dasP17fl4t14_iTLo3mTCrn29ZyUNZyqSU4YIiotPDSK6taZ8TAORqmsbOGTUJZRINDxzsxJ3dH3-n_n4q59JtY0zhF9owxLrTQXTtR90fKpZhzQt_vU9iZ9Nsz6A-V9f-ViT_gUV79</recordid><startdate>20040601</startdate><enddate>20040601</enddate><creator>Balakrishnan, N.</creator><creator>Quiroz, A. J.</creator><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88C</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>FYUFA</scope><scope>F~G</scope><scope>GHDGH</scope><scope>H8D</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0T</scope><scope>M7S</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYYUZ</scope><scope>Q9U</scope></search><sort><creationdate>20040601</creationdate><title>A procedure for outlier identification in data sets from continuous distributions</title><author>Balakrishnan, N. ; Quiroz, A. J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c259t-92168cf44b208bb26960c01eee893f6f8b84ca3d22ea19e7ba1ea18beeaed7273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Econometrics</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Balakrishnan, N.</creatorcontrib><creatorcontrib>Quiroz, A. J.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>Health Research Premium Collection</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>Aerospace Database</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Healthcare Administration Database</collection><collection>Engineering Database</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>Test (Madrid, Spain)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Balakrishnan, N.</au><au>Quiroz, A. J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A procedure for outlier identification in data sets from continuous distributions</atitle><jtitle>Test (Madrid, Spain)</jtitle><date>2004-06-01</date><risdate>2004</risdate><volume>13</volume><issue>1</issue><spage>247</spage><epage>262</epage><pages>247-262</pages><issn>1133-0686</issn><eissn>1863-8260</eissn><abstract>We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.[PUBLICATION ABSTRACT]</abstract><cop>Heidelberg</cop><pub>Springer Nature B.V</pub><doi>10.1007/BF02603008</doi><tpages>16</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1133-0686
ispartof Test (Madrid, Spain), 2004-06, Vol.13 (1), p.247-262
issn 1133-0686
1863-8260
language eng
recordid cdi_proquest_journals_1112393974
source SpringerLink Journals - AutoHoldings
subjects Econometrics
Studies
title A procedure for outlier identification in data sets from continuous distributions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T21%3A24%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20procedure%20for%20outlier%20identification%20in%20data%20sets%20from%20continuous%20distributions&rft.jtitle=Test%20(Madrid,%20Spain)&rft.au=Balakrishnan,%20N.&rft.date=2004-06-01&rft.volume=13&rft.issue=1&rft.spage=247&rft.epage=262&rft.pages=247-262&rft.issn=1133-0686&rft.eissn=1863-8260&rft_id=info:doi/10.1007/BF02603008&rft_dat=%3Cproquest_cross%3E2790608491%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1112393974&rft_id=info:pmid/&rfr_iscdi=true