A procedure for outlier identification in data sets from continuous distributions

We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Test (Madrid, Spain) Spain), 2004-06, Vol.13 (1), p.247-262
Hauptverfasser:	Balakrishnan, N., Quiroz, A. J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Econometrics Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	262
container_issue	1
container_start_page	247
container_title	Test (Madrid, Spain)
container_volume	13
creator	Balakrishnan, N. Quiroz, A. J.
description	We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.[PUBLICATION ABSTRACT]
doi_str_mv	10.1007/BF02603008
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1112393974</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2790608491</sourcerecordid><originalsourceid>FETCH-LOGICAL-c259t-92168cf44b208bb26960c01eee893f6f8b84ca3d22ea19e7ba1ea18beeaed7273</originalsourceid><addsrcrecordid>eNpFkE9LAzEQxYMoWKsXP0HAm7A6SdZscqzFqlAQQc9Lkp1ASrup-XPw27ulBU_zhvnx3vAIuWXwwAC6x-cVcAkCQJ2RGVNSNGrazyfNhGhAKnlJrnLeAMhWcjYjnwu6T9HhUBNSHxONtWwDJhoGHEvwwZkS4kjDSAdTDM1YMvUp7qiL032ssWY6hFxSsPVA5mty4c02481pzsn36uVr-dasP17fl4t14_iTLo3mTCrn29ZyUNZyqSU4YIiotPDSK6taZ8TAORqmsbOGTUJZRINDxzsxJ3dH3-n_n4q59JtY0zhF9owxLrTQXTtR90fKpZhzQt_vU9iZ9Nsz6A-V9f-ViT_gUV79</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1112393974</pqid></control><display><type>article</type><title>A procedure for outlier identification in data sets from continuous distributions</title><source>SpringerLink Journals - AutoHoldings</source><creator>Balakrishnan, N. ; Quiroz, A. J.</creator><creatorcontrib>Balakrishnan, N. ; Quiroz, A. J.</creatorcontrib><description>We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.[PUBLICATION ABSTRACT]</description><identifier>ISSN: 1133-0686</identifier><identifier>EISSN: 1863-8260</identifier><identifier>DOI: 10.1007/BF02603008</identifier><language>eng</language><publisher>Heidelberg: Springer Nature B.V</publisher><subject>Econometrics ; Studies</subject><ispartof>Test (Madrid, Spain), 2004-06, Vol.13 (1), p.247-262</ispartof><rights>Sociedad Española de Estadística e Investigación Operativa 2004</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c259t-92168cf44b208bb26960c01eee893f6f8b84ca3d22ea19e7ba1ea18beeaed7273</citedby><cites>FETCH-LOGICAL-c259t-92168cf44b208bb26960c01eee893f6f8b84ca3d22ea19e7ba1ea18beeaed7273</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Balakrishnan, N.</creatorcontrib><creatorcontrib>Quiroz, A. J.</creatorcontrib><title>A procedure for outlier identification in data sets from continuous distributions</title><title>Test (Madrid, Spain)</title><description>We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.[PUBLICATION ABSTRACT]</description><subject>Econometrics</subject><subject>Studies</subject><issn>1133-0686</issn><issn>1863-8260</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpFkE9LAzEQxYMoWKsXP0HAm7A6SdZscqzFqlAQQc9Lkp1ASrup-XPw27ulBU_zhvnx3vAIuWXwwAC6x-cVcAkCQJ2RGVNSNGrazyfNhGhAKnlJrnLeAMhWcjYjnwu6T9HhUBNSHxONtWwDJhoGHEvwwZkS4kjDSAdTDM1YMvUp7qiL032ssWY6hFxSsPVA5mty4c02481pzsn36uVr-dasP17fl4t14_iTLo3mTCrn29ZyUNZyqSU4YIiotPDSK6taZ8TAORqmsbOGTUJZRINDxzsxJ3dH3-n_n4q59JtY0zhF9owxLrTQXTtR90fKpZhzQt_vU9iZ9Nsz6A-V9f-ViT_gUV79</recordid><startdate>20040601</startdate><enddate>20040601</enddate><creator>Balakrishnan, N.</creator><creator>Quiroz, A. J.</creator><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88C</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>FYUFA</scope><scope>F~G</scope><scope>GHDGH</scope><scope>H8D</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0T</scope><scope>M7S</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYYUZ</scope><scope>Q9U</scope></search><sort><creationdate>20040601</creationdate><title>A procedure for outlier identification in data sets from continuous distributions</title><author>Balakrishnan, N. ; Quiroz, A. J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c259t-92168cf44b208bb26960c01eee893f6f8b84ca3d22ea19e7ba1ea18beeaed7273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Econometrics</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Balakrishnan, N.</creatorcontrib><creatorcontrib>Quiroz, A. J.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>Health Research Premium Collection</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>Aerospace Database</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Healthcare Administration Database</collection><collection>Engineering Database</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>Test (Madrid, Spain)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Balakrishnan, N.</au><au>Quiroz, A. J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A procedure for outlier identification in data sets from continuous distributions</atitle><jtitle>Test (Madrid, Spain)</jtitle><date>2004-06-01</date><risdate>2004</risdate><volume>13</volume><issue>1</issue><spage>247</spage><epage>262</epage><pages>247-262</pages><issn>1133-0686</issn><eissn>1863-8260</eissn><abstract>We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörg (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.[PUBLICATION ABSTRACT]</abstract><cop>Heidelberg</cop><pub>Springer Nature B.V</pub><doi>10.1007/BF02603008</doi><tpages>16</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1133-0686
ispartof	Test (Madrid, Spain), 2004-06, Vol.13 (1), p.247-262
issn	1133-0686 1863-8260
language	eng
recordid	cdi_proquest_journals_1112393974
source	SpringerLink Journals - AutoHoldings
subjects	Econometrics Studies
title	A procedure for outlier identification in data sets from continuous distributions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T21%3A24%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20procedure%20for%20outlier%20identification%20in%20data%20sets%20from%20continuous%20distributions&rft.jtitle=Test%20(Madrid,%20Spain)&rft.au=Balakrishnan,%20N.&rft.date=2004-06-01&rft.volume=13&rft.issue=1&rft.spage=247&rft.epage=262&rft.pages=247-262&rft.issn=1133-0686&rft.eissn=1863-8260&rft_id=info:doi/10.1007/BF02603008&rft_dat=%3Cproquest_cross%3E2790608491%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1112393974&rft_id=info:pmid/&rfr_iscdi=true