Small-sample precision of ROC-related estimates

Motivation: The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2010-03, Vol.26 (6), p.822-830
Hauptverfasser: Hanczar, Blaise, Hua, Jianping, Sima, Chao, Weinstein, John, Bittner, Michael, Dougherty, Edward R.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 830
container_issue 6
container_start_page 822
container_title Bioinformatics
container_volume 26
creator Hanczar, Blaise
Hua, Jianping
Sima, Chao
Weinstein, John
Bittner, Michael
Dougherty, Edward R.
description Motivation: The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics? Results: Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results. Availability: Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html Contact: edward@mail.ece.tamu.edu
doi_str_mv 10.1093/bioinformatics/btq037
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_746084472</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>746084472</sourcerecordid><originalsourceid>FETCH-LOGICAL-c455t-2dbde2f1b53b7ea232895827e2ecd9b9046296500c1acbe37a52b17b69eb013e3</originalsourceid><addsrcrecordid>eNqFkE1LAzEQhoMotlZ_gtKLeFo7-dp0j1I_KpQWv0C8hCQ7C6u73ZpsQf-9ka0VT54ykGfmnXkIOaZwTiHjI1s25bJofG3a0oWRbd-Bqx3SpyKFhIHMdmPNU5WIMfAeOQjhFUBSIcQ-6TGgHIBlfTJ6qE1VJcHUqwqHK4-uDGWzHDbF8H4xSTxWpsV8iKEtYxKGQ7JXmCrg0eYdkKfrq8fJNJktbm4nF7PECSnbhOU2R1ZQK7lVaBhn40yOmUKGLs9sBiJlWSoBHDXOIldGMkuVTTO0cTfkA3LWzV355n0d43VdBodVZZbYrINW8cyxEIr9T3LOgYnobEBkRzrfhOCx0Csfr_KfmoL-lqr_StWd1Nh3sklY2xrzbdePxQicbgATnKkKb5ZR4y_HJJOgROSSjitDix_bf-PfdKq4knr6_KIncCfn8ynVl_wL0R2S8A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>733302409</pqid></control><display><type>article</type><title>Small-sample precision of ROC-related estimates</title><source>MEDLINE</source><source>Access via Oxford University Press (Open Access Collection)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Hanczar, Blaise ; Hua, Jianping ; Sima, Chao ; Weinstein, John ; Bittner, Michael ; Dougherty, Edward R.</creator><creatorcontrib>Hanczar, Blaise ; Hua, Jianping ; Sima, Chao ; Weinstein, John ; Bittner, Michael ; Dougherty, Edward R.</creatorcontrib><description>Motivation: The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics? Results: Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results. Availability: Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html Contact: edward@mail.ece.tamu.edu</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btq037</identifier><identifier>PMID: 20130029</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Biological and medical sciences ; False Positive Reactions ; Fundamental and applied biological sciences. Psychology ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Oligonucleotide Array Sequence Analysis ; Pattern Recognition, Automated - methods ; ROC Curve</subject><ispartof>Bioinformatics, 2010-03, Vol.26 (6), p.822-830</ispartof><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c455t-2dbde2f1b53b7ea232895827e2ecd9b9046296500c1acbe37a52b17b69eb013e3</citedby><cites>FETCH-LOGICAL-c455t-2dbde2f1b53b7ea232895827e2ecd9b9046296500c1acbe37a52b17b69eb013e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=22525074$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/20130029$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Hanczar, Blaise</creatorcontrib><creatorcontrib>Hua, Jianping</creatorcontrib><creatorcontrib>Sima, Chao</creatorcontrib><creatorcontrib>Weinstein, John</creatorcontrib><creatorcontrib>Bittner, Michael</creatorcontrib><creatorcontrib>Dougherty, Edward R.</creatorcontrib><title>Small-sample precision of ROC-related estimates</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics? Results: Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results. Availability: Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html Contact: edward@mail.ece.tamu.edu</description><subject>Algorithms</subject><subject>Biological and medical sciences</subject><subject>False Positive Reactions</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Oligonucleotide Array Sequence Analysis</subject><subject>Pattern Recognition, Automated - methods</subject><subject>ROC Curve</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkE1LAzEQhoMotlZ_gtKLeFo7-dp0j1I_KpQWv0C8hCQ7C6u73ZpsQf-9ka0VT54ykGfmnXkIOaZwTiHjI1s25bJofG3a0oWRbd-Bqx3SpyKFhIHMdmPNU5WIMfAeOQjhFUBSIcQ-6TGgHIBlfTJ6qE1VJcHUqwqHK4-uDGWzHDbF8H4xSTxWpsV8iKEtYxKGQ7JXmCrg0eYdkKfrq8fJNJktbm4nF7PECSnbhOU2R1ZQK7lVaBhn40yOmUKGLs9sBiJlWSoBHDXOIldGMkuVTTO0cTfkA3LWzV355n0d43VdBodVZZbYrINW8cyxEIr9T3LOgYnobEBkRzrfhOCx0Csfr_KfmoL-lqr_StWd1Nh3sklY2xrzbdePxQicbgATnKkKb5ZR4y_HJJOgROSSjitDix_bf-PfdKq4knr6_KIncCfn8ynVl_wL0R2S8A</recordid><startdate>20100315</startdate><enddate>20100315</enddate><creator>Hanczar, Blaise</creator><creator>Hua, Jianping</creator><creator>Sima, Chao</creator><creator>Weinstein, John</creator><creator>Bittner, Michael</creator><creator>Dougherty, Edward R.</creator><general>Oxford University Press</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope></search><sort><creationdate>20100315</creationdate><title>Small-sample precision of ROC-related estimates</title><author>Hanczar, Blaise ; Hua, Jianping ; Sima, Chao ; Weinstein, John ; Bittner, Michael ; Dougherty, Edward R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c455t-2dbde2f1b53b7ea232895827e2ecd9b9046296500c1acbe37a52b17b69eb013e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Algorithms</topic><topic>Biological and medical sciences</topic><topic>False Positive Reactions</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Oligonucleotide Array Sequence Analysis</topic><topic>Pattern Recognition, Automated - methods</topic><topic>ROC Curve</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hanczar, Blaise</creatorcontrib><creatorcontrib>Hua, Jianping</creatorcontrib><creatorcontrib>Sima, Chao</creatorcontrib><creatorcontrib>Weinstein, John</creatorcontrib><creatorcontrib>Bittner, Michael</creatorcontrib><creatorcontrib>Dougherty, Edward R.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hanczar, Blaise</au><au>Hua, Jianping</au><au>Sima, Chao</au><au>Weinstein, John</au><au>Bittner, Michael</au><au>Dougherty, Edward R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Small-sample precision of ROC-related estimates</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2010-03-15</date><risdate>2010</risdate><volume>26</volume><issue>6</issue><spage>822</spage><epage>830</epage><pages>822-830</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Motivation: The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics? Results: Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results. Availability: Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html Contact: edward@mail.ece.tamu.edu</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>20130029</pmid><doi>10.1093/bioinformatics/btq037</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2010-03, Vol.26 (6), p.822-830
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_746084472
source MEDLINE; Access via Oxford University Press (Open Access Collection); EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection
subjects Algorithms
Biological and medical sciences
False Positive Reactions
Fundamental and applied biological sciences. Psychology
General aspects
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Oligonucleotide Array Sequence Analysis
Pattern Recognition, Automated - methods
ROC Curve
title Small-sample precision of ROC-related estimates
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T19%3A04%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Small-sample%20precision%20of%20ROC-related%20estimates&rft.jtitle=Bioinformatics&rft.au=Hanczar,%20Blaise&rft.date=2010-03-15&rft.volume=26&rft.issue=6&rft.spage=822&rft.epage=830&rft.pages=822-830&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btq037&rft_dat=%3Cproquest_cross%3E746084472%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=733302409&rft_id=info:pmid/20130029&rfr_iscdi=true