Receiver operating characteristic analysis: a general tool for DNA array data filtration and performance estimation

A critical step for DNA array analysis is data filtration, which can reduce thousands of detected signals to limited sets of genes. Commonly accepted rules for such filtration are still absent. We present a rational approach, based on thresholding of intensities with cutoff levels that are estimated...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genomics (San Diego, Calif.) Calif.), 2003-02, Vol.81 (2), p.202-209
Hauptverfasser: Khodarev, Nikolai N., Park, James, Kataoka, Yasushi, Nodzenski, Edwardine, Hellman, Samuel, Roizman, Bernard, Weichselbaum, Ralph R., Pelizzari, Charles A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 209
container_issue 2
container_start_page 202
container_title Genomics (San Diego, Calif.)
container_volume 81
creator Khodarev, Nikolai N.
Park, James
Kataoka, Yasushi
Nodzenski, Edwardine
Hellman, Samuel
Roizman, Bernard
Weichselbaum, Ralph R.
Pelizzari, Charles A.
description A critical step for DNA array analysis is data filtration, which can reduce thousands of detected signals to limited sets of genes. Commonly accepted rules for such filtration are still absent. We present a rational approach, based on thresholding of intensities with cutoff levels that are estimated by receiver operating characteristic (ROC) analysis. The technique compares test results with known distributions of positive and negative signals. We apply the method to Atlas cDNA arrays, GeneFilters, and Affymetrix GeneChip. ROC analysis demonstrates similarities in the distribution of false and true positive data for these different systems. We illustrate the estimation of an optimal cutoff level for intensity-based filtration, providing the highest ratio of true to false signals. For GeneChip arrays, we derived filtration thresholds consistent with the reported data based on replicate hybridizations. Intensity-based filtration optimized with ROC combined with other types of filtration (for example, based on significances of differences and/or ratios), should improve DNA array analysis. ROC methodology is also demonstrated for comparison of the performance of different types of arrays, imagers, and analysis software.
doi_str_mv 10.1016/S0888-7543(02)00042-3
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_73065324</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0888754302000423</els_id><sourcerecordid>73065324</sourcerecordid><originalsourceid>FETCH-LOGICAL-c422t-ab165a3dfe4ca8b15693fe16ed60034f6c82a91f4896976440281abee0b01c393</originalsourceid><addsrcrecordid>eNqFkUlvFDEQhS0EIsPATwD5AoJDQ3lpj5sLipKwSBFILGer2l0ORj3twe6JNP8ezyJyzMkHf-9V1XuMPRfwVoAw736AtbZZtVq9BvkGALRs1AO2EGC7xhptHrLFf-SMPSnlT4U6ZeVjdiakkaA6u2DlO3mKt5R52lDGOU433P_GjH6mHMscPccJx12J5T1HfkNTpUY-pzTykDK__HrOMWfc8QFn5CGO894lTVU28GpZoTVOnjhVs_Xh6yl7FHAs9Oz0Ltmvj1c_Lz43198-fbk4v268lnJusBemRTUE0h5tL1rTqUDC0GAAlA7GW4mdCNp2plsZrUFagT0R9CC86tSSvTr6bnL6u63z3ToWT-OIE6VtcSsFplVS3wsKu1LCGlXB9gj6nErJFNwm16Pyzglw-1rcoRa3z9yBdIda3F734jRg269puFOdeqjAyxOAxeMYco0sljtO15N13WDJPhw5qrndRsqu-Eg13iFm8rMbUrxnlX8IpKqP</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>18731863</pqid></control><display><type>article</type><title>Receiver operating characteristic analysis: a general tool for DNA array data filtration and performance estimation</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Khodarev, Nikolai N. ; Park, James ; Kataoka, Yasushi ; Nodzenski, Edwardine ; Hellman, Samuel ; Roizman, Bernard ; Weichselbaum, Ralph R. ; Pelizzari, Charles A.</creator><creatorcontrib>Khodarev, Nikolai N. ; Park, James ; Kataoka, Yasushi ; Nodzenski, Edwardine ; Hellman, Samuel ; Roizman, Bernard ; Weichselbaum, Ralph R. ; Pelizzari, Charles A.</creatorcontrib><description>A critical step for DNA array analysis is data filtration, which can reduce thousands of detected signals to limited sets of genes. Commonly accepted rules for such filtration are still absent. We present a rational approach, based on thresholding of intensities with cutoff levels that are estimated by receiver operating characteristic (ROC) analysis. The technique compares test results with known distributions of positive and negative signals. We apply the method to Atlas cDNA arrays, GeneFilters, and Affymetrix GeneChip. ROC analysis demonstrates similarities in the distribution of false and true positive data for these different systems. We illustrate the estimation of an optimal cutoff level for intensity-based filtration, providing the highest ratio of true to false signals. For GeneChip arrays, we derived filtration thresholds consistent with the reported data based on replicate hybridizations. Intensity-based filtration optimized with ROC combined with other types of filtration (for example, based on significances of differences and/or ratios), should improve DNA array analysis. ROC methodology is also demonstrated for comparison of the performance of different types of arrays, imagers, and analysis software.</description><identifier>ISSN: 0888-7543</identifier><identifier>EISSN: 1089-8646</identifier><identifier>DOI: 10.1016/S0888-7543(02)00042-3</identifier><identifier>PMID: 12620398</identifier><language>eng</language><publisher>San Diego, CA: Elsevier Inc</publisher><subject>Biological and medical sciences ; Data filtration ; Data Interpretation, Statistical ; Data quality ; DNA arrays ; False positive ; Fundamental and applied biological sciences. Psychology ; Molecular and cellular biology ; Molecular genetics ; Mutagenesis. Repair ; Oligonucleotide Array Sequence Analysis - methods ; ROC Curve ; Sensitivity ; Sensitivity and Specificity ; Specificity</subject><ispartof>Genomics (San Diego, Calif.), 2003-02, Vol.81 (2), p.202-209</ispartof><rights>2003 Elsevier Science (USA)</rights><rights>2003 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c422t-ab165a3dfe4ca8b15693fe16ed60034f6c82a91f4896976440281abee0b01c393</citedby><cites>FETCH-LOGICAL-c422t-ab165a3dfe4ca8b15693fe16ed60034f6c82a91f4896976440281abee0b01c393</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0888754302000423$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=14600486$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/12620398$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Khodarev, Nikolai N.</creatorcontrib><creatorcontrib>Park, James</creatorcontrib><creatorcontrib>Kataoka, Yasushi</creatorcontrib><creatorcontrib>Nodzenski, Edwardine</creatorcontrib><creatorcontrib>Hellman, Samuel</creatorcontrib><creatorcontrib>Roizman, Bernard</creatorcontrib><creatorcontrib>Weichselbaum, Ralph R.</creatorcontrib><creatorcontrib>Pelizzari, Charles A.</creatorcontrib><title>Receiver operating characteristic analysis: a general tool for DNA array data filtration and performance estimation</title><title>Genomics (San Diego, Calif.)</title><addtitle>Genomics</addtitle><description>A critical step for DNA array analysis is data filtration, which can reduce thousands of detected signals to limited sets of genes. Commonly accepted rules for such filtration are still absent. We present a rational approach, based on thresholding of intensities with cutoff levels that are estimated by receiver operating characteristic (ROC) analysis. The technique compares test results with known distributions of positive and negative signals. We apply the method to Atlas cDNA arrays, GeneFilters, and Affymetrix GeneChip. ROC analysis demonstrates similarities in the distribution of false and true positive data for these different systems. We illustrate the estimation of an optimal cutoff level for intensity-based filtration, providing the highest ratio of true to false signals. For GeneChip arrays, we derived filtration thresholds consistent with the reported data based on replicate hybridizations. Intensity-based filtration optimized with ROC combined with other types of filtration (for example, based on significances of differences and/or ratios), should improve DNA array analysis. ROC methodology is also demonstrated for comparison of the performance of different types of arrays, imagers, and analysis software.</description><subject>Biological and medical sciences</subject><subject>Data filtration</subject><subject>Data Interpretation, Statistical</subject><subject>Data quality</subject><subject>DNA arrays</subject><subject>False positive</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Molecular and cellular biology</subject><subject>Molecular genetics</subject><subject>Mutagenesis. Repair</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>ROC Curve</subject><subject>Sensitivity</subject><subject>Sensitivity and Specificity</subject><subject>Specificity</subject><issn>0888-7543</issn><issn>1089-8646</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkUlvFDEQhS0EIsPATwD5AoJDQ3lpj5sLipKwSBFILGer2l0ORj3twe6JNP8ezyJyzMkHf-9V1XuMPRfwVoAw736AtbZZtVq9BvkGALRs1AO2EGC7xhptHrLFf-SMPSnlT4U6ZeVjdiakkaA6u2DlO3mKt5R52lDGOU433P_GjH6mHMscPccJx12J5T1HfkNTpUY-pzTykDK__HrOMWfc8QFn5CGO894lTVU28GpZoTVOnjhVs_Xh6yl7FHAs9Oz0Ltmvj1c_Lz43198-fbk4v268lnJusBemRTUE0h5tL1rTqUDC0GAAlA7GW4mdCNp2plsZrUFagT0R9CC86tSSvTr6bnL6u63z3ToWT-OIE6VtcSsFplVS3wsKu1LCGlXB9gj6nErJFNwm16Pyzglw-1rcoRa3z9yBdIda3F734jRg269puFOdeqjAyxOAxeMYco0sljtO15N13WDJPhw5qrndRsqu-Eg13iFm8rMbUrxnlX8IpKqP</recordid><startdate>20030201</startdate><enddate>20030201</enddate><creator>Khodarev, Nikolai N.</creator><creator>Park, James</creator><creator>Kataoka, Yasushi</creator><creator>Nodzenski, Edwardine</creator><creator>Hellman, Samuel</creator><creator>Roizman, Bernard</creator><creator>Weichselbaum, Ralph R.</creator><creator>Pelizzari, Charles A.</creator><general>Elsevier Inc</general><general>Elsevier</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TM</scope><scope>7X8</scope></search><sort><creationdate>20030201</creationdate><title>Receiver operating characteristic analysis: a general tool for DNA array data filtration and performance estimation</title><author>Khodarev, Nikolai N. ; Park, James ; Kataoka, Yasushi ; Nodzenski, Edwardine ; Hellman, Samuel ; Roizman, Bernard ; Weichselbaum, Ralph R. ; Pelizzari, Charles A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c422t-ab165a3dfe4ca8b15693fe16ed60034f6c82a91f4896976440281abee0b01c393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Biological and medical sciences</topic><topic>Data filtration</topic><topic>Data Interpretation, Statistical</topic><topic>Data quality</topic><topic>DNA arrays</topic><topic>False positive</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Molecular and cellular biology</topic><topic>Molecular genetics</topic><topic>Mutagenesis. Repair</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>ROC Curve</topic><topic>Sensitivity</topic><topic>Sensitivity and Specificity</topic><topic>Specificity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Khodarev, Nikolai N.</creatorcontrib><creatorcontrib>Park, James</creatorcontrib><creatorcontrib>Kataoka, Yasushi</creatorcontrib><creatorcontrib>Nodzenski, Edwardine</creatorcontrib><creatorcontrib>Hellman, Samuel</creatorcontrib><creatorcontrib>Roizman, Bernard</creatorcontrib><creatorcontrib>Weichselbaum, Ralph R.</creatorcontrib><creatorcontrib>Pelizzari, Charles A.</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Nucleic Acids Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Genomics (San Diego, Calif.)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Khodarev, Nikolai N.</au><au>Park, James</au><au>Kataoka, Yasushi</au><au>Nodzenski, Edwardine</au><au>Hellman, Samuel</au><au>Roizman, Bernard</au><au>Weichselbaum, Ralph R.</au><au>Pelizzari, Charles A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Receiver operating characteristic analysis: a general tool for DNA array data filtration and performance estimation</atitle><jtitle>Genomics (San Diego, Calif.)</jtitle><addtitle>Genomics</addtitle><date>2003-02-01</date><risdate>2003</risdate><volume>81</volume><issue>2</issue><spage>202</spage><epage>209</epage><pages>202-209</pages><issn>0888-7543</issn><eissn>1089-8646</eissn><abstract>A critical step for DNA array analysis is data filtration, which can reduce thousands of detected signals to limited sets of genes. Commonly accepted rules for such filtration are still absent. We present a rational approach, based on thresholding of intensities with cutoff levels that are estimated by receiver operating characteristic (ROC) analysis. The technique compares test results with known distributions of positive and negative signals. We apply the method to Atlas cDNA arrays, GeneFilters, and Affymetrix GeneChip. ROC analysis demonstrates similarities in the distribution of false and true positive data for these different systems. We illustrate the estimation of an optimal cutoff level for intensity-based filtration, providing the highest ratio of true to false signals. For GeneChip arrays, we derived filtration thresholds consistent with the reported data based on replicate hybridizations. Intensity-based filtration optimized with ROC combined with other types of filtration (for example, based on significances of differences and/or ratios), should improve DNA array analysis. ROC methodology is also demonstrated for comparison of the performance of different types of arrays, imagers, and analysis software.</abstract><cop>San Diego, CA</cop><pub>Elsevier Inc</pub><pmid>12620398</pmid><doi>10.1016/S0888-7543(02)00042-3</doi><tpages>8</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0888-7543
ispartof Genomics (San Diego, Calif.), 2003-02, Vol.81 (2), p.202-209
issn 0888-7543
1089-8646
language eng
recordid cdi_proquest_miscellaneous_73065324
source MEDLINE; Elsevier ScienceDirect Journals
subjects Biological and medical sciences
Data filtration
Data Interpretation, Statistical
Data quality
DNA arrays
False positive
Fundamental and applied biological sciences. Psychology
Molecular and cellular biology
Molecular genetics
Mutagenesis. Repair
Oligonucleotide Array Sequence Analysis - methods
ROC Curve
Sensitivity
Sensitivity and Specificity
Specificity
title Receiver operating characteristic analysis: a general tool for DNA array data filtration and performance estimation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T13%3A36%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Receiver%20operating%20characteristic%20analysis:%20a%20general%20tool%20for%20DNA%20array%20data%20filtration%20and%20performance%20estimation&rft.jtitle=Genomics%20(San%20Diego,%20Calif.)&rft.au=Khodarev,%20Nikolai%20N.&rft.date=2003-02-01&rft.volume=81&rft.issue=2&rft.spage=202&rft.epage=209&rft.pages=202-209&rft.issn=0888-7543&rft.eissn=1089-8646&rft_id=info:doi/10.1016/S0888-7543(02)00042-3&rft_dat=%3Cproquest_cross%3E73065324%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=18731863&rft_id=info:pmid/12620398&rft_els_id=S0888754302000423&rfr_iscdi=true