Optimal Bayesian Filtering for Biomarker Discovery: Performance and Robustness

Optimal Bayesian feature filtering (OBF) is a fast and memory-efficient algorithm that optimally identifies markers with distributional differences between treatment groups under Gaussian models. Here, we study the performance and robustness of OBF for biomarker discovery. Our contributions are twof...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on computational biology and bioinformatics 2020-01, Vol.17 (1), p.250-263
Hauptverfasser: Foroughi pour, Ali, Dalton, Lori A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 263
container_issue 1
container_start_page 250
container_title IEEE/ACM transactions on computational biology and bioinformatics
container_volume 17
creator Foroughi pour, Ali
Dalton, Lori A.
description Optimal Bayesian feature filtering (OBF) is a fast and memory-efficient algorithm that optimally identifies markers with distributional differences between treatment groups under Gaussian models. Here, we study the performance and robustness of OBF for biomarker discovery. Our contributions are twofold: (1) we examine how OBF performs on data that violates modeling assumptions, and (2) we provide guidelines on how to set input parameters for robust performance. Contribution (1) addresses an important, relevant, and commonplace problem in computational biology, where it is often impossible to validate an algorithm's core assumptions. To accomplish both tasks, we present a battery of simulations that implement OBF with different inputs and challenge each assumption made by OBF. In particular, we examine the robustness of OBF with respect to incorrect input parameters, false independence, imbalanced sample size, and we address the Gaussianity assumption by considering performance on an extensive family of non-Gaussian distributions. We address advantages and disadvantages between different priors and optimization criteria throughout. Finally, we evaluate the utility of OBF in biomarker discovery using acute myeloid leukemia (AML) and colon cancer microarray datasets, and show that OBF is successful at identifying well-known biomarkers for these diseases that rank low under moderated t-test.
doi_str_mv 10.1109/TCBB.2018.2858814
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TCBB_2018_2858814</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8417895</ieee_id><sourcerecordid>2076234815</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-3273b6c108b7429a1b1336275f4b0a4e3c5c4ab009f4ee4c5bee29a5646991963</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhhdRrFZ_gAgS8OIldT-TXW-mWhWKFannZZNOJDXJ1t1E6L93S2sPnmZgnneYeRC6IHhECFa383GWjSgmckSlkJLwA3RChEhjpRJ-uOm5iIVK2ACder_EmHKF-TEaMIw5ToQ8Qa-zVVc1po4yswZfmTaaVHUHrmo_o9K6KKtsY9wXuOih8oX9Abe-i97AhVlj2gIi0y6id5v3vmvB-zN0VJraw_muDtHH5HE-fo6ns6eX8f00LpiiXcxoyvKkIFjmKafKkJwwltBUlDzHhgMrRMFNjrEqOQAvRA4QMJHwRCkSHhqim-3elbPfPfhON-E8qGvTgu29pjhNKOOSiIBe_0OXtndtuE5TJihRAcKBIluqcNZ7B6VeueDFrTXBeiNbb2TrjWy9kx0yV7vNfd7AYp_4sxuAyy1QAcB-LDlJpRLsFwcDgXo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2352198150</pqid></control><display><type>article</type><title>Optimal Bayesian Filtering for Biomarker Discovery: Performance and Robustness</title><source>IEEE Electronic Library (IEL)</source><creator>Foroughi pour, Ali ; Dalton, Lori A.</creator><creatorcontrib>Foroughi pour, Ali ; Dalton, Lori A.</creatorcontrib><description>Optimal Bayesian feature filtering (OBF) is a fast and memory-efficient algorithm that optimally identifies markers with distributional differences between treatment groups under Gaussian models. Here, we study the performance and robustness of OBF for biomarker discovery. Our contributions are twofold: (1) we examine how OBF performs on data that violates modeling assumptions, and (2) we provide guidelines on how to set input parameters for robust performance. Contribution (1) addresses an important, relevant, and commonplace problem in computational biology, where it is often impossible to validate an algorithm's core assumptions. To accomplish both tasks, we present a battery of simulations that implement OBF with different inputs and challenge each assumption made by OBF. In particular, we examine the robustness of OBF with respect to incorrect input parameters, false independence, imbalanced sample size, and we address the Gaussianity assumption by considering performance on an extensive family of non-Gaussian distributions. We address advantages and disadvantages between different priors and optimization criteria throughout. Finally, we evaluate the utility of OBF in biomarker discovery using acute myeloid leukemia (AML) and colon cancer microarray datasets, and show that OBF is successful at identifying well-known biomarkers for these diseases that rank low under moderated t-test.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2018.2858814</identifier><identifier>PMID: 30040658</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Acute myeloid leukemia ; Algorithms ; Bayes methods ; Bayes Theorem ; Bayesian analysis ; Bayesian modeling ; Bayesian variable selection ; Bioinformatics ; Biological system modeling ; biomarker discovery ; Biomarkers ; Colon ; Colon cancer ; Computational Biology - methods ; Computational modeling ; Computer applications ; Computer simulation ; Data models ; Databases, Factual ; Feature extraction ; feature selection ; Filtration ; Humans ; Leukemia ; Mathematical models ; Myeloid leukemia ; Neoplasms - diagnosis ; Neoplasms - metabolism ; Optimization ; Parameter robustness ; Robustness</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2020-01, Vol.17 (1), p.250-263</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-3273b6c108b7429a1b1336275f4b0a4e3c5c4ab009f4ee4c5bee29a5646991963</citedby><cites>FETCH-LOGICAL-c392t-3273b6c108b7429a1b1336275f4b0a4e3c5c4ab009f4ee4c5bee29a5646991963</cites><orcidid>0000-0001-8715-5723 ; 0000-0002-3547-0796</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8417895$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54737</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8417895$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30040658$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Foroughi pour, Ali</creatorcontrib><creatorcontrib>Dalton, Lori A.</creatorcontrib><title>Optimal Bayesian Filtering for Biomarker Discovery: Performance and Robustness</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Optimal Bayesian feature filtering (OBF) is a fast and memory-efficient algorithm that optimally identifies markers with distributional differences between treatment groups under Gaussian models. Here, we study the performance and robustness of OBF for biomarker discovery. Our contributions are twofold: (1) we examine how OBF performs on data that violates modeling assumptions, and (2) we provide guidelines on how to set input parameters for robust performance. Contribution (1) addresses an important, relevant, and commonplace problem in computational biology, where it is often impossible to validate an algorithm's core assumptions. To accomplish both tasks, we present a battery of simulations that implement OBF with different inputs and challenge each assumption made by OBF. In particular, we examine the robustness of OBF with respect to incorrect input parameters, false independence, imbalanced sample size, and we address the Gaussianity assumption by considering performance on an extensive family of non-Gaussian distributions. We address advantages and disadvantages between different priors and optimization criteria throughout. Finally, we evaluate the utility of OBF in biomarker discovery using acute myeloid leukemia (AML) and colon cancer microarray datasets, and show that OBF is successful at identifying well-known biomarkers for these diseases that rank low under moderated t-test.</description><subject>Acute myeloid leukemia</subject><subject>Algorithms</subject><subject>Bayes methods</subject><subject>Bayes Theorem</subject><subject>Bayesian analysis</subject><subject>Bayesian modeling</subject><subject>Bayesian variable selection</subject><subject>Bioinformatics</subject><subject>Biological system modeling</subject><subject>biomarker discovery</subject><subject>Biomarkers</subject><subject>Colon</subject><subject>Colon cancer</subject><subject>Computational Biology - methods</subject><subject>Computational modeling</subject><subject>Computer applications</subject><subject>Computer simulation</subject><subject>Data models</subject><subject>Databases, Factual</subject><subject>Feature extraction</subject><subject>feature selection</subject><subject>Filtration</subject><subject>Humans</subject><subject>Leukemia</subject><subject>Mathematical models</subject><subject>Myeloid leukemia</subject><subject>Neoplasms - diagnosis</subject><subject>Neoplasms - metabolism</subject><subject>Optimization</subject><subject>Parameter robustness</subject><subject>Robustness</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkE1Lw0AQhhdRrFZ_gAgS8OIldT-TXW-mWhWKFannZZNOJDXJ1t1E6L93S2sPnmZgnneYeRC6IHhECFa383GWjSgmckSlkJLwA3RChEhjpRJ-uOm5iIVK2ACder_EmHKF-TEaMIw5ToQ8Qa-zVVc1po4yswZfmTaaVHUHrmo_o9K6KKtsY9wXuOih8oX9Abe-i97AhVlj2gIi0y6id5v3vmvB-zN0VJraw_muDtHH5HE-fo6ns6eX8f00LpiiXcxoyvKkIFjmKafKkJwwltBUlDzHhgMrRMFNjrEqOQAvRA4QMJHwRCkSHhqim-3elbPfPfhON-E8qGvTgu29pjhNKOOSiIBe_0OXtndtuE5TJihRAcKBIluqcNZ7B6VeueDFrTXBeiNbb2TrjWy9kx0yV7vNfd7AYp_4sxuAyy1QAcB-LDlJpRLsFwcDgXo</recordid><startdate>202001</startdate><enddate>202001</enddate><creator>Foroughi pour, Ali</creator><creator>Dalton, Lori A.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8715-5723</orcidid><orcidid>https://orcid.org/0000-0002-3547-0796</orcidid></search><sort><creationdate>202001</creationdate><title>Optimal Bayesian Filtering for Biomarker Discovery: Performance and Robustness</title><author>Foroughi pour, Ali ; Dalton, Lori A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-3273b6c108b7429a1b1336275f4b0a4e3c5c4ab009f4ee4c5bee29a5646991963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Acute myeloid leukemia</topic><topic>Algorithms</topic><topic>Bayes methods</topic><topic>Bayes Theorem</topic><topic>Bayesian analysis</topic><topic>Bayesian modeling</topic><topic>Bayesian variable selection</topic><topic>Bioinformatics</topic><topic>Biological system modeling</topic><topic>biomarker discovery</topic><topic>Biomarkers</topic><topic>Colon</topic><topic>Colon cancer</topic><topic>Computational Biology - methods</topic><topic>Computational modeling</topic><topic>Computer applications</topic><topic>Computer simulation</topic><topic>Data models</topic><topic>Databases, Factual</topic><topic>Feature extraction</topic><topic>feature selection</topic><topic>Filtration</topic><topic>Humans</topic><topic>Leukemia</topic><topic>Mathematical models</topic><topic>Myeloid leukemia</topic><topic>Neoplasms - diagnosis</topic><topic>Neoplasms - metabolism</topic><topic>Optimization</topic><topic>Parameter robustness</topic><topic>Robustness</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Foroughi pour, Ali</creatorcontrib><creatorcontrib>Dalton, Lori A.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Foroughi pour, Ali</au><au>Dalton, Lori A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimal Bayesian Filtering for Biomarker Discovery: Performance and Robustness</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2020-01</date><risdate>2020</risdate><volume>17</volume><issue>1</issue><spage>250</spage><epage>263</epage><pages>250-263</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Optimal Bayesian feature filtering (OBF) is a fast and memory-efficient algorithm that optimally identifies markers with distributional differences between treatment groups under Gaussian models. Here, we study the performance and robustness of OBF for biomarker discovery. Our contributions are twofold: (1) we examine how OBF performs on data that violates modeling assumptions, and (2) we provide guidelines on how to set input parameters for robust performance. Contribution (1) addresses an important, relevant, and commonplace problem in computational biology, where it is often impossible to validate an algorithm's core assumptions. To accomplish both tasks, we present a battery of simulations that implement OBF with different inputs and challenge each assumption made by OBF. In particular, we examine the robustness of OBF with respect to incorrect input parameters, false independence, imbalanced sample size, and we address the Gaussianity assumption by considering performance on an extensive family of non-Gaussian distributions. We address advantages and disadvantages between different priors and optimization criteria throughout. Finally, we evaluate the utility of OBF in biomarker discovery using acute myeloid leukemia (AML) and colon cancer microarray datasets, and show that OBF is successful at identifying well-known biomarkers for these diseases that rank low under moderated t-test.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30040658</pmid><doi>10.1109/TCBB.2018.2858814</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-8715-5723</orcidid><orcidid>https://orcid.org/0000-0002-3547-0796</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1545-5963
ispartof IEEE/ACM transactions on computational biology and bioinformatics, 2020-01, Vol.17 (1), p.250-263
issn 1545-5963
1557-9964
language eng
recordid cdi_crossref_primary_10_1109_TCBB_2018_2858814
source IEEE Electronic Library (IEL)
subjects Acute myeloid leukemia
Algorithms
Bayes methods
Bayes Theorem
Bayesian analysis
Bayesian modeling
Bayesian variable selection
Bioinformatics
Biological system modeling
biomarker discovery
Biomarkers
Colon
Colon cancer
Computational Biology - methods
Computational modeling
Computer applications
Computer simulation
Data models
Databases, Factual
Feature extraction
feature selection
Filtration
Humans
Leukemia
Mathematical models
Myeloid leukemia
Neoplasms - diagnosis
Neoplasms - metabolism
Optimization
Parameter robustness
Robustness
title Optimal Bayesian Filtering for Biomarker Discovery: Performance and Robustness
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T01%3A55%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimal%20Bayesian%20Filtering%20for%20Biomarker%20Discovery:%20Performance%20and%20Robustness&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Foroughi%20pour,%20Ali&rft.date=2020-01&rft.volume=17&rft.issue=1&rft.spage=250&rft.epage=263&rft.pages=250-263&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2018.2858814&rft_dat=%3Cproquest_RIE%3E2076234815%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2352198150&rft_id=info:pmid/30040658&rft_ieee_id=8417895&rfr_iscdi=true