ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages

Abstract Motivation Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are sti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2021-05, Vol.37 (8), p.1115-1124
Hauptverfasser: Jin, Ting, Nguyen, Nam D, Talos, Flaminia, Wang, Daifeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1124
container_issue 8
container_start_page 1115
container_title Bioinformatics
container_volume 37
creator Jin, Ting
Nguyen, Nam D
Talos, Flaminia
Wang, Daifeng
description Abstract Motivation Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a ‘black box’, barely providing biological and clinical interpretability from the box. Results To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. Availabilityand implementation ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btaa935
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8150141</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btaa935</oup_id><sourcerecordid>2470035245</sourcerecordid><originalsourceid>FETCH-LOGICAL-c456t-11568bbd8b2db0b171f81330d70b34e58c7c71c3c3d76c71d275a8c5b294eaca3</originalsourceid><addsrcrecordid>eNqNUctuFDEQHCEQCYFfiHzkssQe2zOzHJDQKgSkIC5wttp2z67Bj8X2ROS7-EGc7BKRGye33FXVVaquO2f0DaNrfqFdcnFOOUB1plzoCrDm8kl3ysRAVz2V66dt5sO4EhPlJ92LUr5TKpkQ4nl3wjmnktPptPt9ufkM-Qfmt8TFinmfsYL2SAKYnYtIPEKOLm5JSBY9cRZjdbPDQrbY1virMUpxKZJmKdxLFdL-rDP1jma8i86AJ2mpJoXGg2hJxhsEX5qoR7N4yCSg2UF0JRSSZrJbAkRiXUEo2JyR5sLfklJhi-Vl92xuZHx1fM-6bx8uv24-rq6_XH3avL9eGSGHumJMDpPWdtK91VSzkc0Ta8HtSDUXKCczmpEZbrgdhzbZfpQwGan7tUAwwM-6dwfd_aIDWtOSZ_Bqn13LeasSOPV4E91ObdONmpikTLAm8PookNPPBUtVwRWD3kPEtBTVi5FSLnshG3Q4QE1OpWScH84wqu4aV48bV8fGG_H8X5MPtL8VNwA7ANKy_1_RP2xhxjY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2470035245</pqid></control><display><type>article</type><title>ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Jin, Ting ; Nguyen, Nam D ; Talos, Flaminia ; Wang, Daifeng</creator><contributor>Pier Luigi, Martelli</contributor><creatorcontrib>Jin, Ting ; Nguyen, Nam D ; Talos, Flaminia ; Wang, Daifeng ; Pier Luigi, Martelli</creatorcontrib><description>Abstract Motivation Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a ‘black box’, barely providing biological and clinical interpretability from the box. Results To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value &lt; 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. Availabilityand implementation ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btaa935</identifier><identifier>PMID: 33305308</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Biomarkers ; Carcinoma, Non-Small-Cell Lung ; Gene Expression ; Humans ; Lung Neoplasms - genetics ; Machine Learning ; Original Papers</subject><ispartof>Bioinformatics, 2021-05, Vol.37 (8), p.1115-1124</ispartof><rights>The Author(s) 2020. Published by Oxford University Press. 2020</rights><rights>The Author(s) 2020. Published by Oxford University Press.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c456t-11568bbd8b2db0b171f81330d70b34e58c7c71c3c3d76c71d275a8c5b294eaca3</citedby><cites>FETCH-LOGICAL-c456t-11568bbd8b2db0b171f81330d70b34e58c7c71c3c3d76c71d275a8c5b294eaca3</cites><orcidid>0000-0001-5073-0667 ; 0000-0001-9190-3704</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150141/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150141/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,1604,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33305308$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Pier Luigi, Martelli</contributor><creatorcontrib>Jin, Ting</creatorcontrib><creatorcontrib>Nguyen, Nam D</creatorcontrib><creatorcontrib>Talos, Flaminia</creatorcontrib><creatorcontrib>Wang, Daifeng</creatorcontrib><title>ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a ‘black box’, barely providing biological and clinical interpretability from the box. Results To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value &lt; 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. Availabilityand implementation ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. Supplementary information Supplementary data are available at Bioinformatics online.</description><subject>Biomarkers</subject><subject>Carcinoma, Non-Small-Cell Lung</subject><subject>Gene Expression</subject><subject>Humans</subject><subject>Lung Neoplasms - genetics</subject><subject>Machine Learning</subject><subject>Original Papers</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNUctuFDEQHCEQCYFfiHzkssQe2zOzHJDQKgSkIC5wttp2z67Bj8X2ROS7-EGc7BKRGye33FXVVaquO2f0DaNrfqFdcnFOOUB1plzoCrDm8kl3ysRAVz2V66dt5sO4EhPlJ92LUr5TKpkQ4nl3wjmnktPptPt9ufkM-Qfmt8TFinmfsYL2SAKYnYtIPEKOLm5JSBY9cRZjdbPDQrbY1virMUpxKZJmKdxLFdL-rDP1jma8i86AJ2mpJoXGg2hJxhsEX5qoR7N4yCSg2UF0JRSSZrJbAkRiXUEo2JyR5sLfklJhi-Vl92xuZHx1fM-6bx8uv24-rq6_XH3avL9eGSGHumJMDpPWdtK91VSzkc0Ta8HtSDUXKCczmpEZbrgdhzbZfpQwGan7tUAwwM-6dwfd_aIDWtOSZ_Bqn13LeasSOPV4E91ObdONmpikTLAm8PookNPPBUtVwRWD3kPEtBTVi5FSLnshG3Q4QE1OpWScH84wqu4aV48bV8fGG_H8X5MPtL8VNwA7ANKy_1_RP2xhxjY</recordid><startdate>20210523</startdate><enddate>20210523</enddate><creator>Jin, Ting</creator><creator>Nguyen, Nam D</creator><creator>Talos, Flaminia</creator><creator>Wang, Daifeng</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-5073-0667</orcidid><orcidid>https://orcid.org/0000-0001-9190-3704</orcidid></search><sort><creationdate>20210523</creationdate><title>ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages</title><author>Jin, Ting ; Nguyen, Nam D ; Talos, Flaminia ; Wang, Daifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c456t-11568bbd8b2db0b171f81330d70b34e58c7c71c3c3d76c71d275a8c5b294eaca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Biomarkers</topic><topic>Carcinoma, Non-Small-Cell Lung</topic><topic>Gene Expression</topic><topic>Humans</topic><topic>Lung Neoplasms - genetics</topic><topic>Machine Learning</topic><topic>Original Papers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jin, Ting</creatorcontrib><creatorcontrib>Nguyen, Nam D</creatorcontrib><creatorcontrib>Talos, Flaminia</creatorcontrib><creatorcontrib>Wang, Daifeng</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jin, Ting</au><au>Nguyen, Nam D</au><au>Talos, Flaminia</au><au>Wang, Daifeng</au><au>Pier Luigi, Martelli</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2021-05-23</date><risdate>2021</risdate><volume>37</volume><issue>8</issue><spage>1115</spage><epage>1124</epage><pages>1115-1124</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a ‘black box’, barely providing biological and clinical interpretability from the box. Results To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value &lt; 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. Availabilityand implementation ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>33305308</pmid><doi>10.1093/bioinformatics/btaa935</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-5073-0667</orcidid><orcidid>https://orcid.org/0000-0001-9190-3704</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2021-05, Vol.37 (8), p.1115-1124
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8150141
source MEDLINE; Oxford Journals Open Access Collection; EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection
subjects Biomarkers
Carcinoma, Non-Small-Cell Lung
Gene Expression
Humans
Lung Neoplasms - genetics
Machine Learning
Original Papers
title ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T04%3A14%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ECMarker:%20interpretable%20machine%20learning%20model%20identifies%20gene%20expression%20biomarkers%20predicting%20clinical%20outcomes%20and%20reveals%20molecular%20mechanisms%20of%20human%20disease%20in%20early%20stages&rft.jtitle=Bioinformatics&rft.au=Jin,%20Ting&rft.date=2021-05-23&rft.volume=37&rft.issue=8&rft.spage=1115&rft.epage=1124&rft.pages=1115-1124&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btaa935&rft_dat=%3Cproquest_pubme%3E2470035245%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2470035245&rft_id=info:pmid/33305308&rft_oup_id=10.1093/bioinformatics/btaa935&rfr_iscdi=true