Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes

The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2015-07, Vol.11 (7), p.e1004259-e1004259
Hauptverfasser: Himmelstein, Daniel S, Baranzini, Sergio E
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1004259
container_issue 7
container_start_page e1004259
container_title PLoS computational biology
container_volume 11
creator Himmelstein, Daniel S
Baranzini, Sergio E
description The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks--graphs with multiple node and edge types--for accomplishing both tasks. First we constructed a network with 18 node types--genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections--and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.
doi_str_mv 10.1371/journal.pcbi.1004259
format Article
fullrecord <record><control><sourceid>proquest_plos_</sourceid><recordid>TN_cdi_plos_journals_1704313643</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_ccf29d8590a24e97ad20dbeacdc60921</doaj_id><sourcerecordid>1695757116</sourcerecordid><originalsourceid>FETCH-LOGICAL-c498t-66534d891e98cfb5e7a567503e5cc073f23298a5b2aea27e8fb6481a2887d5063</originalsourceid><addsrcrecordid>eNpVkk1vEzEQhlcIREvhHyDYYy8J_lh_cUCK2tJGqoADnK1ZezZ12KwX2wG1v55Nk1btyaPxO8_M2G9VvadkTrmin9Zxmwbo56Nrw5wS0jBhXlTHVAg-U1zol0_io-pNzmtCptDI19URk1RoxfRxFa6wYIorHDBuc_0Ny7-YftcXfoX1j4Q-uBLi8Lle1OdQoF4OBVcJdrl6MY4pgrupS5ykIaZQwh3W5yEjZJwtco4uQEFfX070_LZ61UGf8d3hPKl-fb34eXY1u_5-uTxbXM9cY3SZSSl447WhaLTrWoEKhFSCcBTOEcU7xpnRIFoGCEyh7lrZaApMa-UFkfyk-rjnjn3M9vBK2VJFGk65bPikWO4VPsLajilsIN3aCMHeJ2JaWUgluB6tcx0zXgtDgDVoFHhGfIvgvJPEMDqxvhy6bdsNeodDSdA_gz6_GcKNXcW_tmmMktRMgNMDIMU_W8zFbkJ22Pdw_yOWSiOUUJTuNmv2Updizgm7xzaU2J0nHra1O0_Ygyemsg9PR3wsejAB_w_7srax</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1695757116</pqid></control><display><type>article</type><title>Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Public Library of Science (PLoS)</source><creator>Himmelstein, Daniel S ; Baranzini, Sergio E</creator><creatorcontrib>Himmelstein, Daniel S ; Baranzini, Sergio E</creatorcontrib><description>The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks--graphs with multiple node and edge types--for accomplishing both tasks. First we constructed a network with 18 node types--genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections--and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1004259</identifier><identifier>PMID: 26158728</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Animals ; Chromosome Mapping - methods ; Data Mining - methods ; Databases, Genetic ; Disease ; Gene expression ; Genetic Predisposition to Disease - genetics ; Genome-Wide Association Study - methods ; Genomes ; Humans ; Ontology ; Pathogenesis ; Protein Interaction Mapping - methods ; Proteins ; Proteome - genetics ; Signal Transduction - genetics ; Studies ; Systems Integration</subject><ispartof>PLoS computational biology, 2015-07, Vol.11 (7), p.e1004259-e1004259</ispartof><rights>2015 Himmelstein, Baranzini 2015 Himmelstein, Baranzini</rights><rights>2015 Public Library of Science. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited: Himmelstein DS, Baranzini SE (2015) Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes. PLoS Comput Biol 11(7): e1004259. doi:10.1371/journal.pcbi.1004259</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c498t-66534d891e98cfb5e7a567503e5cc073f23298a5b2aea27e8fb6481a2887d5063</citedby><cites>FETCH-LOGICAL-c498t-66534d891e98cfb5e7a567503e5cc073f23298a5b2aea27e8fb6481a2887d5063</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497619/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497619/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79569,79570</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26158728$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Himmelstein, Daniel S</creatorcontrib><creatorcontrib>Baranzini, Sergio E</creatorcontrib><title>Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks--graphs with multiple node and edge types--for accomplishing both tasks. First we constructed a network with 18 node types--genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections--and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.</description><subject>Algorithms</subject><subject>Animals</subject><subject>Chromosome Mapping - methods</subject><subject>Data Mining - methods</subject><subject>Databases, Genetic</subject><subject>Disease</subject><subject>Gene expression</subject><subject>Genetic Predisposition to Disease - genetics</subject><subject>Genome-Wide Association Study - methods</subject><subject>Genomes</subject><subject>Humans</subject><subject>Ontology</subject><subject>Pathogenesis</subject><subject>Protein Interaction Mapping - methods</subject><subject>Proteins</subject><subject>Proteome - genetics</subject><subject>Signal Transduction - genetics</subject><subject>Studies</subject><subject>Systems Integration</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>DOA</sourceid><recordid>eNpVkk1vEzEQhlcIREvhHyDYYy8J_lh_cUCK2tJGqoADnK1ZezZ12KwX2wG1v55Nk1btyaPxO8_M2G9VvadkTrmin9Zxmwbo56Nrw5wS0jBhXlTHVAg-U1zol0_io-pNzmtCptDI19URk1RoxfRxFa6wYIorHDBuc_0Ny7-YftcXfoX1j4Q-uBLi8Lle1OdQoF4OBVcJdrl6MY4pgrupS5ykIaZQwh3W5yEjZJwtco4uQEFfX070_LZ61UGf8d3hPKl-fb34eXY1u_5-uTxbXM9cY3SZSSl447WhaLTrWoEKhFSCcBTOEcU7xpnRIFoGCEyh7lrZaApMa-UFkfyk-rjnjn3M9vBK2VJFGk65bPikWO4VPsLajilsIN3aCMHeJ2JaWUgluB6tcx0zXgtDgDVoFHhGfIvgvJPEMDqxvhy6bdsNeodDSdA_gz6_GcKNXcW_tmmMktRMgNMDIMU_W8zFbkJ22Pdw_yOWSiOUUJTuNmv2Updizgm7xzaU2J0nHra1O0_Ygyemsg9PR3wsejAB_w_7srax</recordid><startdate>20150701</startdate><enddate>20150701</enddate><creator>Himmelstein, Daniel S</creator><creator>Baranzini, Sergio E</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20150701</creationdate><title>Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes</title><author>Himmelstein, Daniel S ; Baranzini, Sergio E</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c498t-66534d891e98cfb5e7a567503e5cc073f23298a5b2aea27e8fb6481a2887d5063</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Animals</topic><topic>Chromosome Mapping - methods</topic><topic>Data Mining - methods</topic><topic>Databases, Genetic</topic><topic>Disease</topic><topic>Gene expression</topic><topic>Genetic Predisposition to Disease - genetics</topic><topic>Genome-Wide Association Study - methods</topic><topic>Genomes</topic><topic>Humans</topic><topic>Ontology</topic><topic>Pathogenesis</topic><topic>Protein Interaction Mapping - methods</topic><topic>Proteins</topic><topic>Proteome - genetics</topic><topic>Signal Transduction - genetics</topic><topic>Studies</topic><topic>Systems Integration</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Himmelstein, Daniel S</creatorcontrib><creatorcontrib>Baranzini, Sergio E</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Himmelstein, Daniel S</au><au>Baranzini, Sergio E</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2015-07-01</date><risdate>2015</risdate><volume>11</volume><issue>7</issue><spage>e1004259</spage><epage>e1004259</epage><pages>e1004259-e1004259</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks--graphs with multiple node and edge types--for accomplishing both tasks. First we constructed a network with 18 node types--genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections--and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>26158728</pmid><doi>10.1371/journal.pcbi.1004259</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2015-07, Vol.11 (7), p.e1004259-e1004259
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_1704313643
source MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Public Library of Science (PLoS)
subjects Algorithms
Animals
Chromosome Mapping - methods
Data Mining - methods
Databases, Genetic
Disease
Gene expression
Genetic Predisposition to Disease - genetics
Genome-Wide Association Study - methods
Genomes
Humans
Ontology
Pathogenesis
Protein Interaction Mapping - methods
Proteins
Proteome - genetics
Signal Transduction - genetics
Studies
Systems Integration
title Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T01%3A46%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Heterogeneous%20Network%20Edge%20Prediction:%20A%20Data%20Integration%20Approach%20to%20Prioritize%20Disease-Associated%20Genes&rft.jtitle=PLoS%20computational%20biology&rft.au=Himmelstein,%20Daniel%20S&rft.date=2015-07-01&rft.volume=11&rft.issue=7&rft.spage=e1004259&rft.epage=e1004259&rft.pages=e1004259-e1004259&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1004259&rft_dat=%3Cproquest_plos_%3E1695757116%3C/proquest_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1695757116&rft_id=info:pmid/26158728&rft_doaj_id=oai_doaj_org_article_ccf29d8590a24e97ad20dbeacdc60921&rfr_iscdi=true