Network-constrained regularization and variable selection for analysis of genomic data
Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a...
Gespeichert in:
Veröffentlicht in: | Bioinformatics 2008-05, Vol.24 (9), p.1175-1182 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1182 |
---|---|
container_issue | 9 |
container_start_page | 1175 |
container_title | Bioinformatics |
container_volume | 24 |
creator | Li, Caiyan Li, Hongzhe |
description | Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu |
doi_str_mv | 10.1093/bioinformatics/btn081 |
format | Article |
fullrecord | <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_69141694</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btn081</oup_id><sourcerecordid>20405730</sourcerecordid><originalsourceid>FETCH-LOGICAL-c587t-9bbee2f129643dea86d23bd276fdebff4b40a7cfb7f34411fa733ec1722fbddc3</originalsourceid><addsrcrecordid>eNqNkctu1jAUhC0EoqXwCKAICXahvsVOlqgFihTBBhDqxvLluHKb2D92ApSnxzS_imADK9vH38zoaBB6TPALggd2bEIK0ac86yXYcmyWiHtyBx0SLnBLcTfcrXcmZMt7zA7Qg1IuMe4I5_w-OiA9I1iQ_hB9egfLt5SvWptiWbIOEVyT4WKddA4_qneKjY6u-Vqf2kzQFJjA3oxreP3S03UJpUm-uYCY5mAbpxf9EN3zeirwaH8eoY-vX304OWvH92_enrwcW9v1cmkHYwCoJ3QQnDnQvXCUGUel8A6M99xwrKX1RnrGOSFeS8bAEkmpN85ZdoSeb767nL6sUBY1h2JhmnSEtBYlBsKJGPg_QYo57iTDFXz6F3iZ1lzXLIoMvWAMS1KhboNsTqVk8GqXw6zztSJY_apH_VmP2uqpuid789XM4H6r9n1U4Nke0MXqyWcdbSi3HMUM98ONEd64tO7-O7vdJKEs8P1WpPOVEpLJTp19Plf8dBxH2kl1zn4CXkvAJg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198633071</pqid></control><display><type>article</type><title>Network-constrained regularization and variable selection for analysis of genomic data</title><source>Oxford Journals Open Access Collection</source><creator>Li, Caiyan ; Li, Hongzhe</creator><creatorcontrib>Li, Caiyan ; Li, Hongzhe</creatorcontrib><description>Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btn081</identifier><identifier>PMID: 18310618</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Bioinformatics ; Biological activity ; Biological and medical sciences ; Biology ; Biomarkers, Tumor - metabolism ; Biomedical research ; Chromosome Mapping - methods ; Data analysis ; Datasets ; DNA microarrays ; Encyclopedias ; Feature selection ; Fundamental and applied biological sciences. Psychology ; Gene expression ; Gene Expression Profiling ; General aspects ; Genes ; Genomes ; Genomic analysis ; Glioblastoma ; Glioblastoma - metabolism ; Glioblastoma - mortality ; Graphical representations ; Graphs ; Humans ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Medical research ; Metabolic pathways ; Neoplasm Proteins - metabolism ; Networks ; Oligonucleotide Array Sequence Analysis - methods ; Pathways ; Penalty function ; Phenotypes ; Regression analysis ; Regularization ; Signal Transduction ; Smoothness ; Statistical analysis ; Survival Analysis ; Survival Rate</subject><ispartof>Bioinformatics, 2008-05, Vol.24 (9), p.1175-1182</ispartof><rights>The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2008</rights><rights>2008 INIST-CNRS</rights><rights>The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c587t-9bbee2f129643dea86d23bd276fdebff4b40a7cfb7f34411fa733ec1722fbddc3</citedby><cites>FETCH-LOGICAL-c587t-9bbee2f129643dea86d23bd276fdebff4b40a7cfb7f34411fa733ec1722fbddc3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,1599,27905,27906</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btn081$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20308981$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18310618$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Caiyan</creatorcontrib><creatorcontrib>Li, Hongzhe</creatorcontrib><title>Network-constrained regularization and variable selection for analysis of genomic data</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu</description><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Biological activity</subject><subject>Biological and medical sciences</subject><subject>Biology</subject><subject>Biomarkers, Tumor - metabolism</subject><subject>Biomedical research</subject><subject>Chromosome Mapping - methods</subject><subject>Data analysis</subject><subject>Datasets</subject><subject>DNA microarrays</subject><subject>Encyclopedias</subject><subject>Feature selection</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene expression</subject><subject>Gene Expression Profiling</subject><subject>General aspects</subject><subject>Genes</subject><subject>Genomes</subject><subject>Genomic analysis</subject><subject>Glioblastoma</subject><subject>Glioblastoma - metabolism</subject><subject>Glioblastoma - mortality</subject><subject>Graphical representations</subject><subject>Graphs</subject><subject>Humans</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Medical research</subject><subject>Metabolic pathways</subject><subject>Neoplasm Proteins - metabolism</subject><subject>Networks</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Pathways</subject><subject>Penalty function</subject><subject>Phenotypes</subject><subject>Regression analysis</subject><subject>Regularization</subject><subject>Signal Transduction</subject><subject>Smoothness</subject><subject>Statistical analysis</subject><subject>Survival Analysis</subject><subject>Survival Rate</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkctu1jAUhC0EoqXwCKAICXahvsVOlqgFihTBBhDqxvLluHKb2D92ApSnxzS_imADK9vH38zoaBB6TPALggd2bEIK0ac86yXYcmyWiHtyBx0SLnBLcTfcrXcmZMt7zA7Qg1IuMe4I5_w-OiA9I1iQ_hB9egfLt5SvWptiWbIOEVyT4WKddA4_qneKjY6u-Vqf2kzQFJjA3oxreP3S03UJpUm-uYCY5mAbpxf9EN3zeirwaH8eoY-vX304OWvH92_enrwcW9v1cmkHYwCoJ3QQnDnQvXCUGUel8A6M99xwrKX1RnrGOSFeS8bAEkmpN85ZdoSeb767nL6sUBY1h2JhmnSEtBYlBsKJGPg_QYo57iTDFXz6F3iZ1lzXLIoMvWAMS1KhboNsTqVk8GqXw6zztSJY_apH_VmP2uqpuid789XM4H6r9n1U4Nke0MXqyWcdbSi3HMUM98ONEd64tO7-O7vdJKEs8P1WpPOVEpLJTp19Plf8dBxH2kl1zn4CXkvAJg</recordid><startdate>20080501</startdate><enddate>20080501</enddate><creator>Li, Caiyan</creator><creator>Li, Hongzhe</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20080501</creationdate><title>Network-constrained regularization and variable selection for analysis of genomic data</title><author>Li, Caiyan ; Li, Hongzhe</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c587t-9bbee2f129643dea86d23bd276fdebff4b40a7cfb7f34411fa733ec1722fbddc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Biological activity</topic><topic>Biological and medical sciences</topic><topic>Biology</topic><topic>Biomarkers, Tumor - metabolism</topic><topic>Biomedical research</topic><topic>Chromosome Mapping - methods</topic><topic>Data analysis</topic><topic>Datasets</topic><topic>DNA microarrays</topic><topic>Encyclopedias</topic><topic>Feature selection</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene expression</topic><topic>Gene Expression Profiling</topic><topic>General aspects</topic><topic>Genes</topic><topic>Genomes</topic><topic>Genomic analysis</topic><topic>Glioblastoma</topic><topic>Glioblastoma - metabolism</topic><topic>Glioblastoma - mortality</topic><topic>Graphical representations</topic><topic>Graphs</topic><topic>Humans</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Medical research</topic><topic>Metabolic pathways</topic><topic>Neoplasm Proteins - metabolism</topic><topic>Networks</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Pathways</topic><topic>Penalty function</topic><topic>Phenotypes</topic><topic>Regression analysis</topic><topic>Regularization</topic><topic>Signal Transduction</topic><topic>Smoothness</topic><topic>Statistical analysis</topic><topic>Survival Analysis</topic><topic>Survival Rate</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Caiyan</creatorcontrib><creatorcontrib>Li, Hongzhe</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Caiyan</au><au>Li, Hongzhe</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Network-constrained regularization and variable selection for analysis of genomic data</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2008-05-01</date><risdate>2008</risdate><volume>24</volume><issue>9</issue><spage>1175</spage><epage>1182</epage><pages>1175-1182</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>18310618</pmid><doi>10.1093/bioinformatics/btn081</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics, 2008-05, Vol.24 (9), p.1175-1182 |
issn | 1367-4803 1460-2059 1367-4811 |
language | eng |
recordid | cdi_proquest_miscellaneous_69141694 |
source | Oxford Journals Open Access Collection |
subjects | Algorithms Bioinformatics Biological activity Biological and medical sciences Biology Biomarkers, Tumor - metabolism Biomedical research Chromosome Mapping - methods Data analysis Datasets DNA microarrays Encyclopedias Feature selection Fundamental and applied biological sciences. Psychology Gene expression Gene Expression Profiling General aspects Genes Genomes Genomic analysis Glioblastoma Glioblastoma - metabolism Glioblastoma - mortality Graphical representations Graphs Humans Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Medical research Metabolic pathways Neoplasm Proteins - metabolism Networks Oligonucleotide Array Sequence Analysis - methods Pathways Penalty function Phenotypes Regression analysis Regularization Signal Transduction Smoothness Statistical analysis Survival Analysis Survival Rate |
title | Network-constrained regularization and variable selection for analysis of genomic data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T14%3A34%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Network-constrained%20regularization%20and%20variable%20selection%20for%20analysis%20of%20genomic%20data&rft.jtitle=Bioinformatics&rft.au=Li,%20Caiyan&rft.date=2008-05-01&rft.volume=24&rft.issue=9&rft.spage=1175&rft.epage=1182&rft.pages=1175-1182&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btn081&rft_dat=%3Cproquest_TOX%3E20405730%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198633071&rft_id=info:pmid/18310618&rft_oup_id=10.1093/bioinformatics/btn081&rfr_iscdi=true |