Network-constrained regularization and variable selection for analysis of genomic data

Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2008-05, Vol.24 (9), p.1175-1182
Hauptverfasser: Li, Caiyan, Li, Hongzhe
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1182
container_issue 9
container_start_page 1175
container_title Bioinformatics
container_volume 24
creator Li, Caiyan
Li, Hongzhe
description Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu
doi_str_mv 10.1093/bioinformatics/btn081
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_69141694</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btn081</oup_id><sourcerecordid>20405730</sourcerecordid><originalsourceid>FETCH-LOGICAL-c587t-9bbee2f129643dea86d23bd276fdebff4b40a7cfb7f34411fa733ec1722fbddc3</originalsourceid><addsrcrecordid>eNqNkctu1jAUhC0EoqXwCKAICXahvsVOlqgFihTBBhDqxvLluHKb2D92ApSnxzS_imADK9vH38zoaBB6TPALggd2bEIK0ac86yXYcmyWiHtyBx0SLnBLcTfcrXcmZMt7zA7Qg1IuMe4I5_w-OiA9I1iQ_hB9egfLt5SvWptiWbIOEVyT4WKddA4_qneKjY6u-Vqf2kzQFJjA3oxreP3S03UJpUm-uYCY5mAbpxf9EN3zeirwaH8eoY-vX304OWvH92_enrwcW9v1cmkHYwCoJ3QQnDnQvXCUGUel8A6M99xwrKX1RnrGOSFeS8bAEkmpN85ZdoSeb767nL6sUBY1h2JhmnSEtBYlBsKJGPg_QYo57iTDFXz6F3iZ1lzXLIoMvWAMS1KhboNsTqVk8GqXw6zztSJY_apH_VmP2uqpuid789XM4H6r9n1U4Nke0MXqyWcdbSi3HMUM98ONEd64tO7-O7vdJKEs8P1WpPOVEpLJTp19Plf8dBxH2kl1zn4CXkvAJg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198633071</pqid></control><display><type>article</type><title>Network-constrained regularization and variable selection for analysis of genomic data</title><source>Oxford Journals Open Access Collection</source><creator>Li, Caiyan ; Li, Hongzhe</creator><creatorcontrib>Li, Caiyan ; Li, Hongzhe</creatorcontrib><description>Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btn081</identifier><identifier>PMID: 18310618</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Bioinformatics ; Biological activity ; Biological and medical sciences ; Biology ; Biomarkers, Tumor - metabolism ; Biomedical research ; Chromosome Mapping - methods ; Data analysis ; Datasets ; DNA microarrays ; Encyclopedias ; Feature selection ; Fundamental and applied biological sciences. Psychology ; Gene expression ; Gene Expression Profiling ; General aspects ; Genes ; Genomes ; Genomic analysis ; Glioblastoma ; Glioblastoma - metabolism ; Glioblastoma - mortality ; Graphical representations ; Graphs ; Humans ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Medical research ; Metabolic pathways ; Neoplasm Proteins - metabolism ; Networks ; Oligonucleotide Array Sequence Analysis - methods ; Pathways ; Penalty function ; Phenotypes ; Regression analysis ; Regularization ; Signal Transduction ; Smoothness ; Statistical analysis ; Survival Analysis ; Survival Rate</subject><ispartof>Bioinformatics, 2008-05, Vol.24 (9), p.1175-1182</ispartof><rights>The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2008</rights><rights>2008 INIST-CNRS</rights><rights>The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c587t-9bbee2f129643dea86d23bd276fdebff4b40a7cfb7f34411fa733ec1722fbddc3</citedby><cites>FETCH-LOGICAL-c587t-9bbee2f129643dea86d23bd276fdebff4b40a7cfb7f34411fa733ec1722fbddc3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,1599,27905,27906</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btn081$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=20308981$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18310618$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Caiyan</creatorcontrib><creatorcontrib>Li, Hongzhe</creatorcontrib><title>Network-constrained regularization and variable selection for analysis of genomic data</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu</description><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Biological activity</subject><subject>Biological and medical sciences</subject><subject>Biology</subject><subject>Biomarkers, Tumor - metabolism</subject><subject>Biomedical research</subject><subject>Chromosome Mapping - methods</subject><subject>Data analysis</subject><subject>Datasets</subject><subject>DNA microarrays</subject><subject>Encyclopedias</subject><subject>Feature selection</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene expression</subject><subject>Gene Expression Profiling</subject><subject>General aspects</subject><subject>Genes</subject><subject>Genomes</subject><subject>Genomic analysis</subject><subject>Glioblastoma</subject><subject>Glioblastoma - metabolism</subject><subject>Glioblastoma - mortality</subject><subject>Graphical representations</subject><subject>Graphs</subject><subject>Humans</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Medical research</subject><subject>Metabolic pathways</subject><subject>Neoplasm Proteins - metabolism</subject><subject>Networks</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Pathways</subject><subject>Penalty function</subject><subject>Phenotypes</subject><subject>Regression analysis</subject><subject>Regularization</subject><subject>Signal Transduction</subject><subject>Smoothness</subject><subject>Statistical analysis</subject><subject>Survival Analysis</subject><subject>Survival Rate</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkctu1jAUhC0EoqXwCKAICXahvsVOlqgFihTBBhDqxvLluHKb2D92ApSnxzS_imADK9vH38zoaBB6TPALggd2bEIK0ac86yXYcmyWiHtyBx0SLnBLcTfcrXcmZMt7zA7Qg1IuMe4I5_w-OiA9I1iQ_hB9egfLt5SvWptiWbIOEVyT4WKddA4_qneKjY6u-Vqf2kzQFJjA3oxreP3S03UJpUm-uYCY5mAbpxf9EN3zeirwaH8eoY-vX304OWvH92_enrwcW9v1cmkHYwCoJ3QQnDnQvXCUGUel8A6M99xwrKX1RnrGOSFeS8bAEkmpN85ZdoSeb767nL6sUBY1h2JhmnSEtBYlBsKJGPg_QYo57iTDFXz6F3iZ1lzXLIoMvWAMS1KhboNsTqVk8GqXw6zztSJY_apH_VmP2uqpuid789XM4H6r9n1U4Nke0MXqyWcdbSi3HMUM98ONEd64tO7-O7vdJKEs8P1WpPOVEpLJTp19Plf8dBxH2kl1zn4CXkvAJg</recordid><startdate>20080501</startdate><enddate>20080501</enddate><creator>Li, Caiyan</creator><creator>Li, Hongzhe</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20080501</creationdate><title>Network-constrained regularization and variable selection for analysis of genomic data</title><author>Li, Caiyan ; Li, Hongzhe</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c587t-9bbee2f129643dea86d23bd276fdebff4b40a7cfb7f34411fa733ec1722fbddc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Biological activity</topic><topic>Biological and medical sciences</topic><topic>Biology</topic><topic>Biomarkers, Tumor - metabolism</topic><topic>Biomedical research</topic><topic>Chromosome Mapping - methods</topic><topic>Data analysis</topic><topic>Datasets</topic><topic>DNA microarrays</topic><topic>Encyclopedias</topic><topic>Feature selection</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene expression</topic><topic>Gene Expression Profiling</topic><topic>General aspects</topic><topic>Genes</topic><topic>Genomes</topic><topic>Genomic analysis</topic><topic>Glioblastoma</topic><topic>Glioblastoma - metabolism</topic><topic>Glioblastoma - mortality</topic><topic>Graphical representations</topic><topic>Graphs</topic><topic>Humans</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Medical research</topic><topic>Metabolic pathways</topic><topic>Neoplasm Proteins - metabolism</topic><topic>Networks</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Pathways</topic><topic>Penalty function</topic><topic>Phenotypes</topic><topic>Regression analysis</topic><topic>Regularization</topic><topic>Signal Transduction</topic><topic>Smoothness</topic><topic>Statistical analysis</topic><topic>Survival Analysis</topic><topic>Survival Rate</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Caiyan</creatorcontrib><creatorcontrib>Li, Hongzhe</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Caiyan</au><au>Li, Hongzhe</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Network-constrained regularization and variable selection for analysis of genomic data</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2008-05-01</date><risdate>2008</risdate><volume>24</volume><issue>9</issue><spage>1175</spage><epage>1182</epage><pages>1175-1182</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>18310618</pmid><doi>10.1093/bioinformatics/btn081</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2008-05, Vol.24 (9), p.1175-1182
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_69141694
source Oxford Journals Open Access Collection
subjects Algorithms
Bioinformatics
Biological activity
Biological and medical sciences
Biology
Biomarkers, Tumor - metabolism
Biomedical research
Chromosome Mapping - methods
Data analysis
Datasets
DNA microarrays
Encyclopedias
Feature selection
Fundamental and applied biological sciences. Psychology
Gene expression
Gene Expression Profiling
General aspects
Genes
Genomes
Genomic analysis
Glioblastoma
Glioblastoma - metabolism
Glioblastoma - mortality
Graphical representations
Graphs
Humans
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Medical research
Metabolic pathways
Neoplasm Proteins - metabolism
Networks
Oligonucleotide Array Sequence Analysis - methods
Pathways
Penalty function
Phenotypes
Regression analysis
Regularization
Signal Transduction
Smoothness
Statistical analysis
Survival Analysis
Survival Rate
title Network-constrained regularization and variable selection for analysis of genomic data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T14%3A34%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Network-constrained%20regularization%20and%20variable%20selection%20for%20analysis%20of%20genomic%20data&rft.jtitle=Bioinformatics&rft.au=Li,%20Caiyan&rft.date=2008-05-01&rft.volume=24&rft.issue=9&rft.spage=1175&rft.epage=1182&rft.pages=1175-1182&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btn081&rft_dat=%3Cproquest_TOX%3E20405730%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198633071&rft_id=info:pmid/18310618&rft_oup_id=10.1093/bioinformatics/btn081&rfr_iscdi=true