Differential gene expression detection and sample classification using penalized linear regression models

Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p ≫ n), microarray data analysis poses big challenges for statistical analysis. An obvious problem...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2006-02, Vol.22 (4), p.472-476
1. Verfasser: Wu, Baolin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 476
container_issue 4
container_start_page 472
container_title Bioinformatics
container_volume 22
creator Wu, Baolin
description Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p ≫ n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the ‘large p small n’ is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the ‘nearest shrunken centroid’ proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the ℒ1 penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the ℒ1 penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection. Availability: For the computer programs, detailed analysis results and R functions for the proposed methods, please see Contact: baolin@biostat.umn.edu Supplementary information:
doi_str_mv 10.1093/bioinformatics/bti827
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_67639312</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>990415801</sourcerecordid><originalsourceid>FETCH-LOGICAL-c480t-9acf8b916da0b07622eeb83b38870d952abf6250b1cb1cffc5410d458ae4c50c3</originalsourceid><addsrcrecordid>eNqFkV2L1TAQhoso7of-BKUIelc3ab6aS9nVPcqCN6vI3oRpOjlkTdOatLD6683xHF30RhiYgfeZl0zeqnpGyWtKNDvr_eSjm9IIi7f5rF9816oH1THlkjQtEfphmZlUDe8IO6pOcr4lRFDO-ePqiEomWin4ceUvvHOYMC4eQr3FiDXezQlz9lOsB1zQLrsJ4lBnGOeAtQ1QVOct_FLW7OO2njFC8D9wqIOPCKlOuP3tMk4DhvykeuQgZHx66KfVp3dvr883zdXHy_fnb64aW166NBqs63pN5QCkJ0q2LWLfsZ51nSKDFi30TraC9NSWcs4KTsnARQfIrSCWnVav9r5zmr6tmBcz-mwxBIg4rdlIJZlmtP0vSDVn5e94AV_8A95Oayr37phOSkrFDhJ7yKYp54TOzMmPkL4bSswuMfN3YmafWNl7fjBf-xGH-61DRAV4eQAgWwguQbQ-33NKMKo6Wbhmz_m84N0fHdLXcjJTwmy-3Jibz5ro6w8bo9hPTkO11g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198661154</pqid></control><display><type>article</type><title>Differential gene expression detection and sample classification using penalized linear regression models</title><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Oxford Journals Open Access Collection</source><source>Alma/SFX Local Collection</source><creator>Wu, Baolin</creator><creatorcontrib>Wu, Baolin</creatorcontrib><description>Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p ≫ n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the ‘large p small n’ is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the ‘nearest shrunken centroid’ proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the ℒ1 penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the ℒ1 penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection. Availability: For the computer programs, detailed analysis results and R functions for the proposed methods, please see Contact: baolin@biostat.umn.edu Supplementary information:</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/bti827</identifier><identifier>PMID: 16352654</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Artificial Intelligence ; Biological and medical sciences ; Cluster Analysis ; Fundamental and applied biological sciences. Psychology ; Gene Expression Profiling - methods ; General aspects ; Linear Models ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Models, Genetic ; Oligonucleotide Array Sequence Analysis - methods ; Pattern Recognition, Automated - methods ; Regression Analysis ; Sample Size</subject><ispartof>Bioinformatics, 2006-02, Vol.22 (4), p.472-476</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright Oxford University Press(England) Feb 15, 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c480t-9acf8b916da0b07622eeb83b38870d952abf6250b1cb1cffc5410d458ae4c50c3</citedby><cites>FETCH-LOGICAL-c480t-9acf8b916da0b07622eeb83b38870d952abf6250b1cb1cffc5410d458ae4c50c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=17531786$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16352654$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Baolin</creatorcontrib><title>Differential gene expression detection and sample classification using penalized linear regression models</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p ≫ n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the ‘large p small n’ is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the ‘nearest shrunken centroid’ proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the ℒ1 penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the ℒ1 penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection. Availability: For the computer programs, detailed analysis results and R functions for the proposed methods, please see Contact: baolin@biostat.umn.edu Supplementary information:</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Biological and medical sciences</subject><subject>Cluster Analysis</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression Profiling - methods</subject><subject>General aspects</subject><subject>Linear Models</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Models, Genetic</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Pattern Recognition, Automated - methods</subject><subject>Regression Analysis</subject><subject>Sample Size</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkV2L1TAQhoso7of-BKUIelc3ab6aS9nVPcqCN6vI3oRpOjlkTdOatLD6683xHF30RhiYgfeZl0zeqnpGyWtKNDvr_eSjm9IIi7f5rF9816oH1THlkjQtEfphmZlUDe8IO6pOcr4lRFDO-ePqiEomWin4ceUvvHOYMC4eQr3FiDXezQlz9lOsB1zQLrsJ4lBnGOeAtQ1QVOct_FLW7OO2njFC8D9wqIOPCKlOuP3tMk4DhvykeuQgZHx66KfVp3dvr883zdXHy_fnb64aW166NBqs63pN5QCkJ0q2LWLfsZ51nSKDFi30TraC9NSWcs4KTsnARQfIrSCWnVav9r5zmr6tmBcz-mwxBIg4rdlIJZlmtP0vSDVn5e94AV_8A95Oayr37phOSkrFDhJ7yKYp54TOzMmPkL4bSswuMfN3YmafWNl7fjBf-xGH-61DRAV4eQAgWwguQbQ-33NKMKo6Wbhmz_m84N0fHdLXcjJTwmy-3Jibz5ro6w8bo9hPTkO11g</recordid><startdate>20060215</startdate><enddate>20060215</enddate><creator>Wu, Baolin</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20060215</creationdate><title>Differential gene expression detection and sample classification using penalized linear regression models</title><author>Wu, Baolin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c480t-9acf8b916da0b07622eeb83b38870d952abf6250b1cb1cffc5410d458ae4c50c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Biological and medical sciences</topic><topic>Cluster Analysis</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression Profiling - methods</topic><topic>General aspects</topic><topic>Linear Models</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Models, Genetic</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Pattern Recognition, Automated - methods</topic><topic>Regression Analysis</topic><topic>Sample Size</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Baolin</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Baolin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Differential gene expression detection and sample classification using penalized linear regression models</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2006-02-15</date><risdate>2006</risdate><volume>22</volume><issue>4</issue><spage>472</spage><epage>476</epage><pages>472-476</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p ≫ n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the ‘large p small n’ is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the ‘nearest shrunken centroid’ proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the ℒ1 penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the ℒ1 penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection. Availability: For the computer programs, detailed analysis results and R functions for the proposed methods, please see Contact: baolin@biostat.umn.edu Supplementary information:</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>16352654</pmid><doi>10.1093/bioinformatics/bti827</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2006-02, Vol.22 (4), p.472-476
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_67639312
source MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Oxford Journals Open Access Collection; Alma/SFX Local Collection
subjects Algorithms
Artificial Intelligence
Biological and medical sciences
Cluster Analysis
Fundamental and applied biological sciences. Psychology
Gene Expression Profiling - methods
General aspects
Linear Models
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Models, Genetic
Oligonucleotide Array Sequence Analysis - methods
Pattern Recognition, Automated - methods
Regression Analysis
Sample Size
title Differential gene expression detection and sample classification using penalized linear regression models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T01%3A57%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Differential%20gene%20expression%20detection%20and%20sample%20classification%20using%20penalized%20linear%20regression%20models&rft.jtitle=Bioinformatics&rft.au=Wu,%20Baolin&rft.date=2006-02-15&rft.volume=22&rft.issue=4&rft.spage=472&rft.epage=476&rft.pages=472-476&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/bti827&rft_dat=%3Cproquest_cross%3E990415801%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198661154&rft_id=info:pmid/16352654&rfr_iscdi=true