Differential gene expression detection and sample classification using penalized linear regression models
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p ≫ n), microarray data analysis poses big challenges for statistical analysis. An obvious problem...
Gespeichert in:
Veröffentlicht in: | Bioinformatics 2006-02, Vol.22 (4), p.472-476 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p ≫ n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the ‘large p small n’ is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the ‘nearest shrunken centroid’ proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the ℒ1 penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the ℒ1 penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection. Availability: For the computer programs, detailed analysis results and R functions for the proposed methods, please see Contact: baolin@biostat.umn.edu Supplementary information: |
---|---|
ISSN: | 1367-4803 1460-2059 1367-4811 |
DOI: | 10.1093/bioinformatics/bti827 |