A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data

Motivation: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the e...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2005-12, Vol.21 (23), p.4280-4288
Hauptverfasser:	Xie, Yang, Pan, Wei, Khodursky, Arkady B.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Biological and medical sciences Chromosome Mapping Cluster Analysis Computational Biology - methods Computer Simulation Data Interpretation, Statistical DNA, Complementary - metabolism False Positive Reactions Fundamental and applied biological sciences. Psychology Gene Expression Profiling - methods General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Models, Genetic Models, Statistical Oligonucleotide Array Sequence Analysis - methods Pattern Recognition, Automated Reproducibility of Results Sensitivity and Specificity Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Motivation: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. Results: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method. Contact: yangxie@biostat.umn.ed
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/bti685