Empirical Bayes Confidence Intervals for Selected Parameters in High-Dimensional Data
Modern statistical problems often involve a large number of populations and hence a large number of parameters that characterize these populations. It is common for scientists to use data to select the most significant populations, such as those with the largest t statistics. The scientific interest...
Gespeichert in:
Veröffentlicht in: | Journal of the American Statistical Association 2013-06, Vol.108 (502), p.607-618 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Modern statistical problems often involve a large number of populations and hence a large number of parameters that characterize these populations. It is common for scientists to use data to select the most significant populations, such as those with the largest t statistics. The scientific interest often lies in studying and making inferences regarding these parameters, called the selected parameters , corresponding to the selected populations. The current statistical practices either apply a traditional procedure assuming there were no selection—a practice that is not valid—or they use the Bonferroni-type procedure that is valid but very conservative and often noninformative. In this article, we propose valid and sharp confidence intervals that allow scientists to select parameters and to make inferences for the selected parameters based on the same data. This type of confidence interval allows the users to zero in on the most interesting selected parameters without collecting more data. The validity of confidence intervals is defined as the controlling of Bayes coverage probability so that it is no less than a nominal level uniformly over a class of prior distributions for the parameter. When a mixed model is assumed and the random effects are the key parameters, this validity criterion is exactly the frequentist criterion, since the Bayes coverage probability is identical to the frequentist coverage probability. Assuming that the observations are normally distributed with unequal and unknown variances, we select parameters with the largest t statistics. We then construct sharp empirical Bayes confidence intervals for these selected parameters, which have either a large Bayes coverage probability or a small Bayes false coverage rate uniformly for a class of priors. Our intervals, applicable to any high-dimensional data, are applied to microarray data and are shown to be better than all the alternatives. It is also anticipated that the same intervals would be valid for any selection rule. Supplementary materials for this article are available online. |
---|---|
ISSN: | 1537-274X 0162-1459 1537-274X |
DOI: | 10.1080/01621459.2013.771102 |