Multiple‐Testing Strategy for Analyzing cDNA Array Data on Gene Expression

An objective of many functional genomics studies is to estimate treatment‐induced changes in gene expression. cDNA arrays interrogate each tissue sample for the levels of mRNA for hundreds to tens of thousands of genes, and the use of this technology leads to a multitude of treatment contrasts. By‐g...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Biometrics 2004-09, Vol.60 (3), p.774-782
Hauptverfasser:	Delongchamp, Robert R., Bowyer, John F., Chen, James J., Kodell, Ralph L.
Format:	Artikel
Sprache:	eng
Schlagworte:	Amphetamine - pharmacology Animals Biometrics Biometry Brain - drug effects Brain - metabolism Complementary DNA Consultant's Forum Data Interpretation, Statistical Decision theory Deoxyribonucleic acid DNA Fall lines False discovery rate False negative errors False nondiscovery rate False positive errors Gene expression Gene Expression - drug effects Gene Expression Profiling - statistics & numerical data gene expression regulation Genes Genomics Genomics - statistics & numerical data messenger RNA Models, Statistical Null hypothesis Oligonucleotide Array Sequence Analysis - statistics & numerical data P values p-value plot Rats ROC Curve ROC curves Statistical variance Subset selection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	An objective of many functional genomics studies is to estimate treatment‐induced changes in gene expression. cDNA arrays interrogate each tissue sample for the levels of mRNA for hundreds to tens of thousands of genes, and the use of this technology leads to a multitude of treatment contrasts. By‐gene hypotheses tests evaluate the evidence supporting no effect, but selecting a significance level requires dealing with the multitude of comparisons. The p‐values from these tests order the genes such that a p‐value cutoff divides the genes into two sets. Ideally one set would contain the affected genes and the other would contain the unaffected genes. However, the set of genes selected as affected will have false positives, i.e., genes that are not affected by treatment. Likewise, the other set of genes, selected as unaffected, will contain false negatives, i.e., genes that are affected. A plot of the observed p‐values (1 −p) versus their expectation under a uniform [0, 1] distribution allows one to estimate the number of true null hypotheses. With this estimate, the false positive rates and false negative rates associated with any p‐value cutoff can be estimated. When computed for a range of cutoffs, these rates summarize the ability of the study to resolve effects. In our work, we are more interested in selecting most of the affected genes rather than protecting against a few false positives. An optimum cutoff, i.e., the best set given the data, depends upon the relative cost of falsely classifying a gene as affected versus the cost of falsely classifying a gene as unaffected. We select the cutoff by a decision‐theoretic method analogous to methods developed for receiver operating characteristic curves. In addition, we estimate the false discovery rate and the false nondiscovery rate associated with any cutoff value. Two functional genomics studies that were designed to assess a treatment effect are used to illustrate how the methods allowed the investigators to determine a cutoff to suit their research goals.
ISSN:	0006-341X 1541-0420
DOI:	10.1111/j.0006-341X.2004.00228.x