Ralph L. Kodell
National Center for Toxicological Research
U.S. Food and Drug Administration,
Jefferson, Arkansas
Using P-Value Plots and ROC Curves to Assess Statistical Significance of
Microarray Data on Gene Expression
cDNA arrays interrogate tissue samples for the levels of mRNA for hundreds
to tens-of-thousands of genes to measure treatment-induced changes in gene
expression. Selecting a significance level for by-gene hypothesis tests
requires dealing with the multitude of treatment contrasts. The p-values
from these tests order the genes such that a p-value cutoff divides the
genes into two sets. The set of genes selected as affected will have false
positives while the set selected as unaffected will contain false negatives.
A p-value plot (Schweder and Spjotvoll, 1982) allows one to estimate the
number of true null hypotheses (truly unaffected genes). With this
estimate, the false-positive and false-negative rates associated with any
p-value cutoff can be estimated. An optimal cutoff depends upon the
relative cost of falsely classifying a gene as affected versus the cost of
falsely classifying a gene as unaffected. Here, a method analogous to
methods developed for ROC curves is proposed for selecting the cutoff. The
false discovery rate (FDR) and false non-discovery rate (FNR) associated
with the cutoff are then estimated. Two functional genomics studies are
used for illustration.
Back to Colloquium Series