Ralph L. Kodell

National Center for Toxicological Research U.S. Food and Drug Administration,

Jefferson, Arkansas


Using P-Value Plots and ROC Curves to Assess Statistical Significance of Microarray Data on Gene Expression

cDNA arrays interrogate tissue samples for the levels of mRNA for hundreds to tens-of-thousands of genes to measure treatment-induced changes in gene expression. Selecting a significance level for by-gene hypothesis tests requires dealing with the multitude of treatment contrasts. The p-values from these tests order the genes such that a p-value cutoff divides the genes into two sets. The set of genes selected as affected will have false positives while the set selected as unaffected will contain false negatives. A p-value plot (Schweder and Spjotvoll, 1982) allows one to estimate the number of true null hypotheses (truly unaffected genes). With this estimate, the false-positive and false-negative rates associated with any p-value cutoff can be estimated. An optimal cutoff depends upon the relative cost of falsely classifying a gene as affected versus the cost of falsely classifying a gene as unaffected. Here, a method analogous to methods developed for ROC curves is proposed for selecting the cutoff. The false discovery rate (FDR) and false non-discovery rate (FNR) associated with the cutoff are then estimated. Two functional genomics studies are used for illustration.


Back to Colloquium Series