Jaxk Reeves

Statistics Department

University of Georgia


Sporulation Data Revisited

The 'hot' topic in Statistical Genetics for the past few years has been analysis of micro-array data. Micro-array technology allows researchers to simultaneously measure the RNA expression level of thousands of genes of an organism simultaneously and over time. Such data can be very useful in biological research, since, if properly analyzed, it can give researchers some clues as to the function of some genes or yield some ideas about the genes which are likely to be involved in bio-chemical pathways for certain processes. In a typical experiment, the number of genes, n, analyzed is frequently in the thousands; much larger than the number whose effects on the process in question could possibly be measured at a significant level. The statistician's role in this endeavor is two-fold: to find the genes for which there is significant evidence of change in expression level during the process and, among these, to cluster those for which the evidence is highest of mutual associated expression and/or repression. A 'classic' data set which many researchers in this field use to illustrate their pet techniques is the yeast sporulation data set of Chu, DeRisi, et al. (Chu, Science, 1998), available on-line at http://cmgm.stanford.edu/pbrown/sporulation. Although micro-array techniques have improved significantly in the past 5 years and there has been some move away from the use of the Red/Green intensity measures used in this experiment, the experiment was very well designed for its time and is not atypical of the sort of micro-array analysis currently being done in many biological labs throughout the world. In this talk, I will examine a number of statistical assumptions made in the analysis of this data set and discuss the effects of these assumptions upon conclusions reached by the authors of the sporulation study and, more generally, by those who use similar statistical analysis techniques.


Back to Colloquium Series