Peter J. Waddell

Chugai Research Institute for Molecular Medicine

Chugai Pharmaceutical Company, Japan


Genetic Pathways from MicroArray Data using Clustering, Graphical Models and Correspondence Analysis

A MicroArray experiment can measure the RNA expression level of > 20,000 genes and it is feasible to do hundreds of experiments as a series. Presently, a correlation of expression levels may be used to cluster genes behaving similarly, hopefully due to being on the same genetic pathway. Here I look at both the theory and practicalities of more elaborate analyses using NCI-60 cell line and colon cancer data sets. With my colleague, Hiro Kisihino, we introduce the use of partial correlation-based distances, tree consensus and inter-tree metrics to extend this field. First applications of Graphical Modeling to elucidate features of pathways show the importance of experimental design to avoid complete breakdown of assumptions. To allow modeling to continue when there is an identifiability/latent variable problem due to many more genes than experiments, Approximate Partial Correlation via Regression, APCR, is developed using AIC for variable selection. This approach is now implemented for thousands of genes and is competitive with Bayesian Networks. A potentially useful twist on graphical modeling, composed of Hayashi-style MDS based on partial correlations, followed by drawing in links for strongly associated genes, is illustrated. Finally, Correspondence Analysis is seen to be a natural and robust way for biologists to visualise the association of aberrant genes and tissues with real advantages over both MDS and PCA. In practice, such methods are often as important as explicit modeling for making real discoveries. Given poorly controlled, yet important, factors such as somatic mutations, splice variants, and protein phosphorylation it may be some time before explicit modeling significantly supplements visual methods plus trained biological insight.


Back to Colloquium Series