Peter J. Waddell
Chugai Research Institute for Molecular Medicine
Chugai Pharmaceutical Company, Japan
Genetic Pathways from MicroArray Data using Clustering,
Graphical Models and Correspondence Analysis
A MicroArray experiment can measure the RNA expression level of > 20,000
genes and it is feasible to do hundreds of experiments as a series.
Presently, a correlation of expression levels may be used to cluster genes
behaving similarly, hopefully due to being on the same genetic pathway. Here
I look at both the theory and practicalities of more elaborate analyses
using NCI-60 cell line and colon cancer data sets. With my colleague, Hiro
Kisihino, we introduce the use of partial correlation-based distances, tree
consensus and inter-tree metrics to extend this field. First applications of
Graphical Modeling to elucidate features of pathways show the importance of
experimental design to avoid complete breakdown of assumptions. To allow
modeling to continue when there is an identifiability/latent variable
problem due to many more genes than experiments, Approximate Partial
Correlation via Regression, APCR, is developed using AIC for variable
selection. This approach is now implemented for thousands of genes and is
competitive with Bayesian Networks. A potentially useful twist on graphical
modeling, composed of Hayashi-style MDS based on partial correlations,
followed by drawing in links for strongly associated genes, is illustrated.
Finally, Correspondence Analysis is seen to be a natural and robust way for
biologists to visualise the association of aberrant genes and tissues with
real advantages over both MDS and PCA. In practice, such methods are often
as important as explicit modeling for making real discoveries. Given poorly
controlled, yet important, factors such as somatic mutations, splice
variants, and protein phosphorylation it may be some time before explicit
modeling significantly supplements visual methods plus trained biological
insight.
Back to Colloquium Series