Mahlet G. Tadesse
Department of Mathematics
Georgetown University
Unified Bayesian Methods for Variable Selection and Clustering
The practical utility of variable selection for the analysis of
high-dimensional data is well recognized.
Several methods have been developed for regression and classification
models. However, few contributions have been made in the context of
clustering. In DNA microarray studies, for example, there is often interest
in uncovering disease subgroups and identifying relevant genes. This calls
for methods that cluster the n samples and select genes with distinctive
expression patterns between the different groups. We propose Bayesian
methods that provide a unified approach for addressing these problems
simultaneously. Model-based clustering is used to uncover the clusters and
a stochastic search variable selection method is built into the model to
identify discriminating variables. We consider the general case where the
number of clusters is unknown and we adopt two different MCMC strategies.
The first formulates the clustering problem in terms of finite mixture
models with an unknown number of components and uses a reversible jump MCMC
technique. The second approach uses infinite mixture models via Dirichlet
process mixture priors. We illustrate the methods with an application to a
microarray data from an endometrial cancer study.
Back to Colloquium Series