Mahlet G. Tadesse

Department of Mathematics

Georgetown University


Unified Bayesian Methods for Variable Selection and Clustering

The practical utility of variable selection for the analysis of high-dimensional data is well recognized. Several methods have been developed for regression and classification models. However, few contributions have been made in the context of clustering. In DNA microarray studies, for example, there is often interest in uncovering disease subgroups and identifying relevant genes. This calls for methods that cluster the n samples and select genes with distinctive expression patterns between the different groups. We propose Bayesian methods that provide a unified approach for addressing these problems simultaneously. Model-based clustering is used to uncover the clusters and a stochastic search variable selection method is built into the model to identify discriminating variables. We consider the general case where the number of clusters is unknown and we adopt two different MCMC strategies. The first formulates the clustering problem in terms of finite mixture models with an unknown number of components and uses a reversible jump MCMC technique. The second approach uses infinite mixture models via Dirichlet process mixture priors. We illustrate the methods with an application to a microarray data from an endometrial cancer study.


Back to Colloquium Series