Bayesian nonparametric clustering of sparse signals

advertisement
Chris Holmes (Oxford)
Bayesian nonparametric clustering of sparse signals
Bayesian mixture modelling is a widely used method for cluster analysis as it allows for characterisation
of uncertainty surrounding principle dependence structures within data. Clustering is often used as an
exploratory method to uncover hidden structure or to recover suspected structure. As data becomes
ever cheaper to capture there is an increasing need for methods that can adjust towards many recorded
variables being irrelevant to the clustering task. Variable selection has a substantial literature in
regression or supervised learning settings but is much less studied in unsupervised clustering problems.
In this talk we will report on the use of variable selection priors for nonparametric mixture models. Such
priors tend to induce sparsity in the posterior model space and help characterise the relative
information of those explanatory variables useful for clustering. We demonstrate how the use of
hierarchical Dirichlet process priors allows for a principled way to construct these models; providing
accurate indication of irrelevant variables whilst being able to quantify the relative relevance of
informative variables. We pay particular attention to efficient MCMC sampling schemes for inference
allowing for an unknown number of mixture components and an unknown number of relevant variables.
Download