HDIA-abstracts [2]

advertisement
High-Dimensional Inference
with Applications
University of Kent, 24/25 June 2013
Keynote speakers/abstracts
1. Bayesian Inference in Finite and Infinite Dimensions
Professor Philip Dawid, Cambridge
When the parameter-space is finite-dimensional, all reasonable prior
distributions are mutually absolutely continuous: this implies that their
associated posterior or predictive distributions will become indistinguishable
as more data accrue, so leading to essentially “objective” Bayesian
inference in large samples.
By contrast, in nonparametric problems, involving infinite-dimensional
parameter spaces, distinct prior distributions are generically mutually singular:
and this implies that no amount of data can bring their inferences into
agreement.
I will discuss some issues associated with this, with special attention to the
need for caution when specifying prior distributions on infinite-dimensional
spaces.
2. BS in Britain: Mitigating the effects of preferentially selected monitoring sites for
inference and policy
Professor Jim Zidek, University of British Columbia, Canada
Co-author: Dr Gavin Shaddick, Bath
In the 1960s, over 2000 sites in the UK monitored black smoke (BS) air pollution
due to concerns about its effect on public health that were clearly
demonstrated by the famous London fog of 1952. Abatement measures led
to a decline in the levels of BS and hence a reduction in the number of
monitoring sites to less than 200 by 1996.
Treating the BS example as a case study, the speaker will argue that the sites
to be removed were preferentially selected, causing estimates of metrics
used by regulatory agencies to be too high.
Moreover he will describe an approach to mitigating the effects of
preferential sampling. The large number of monitoring sites and their
associated high dimensional data vectors rules out naïve use of classical
geostatistical methods in doing so and hence the need for novel
approaches in analysis, which will be described.
The work has important general implications for the setting of regulatory
standards and the design of monitoring networks. Most importantly it points
anew to the importance of good design in statistical measurement and
testing.
3. Bayesian Models for Integrative Genomics
Professor Marina Vannucci, Rice University, Houston, Texas
Novel methodological questions are now being generated in bioinformatics
and require the integration of different concepts, methods, tools and data
types.
Bayesian methods that employ variable selection have been particularly
successful for genomic applications, as they allow to handle situations where
the amount of measured variables can be much greater than the number of
observations.
In this talk I will first describe Bayesian variable selection methods for linear
settings that incorporate external biological information into the analysis of
gene expression data. I will then focus on models that achieve an even
greater type of integration, by incorporating into the modelling experimental
data from different platforms, together with prior knowledge.
I will look in particular at graphical models, integrating gene expression data
with microRNA expression data, and at a hierarchical mixture model for
imaging genetics data, incorporating functional MRI data and genetic
information measured on a same set of patients. All modelling settings
employ variable selection techniques and prior constructions that cleverly
incorporate biological knowledge about structural dependencies among the
variables.
4. MaTaDOR: Bayesian Object Regression for Complex, High Dimensional Data
Professor Jeffrey S. Morris, University of Texas MD Anderson Cancer Center
The term “object data” is a generalization of “functional data”, and could be
defined to involve multiple measurements on some type of structured space,
and include, for example, functions, images, shapes, graphs, and trees. The
internal structure of the objects can be based on geometry or more complex
scientific relationships, and efficient statistical methods should take this
internal structure into account.
In this talk, I will discuss MaTaDOR (MulTi-Domain Object Regression), a very
general and flexible modelling framework that is a generalization of
functional mixed models that can be used to perform unified Bayesian
regression analyses on a broad array of such object data while taking various
types of internal structure into account in a flexible way.
Our strategy involves the use of various types of basis functions to capture
objects’ internal structure, using a modelling strategy that is conducive to
parallel processing and scales up to very large data sets.
I will discuss some specific methods developed within this framework (some of
which were done in collaboration with Phil Brown), including robust object
regression using scale mixture distributions, object classification using
predictive probabilities, and nonparametric additive models for object data.
5. Making Bayesian Mixture Models Identifiable
Professor Stephen Walker, SMSAS, Kent
We are interested in making the Bayesian mixture model identifiable. It is
known that the model with weights and locations is not identifiable for these
parameters. Hence, the latent allocation variables are also difficult to
interpret making clustering problematic. In this talk we endeavour to make
the model identifiable by marginalizing over the weights and locations. This
leaves a model for the observations given the allocations and hence a prior
for the allocations is needed. We propose a form of prior which provides
explicit interpretation for the allocations. Supporting theory and illustrations
are presented.
6. Election-night Forecasting for the BBC: Statisticians versus Presenters and
Swingometers
Dr Clive Payne, Oxford
A review of the statistical methods used in BBC election-night forecasts in the
last 40 years with particular emphasis on the Brown-Payne ridge regression
method used in General Elections 1974-2002 and on the exit poll-based
methods used more recently. The paper will include discussion of the media
aspects of the presentation of predictions.
7. A statistical framework for comparison of climate simulation models with past
climate observations, including a calibration problem
Professor Rolf Sundberg, Stockholm
The variability in a climate simulation model and in actual observational data
have in common only the possible responses to so called forcings, built into
the model. Forcings are more or less well-documented factors thought to
have affected the climate, such as variation in planet orbit, solar strength,
land use, and green-house gases. Climate observations are of instrumental
(relatively recent) or proxy type. The latter are surrogates of instrumental
measurements, based on for example tree rings or various kinds of sediments.
We will formulate a statistical framework aimed at comparison of climate
models with differently built in forcing effects, with respect to the model's
ability to fit past climate data. The framework will require a discussion of
calibration of proxies.
8. An adaptive MCMC scheme for variable selection problems
Professor Jim Griffin
Co-authors: Dr Krzysztof Łatuszyński (Warwick) and Professor Mark Steel (Warwick)
Data sets with many variables are routinely collected in many disciplines. This
has led to interest in variable selection in regression models with a large
number of variables.
A standard Bayesian approach defines a prior on the model space (defined
by all subsets of the variables) and uses Markov chain Monte Carlo methods
to explore the space. Unfortunately, the size of the space ($2^p$ if there are
$p$ variables) and the use of simple proposals in Metropolis-Hastings steps
has led to samplers that often mix poorly.
In this talk, I will describe an adaptive Metropolis-Hastings scheme which
adapts a wide-class of proposals to the posterior distribution. This leads to
orders of magnitude improvements in the mixing over standard algorithms.
The methods will be illustrated on several real regression problems.
9. A Benefit–Risk Analysis of using Formal Benefit-Risk Approaches for Decision-Making
in Drug Regulation
Professor Deborah Ashby, School of Public Health, Imperial College London
For a medicine to gain a license it requires evidence of its efficacy and
safety. Study designs and statistical methods are well developed to deal with
the former, and to a lesser extent the latter. However, until recently,
assessment of the benefit-risk balance for a medicine, especially in relation to
alternatives, has been entirely informal. There is now growing interest among
drug regulators and pharmaceutical companies in the possibilities of more
formal approaches to benefit-risk decision-making.
In this talk, we review the basis of drug regulation, the established statistical
bases for decision-making under uncertainty, and current initiatives in the
area. One such initiative forms part of the Pharmacoepidemiological
Research on Outcomes of Therapeutics by a European Consortium
(PROTECT) project, which is funded under the Innovative Medicines Initiative
and is a collaboration between academic, pharmaceutical, regulatory and
patient organizations.
Based on work from this project we will review current methodological
approaches, and illustrate them with case-studies on medicines where
benefit-risk is finely balanced. The use of formal decision-making in this
context is not uncontroversial, so we end with a, somewhat informal and
personal, appraisal of the benefits and risks of taking this path.
10. Using decision theory to explore posterior models in cancer genomics
Professor Chris Holmes, Oxford
Bayesian models have proved highly useful in the analysis of genetic variation
arising in cancers. We have previously developed Bayesian Hidden Markov
Models for this task where the hidden states relate to structural changes in
DNA known to be key drivers of cancer initialisation and progression.
In this talk we discuss the use of decision theory to help elicit knowledge
contained in the posterior models. That is, having conditioned a model on
data how can we explore the posterior model for interesting, highly
probable, state sequences?
11. Variable Selection via EMVS
Professor Ed George, Wharton, Pennsylvania
Co-author: Veronika Rockova, Erasmus University, Rotterdam
Despite rapid developments in stochastic search algorithms, the practicality
of Bayesian variable selection has continued to pose challenges. Highdimensional data are now routinely analyzed, typically with many more
covariates than observations.
To broaden the applicability of Bayesian variable selection for such contexts,
we propose EMVS, a deterministic alternative to stochastic search based on
an EM algorithm that quickly finds posterior modes over a nested sequence
of continuous conjugate spike-and-slab priors. Summarizing such dynamic
posterior exploration with a regularization diagram, rigorous evaluation by
posterior model probabilities is used to identify the most promising sparse
submodels. External structural information such as likely covariate groupings
or network topologies is easily incorporated into the EMVS framework.
Deterministic annealing variants are seen to improve search effectiveness by
mitigating posterior multimodality. Both univariate and multivariate regression
examples will be used to illustrate EMVS in action.
12. Smooth supersaturated models: SSM
Professor Henry Wynn, London School of Economics
Polynomial models are at the centre of the emerging field of Algebraic
Statistics and using abstract algebra, particularly ideal theory, leads to
greater understanding of classical issues such as aliasing. However, raw
polynomial regression models have poor smoothness as soon as one has
higher degree than two.
With an elementary procedure one can produce high polynomial models
which have increased smoothness but for which the degree p is greater than
the sample size n. As p gets larger one approaches spline kernels, while, for
finite p, retaining the property of being analytic. The algebraic methods aids
the careful extension of the degree. Applications are to computer
experiments, optimal design and sensitivity analysis where being analytic has
mathematical advantages. The methods are competitive with popular
method such as Gaussian kriging.
Download