Bayesian Inference and Model Choice for Nonlinear Stochastic Processes: Theo Kypraios

advertisement
Bayesian Inference and Model Choice for
Nonlinear Stochastic Processes:
Applications to stochastic epidemic modelling
Theo Kypraios
http://www.maths.nott.ac.uk/∼tk
School of Mathematical Sciences, University of Nottingham
EPSRC Symposium Workshop on Markov Chain-Monte Carlo,
Warwick , March 2009
Joint work with:
Gareth O. Roberts @ University of Warwick
Phil O’Neill @ University of Nottingham
Ben Cooper @ Health Protection Agency
Background
Understanding the spread of an infectious disease is an important
issue in order to prevent major outbreaks of an epidemic.
In general, inference problems for disease outbreak data are
complicated by the facts that
• the data are inherently dependent and
• the data are usually incomplete in the sense that the actual
process of infection is not observed.
However, it is often possible to formulate simple stochastic models
which describe the key features of epidemic spread.
Outline
• Bayesian inference for partially observed stochastic epidemics
• Efficient Markov Chain Monte Carlo algorithms (including
Non-Centered reparameterisations)
• Bayesian Model Choice using RJMCMC and other alternative
methods.
• Illustration simulation study.
The SIR Model
Principles
− Consider a closed population (no births/deaths) of size N
individuals
− α initially infected individuals
− at any given time point, each individual i is in one of the three
states:
• Susceptible
• Infected
• Removed
S→I→R
The Stochastic Process
• Each of the infected individuals remains infectious for some
period which is distributed according to the distribution of
random variable D.
• While infectious, makes contacts with each of the N
individuals in the population at times given by the points of a
Poisson process of rate β > 0.
• Any such contact with a susceptible individual immediately
makes the susceptible an infective.
• At the end of its infectious period an infective no longer
makes any contacts and is said to be removed.
Infectious Period & Generalisations of the GSE
• The infectious periods of different infectives are assumed to
be independent and identically distributed according to the
distribution of a random variable D,. . .
• . . . which can have any arbitrary but specified distribution.
• The special (Markovian) case where the infectious period
follows an Exponential distribution is known as the General
Stochastic Epidemic (GSE).
Although assuming an Exponential infectious period is
mathematically convenient (limit results etc) it is not biologically
motivated.
This lead to various generalisations of this particular model
(infectious period, SEIR, βij . . .)
A Heterogeneously Mixing Stochastic Epidemic Model
Model Specification
Infection Rate:
βij := β0 · h(i, j)
(1)
Di ∼ Ga(α, γ)
(2)
Infectious Period:
The parameters of interest are β0 , γ > 0 (α assumed to be
known).
The (deterministic) function h(i, j) refers to individual’s
characteristics which might involve some other parameters as well.
Inference
GOAL: Draw inference for the parameters (β0 , γ)
Likelihood:
f (I, R|β0 , γ) ∝
nI
Y
nI
Y



 Z T X X

βij dt
βji × exp −


Ik

X

i=1,i6=k
×

j∈Yi
i∈It − j∈St −
fD (Ri − Ii )
i=1
where
• Yi = {j : Ij < Ii < Rj }
• k denotes the initial infective.
• It − and St − denote the set of infectives and susceptibles just
prior time t.
• I = (I1 , . . . , InI ) and R = (R1 , . . . , RnR )
• fD (·) is the density of the infectious distribution.
Bayesian Inference for Stochastic Epidemics
In general, inference for disease outbreak data is hard and often
requires problem-specific methodology:
• Incompleteness
• Actual process not observed (eg. infection times),
• Undetected colonisation (eg. imperfect sensitivity).
• Dependence
• Correlation between unobserved data and model parameters
can cause mixing problems for the MCMC.
• Dimensionality
• d1 = 2 (model parameters)
• d2 (missing data - infection times).
MCMC Algorithms for Stochastic Epidemics
Gibson and Renshaw (1998) & Roberts and O’Neill (1999) were
the first to apply MCMC methods for stochastic epidemics.
Outline of the MCMC (centered) algorithm:
1. Initialise the algorithm;
2. Choose uniformly one (or more) infection time(s)
Ij ,j = 1, . . . , nI and update it (them) individually
using a Metropolis Hastings step by proposing a
replacement infection time;
3. Update the parameters β0 and γ via a Gibbs step.
MCMC Implementation
Terminology:
Random Scan (RS): At each MCMC iteration we update only one
of the infection times (chosen at random).
δ% Deterministic Scan (δ% DS): At each MCMC iteration, we
choose (at random) to update δ% of the infection times.
Computation:
The integral involved in the likelihood can be rewritten as follows:
Z
T
Ik
X X
i∈It − j∈St −
βij dt =
nI X
N
X
βij (Ri ∧ Ij − Ij ∧ Ii ).
i=1 j=1
This reduces the computational cost of evaluating the likelihood
significantly.
Remarks
− (Efficient) Block updating of the infection times is very hard
especially when the size of the data (nI , N ) is big.
− Gibbs step for the infection times is possible, but difficult to
construct.
− The independence sampler
Ri − Ii ∼ Ga(α, γ)
seems to be the most efficient proposal among the previously
mentioned sampler.
Random Scan vs Deterministic Scan
Updating the infection times I
0.2 0.4 0.6 0.8 1.0
0.0
ACF
[C] − RS: I
0
50
100
Lag
150
200
Updating the infection times I (cont.)
0.2 0.4 0.6 0.8 1.0
0.0
ACF
[C] − 10% DS: I
0
50
100
Lag
150
200
Updating the infection times I(cont.)
0.0 0.2 0.4 0.6 0.8 1.0
ACF
[C] − 50% DS: I
0
50
100
Lag
150
200
Updating the infection times I(cont.)
0.0 0.2 0.4 0.6 0.8 1.0
ACF
[C] − 100% DS: I
0
50
100
Lag
150
200
Comments
• Ignoring the cpu time needed to run an algorithm, 100%
Deterministic scan performs better than any other chosen
algorithm (ie. δ < 1).
• In general, increasing δ improves the mixing of the MCMC
algorithm, nevertheless is more computationally costly (slower
in cpu time).
• It is of interest to quantify the optimal percentage of
deterministic scan taking into account mixing and cpu time.
• The optimal δ will depend on the size of the epidemic (nI , N ).
Easy to do it computationally − hard to derive a theoretical .
Alternative Target Distributions − Any better?
• Due to the conditional independence of β0 and γ (given the
infection and removal times), β0 or γ (or even both) could be
integrated out.
• It turns out that if δ is chosen to be relatively small, then
pretty similar results (regarding efficiency) are obtained.
• However . . .
. . . If we increase δ, and in particular we choose a 100% DS,
integrating both model parameters out (i.e. β0 and γ) and
choose as target distribution
π(I|R)
seems to increase (slightly) the algorithm’s efficiency
compared to the standard target distribution (π(I, β0 , γ|R)).
A Working Example
Model’s Equations
βij
Ri − Ii
N
= β0 exp {−ρ(i, j)}
∼ Gamma(α, γ)
= 501 (1 initially infective)
where ρ(i, j) denotes the distance between the two individuals.
Simulated Dataset
True β0
True α
True γ
nI = nR
D1
0.0025
0.5
0.25
284
D2
0.0025
2
1
275
D3
0.0025
5
2.5
286
A Working Example (cont.)
2.0
1.0
D1
D2
D3
0.0
Density f(x)
Distributions of the Infectious Periods
0
1
2
3
4
5
Mixing of the MCMC Algorithm
(α, γ) = (0.5, 0.25)
0.2 0.4 0.6 0.8 1.0
0.0
ACF
ACF plot for γ :(Dataset 1)
0
50
100
Lag
150
200
Mixing of the MCMC Algorithms (cont.)
(α, γ) = (2, 1)
0.2 0.4 0.6 0.8 1.0
0.0
ACF
ACF plot for γ :(Dataset 2)
0
50
100
Lag
150
200
Mixing of the MCMC Algorithms (cont.)
(α, γ) = (5, 2.5)
0.2 0.4 0.6 0.8 1.0
0.0
ACF
ACF plot for γ :(Dataset 3)
0
50
100
Lag
150
200
Reasons for Poor Mixing
If we carefully look at one of the likelihood equations which
describes the infectious period of each individual:
Ri − Ii ∼ Ga(α, γ).
This reveals that a-priori γ and I are dependent and this causes
problems to the MCMC mixing.
Ri − Ii
nI
X
i=1
∼ Ga(α, γ), for i = 1, . . . , nI
(Ri − Ii ) ∼ Ga(αnI , γ)
Reasons for Poor Mixing (cont.)
Thus for large nI or
Pα,I the parameter γ and the sum of the
infectious periods ni=1
(Ri − Ii ) are a-priori heavily dependent.
If these two were the parameters of interest, then this a-priori
correlation would have caused mixing problems in the case of a
two-state Gibbs sampler [see Amit(1991), Roberts and Sahu
(1997)].
Things are more complicated in practice since the MCMC schemes
used so far involved deterministic scan update of the each of
infection times Ii , i = 1, . . . , nI .
Correlation between Infection times and γ
Dataset 3
Dataset 2
γ
0.4
2
3
0.8
γ
4
1.2
5
0.15 0.25 0.35 0.45
γ
Dataset 1
25.0
26.5
I
18.5
20.0
I
7.0
8.0
I
Non−Centered Parameterisations (NCP)
The non-centered methodology presented in Papaspiliopoulos et.
al (2003) suggests that appropriate non−centered
parameterisations out perform the centered existing centered
algorithms over a range of examples.
Their findings can be summarized as follows:
When the observed (missing) data are much more
informative about the parameter of interest than the
missing (observed) then is better to choose a centered
(non−centered) parameterisation.
NCP for Stochastic Epidemics
We introduce a non−centered parameterisation:
Ui = γ(Ri − Ii ), i = 1, . . . , nI
It easy to see that since Ui ∼ Ga(α, 1),i = 1, . . . , nI , the random
variables U = (U1 , . . . , UnI )T and γ are apriori independent.
Change in variables:
(I, β0 , γ, R) → (U, β0 , γ, R).
Implementation
1. Get a sample of (γ, I) via a centered algorithm;
2. Transform the I’s to U’s via
Ui = γ(Ri − Ii )
3. Update γ using Metropolis Hastings algorithm
using π(γ|U, β0 , R) ;
(P)NCP for Stochastic Epidemics (2)
1. Draw samples of (γ, I) via MCMC on π(γ, I|R) → update γ
using RwM (Neal and Roberts, 2005)
2. Draw samples of (γ, I) via MCMC on the most efficient
standard algorithm (e.g. on π(I|R)) → update γ using various
independence MH sampler (Kypraios 2007)
Partially Non−Centered Parameterisations
The set of the infected individuals in the epidemic, is partitioned
into two groups, C and U (assign with probability µ).
For those individuals in the U, let
Ui = γ(Ri − Ii ) (i ∈ U)
Results - Dataset D1
[PNC]
0.0
0.4
ACF
0.4
0.0
ACF
0.8
0.8
[C]
0
200
Lag
400
0
200
Lag
400
Results - Dataset D2
[PNC]
0.0
0.4
ACF
0.4
0.0
ACF
0.8
0.8
[C]
0
400
Lag
800
0
400
Lag
800
Results - Dataset D3
[PNC]
0.0
0.4
ACF
0.4
0.0
ACF
0.8
0.8
[C]
0
500 1000
Lag
0
500 1000
Lag
Bayesian Model Choice in
Stochastic Epidemics
Model Choice (1)
Motivated by applications in healthcare associated infections
(MRSA etc) various models have been proposed (Kypraios et al
2009):
1. M1 : λj = β0
2. M2 : λj = β0 + β1 nC + β2 nQ
3. M3 : λj = β0 + β1 I(nC > 0) + β2 I(nQ > 0)
Question: Is there evidence to support the assumption of linear
colonisation pressure?
• Bayesian Model Choice.
• Posterior Model Probabilities - Bayes Factors
• Within-Model prior distributions and Lindley’s paradox :
Prior’s Matching & Prior Senstivity
• RJMCMC
Matched Prior Distributions
The idea is to match the prior distributions for the two models by
trying to make the pressure experienced by a susceptible individual
similar in both models.
Assign a prior distribution to β1F say, and then derive the prior
distribution for β1S by matching the moments of the prior
distributions:
E [λF ] = E [λS ]
V [λF ] = V [λS ]
Easy to derive the “matched priors” when assuming Exponential
priors.
Computing Marginal Likelihoods?
{Work in progress with Tony Pettit and Nial Friel}
When the number of models to explore is not too big, it is worth
seeking for alternative methods to RJMCMC.
The idea of power posteriors (Friel and Pettit, 2008) seems very
promising when infection times are assumed to be known.
Z 1
log{π(Y)} =
Eθ|y,t [log{L(y|θ)}] dt
0
where π(θ|y , t) ∝ L(y|θ)t π(θ), i.e. the power posterior.
Significant improvements have been made to the original idea of
power posterior but there is still more work required when infection
times are treated as unknown.
Conclusions
• Although “easy” to implement, standard (centered) MCMC
algorithm for SIR-type epidemic models perform poorly as the
number of infected individuals (nI ) increases.
• Non−Centered Parameterisations often offer much more
robust algorithms, there is relatively little extra cost in
implementing such an algorithm.
• “Matched-Priors” in the model choice context offer an easy
way towards avoiding Lindleys paradox.
• Power posteriors could be used to numerically evaluate
marginal likelihood in the epidemic modelling context.
Download