Bayesian Inference and Model Choice for Nonlinear Stochastic Processes: Applications to stochastic epidemic modelling Theo Kypraios http://www.maths.nott.ac.uk/∼tk School of Mathematical Sciences, University of Nottingham EPSRC Symposium Workshop on Markov Chain-Monte Carlo, Warwick , March 2009 Joint work with: Gareth O. Roberts @ University of Warwick Phil O’Neill @ University of Nottingham Ben Cooper @ Health Protection Agency Background Understanding the spread of an infectious disease is an important issue in order to prevent major outbreaks of an epidemic. In general, inference problems for disease outbreak data are complicated by the facts that • the data are inherently dependent and • the data are usually incomplete in the sense that the actual process of infection is not observed. However, it is often possible to formulate simple stochastic models which describe the key features of epidemic spread. Outline • Bayesian inference for partially observed stochastic epidemics • Efficient Markov Chain Monte Carlo algorithms (including Non-Centered reparameterisations) • Bayesian Model Choice using RJMCMC and other alternative methods. • Illustration simulation study. The SIR Model Principles − Consider a closed population (no births/deaths) of size N individuals − α initially infected individuals − at any given time point, each individual i is in one of the three states: • Susceptible • Infected • Removed S→I→R The Stochastic Process • Each of the infected individuals remains infectious for some period which is distributed according to the distribution of random variable D. • While infectious, makes contacts with each of the N individuals in the population at times given by the points of a Poisson process of rate β > 0. • Any such contact with a susceptible individual immediately makes the susceptible an infective. • At the end of its infectious period an infective no longer makes any contacts and is said to be removed. Infectious Period & Generalisations of the GSE • The infectious periods of different infectives are assumed to be independent and identically distributed according to the distribution of a random variable D,. . . • . . . which can have any arbitrary but specified distribution. • The special (Markovian) case where the infectious period follows an Exponential distribution is known as the General Stochastic Epidemic (GSE). Although assuming an Exponential infectious period is mathematically convenient (limit results etc) it is not biologically motivated. This lead to various generalisations of this particular model (infectious period, SEIR, βij . . .) A Heterogeneously Mixing Stochastic Epidemic Model Model Specification Infection Rate: βij := β0 · h(i, j) (1) Di ∼ Ga(α, γ) (2) Infectious Period: The parameters of interest are β0 , γ > 0 (α assumed to be known). The (deterministic) function h(i, j) refers to individual’s characteristics which might involve some other parameters as well. Inference GOAL: Draw inference for the parameters (β0 , γ) Likelihood: f (I, R|β0 , γ) ∝ nI Y nI Y Z T X X βij dt βji × exp − Ik X i=1,i6=k × j∈Yi i∈It − j∈St − fD (Ri − Ii ) i=1 where • Yi = {j : Ij < Ii < Rj } • k denotes the initial infective. • It − and St − denote the set of infectives and susceptibles just prior time t. • I = (I1 , . . . , InI ) and R = (R1 , . . . , RnR ) • fD (·) is the density of the infectious distribution. Bayesian Inference for Stochastic Epidemics In general, inference for disease outbreak data is hard and often requires problem-specific methodology: • Incompleteness • Actual process not observed (eg. infection times), • Undetected colonisation (eg. imperfect sensitivity). • Dependence • Correlation between unobserved data and model parameters can cause mixing problems for the MCMC. • Dimensionality • d1 = 2 (model parameters) • d2 (missing data - infection times). MCMC Algorithms for Stochastic Epidemics Gibson and Renshaw (1998) & Roberts and O’Neill (1999) were the first to apply MCMC methods for stochastic epidemics. Outline of the MCMC (centered) algorithm: 1. Initialise the algorithm; 2. Choose uniformly one (or more) infection time(s) Ij ,j = 1, . . . , nI and update it (them) individually using a Metropolis Hastings step by proposing a replacement infection time; 3. Update the parameters β0 and γ via a Gibbs step. MCMC Implementation Terminology: Random Scan (RS): At each MCMC iteration we update only one of the infection times (chosen at random). δ% Deterministic Scan (δ% DS): At each MCMC iteration, we choose (at random) to update δ% of the infection times. Computation: The integral involved in the likelihood can be rewritten as follows: Z T Ik X X i∈It − j∈St − βij dt = nI X N X βij (Ri ∧ Ij − Ij ∧ Ii ). i=1 j=1 This reduces the computational cost of evaluating the likelihood significantly. Remarks − (Efficient) Block updating of the infection times is very hard especially when the size of the data (nI , N ) is big. − Gibbs step for the infection times is possible, but difficult to construct. − The independence sampler Ri − Ii ∼ Ga(α, γ) seems to be the most efficient proposal among the previously mentioned sampler. Random Scan vs Deterministic Scan Updating the infection times I 0.2 0.4 0.6 0.8 1.0 0.0 ACF [C] − RS: I 0 50 100 Lag 150 200 Updating the infection times I (cont.) 0.2 0.4 0.6 0.8 1.0 0.0 ACF [C] − 10% DS: I 0 50 100 Lag 150 200 Updating the infection times I(cont.) 0.0 0.2 0.4 0.6 0.8 1.0 ACF [C] − 50% DS: I 0 50 100 Lag 150 200 Updating the infection times I(cont.) 0.0 0.2 0.4 0.6 0.8 1.0 ACF [C] − 100% DS: I 0 50 100 Lag 150 200 Comments • Ignoring the cpu time needed to run an algorithm, 100% Deterministic scan performs better than any other chosen algorithm (ie. δ < 1). • In general, increasing δ improves the mixing of the MCMC algorithm, nevertheless is more computationally costly (slower in cpu time). • It is of interest to quantify the optimal percentage of deterministic scan taking into account mixing and cpu time. • The optimal δ will depend on the size of the epidemic (nI , N ). Easy to do it computationally − hard to derive a theoretical . Alternative Target Distributions − Any better? • Due to the conditional independence of β0 and γ (given the infection and removal times), β0 or γ (or even both) could be integrated out. • It turns out that if δ is chosen to be relatively small, then pretty similar results (regarding efficiency) are obtained. • However . . . . . . If we increase δ, and in particular we choose a 100% DS, integrating both model parameters out (i.e. β0 and γ) and choose as target distribution π(I|R) seems to increase (slightly) the algorithm’s efficiency compared to the standard target distribution (π(I, β0 , γ|R)). A Working Example Model’s Equations βij Ri − Ii N = β0 exp {−ρ(i, j)} ∼ Gamma(α, γ) = 501 (1 initially infective) where ρ(i, j) denotes the distance between the two individuals. Simulated Dataset True β0 True α True γ nI = nR D1 0.0025 0.5 0.25 284 D2 0.0025 2 1 275 D3 0.0025 5 2.5 286 A Working Example (cont.) 2.0 1.0 D1 D2 D3 0.0 Density f(x) Distributions of the Infectious Periods 0 1 2 3 4 5 Mixing of the MCMC Algorithm (α, γ) = (0.5, 0.25) 0.2 0.4 0.6 0.8 1.0 0.0 ACF ACF plot for γ :(Dataset 1) 0 50 100 Lag 150 200 Mixing of the MCMC Algorithms (cont.) (α, γ) = (2, 1) 0.2 0.4 0.6 0.8 1.0 0.0 ACF ACF plot for γ :(Dataset 2) 0 50 100 Lag 150 200 Mixing of the MCMC Algorithms (cont.) (α, γ) = (5, 2.5) 0.2 0.4 0.6 0.8 1.0 0.0 ACF ACF plot for γ :(Dataset 3) 0 50 100 Lag 150 200 Reasons for Poor Mixing If we carefully look at one of the likelihood equations which describes the infectious period of each individual: Ri − Ii ∼ Ga(α, γ). This reveals that a-priori γ and I are dependent and this causes problems to the MCMC mixing. Ri − Ii nI X i=1 ∼ Ga(α, γ), for i = 1, . . . , nI (Ri − Ii ) ∼ Ga(αnI , γ) Reasons for Poor Mixing (cont.) Thus for large nI or Pα,I the parameter γ and the sum of the infectious periods ni=1 (Ri − Ii ) are a-priori heavily dependent. If these two were the parameters of interest, then this a-priori correlation would have caused mixing problems in the case of a two-state Gibbs sampler [see Amit(1991), Roberts and Sahu (1997)]. Things are more complicated in practice since the MCMC schemes used so far involved deterministic scan update of the each of infection times Ii , i = 1, . . . , nI . Correlation between Infection times and γ Dataset 3 Dataset 2 γ 0.4 2 3 0.8 γ 4 1.2 5 0.15 0.25 0.35 0.45 γ Dataset 1 25.0 26.5 I 18.5 20.0 I 7.0 8.0 I Non−Centered Parameterisations (NCP) The non-centered methodology presented in Papaspiliopoulos et. al (2003) suggests that appropriate non−centered parameterisations out perform the centered existing centered algorithms over a range of examples. Their findings can be summarized as follows: When the observed (missing) data are much more informative about the parameter of interest than the missing (observed) then is better to choose a centered (non−centered) parameterisation. NCP for Stochastic Epidemics We introduce a non−centered parameterisation: Ui = γ(Ri − Ii ), i = 1, . . . , nI It easy to see that since Ui ∼ Ga(α, 1),i = 1, . . . , nI , the random variables U = (U1 , . . . , UnI )T and γ are apriori independent. Change in variables: (I, β0 , γ, R) → (U, β0 , γ, R). Implementation 1. Get a sample of (γ, I) via a centered algorithm; 2. Transform the I’s to U’s via Ui = γ(Ri − Ii ) 3. Update γ using Metropolis Hastings algorithm using π(γ|U, β0 , R) ; (P)NCP for Stochastic Epidemics (2) 1. Draw samples of (γ, I) via MCMC on π(γ, I|R) → update γ using RwM (Neal and Roberts, 2005) 2. Draw samples of (γ, I) via MCMC on the most efficient standard algorithm (e.g. on π(I|R)) → update γ using various independence MH sampler (Kypraios 2007) Partially Non−Centered Parameterisations The set of the infected individuals in the epidemic, is partitioned into two groups, C and U (assign with probability µ). For those individuals in the U, let Ui = γ(Ri − Ii ) (i ∈ U) Results - Dataset D1 [PNC] 0.0 0.4 ACF 0.4 0.0 ACF 0.8 0.8 [C] 0 200 Lag 400 0 200 Lag 400 Results - Dataset D2 [PNC] 0.0 0.4 ACF 0.4 0.0 ACF 0.8 0.8 [C] 0 400 Lag 800 0 400 Lag 800 Results - Dataset D3 [PNC] 0.0 0.4 ACF 0.4 0.0 ACF 0.8 0.8 [C] 0 500 1000 Lag 0 500 1000 Lag Bayesian Model Choice in Stochastic Epidemics Model Choice (1) Motivated by applications in healthcare associated infections (MRSA etc) various models have been proposed (Kypraios et al 2009): 1. M1 : λj = β0 2. M2 : λj = β0 + β1 nC + β2 nQ 3. M3 : λj = β0 + β1 I(nC > 0) + β2 I(nQ > 0) Question: Is there evidence to support the assumption of linear colonisation pressure? • Bayesian Model Choice. • Posterior Model Probabilities - Bayes Factors • Within-Model prior distributions and Lindley’s paradox : Prior’s Matching & Prior Senstivity • RJMCMC Matched Prior Distributions The idea is to match the prior distributions for the two models by trying to make the pressure experienced by a susceptible individual similar in both models. Assign a prior distribution to β1F say, and then derive the prior distribution for β1S by matching the moments of the prior distributions: E [λF ] = E [λS ] V [λF ] = V [λS ] Easy to derive the “matched priors” when assuming Exponential priors. Computing Marginal Likelihoods? {Work in progress with Tony Pettit and Nial Friel} When the number of models to explore is not too big, it is worth seeking for alternative methods to RJMCMC. The idea of power posteriors (Friel and Pettit, 2008) seems very promising when infection times are assumed to be known. Z 1 log{π(Y)} = Eθ|y,t [log{L(y|θ)}] dt 0 where π(θ|y , t) ∝ L(y|θ)t π(θ), i.e. the power posterior. Significant improvements have been made to the original idea of power posterior but there is still more work required when infection times are treated as unknown. Conclusions • Although “easy” to implement, standard (centered) MCMC algorithm for SIR-type epidemic models perform poorly as the number of infected individuals (nI ) increases. • Non−Centered Parameterisations often offer much more robust algorithms, there is relatively little extra cost in implementing such an algorithm. • “Matched-Priors” in the model choice context offer an easy way towards avoiding Lindleys paradox. • Power posteriors could be used to numerically evaluate marginal likelihood in the epidemic modelling context.