Bayesian inference and model selection for stochastic epidemics and other coupled hidden Markov models (with special attention to epidemics of Escherichia coli O157:H7 in cattle) Simon Spencer 3rd May 2016 Acknowledgements Bärbel Finkenstädt Rand Peter Neal TJ McKinley Nigel French, Tom Besser and Rowland Cobbold Panayiota Touloupou Outline 1. Introduction 2. Bayesian inference for epidemics 3. Model selection for epidemics 4. Scalable inference for epidemics 5. Conclusion Introduction Introduction A typical epidemic model: Susceptible → Exposed → Infected → Removed Infections occur according to an inhomogeneous Poisson process with rate ∝ S(t)I (t). 100 A simulation 0 20 40 60 80 Susceptible Exposed Infected Removed 0 20 40 60 time 80 100 Comments Statistical inference for epidemic models is hard. Intractable likelihood – need to know infection times. Usual solution: large scale data augmentation MCMC. What are the observed data? Epidemic data Historically: final size (single number). Final size in many sub-populations, e.g. households. Markov models: removal times. Who is removed is not needed / recorded. Epidemic data Historically: final size (single number). Final size in many sub-populations, e.g. households. Markov models: removal times. Who is removed is not needed / recorded. Individual level diagnostic test results. To be realistic, tests are imperfect. Temporal resolution of 1 day. Epidemic data Historically: final size (single number). Final size in many sub-populations, e.g. households. Markov models: removal times. Who is removed is not needed / recorded. Individual level diagnostic test results. To be realistic, tests are imperfect. Temporal resolution of 1 day. ⇒ View epidemic as hidden Markov model Motivating example: Escherichia coli O157 E. coli O157 is a highly pathogenic form of Escherichia coli. It can cause severe gastroentestinal illness, haemorrhagic diarrhoea and even death. Outbreaks and endemic cases are associated with food, water or direct contact with infected animals. Cattle are the main reservoir. Additional economic burden due to impacts on trade. Study design Natural colonization and faecal excretion of E. coli O157 in commercial feedlot. 20 pens containing 8 calves were sampled 27 times over a 99 day period. Each sampling event included a faecal pat sample and a recto-anal mucosal swab (RAMS). Tests were assumed to have perfect specificity but imperfect sensitivity. Patterns of infection 8 ● ● 7 ● ● ● 6 ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● 1 Animal Positive Tests, Pen 5 (South) ● ● ● ● ● ● ● ● ● 0 ● ● ● RAMS Faecal ● ● ● Negative ● ● ● ● 20● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● Time (days) 60 80 ●● ● ● ● 100 Patterns of infection 8 ● ● 7 ● ● ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● 3 ● ● ● 2 5 ● ● ● ● ● 1 Animal Positive Tests, Pen 7 (North) ● ● ● 0● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20● ● ● ● ●● 40 ● ● ●● ● ● ● Time (days) 60 ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 80 ●● ● ● ● ●● ● ● RAMS Faecal ●● ● ● ● ● Negative ● ● 100 Bayesian inference for epidemics Bayesian inference for epidemics Intractable likelihood: π(y |θ). Need to impute infection status of individuals x for augmented likelihood π(y |x , θ). Missing data x typically very high dimensional. Updating the infection status Standard method by O’Neill and Roberts (1999) involves 3 steps: 1 2 3 Add a period of infection Remove a period of infection Move an end-point of a period of infection This method was designed for SIR models (where individuals can’t be infected twice). Easily adapted to discrete time models. Add a period of infection Current: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ? Propose: 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 Choose a block of zeros at random. 2 Propose changing zeros to ones. 3 Accept or reject based on ratio of posteriors. Remove a period of infection Current: 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ? Propose: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Choose a complete block of ones. 2 Propose changing ones to zeros. 3 Accept or reject based on ratio of posteriors. Move an endpoint Current: 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 ? Propose: 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 Choose an endpoint of a block of ones. 2 Propose a new location for that endpoint. 3 Accept or reject based on ratio of posteriors. Some pros and cons ! Considerably fast ! Can handle non-Markov models % Most of the hidden states are not updated % High degree of autocorrelation Slow mixing of the chain and long run length % Tuning of the maximum block length required. Alternative approach: FFBS Discrete time epidemic is a hidden Markov model. Gibbs step: sample from the full condition distribution of the hidden states. Use Forward Filtering Backward Sampling algorithm (Carter and Kohn, 1994). Some pros and cons ! Very good mixing of the MCMC chains ! No tuning required % Computationally intensive At each timepoint we need to calculate N C summations O(TN 2C ) % High memory requirements All T forward variables must be stored The transition matrix is of dimension N C × N C N = number of infection states (e.g. 2) C = number of cows (e.g. 8) T = number of timepoints (e.g. 99) Example: SIS model Stochastic SIS (Susceptible-Infected-Susceptible) transmission model in discrete time.1 Xp,i,t infection status for animal i in pen p on day t. Xp,i,t = 1 – infected/colonized. Xp,i,t = 0 – uninfected/susceptible. We treat Xp,i,t as missing data and infer it using MCMC. Epidemic model parameters updated via Metropolis-Hastings and test sensitivities updated using Gibbs. 1 Spencer et al. (2015) ‘Super’ or just ‘above average’ ? Supershedders and the transmission of Escherichia coli O157:H7 among feedlot cattle. Interface 12, 20150446. Colonization probability: 8 𝑋𝑝,𝑗,𝑡 𝜌 𝕀(𝑆𝑝,𝑗,𝑡 >𝜏) P 𝑋𝑝,𝑖,𝑡+1 = 1 𝑋𝑝,𝑖,𝑡 = 0 = 1 − exp −𝛼 − 𝛽 𝑗=1 Susceptible Xp,i,t = 0 Colonized Xp,i,t = 1 Colonization duration: NegativeBinomial(𝑟, 𝜇) Pens: 𝑝 = 1 ⋯ 20 Animals: 𝑖 = 1 ⋯ 8 Time: 𝑡 = 1 ⋯ 99 days 1.0 Pen 5 Animal 5 RAMS Faecal 0.5 We can calculate the posterior infection probability for every day of the study. 0.0 Posterior colonization probability Example: Posterior infection probabilities 0 20 40 60 80 100 Time (days) ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ●● ●● ●● ●● ● ●● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● 20 ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● 0 ●● 40 ●● ●● ●● ● ●● ● ● ● 60 Time (days) 80 100 1 2 3 4 5 6 7 8 Pen 8 Animal 1 2 3 4 5 6 7 8 Animal Pen 5 ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ●● ●● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● 0 20 ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● 40 ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● 60 Time (days) ● ●● ● ● ● 80 100 Model selection for epidemics Model selection for epidemics A lot of epidemiologically interesting questions take the form of model selection questions. What is the transmission mechanism of this disease? Do infected individuals really exhibit an exposed period? Do water troughs spread E. coli O157? Posterior probabilities and marginal likelihoods Would like the posterior probability in favour of model i. π(y |Mi )P(Mi ) P(Mi |y ) = P j π(y |Mj )P(Mj ) Posterior probabilities and marginal likelihoods Would like the posterior probability in favour of model i. π(y |Mi )P(Mi ) P(Mi |y ) = P j π(y |Mj )P(Mj ) Equivalently, the Bayes factor comparing models i and j. Bij = π(y |Mi ) π(y |Mj ) Posterior probabilities and marginal likelihoods Would like the posterior probability in favour of model i. π(y |Mi )P(Mi ) P(Mi |y ) = P j π(y |Mj )P(Mj ) Equivalently, the Bayes factor comparing models i and j. Bij = π(y |Mi ) π(y |Mj ) All we need is the marginal likelihood, Z π(y |Mi ) = π(y |θ, Mi )π(θ|Mi ) dθ but how can we calculate it? Marginal likelihood estimation Many existing approaches: Chib’s method Power posteriors Harmonic mean Bridge sampling Most direct approach: importance sampling. Use asymptotic normality of the posterior to find efficient proposal. But how to deal with the missing data? Dr Peter Neal Marginal likelihood estimation using importance sampling 1 Run MCMC as usual. 2 Fit normal distribution to posterior samples2 ⇒ q(θ). 3 Draw N samples from q(θ). π(y ) = 2 Z π(y |θ)π(θ) dθ. To avoid problems, make q overdispersed relative to the posterior. Marginal likelihood estimation using importance sampling 1 Run MCMC as usual. 2 Fit normal distribution to posterior samples2 ⇒ q(θ). 3 Draw N samples from q(θ). π(y ) ≈ N X π(y |θ i )π(θ i ) i=1 2 q(θ i ) . To avoid problems, make q overdispersed relative to the posterior. Marginal likelihood estimation with missing data 1 Run MCMC as usual. 2 Fit normal distribution to posterior samples → q(θ). 3 Draw N samples from q(θ). 4 For each sampled θ i draw missing data x i from the full conditional using FFBS. π(y ) ≈ N X π(y |x i , θ i ) π(x i |θ i ) π(θ i ) . π(x i |y , θ i ) q(θ i ) i=1 Simulation study: pneumococcol carriage Panayiota performed a thorough simulation study3 based on Melegaro at al. (2004). Household based longitudinal study on carriage of Streptococcus Pneumoniae. Data consist of repeated diagnostic tests. Multi-type model with 11 parameters, 2600 observed data and 6500 missing data. 3 Touloupou et al. (2016) Model comparison with missing data using MCMC and importance sampling. arXiv 1512.04743 Results: marginal likelihood estimation -1238 -1237 -1236 -1235 -1237.5 -1234 -1233 -1237.25 -1232 -1231 -1237 ISN1 ISN2 ISN3 ISN4 ISmix ISt4 ISt6 ISt8 ISt10 Chib PP HM -931 -929 -927 -925 -923 Log marginal likelihood -921 -919 -1230 Results: Bayes factor estimation Do adults and children acquire infection at the same rate? M1 : kA 6= kC M2 : kA = kC 2 3 4 5 -4 ISmix ISmix Chib Chib RJcor RJcor RJ RJ ISmix ISmix PP PP HM HM 2 5 8 11 15 19 23 Log B12 (a) Data simulated from model M1 -22 -3 -16 -2 -10 -4 -1 0 4 0 8 Log B12 (b) Data simulated from model M2 Results: Evolution of the log Bayes factor 6 HM 4 PP RJcor 2 Chib 0 IS -2 Log B12 8 10 12 Initial MCMC run for IS and Chib RJ pilot + RJcor burn in 0 30 60 90 120 Time (minutes) 150 180 210 240 Application 1: E. coli O157 in feedlot cattle Do animals develop immunity over time? We compare two models for infection period: Geometric: lack of memory. Negative Binomial: probability of recovery depends on duration of infection. The Negative Binomial is a generalisation of the Geometric: Setting Negative Binomial dispersion parameter κ = 1 leads to Geometric. 1.5 Application 1: Results IS RJMCMC RJMCMC and IS agree on the estimate of the Bayes factor Log BNG 1.0 IS estimator: faster convergence 0.0 0.5 Bayes factor supports the Negative Binomial model 0 30 60 90 120 Time (in minutes) 150 180 The longer the colonization, the greater the probability of clearance – may indicate an immune response in the host Application 2: Role of pen area/location 1 Pen 14 Pen 15 North Pen Set Pen 6 Pen Size 6m×17m Pen 8 North = small Pen 7 Pen 9 Pen 10 Pen 16 Pen 17 Pen 18 Pen 19 Pen 20 South = big Supplement and Premix Storage Scale House Catch pens from scale house Pen 1 Pen 11 Pen 12 Pen 13 Pen 5 South Pen Set Pen 2 Pen Size 6m×37m Pen 4 Pen 3 Application 2: Role of pen area/location Do north and south pens have different risk of infection? Allow different external (αs , αn ) and/or within-pen (βs , βn ) transmission rates. Candidate models: Model 1 2 3 4 External North South αn αs α α αn αs α α Within-pen North South βn βs βn βs β β β β Application 2: Posterior probabilities Posterior Probability 0.8 ● RJMCMC and IS provide identical conclusions. 0.6 Evidence to support different within-pen transmission rates. 0.4 0.2 ● ● 0.0 1 2 Method 3 Model IS 4 RJMCMC Animals in smaller pens more at risk of within-pen infection Application 3: Investigating transmission between pens Additional dataset: pens adjacent in a 12 × 2 rectangular grid. No direct contact across feed buck. Shared waterers between pairs of adjacent pens. Pen 24 Pen 1 Pen 23 Pen 22 Pen 21 Pen 20 Pen 19 Pen 18 Pen 17 Pen 16 Pen 15 Pen 14 Pen 13 Pen 2 Pen 4 Pen 6 Pen 8 Pen 10 Pen 11 Pen 12 Pen 3 Pen 5 Pen 7 Pen 9 Application 3: Investigating transmission between pens Do waterers spread infection? (a) Model 1: No contacts between pens (b) Model 2: Transmission via a waterer (c) Model 3: Transmission via any boundary Application 3: Posterior probabilities Posterior Probability 0.6 RJMCMC: hard to design jump mechanism 0.5 Using IS results still possible. 0.4 Evidence for transmission between pens sharing a waterer rather than another boundary. 0.3 0.2 ● ● 1 2 Model 3 Scalable inference for epidemics Scalable inference for epidemics Thus far we have been doing inference for small populations. Households Pens The FFBS algorithm scales very badly with population size. We would like an inference method that scales better with population size. Graphical representation Diagram of the Markovian epidemic model. Circles are hidden states and rectangles are observed data. Arrows represent dependencies. [1] xt−2 [1] xt−1 [1] [2] [2] xt−1 [2] [3] [3] yt−2 [1] xt+2 [2] xt [1] [2] xt+1 [2] xt+2 [3] [3] xt [3] yt [2] xt+3 [2] [2] yt+3 yt xt−1 [1] xt+3 yt+3 yt yt−2 xt−2 [1] xt+1 [1] yt−2 xt−2 [1] xt [3] xt+1 [3] xt+2 [3] xt+3 [3] yt+3 A new approach – the iFFBS algorithm Update one individual at a time by sampling from the full conditional: Reformulate graph: [c] [1] xt−1 [1] xt [1] ⇒ View as coupled hidden Markov model [1] [2] xt [3] xt xt−1 [2] [2] xt+1 [3] xt+1 [2] yt [3] yt [−c] xt+1 yt xt−1 [1:C ] P(x 1:T | y 1:T , x 1:T , θ). Sample [3] A new approach – the iFFBS algorithm Update one individual at a time by sampling from the full conditional: Reformulate graph: [c] [1] xt−1 [1] xt [1] ⇒ View as coupled hidden Markov model [1] [2] xt [3] xt xt−1 [2] [2] xt+1 [3] xt+1 [2] yt [3] yt [−c] xt+1 yt xt−1 [1:C ] P(x 1:T | y 1:T , x 1:T , θ). Sample [3] Computational complexity reduced from O(TN 2C ) to O(TCN 2 ). N = number of infection states (e.g. 2) C = number of cows (e.g. 8) T = number of timepoints (e.g. 99) ● 3 4 5 ● ● ● 6 7 8 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● 0.2 1 ● 0.4 ● 0 ● ● 0.8 ● ACF per iteration 0.2 Time (in seconds) ● 0.1 2.5 1.5 2 ● 9 0 0.5 Animals in pen 0 Time (in seconds) ● ● ● Spencer's Dong's fullFFBS iFFBS ● 3 3.5 0.3 1 Comparison of methods ● ● ● 3 4 5 ● ● ● ● ● ● 6 7 8 9 10 11 Animals in pen 0 5 10 15 Lag 20 25 30 175 Larger populations Spencer's Dong's iFFBS 100 75 50 25 ● ● ● ● 100 200 300 400 0 Relative speed 125 150 ● ● ● ● ● ● ● 500 600 700 800 900 1000 Animals in pen Conclusion Conclusion FFBS algorithm generates better mixing MCMC for parameter inference. Unlocks direct approach to marginal likelihood estimation. Allows important epidemiological questions to be answered via model selection. iFFBS can perform inference in large populations – exploits dependence structure in epidemic data. What I didn’t say All of this work (and much more!) has been done by Panayiota. FFBS and iFFBS can also be used as a Metropolis-Hastings proposal to fit non-Markovian epidemic models. Can we do model selection with iFFBS? Power of iFFBS allows more complex models to be fitted, e.g. multi-strain epidemic models. 1.0 -- --- -- - - - - - - - -- +C + + - - - - - -- - - - - - - - - - -- ++ + C - - - - + -- - - - 0.8 0.6 0.4 0.2 Pen 3 - Animal 4 Pen 3 - Animal 1 Current work 0.0 1.0 O - ++ ++ - + - - - - - - - +M+ - - + - - - - - +- M- - 0.8 0.6 0.4 0.2 0.0 1 15 29 43 57 71 85 99 1 15 29 + - ++ + ++ ++ ++ +- - - - + + - - - + - - +O +- + - + 43 57 71 85 99 Day - + ++ ++ ++ ++ + - + + - ++ +- - - M 0.8 0.6 0.4 0.2 Pen 3 - Animal 8 Pen 3 - Animal 6 Day 1.0 - - - - - - -- - - - - - - - - -- - - - 0.0 1.0 -- --- -- - - - - - - - -- - - - ++ +C - - U- + - - - - - - - - -- - - - ++ + - - - -- - - - 0.8 0.6 0.4 0.2 0.0 1 15 29 43 57 71 85 99 1 15 29 Day Serotype 43 57 71 Day A C G M O P T U - 85 99