Supplementary Information Text S1 Quantifying spatiotemporal heterogeneity of MERS-CoV transmission in the Middle East region: a combined modeling approach Chiara Polettoa, Vittoria Colizzaa, b, Pierre-Yves Boëllea Universités, UPMC Univ Paris 06, INSERM, Institut Pierre Louis d’épidémiologie et de Santé Publique (IPLESP UMRS 1136), F75012, 27 rue Chaligny, 75012 Paris, France. aSorbonne bInstitute for Scientific Interchange Foundation, via Alassio 11/c, 10126 Torino, Italy. Contents Data ......................................................................................................................................................... 2 Available information ........................................................................................................................... 2 Imputation of epidemic curves ............................................................................................................. 2 Distribution of time from onset to hospitalization and notification ................................................... 2 Imputed epidemic curves ................................................................................................................. 3 Model design (regional analysis) ............................................................................................................. 3 Results ..................................................................................................................................................... 4 Model selection .................................................................................................................................... 4 Posterior distribution summary ............................................................................................................ 5 Estimated geographical scaling factors ........................................................................................... 5 Estimated parameters in the complete information on transmission scenario ................................ 6 Comparison of simulated and observed epidemics............................................................................. 6 Correlation between parameters ......................................................................................................... 6 Seasonal model ................................................................................................................................... 7 Sensitivity analysis .................................................................................................................................. 7 Sensitivity on other modelling assumptions ........................................................................................ 7 References .............................................................................................................................................. 9 Data Available information Data were obtained from the website [Rambaut, A. “MERS-cov Spatial, Temporal and Epidemiological Information”, http://epidemic.bio.ed.ac.uk/coronavirus background]. Table S1 shows the completeness of the data used for the analysis. Variable % reported cases for whom information is available Region 100% Date of onset 63% Date of hospitalisation 57% Date of notification 96% Onset imputed from Known 63% Hospitalisation 8% Notification 29% Secondary cases 34% Table S1: Completeness of the dataset Imputation of epidemic curves Distribution of time from onset to hospitalization and notification For hospitalization, onset to hospitalization data was available for 31% of the cases, with mean 4.6 days and standard deviation 4.5 days. For onset to notification, data was available for 49% of the cases, with mean 10.5 days and standard deviation 9.2 days. Figure S1: Time from onset to hospitalisation (left) and from onset to notification (right) of MESR-CoV in the Middle East from March 2012 to September 2014. For imputing missing onset dates, we applied the following approach: If hospitalisation date th was available, the onset date was imputed at th – d where d was sampled from the distribution of time from onset to hospitalisation, determined from other cases hospitalized in the same period (i.e. [th – 30 days, th + 30 days]); If only the notification date tr was known, the onset date was imputed at tr – d where d was sampled from the distribution of time from onset to notification, determined from other cases notified in the same period (i.e. [tr – 45 days, tr + 45 days]). This approach allowed taking into account potential changes in care and notification over time. Imputed epidemic curves 20 epidemic curves were imputed from the original data, accounting for missing onset dates. The overall profiles were little affected. Figure S2: MERS-CoV epidemic curves in the Middle East. The gray bars show the variability in incidence due to imputation of missing onset dates. Model design (regional analysis) The step 1 model is an analysis of incident cases time series in different regions. As transmission is still low, we hypothesized that the epidemic process was independent in the geographical regions. In each region, we postulated (dropping the r superscript for region): π·(π‘)~ππππ π ππ(πΈ(π·π (π‘))), where D(t) is overall incidence at time t; π (π‘)~π΅πππππππ(π π½ π (π‘ − 1)π·(π‘ − 1), π·(π‘)), where s(t) is incidence of cases described as secondary; ln(π½) ~π(0, ππ΅2 ), random effect for transmission strength; ln(πΌ) ~π(0, ππ΄2 ), random effect for sporadic cases. The prior distributions were: ππ΅2 ~exp(1) ππ΄2 ~exp(1) π~π΅ππ‘π(1,1) π ~exp(0.1) ′ ππ π ~exp(0.1) An example BUGS script is shown below : var pR[3],pr[2],alpha[n.province],beta[n.province], count[n.province,n.week],count.secondary[n.province,n.week], p[n.province], r[n.week], R[n.week] model { s.alpha ~ dexp(1) s.beta ~ dexp(1) pR[1]~dexp(0.1) pR[2]~dexp(0.1) pR[3]~dexp(0.1) pr[1]~dexp(0.1) pr[2]~dexp(0.1) # first observations for (k in 1:n.province) { alpha[k] ~ dnorm(0,1/(s.alpha*s.alpha)) count[k,1] ~dpois(pr[1] * exp(alpha[k]) * pop[k]) count.secondary[k,1] ~ dpois(0.01) p[k] ~ dbeta(1,1) } # rest of time series for (t in 2:n.week) { r[t] <- pr[1] + (pr[2]-pr[1]) * (t - 87)/4* step(t - 87)*step(90-t) + (pr[2]-pr[1]) * (95-t)/4* step(t - 91)*step(94-t) R[t-1] <- pR[1] + (pR[2]-pR[1]) * step(t - 87)*step(90-t) + (pR[3]-pR[1]) * step(t - 91)*step(94-t) # in each province for (k in 1:n.province) { e[k,t] <- r[t] * exp(alpha[k])* pop[k] + R[t-1]* count[k, t-1] count[k,t] ~ dpois(e[k,t]) count.secondary[k,t] ~ dbinom(p[k] * R[t-1]* count[k,t1]/e[k,t], count[k,t]) } } } Results Model selection Gibbs sampling was performed using JAGS and rjags. We obtained posterior samples of size 1000 from 100000 iterations sampled every 100 steps to limit autocorrelation. The DIC was computed by the JAGS module “dic”. DICs for all models were obtained for the 20 imputed epidemics and averaged. Geographical variation none R psp both psp and R none 2169 2131 1973 1959 psp 2148 2115 1891 1882 R 2088 2056 1888 1878 both psp and R 2072 2043 1850 1837 none 4288 4273 3128 3113 psp 3382 3368 2222 2207 R 4276 4250 3116 3090 both psp and R 3370 3344 2209 2183 temporal variation Partial information Complete information Table S2: DIC for all 32 models tested in the scenarios partial information on transmission and complete information on transmission. Posterior distribution summary Estimated geographical scaling factors Posterior means and credible intervals are provided for the best-fit model (partial information; geographical and temporal variation in all regions). Region ο‘ ο ο’ UAE 1.79 [0.59 - 3.89] 1.07 [0.68 - 1.82] Aseer 0.57 [0.15 - 1.36] 1.06 [0.46 - 2.13] Al Bahah 0.27 [0.03 - 0.81] 1.02 [0.31 - 2.29] Border Region 0.81 [0.09 - 2.64] 1.01 [0.30 - 2.27] Jordan 0.34 [0.09 - 0.80] 0.85 [0.32 - 1.54] Al-Jawf 2.84 [0.63 - 7.27] 0.78 [0.20 - 1.44] Kuwait 0.32 [0.06 - 0.89] 1.03 [0.36 - 2.22] Al Madinah 2.14 [0.50 - 5.42] 1.32 [0.79 - 2.40] Makkah 2.28 [0.64 - 5.62] 1.45 [0.96 - 2.47] Nejran 1.67 [0.33 - 4.57] 0.82 [0.22 - 1.55] Oman 0.46 [0.10 - 1.16] 0.77 [0.21 - 1.40] Qatar 2.73 [0.79 - 6.38] 0.83 [0.25 - 1.54] Al-Qassim 0.46 [0.08 - 1.31] 0.94 [0.28 - 1.95] Riyadh 6.23 [2.27 - 13.07] 1.25 [0.83 - 2.09] Eastern Province 3.19 [0.99 - 7.13] 1.86 [1.03 - 3.43] Tabuk 1.25 [0.23 - 3.37] 1.11 [0.57 - 2.19] Yemen 0.04 [0.01 - 0.10] 0.87 [0.24 - 1.69] Table S3: Posterior means and credible intervals for parameters πΌπ and π½π are provided for the best-fit model (partial information; geographical and temporal variation in all regions). Estimated parameters in the complete information on transmission scenario The parameters estimated in the “complete information” model showed, as expected, less transmission and more sporadic cases. Parameter estimate ππ π,1 0.027 ×10-6 [0.010 – 0.065] ππ π,2 1.2 ×10-6 [0.45 – 2.8] π 1 0.26 [0.15 – 0.38] π 2 0.82 [0.46 – 1.3] π 3 0.33 [0.20 – 0.48] Table S4: Parameter estimates obtained in the complete information on transmission scenario. Comparison of simulated and observed epidemics We simulated outbreaks in the Middle East using parameters sampled in the posterior distribution of the best fitting model. Each week, the number of detected cases was sampled in each region from a Poisson distribution with mean E(Dr(t)) as described in the text. The envelope of the predicted values was in accordance with the observed epidemic and showed that large stochastic variability was possible. Figure S3: observed MESR-CoV in the Middle East (line) and median (dashed) and pointwise 95% prediction interval from 1000 simulations. Correlation between parameters Overall, mixing in the chains was good, and quantiles of the posterior distribution stabilized over time. We limited autocorrelation in the posterior samples by retaining only 1 iteration every 1000. The scatterplots of the final distributions are shown in Figure S4. The π and ππ π, distributions were roughly independent, as shown by the shape of the posterior scatterplots. There was more correlation in the parameters of the same nature (π 1 , π 2 , π 3 ) and (ππ π,1 , ππ π,2 ), but the scatterplots do not suggest problems in estimation (multiple maxima, bimodality, non-identifiability). Figure S3 : Bivariate scatterplot of posterior distributions for main model parameters. Parameter distributions are from the best fitting model. The label R.base, R.before, R.after, p[sp]base and p[sp]peak indicate in the order πΉπ , πΉπ , πΉπ and πππ,π , πππ,π . Seasonal model Geographical variation R psp both psp and R original 12 206 0 psp seasonal 52 220 42 R seasonal 14, 213 2 both psp and R seasonal 64 228 53 Temporal variation Table S5: Fit of a model with seasonal change in R and psp. The results show the difference in DIC computed with the best fitting model reported in the manuscript, averaged over 20 imputed epidemics. Sensitivity analysis Sensitivity on other modelling assumptions We explored the impact of arbitrary modelling choices on the distribution of the parameters. There were five variations in addition to the original model described below (changes from the original are summarized in parentheses): - wide : change in π and ππ π on a 10-week-long period (8-week-long) centered around 201417. - peak 2014-16 : change in π and ππ π in the two periods 2014-12 to 16 and 2014-17 to 201420 (change in the two periods 2014-13 to 17 and 2014-18 to 2014-21). - narrow step: change in π on a 4-week-long period from 2014-13 to 2014-17 (two changes from 2014-13 to 17 and 18-21). - large step: change in π on a 8-week-long period from 2014-13 to 2014-21 (two changes from 2014-13 to 17 and 18-21). - distrib: taking into account the distribution of the generation time over 3 weeks (generation time was 1 week). The distribution of the generation time was obtained from [1] (lognormal distribution (meanlog=1.9, sdlog=0.49)) and discretized over weeks (week1: 81%, week 2: 17%, week3 : 2%) compared to (week1: 100%) in the original model. Table S6 summarizes the fits, as measured by DIC. There was little difference in overall goodness of fit: in all cases, a model allowing changes in both psp and R had better fit than others. More precisely, the shift by one week of the peak time had no effect on the overall DIC (model “peak 2014-16”), and other changes led to small increases in the DIC. Model original peak 2014-16 narrow step wide large step distrib Constant 2169 2147 2147 2147 2147 2098 both 1959 1959 2097 1958 1958 1974 Constant 2148 2129 2132 2131 2131 2088 both 1882 1879 1882 1878 1882 1901 Constant 2088 2088 2109 2083 2109 2012 both 1888 1887 1921 1897 1921 1878 constant 2072 2054 2096 2071 2096 2006 both 1837 1837 1867 1850 1867 1849 Geographical none psp R both psp and R Table S6 : DICs of the models in the sensitivity analysis. We consider here the four level of geographical heterogeneity and only two levels of temporal heterogeneity (constant and both). The distributions obtained in these models are shown in Figure S5 for each region (we multiplied the parameter value by the region-specific modifier). Overall, there were no major changes in the posterior distributions. As expected, in “large step” model, the R estimates were lower during the period of epidemic increase (π 2 ) and higher during the decreasing part of the epidemic wave (π 3 ). In the “distrib” model, ππ π, parameters decreased while π parameters increased above the original, but the changes were small in magnitude. Figure S5: Sensitivity analysis: parameter distributions according to the five model formulations considered in the analysis. Parameter distributions are presented for all regions under study. The label R.base, R.before, R.after, p[sp]base and p[sp]peak indicate in the order πΉπ , πΉπ , πΉπ and πππ,π , πππ,π . References 1. Assiri A, McGeer A, Perl TM, Price CS, Al Rabeaah AA, Cummings DA et al. Hospital outbreak of Middle East respiratory syndrome coronavirus. New England Journal of Medicine 2013; 369(5):407-16. http://dx.doi.org/10.1056/NEJMoa1306742.