Characterizing Infectious Disease Outbreaks: Traditional and Novel Approaches Laura F White 15 October 2013 2009 Influenza A H1N1 Pandemic • H1N1 pandemic first noticed in February in Mexico. • Large outbreak early on in La Gloria-a small village outside of Mexico City. • Studied extensively in the first report on H1N1 (Fraser, Donelly et al. “Pandemic potential of a strain of Influenza (H1N1): early findings”, Science Express, 11 May 2009.) Example-H1N1 Outbreak Example-H1N1 Outbreak Edgar Hernandez (four years old): first confirmed case Cases reported in La Gloria Quantitative Issues • How do we determine how fast the disease is spreading? – Reproductive number, serial interval • How do we determine how severe the disease is? – Attack rate, case fatality ratio – A topic for another talk! • How do we determine what interventions will be most effective? – Mathematical modeling, network models, etc. – Estimates of severity and transmission by age group Importance of parameter estimates • Good information leads to good policy. • School closure is expensive – Important to determine if it will really help. • If R0 < 2, some estimate that Influenza can be controlled. • Information on R0 and the serial interval can give a good picture of how a disease might spread. Source: Fraser et al (2004) Impact of the serial interval Some of the challenges in infectious diseases • Dependency in the data. – Chain of infection. • Undetected cases. – Asymptomatic, but still infectious. – Unable to detect with existing surveillance. • Need to act fast with little information. Approaches to estimation • Classical: Mathematical models • Network models • Statistical approaches Simple approach • Assume exponential growth for the first part of an epidemic. • td is the doubling time of the epidemic, D is the average serial interval. Then use the following to solve for R0. td (ln 2) D / ( R0 1) • Overly simplistic and sensitive. Mathematical models SIR Model Susceptible Recovered Infected (Contact Rate)*(Transmission Probability)Infected 1/(duration of infectiousness) R0=(attack rate)(contact rate)(duration of infectiousness) Mathematical Models-Uses • Modeling vaccination programs • Determining optimal intervention strategies for halt or control an epidemic • HIV transmission routes • Estimating parameters of disease Mathematical Models: Limitations • Make a lot of assumptions. – Must plug in a lot of values in order to get estimates. • Do not allow for randomness in processes-always gives a number as the answer with no error bounds. – Stochastic epidemic model. • Can oversimplify the problem. – Challenge to achieve balance between making the model too simple and too complex. References • Hethcote – The Mathematics of Infectious Diseases. Herbert W. Hethcote. SIAM Review, Vol. 42, No. 4, 599653. Dec., 2000. • Anderson and May – Infectious Diseases of Humans: Dynamics and Control, Oxford University Press, 1992. Wallinga & Tuenis • Network based method to estimate the reproductive number each day of an epidemic. • Requires knowledge of the serial interval. • Requires that all cases have been observed and epidemic is over. • Originated to analyze SARS. American Journal of Epidemiology, 2004 = infected person Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 All possible infectors. j Day 1 p3 pt=probability of being infected by a case that appeared t days prior. Day 2 p2 Day 3 p1 p1 Day 4 Day 5 Day 6 i p1 p1 Wallinga & Teunis • If g(t) is the distribution of the serial interval, then, the relative probability that case i has been infected by case j is: pij g (ti t j ) g (t i j i tj) • The effective reproductive number for cases on day j is then: R j pij i WT - SARS White & Pagano • Statistical method, using probability models to estimate the serial interval and reproductive number. • Assume that we observe daily counts of new cases: N1 , N2 ,, NT . • Let Xij be the number of cases with symptoms on day j that were infected by a case with symptoms on day i. Statistics in Medicine, 2008 White & Pagano Method • Using this scheme, we make some probabilistic assumptions and get a likelihood equation: e t tNt L( R0 , p | N ) Nt ! t 1 T • Where t R0 min ( k ,t ) j 1 p j Nt j . • pj describes the serial interval (i.e. probability of having symptoms j days after infector). • Use numerical methods to get MLEs of Ro and p. H1N1 Example • In April the public became aware of a novel strain of Influenza that was affecting Mexico. • Fraser, Donelly et al published initial report in Science on 11 May 2009. • Estimate the reproductive number to be between 1.4 and 1.6. • Estimate the average serial interval to be 1.91 days. H1N1 Example • We obtained data from the CDC with information on each confirmed and suspected case (1368 cases) as of May 8. • 750 had a date of symptom onset. Influenza A/H1N1: Serial Interval • Spanish work estimate average serial interval to be 3.5 days, range=1-6 days. – Use contact tracing data. • Seasonal influenza (Cowling et al, 2009) – 3.6 days, SD=1.6 – From a household contact study Influenza A/H1N1: R0 estimates • • • • • Mexico: 1.3-1.4 (Cruz-Pacheco et al) Mexico: less than 2.2-3.1 (Boelle et al) Japan: 2.3 (Nishiura et al) Netherlands: less than 1 (Hahne et al) US: 1.7-1.8 (White et al) Influenza A/H1N1: USA Influenza A/H1N1: USA • Missing dates of symptom onset – All cases have report date but many lack date of symptom onset. – Calculate the distribution of time between reported date and symptom onset for those with both. – Impute a date of symptom onset for those with missing information from the observed distribution. Reporting delay distribution Other issues in the data • Imported cases – Make an adjustment in the estimation method to account for those who were known to have traveled to Mexico. • Reporting delay – The decline in cases as it gets closer to May 8 is likely due to reporting delays, rather than a true drop off in case numbers. – Augment the data at the end, using the reporting delay distribution. Augmented data Estimates in the USA • Using the White & Pagano Method with the modifications mentioned we get estimates for R0 and the serial interval in the initial outbreak in the US. Serial interval estimate Using data up to and including April 25, 2009. Using data up to and including April 27, 2009. HETEROGENEITY Heterogeneity • Variation in transmission between adults and kids, geographically, etc. • Can lead to better policy decisions – Who gets vaccinated first? – Social distancing measures that might be most effective? Overview • • • • Social mixing matrices Glass method Modification of Wallinga and Teunis Modification of White and Pagano Social mixing • To understand who is most culpable for transmission, we typically need to understand how people interact • Many approaches to this, but we choose most popular currently: social mixing matrices PolyMod study • Large European study – Belgium, Finland, Great Britain, Germany, Italy, Luxembourg, the Netherlands, and Poland • 97,904 contacts among 7,290 participants • Participants record number and nature of contacts in a diary • Contact matrices were created to describe all close contacts and separately, close contacts that involve physical touch Table 1. Mossong J, Hens N, Jit M, Beutels P, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74. doi:10.1371/journal.pmed.0050074 http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050074 Figure 1. The Mean Proportion of Contacts That Involved Physical Contact, by Duration, Frequency, and Location of Contact in All Countries Mossong J, Hens N, Jit M, Beutels P, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74. doi:10.1371/journal.pmed.0050074 http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050074 Figure 2. The Distribution by Location and by Country of (A) All Reported Contacts and (B) Physical Contacts Only Mossong J, Hens N, Jit M, Beutels P, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74. doi:10.1371/journal.pmed.0050074 http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050074 Figure 3. Smoothed Contact Matrices for Each Country Based on (A) All Reported Contacts and (B) Physical Contacts Weighted by Sampling Weights Mossong J, Hens N, Jit M, Beutels P, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74. doi:10.1371/journal.pmed.0050074 http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050074 Other studies • Similar studies have been conducted in South Africa and Vietnam • First of this nature in Netherlands (Wallinga et al, 2006) • Johnstone-Robertson et al (2011) carried out a very similar study in a South African township Approaches • Glass et al, 2011 – Estimate R for adults and children – Do not require transmission data • Modify Wallinga and Teunis method – Estimate Rt (and R0) across age groups. – Require contact information. • Moser and White, 2013 (in preparation) – Bayesian approach to the problem – Modify White & Pagano method to incorporate age contact information – Incorporate contact information as a prior distribution Approach 1: Glass et al • Modify Wallinga & Teunis and White & Pagano methods to estimate R for children and adults • Assume a form for a reproduction matrix: mCC mCA mAC mAA M= • mij describes the number of cases of type i infected by cases of type j. • Some pre-specified structure must be imposed on the matrix M must be assumed to estimate the mij. Matrix constraints Source: Glass et al, 2011 Modification of White & Pagano • Let 𝑋𝑖 ~𝑃𝑜𝑖𝑠(𝑚𝐶𝐶 𝐶𝑖 + 𝑚𝐴𝐶 𝐴𝑖 ) and 𝑌𝑖 ~𝑃𝑜𝑖𝑠(𝑚𝐶𝐴 𝐶𝑖 + 𝑚𝐴𝐴 𝐴𝑖 ) where Ai and Ci are the incidence counts for adults and children, respectively. Xi and Yi are the total number of cases infected by children and adults from day i, respectively. Modification of White & Pagano 𝜇𝐴 and 𝜇𝐶 are the expected number of adults and children on day t. Modification of White & Pagano The likelihood used is: Maximize this over the mij to obtain estimates. Applying constraints to M, creates relationships between the mij and they become identifiable. Modification of Wallinga & Teunis • Modify the pij to incorporate the probability children infect each other and adults infect each other. • 𝑞𝑎𝑖 is the probability that an individual of type ai was infected by someone of the same type (ai=C or A). Modification of Wallinga & Teunis • To estimate qA and qC, define f to be the total number of cases that are children. – Then the elements of the matrix elements in terms of f and R (the population reproductive number). – For separable matrix, 𝑞𝐶 = 1/(1 + 𝑥 2 ) and 𝑞𝐴 = 1−𝑓 2 2 𝑥 /(1 + 𝑥 ) where 𝑥 = . 𝑓 Approach 1: simulation study True RC=2.5 and true RA=1. L, M and U are 3rd, median and 98th percentiles over 100 simulations. Approach 1: Japanese influenza data Wallinga & Teunis Approach Approach 1: Japanese influenza data White & Pagano Method 𝑅𝐶 = 3.51, 3.49, 3.52 and 3.57 𝑅𝐴 = 0.34, 0.58, 0.21 and 0.37, for each matrix Heterogeneity APPROACH 2: MODIFICATION OF WALLINGA AND TEUNIS Approach 2: modification of Wallinga & Teunis • Similar to Glass et al, allow the probability of infection to be impacted by more than just distance apart in time where is the probability of a serial interval of length j-i and is a similarity measure (similar to the matrices used by Glass et al). Source: White, Archer and Pagano (submitted, 2013) Approach 2: modification of Wallinga & Teunis • Similar to Glass et al, but we do not assume any structure on a similarity matrix, D=(dij). • We use available data to define this matrix and are able to obtain estimates of Rj for a large number of age groups (or spatial locations, etc.) Similarity measures • Individuals who are “close” together are more likely to infect each other have larger similarity measures. • Can be used to address probability of infection between different geographical regions, age groups, etc. Similarity measures Similarity Matrix Use a matrix to define the similarity measure. Xij describes the amount of contact individuals in group i have with those in group j. Age group 1 Age group 2 Age group 3 Age group 1 x11 x12 x13 Age group 2 x21 x22 x23 Age group 3 x31 x32 x33 Basic similarity measures • Matrix of all 1’s: original estimator – Implies that transmission is equally likely among all individuals • Diagonal matrix: transmission only occurs within homogenous groups (no mixing) – Comparable to applying original method to each homogenous group separately • Can also use matrix that describes contact patterns Example: Pandemic Influenza In South Africa • Between 6/15/2009 and 11/23/2009 there were 12,630 confirmed cases Source: Archer et al (2009) Age Analysis • We restrict our attention to Gauteng Province (the most populous) to limit geographic effects • Use two sources of information on contact patterns between age groups: – PolyMod Study (Mossong et al, 2009) – Study in South African township (JohnstoneRobertson, 2011) JSM 2012 PolyMod contact trace matrix 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+ 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+ 1.92 0.65 0.41 0.24 0.46 0.73 0.67 0.83 0.24 0.22 0.36 0.2 0.2 0.26 0.13 0.95 6.64 1.09 0.73 0.61 0.75 0.95 1.39 0.9 0.16 0.3 0.22 0.5 0.48 0.2 0.48 1.31 6.85 1.52 0.27 0.31 0.48 0.76 1 0.69 0.32 0.44 0.27 0.41 0.33 0.33 0.34 1.03 6.71 1.58 0.73 0.42 0.56 0.85 1.16 0.7 0.3 0.2 0.48 0.63 0.45 0.3 0.22 0.93 2.59 1.49 0.75 0.63 0.77 0.87 0.88 0.61 0.53 0.37 0.33 0.79 0.66 0.44 0.74 1.29 1.83 0.97 0.71 0.74 0.85 0.88 0.87 0.67 0.74 0.33 0.97 1.07 0.62 0.5 0.88 1.19 1.67 0.89 1.02 0.91 0.92 0.61 0.76 0.63 0.27 1.02 0.98 1.26 1.09 0.76 0.95 1.53 1.5 1.32 1.09 0.83 0.69 1.02 0.96 0.2 0.55 1 1.14 0.94 0.73 0.88 0.82 1.23 1.35 1.27 0.89 0.67 0.94 0.81 0.8 0.29 0.54 0.57 0.77 0.97 0.93 0.57 0.8 1.32 1.87 0.61 0.8 0.61 0.59 0.57 0.33 0.38 0.4 0.41 0.44 0.85 0.6 0.61 0.71 0.95 0.74 1.06 0.59 0.56 0.57 0.31 0.21 0.25 0.33 0.39 0.53 0.68 0.53 0.55 0.51 0.82 1.17 0.85 0.85 0.33 0.26 0.25 0.19 0.24 0.19 0.34 0.4 0.39 0.47 0.55 0.41 0.78 0.65 0.85 0.57 0.09 0.11 0.12 0.2 0.19 0.22 0.13 0.3 0.23 0.13 0.21 0.28 0.36 0.7 0.6 0.14 0.15 0.21 0.1 0.24 0.17 0.15 0.41 0.5 0.71 0.53 0.76 0.47 0.74 1.47 Great Britain, all contacts South African township contact matrix Source: Johnstone-Robertson et al, AJE, 2011 Estimate of Rt Epidemic curve Estimate of Rt (a) All contacts involving physical touch; (b) all close contacts 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45+ 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45+ Age Group Age Group (a) All contacts involving physical touch; (b) all close contacts 800 1.0 Number of Cases 400 0.5 0.0 800 1.0 1200 1.5 1200 1.5 (a) 0 400 0.5 0 0.0 ^ R0 Estimates of R0 by age group (b) Estimates of R0 Age group N (%) R0, close contacts R0, all physical contacts 0-4 484 (8.73) 0.94 (0.91-0.97) 0.74 (0.72-0.76) 5-9 927 (16.72) 1.20 (1.17-1.24) 1.29 (1.25-1.33) 10-14 1150 (20.75) 1.53 (1.49-1.58) 1.47 (1.44-1.51) 15-19 1026 (18.52) 1.36 (1.32-1.40) 1.47 (1.42-1.50) 20-24 556 (10.03) 1.06 (1.03-1.09) 1.03 (1.01-1.06) 25-29 389 (7.02) 0.98 (0.94-1.01) 0.97 (0.94-1.01) 30-34 229 (4.13) 0.92 (0.88-0.94) 0.86 (0.82-0.88) 35-39 246 (4.44) 0.85 (0.82-0.88) 0.75 (0.82-0.78) 40-44 171 (3.09) 0.86 (0.83-0.90) 0.83 (0.80-0.87) 45+ 363 (6.55) 0.79 (0.75-0.85) 0.75 (0.71-0.81) R0 by age group, by country used (a) results for all close contacts and (b) for contacts involving physical touch Estimates of R0 depending on contact matrix used Overall R0 R̂0 Method Homogenous mixing Overall 1.28 (1.26-1.31) Contact matrix used All close contacts Contacts with physical touch South Africa 1.27 (1.25-1.31) 1.27 (1.25-1.31) Belgium 1.26 (1.24-1.31) 1.27 (1.24-1.31) Finland 1.27 (1.25-1.32) 1.27 (1.25-1.32) Great Britain 1.27 (1.25-1.32) 1.27 (1.25-1.31) Germany 1.27 (1.25-1.32) 1.27 (1.25-1.32) Italy 1.27 (1.25-1.31) 1.27 (1.24-1.31) Luxembourg 1.27 (1.25-1.32) 1.27 (1.25-1.32) Netherlands 1.27 (1.25-1.32) 1.27 (1.25-1.32) Poland 1.27 (1.25-1.31) 1.27 (1.24-1.31) Heterogeneity APPROACH 3: MODIFICATION OF WHITE AND PAGANO Moser and White • Modification of the White and Pagano method to estimate R0 and incorporate heterogeneity in the population • Revise the likelihood to incorporate heterogeneity in the reproductive numbers • Consider the scenario where we look at adults and kids only (2 group scenario) – RA and RC are the reproductive numbers for adults and children, respectively Moser and White • Reparameterize the problem to allow for inclusion of contact matrix information – qhg is the probability that individual of type h has contact with individual of type g – Example: RCA= qCA*RC – RC=RCA+RCC Day 0: Day 1: Day 2: Day 3: Day 4: ……. Day T: N0C N0A Derivation of Likelihood Function Two Group Example N1C = XC0C1 + XA0C1 N1A = XC0A1 + XA0A1 N2C = XC0C2 + XA0C2 + XC1C2 + XA1C2 N2A = XC0A2 + XA0A2 + XC1A2 + XA1A2 N3C = XC0C3 + XA0C3 + XC1C3 + XA1C3 + XC2C3 + XA2C3 N3A = XC0A3 + XA0A3 + XC1A3 + XA1A3 + XC2A3 + XA2A3 N4C = + XC1C4 + XA1C4 + XC2C4 + XA2C4 + XC3C4 + XA3C4 N4A = + XC1A4 + XA1A4 + XC2A4 + XA2A4 + XC3A4 + XA3A4 …………. NTC NTA Day 0: Day 1: Day 2: Day 3: Day 4: ……. Day T: N0C 3 Day Serial Interval N0A XC0A2 = Adults infected on day 2 by a child from day 0 N1C = XC0C1 + XA0C1 N1A = XC0A1 + XA0A1 N2C = XC0C2 + XA0C2 + XC1C2 + XA1C2 N2A = XC0A2 + XA0A2 + XC1A2 + XA1A2 N3C = XC0C3 + XA0C3 + XC1C3 + XA1C3 + XC2C3 + XA2C3 N3A = XC0A3 + XA0A3 + XC1A3 + XA1A3 + XC2A3 + XA2A3 N4C = + XC1C4 + XA1C4 + XC2C4 + XA2C4 + XC3C4 + XA3C4 N4A = + XC1A4 + XA1A4 + XC2A4 + XA2A4 + XC3A4 + XA3A4 …………. NTC NTA Day 0: Day 1: Day 2: Day 3: Day 4: ……. Day T: N0C RC XC0A2 = Adults infected on day 2 by a child from day 0 RA N0A N1C = XC0C1 + XA0C1 N1A = XC0A1 + XA RC RA 0A1 N2C = XC0C2 + XA0C2 + XC1C2 + XA1C2 N2A = XC0A2 + XA0A2 + XC1A2 + XA1A2 RC RA N3C = XC0C3 + XA0C3 + XC1C3 + XA1C3 + XC2C3 + XA2C3 RC N3A = XC0A3 + XA0A3 + XC1A3 + XA1A3 + XC2A3 + XA2A3 RA N4C = + XC1C4 + XA1C4 + XC2C4 + XA2C4 + XC3C4 + XA3C4 N4A = + XC1A4 + XA1A4 + XC2A4 + XA2A4 + XC3A4 + XA3A4 …………. NTC NTA Is Mixing Assortative? Day 0: N0C N0A RCC Day 1: Day 2: Day 3: Day 4: ……. Day T: RCA XC0A2 = Adults infected on day 2 by a child from day 0 N1C = XC0C1 N1A = XC0A1 N2C = XC0C2 N2A = XC0A2 N3C = XC0C3 N3A = XC0A3 N4C = N4A = …………. NTC NTA Is Mixing Assortative? Updated Likelihood • The likelihood can be written as: where Ntg is the number of cases on day t from group g. • How do we maximize this likelihood? Estimation • We could try a frequentist approach, but there are issues with identifiability – We have four parameters to estimate and, similar to Glass et al, would need to impose constraints on the q’s in order to get estimates. • Alternative approach: MCMC with prior information – Use contact frequency matrices from survey data to inform the priors of the q’s Epidemic curves by age in South Africa Results from South Africa pandemic Age = 20 𝑅𝐶 N (%) 1589 (66) Age = 18 𝑅𝐴 801 (34) 𝑅𝐶 1340 (56) Age = 15 𝑅𝐴 1050 (44) 𝑅𝐶 901 (38) 𝑅𝐴 1489 (62) Mossong Prior 1 Prior 2 Prior 3 1.34 (1.11- 1.51 (1.121.32 1.58) 1.91) (1.12-1.53) 1.37 (1.16- 1.45 (1.071.36 1.60) 1.85) (1.15-1.57) 1.67 (1.47- 0.94 (0.671.55 1.88) 1.23) (1.35-1.76) 1.45 (1.21-1.71) 1.41 (1.17-1.67) 1.20 (0.98-1.43) 1.47 (1.07-1.90) 1.51 (1.10-1.93) 1.61 (1.22-2.00) 1.37 (1.11-1.64) 1.34 (1.08-1.61) 1.28 (1.04-1.54) 1.33 1.52 (1.131.31 (1.11-1.57) 1.92) (1.11-1.53) 1.37 (1.16- 1.45 (1.081.35 1.60) 1.83) (1.14-1.56) 1.65 (1.47- 0.97 (0.711.54 1.85) 1.25) (1.35-1.74) 1.46 (1.21-1.72) 1.43 (1.18-1.69) 1.20 (1.00-1.41) 1.44 (1.04-1.88) 1.48 (1.09-1.90) 1.58 (1.23-1.93) 1.38 (1.11-1.65) 1.36 (1.10-1.62) 1.30 (1.09-1.53) JohnstonRobertson Prior 1 Prior 2 Prior 3 Glass Separable 1.78 0.90 1.65 1.29 1.06 1.75 HiC2C 1.73 1.00 1.63 1.31 0.87 1.86 Contact Freq. 1.90 0.64 1.91 0.95 1.63 1.41 Proportional 1.89 0.67 1.97 0.87 1.92 1.24 Issues • Reporting differences across age groups – How might this impact our results? – Example: kids are much more likely to show up at the clinic and have their cases reported. Adults are more likely to stay home. • Non-uniformity of contact patterns globally? • Other issues? Final thoughts • Quantitative methods are essential to informing policy decisions in a disease outbreak • Issues we want to address: – – – – Severity Transmissibility Heterogeneity Uncertainty • Challenges with dependency in the data, unobserved events, etc. Thanks! • Funding source: National Institute Of General Medical Sciences of the National Institutes of Health under Award Number U54GM088558.