Statistical Modeling of SARS Epidemic Propagation via Branching Processes V.Kamalesh, V.Kuralmani, Goh Li Ping, Qian Long, Fu Xiuju, Terence Hung Software & Computing Programme Institute of High Performance Computing “To succeed in containing SARS in Singapore, everyone must cooperate and play his part.” - Prime Minister Goh Chok Tong History of Branching Process The study of branching processes originated with a mathematical puzzle posed by Sir Francis Galton, the noted cousin of Charles Darwin, in the Educational Times of 1 April 1873. Branching process may be viewed as a mathematical representation of the evolution of a population wherein the reproduction and death are subject to the laws of chance. Galton’s Puzzle A large nation, of whom we will only concern ourselves with the adult males, N in number, and who each bear separate surnames, colonise a district. Their law of population is such that, in each generation, P0 per cent of the adult males have no male children who reach adult life; P1 have only one such male child; P2 have 2, and so on up to P5 who have 5. Find (1) What proportion of the surnames will have become extinct after r generations; and (2) how many instances there will be of the same surname being held by m persons A solution was proffered by the Rev. Henry William Watson, and from his 1874 joint paper with Galton , the mathematical tool of branching emerged, the Galton-Watson Process. Examples of BP Propagation of human and animal species and genes Nuclear chain reaction Electronic cascade phenomena Epidemic Models Branching Process X0 1 X0 1 Y2 2 2 X1 3 Y1 Y3 2 X2 9 X 3 26 Y1 3 Y2 3 Y3 3 Y4 3 Y5 3 Y6 3 Y7 3 Y8 3 Y9 3 Bienayme-Galton-Watson BP Bienayme-Galton-Watson BP can be thought of as a stochastic model of an evolving population of particles or individuals. It starts at time 0 with Z(0) particles, each of which splits into a random number of offspring that constitute the first generation, and so on. The number of “offspring” produced by a single “parent” particle at any time is independent of the history of the process, and of other particles existing at the present. The archetypal branching Process (Galton-Watson): Discrete reproduction periods (‘generations’; no overlap or parents equivalent to offspring) 1 type of individuals, with identical offspring distribution They do not affect each other’s reproduction Distributions of offspring numbers do not change in time BP as an epidemic Model Branching processes can be adopted as models for the spread of epidemic diseases. Infections directly due to an infective are the offspring One can approximate the infective population during the early stages of the epidemic by a branching process Minor epidemic: Extinction of the branching process Major epidemic: Non-extinction of the branching process Specification & standard details A Galton-Watson process {xn; n=0,1,2,…} is a Markov chain defined on a probability space (Ω,Γ,P) with state space Δ={0,1,…} and it has the representation x0 = N, some specified positive integer, x1 = ξ1 + ξ2 + … + ξx0 x2 = ξx0+1 + ξx0+2 + …+ ξx0+x1 . . . xn = ξx0+x1+…+xn-2+1 + …+ ξx0+x1+…+xn-1 and xn = 0 if xn-1 = 0, n ≥ 1 where ξi, i=1,2,… are independent and identically (iid) distributed non-negative integer valued rv on (Ω,Γ,P) and their common probability law is given by P(ξi = k) = pk, k = 0,1,…; ∑ pk = 1 The Model A Galton-Watson process is a Markov chain {X(n); n ≥ 0} on the non-negative integers, where for n ≥ 0 X(n+1) = ξ(n+1,1) + … ξ (n+1,X(n)) if X(n) ≥ 0 =0 if X(n) = 0 and {ξ (n,r); r,n ≥1} are independent random variables, identically distributed like ξ (say) and with other additional assumptions. Also E(ξ i) = m Offspring mean (m) Since the offspring mean of a branching process indicates almost sure extinction or possible explosion of a population, there is considerable interest in knowing the value of this criticality parameter (growth rate parameter, basic reproductive rate) The offspring mean (m) is also known as the infection rate and its estimation is of great interest The problem of estimation of ‘m’ arises when we deal with the problem of determining vaccination policies aimed at preventing major epidemics Estimation of offspring mean Galton-Watson BP is classified as: Sub-critical if m < 1 (always extinction, finite expected time to extinction) Critical if m = 1 (always extinction, infinite expected time to extinction) Super-critical if m > 1 (probability of extinction smaller than 1) Offspring mean indicates the (almost) sure extinction or possible explosion of a population One of the basic problems of the statistics of a G-W process is to find a ‘good’ estimator for m Estimation methods: MLE, Least-squares, Ratio, Moment type, Bayes, etc. Probability of extinction A parameter of special interest is the following: ∞ ∞ q = P(U ∩{xk = 0} = P(xn → 0) = P(E) (say) n=1 k=n This is referred to as the probability of extinction of a G-W process with x0 = 1 It can be verified that: q = 1 if m ≤ 1, and q < 1 if m > 1 Estimation of q is relevant when one is dealing with the recognition of a new mutation in a genetic population Immigration Process Estimation of the offspring mean ‘m’ breaks down in the sub-critical case ( when 0 < m < 1), in view of extinction being almost certain in such situations. The introduction of an immigration process into the system facilitates the estimation of the offspring and immigration mean under the sub-critical case. The analysis of a G-W process with immigration has some interesting conclusions: for example, if the mean of the offspring distribution is > 1, immigration makes very little difference to the eventual behaviour of the process. BP with immigration The simple subcritical G-W process X = {X(t); t=0,1,2,…} with immigration, has the specification that X(0) is a nonnegative integer-valued random variable, and for t≥1, X(t) = z(t,1) + …z(t,X(t-1)) + Y(t) = Y(t) if X(t-1) > 0 if X(t-1) = 0 and {z(t,r); r,n ≥1} are independent random variables, identically distributed like z (say) and with other additional assumptions. Y(t) is the immigration component Data Source The data was taken from the following website: http://sarstracker.blogspot.com/ (source: Straits Times 12 April 2003). After careful study of the data, we transformed it into a format which could be used to fit the Galton-Watson branching process. Singapore SARS Data Singapore SARS Data SARS Tree Diagram Super Spreader 1 Esther Mok Esther’s Mom Esther’s Dad Pastor Simon Hospital staff Nurse in ward 5A Friend Hospital staff Friend Friend Hospital staff Patient in same ward Friend Her Dad daughter Patient in same ward Esther’s Grdma Hospital staff Esther’ s Uncle son Hospital staff Hospital Staff Hospital Staff Hospital Staff Patient in same ward Friend Hosp Staff Patient Sister Mom Patient in same ward daught er Hosp Staff Dr Leong Indon Maid Hospital Staff son Hosp Staff Hosp Staff Hosp Staff Hosp Staff Husband Patient 8A daughter Hosp Staff Mdm Paiinah Hospital staff Hospital Staff Hospital staff Visitor Hospital Staff Boy 5 fr Pat Schhouse Mom of 3 in ICU Visitor Super Spreader 4 Hospital Staff Hospital staff Patient in same ward Hospital staff Hospital Staff Hospital Staff Hospital Staff Patient 8A Nurse Mom daughter son Super Spreader 5 Nurse Visitor Dr Lim from TTSH Hosp Staff Visitor Hospital staff Patient Patient Patient Patient Patient Hospital staff Dr Ong Hok Su Mom Hosp Staff Patient Patient Mdm Painah sis Patient Hospital staff Patient Hosp Staff Patient Heath care staff Hosp Staff Heath care staff Mdm Painah Bro Heath care staff Mdm Painah sis Heath care staff Heath care staff Heath care staff Health Health Health Health Health Health Health Health Health Health Health Health Visitor Visitor Health Health Visitor Health Visitor Visitor care care care care care care care care care care care care care care care staff staff staff staff staff staff staff staff staff staff staff staff staff staff staff Hosp Mdm Mdm Staff Painah Painah mom sis Hosp Staff Hosp Staff Hosp Staff Hosp Staff Hospital staff Hospital staff Hosp Staff Hosp Staff Visitor Visitor Visitor Visitor Visitor Hosp Staff Friend Jamail ah Taxi driver son TTSH nurse PPWS PPWS Jamail ah Patient Visitor Dr Lee Kang Hoe Grd Daughter Grd Daughter Grd Daughter Patient Doctor Nurse Methodology Study the links between the SARS affected patients and identify the generation they belong to. For example, z(0) is the initial number of patients, z(1) the next generation and so on. Hence z(0) is the parent and z(1) is the offspring for the first generation. Similarly z(1) is the parent and z(2) is the offspring for the second generation The parents are the infectives and the offspring the infection Methodology (Cont.) Calculate the following probabilities: p(0) – probability of 0 person infected p(1) – probability of 1 person infected p(2) – probability of 2 persons infected p(3) – probability of 3 persons infected p(4) – probability of 4 or more persons infected (super spreader) Determine the time period and fit the Galton-Watson branching process Generation Size Z 0 =1 Z is the generation for 5 generations, Z(0) to Z(5). These have been colour-banded to show clearly the number of offspring at each point. For example Z4=17 Z 1 =25 Z 2 =36 Z 3 =72 Z 4 =17 Z 5 =6 The population size of each generation is: Z(0) =1 (1 female) 61.2% of Z(1) = 25 (14 females + 11 males) SARS Z(2) = 36 (21 females + 15 males) infected are Z(3) = 72 (46 females + 26 males) females and Z(4) = 17 (10 females + 7 males) 38.8% of Z(5) = 6 (4 females + 2 males) them are males Total = 157 (96 females + 61 males) Super Infectors Super Spreader 1 Sex No. infected directly Female 25 (14 female + 11 male) 2 Female 23 (13 female + 10 male) 3 Female 24 (18 female + 6 male) 4 Male 43 (25 female + 18 male) 5 Male 11 (4 female + 7 male) Probability Calculation p(0) – probability of 0 person infected = 0.8344 p(1) – probability of 1 person infected = 0.0927 p(2) – probability of 2 persons infected = 0.01986 p(3) – probability of 3 persons infected = 0.01986 p(4) – probability of 4 or more persons infected (super spreader) = 0.0331 The mean of the offspring distribution is 1.0331 Software To model the SARS epidemic we use a JAVA program which simulates a single-type BP and computes the extinction probabilities. In this program we specify the distribution for offspring in a BP and "Maximum generations" giving the number of generations we wish to observe the BP. The program computes and displays the probabilities that the branching process will die out by generation g, for g = 1 to Maximum Generations. Source: Written by Julian Devlin, 8/97, for the text book “Introduction to Probability”, by Charles M. Grinstead & J. Laurie Snell Probability of extinction We set the maximum generations to 30 and the results are: Extinction Probability 1 0.83400005 2 0.9530404 3 0.98533565 4 0.99529344 5 0.99847656 6 0.9995056 7 0.9998395 8 0.99994797 9 0.99998313 10 0.9999946 11 0.99999833 12 0.9999995 13 0.9999999 14 1.0 15 1.0 16 1.0 Probability of Extinction of the SARS epidemic 1.05 1 Probability Generation 0.95 0.9 0.85 0.8 0.75 1 3 5 7 9 11 Generation 13 15 17 19 Some Conclusions The probability that the SARS epidemic will eventually become extinct is 1. This is likely to happen in the 14th generation. Since this data has already encountered 5 generations, there can utmost be 9 more generations. Assuming each generation takes a maximum of 10 days, based on the given data the epidemic will last only for a maximum of 90 more days from 8 April 2003. This result is conditional upon the same environment and quarantine conditions. Other related work @ IHPC Auto-Regressive (AR) model • Assumptions Every time series data consist of both deterministic and stochastic components. The deterministic component gives rises to trends seasonal patterns and cycles. While the stochastic component causes statistical fluctuations which have a short term correlation structure. Auto-Regressive (AR) model • Methodology – Step 1: determine the maximum number of the sample data – Step 2: calculate the mean value of the sample data for previous time – Step 3: estimate the unknown parameters from historical data – Step 4: use the estimated parameters to predict future case numbers • Software – An in-house software in FORTRAN language has been developed. It is compatible with Window systems and UNIX systems Auto-Regressive (AR) model Result: two days prediction use the previous data to predict the data of two days later 200 Predicted 150 Observed 100 Two day prediction 50 0 0 10 20 30 40 Day number of patient starting from Mar 16 by two day prediction 50 Auto-Regressive (AR) model Result: three days prediction use the previous data to predict data of three days later 200 Predicted 150 Observed 100 Three day prediction 50 0 0 10 20 30 40 Day number of patient starting from Mar 16 by three days prediction 50 Future Research … A Time Series approach to the study of a Branching Process Motivation: Venkataraman,K.N (1982) A Time Series approach to the study of the simple subcritical Galton-Watson process with immigration, Adv.Appl.Prob., 14, 1-20. Let ε(t) = 0 for t<0; ε(0) = X(0); and for t≥1, ε(t) = X(t) – m X(t-1) – λ Heyde and Seneta (1972) were the first to observe that the above equation is analogous to the first-order autoregressive model for time series Vital difference: In BP ε(t) is determined by X(t) whereas in the analogous time series model X(t) will be determined in terms of ε(t) Thank you !!!