Online supplementary material The Markov chain representation of the compartmental model implies a matrix T, called the transition matrix, for which the entries are the rates of flow from one compartment to another. The interpretation is that flux occurs among the compartments within the model according to the various transition probabilities. These correspond to the rates of mortality, infection, reinfection and progression to disease. By assuming constant population size, extended to constant compartmental size, construction of the Markov transition matrix becomes possible. For this matrix the element at the intersection of a row and a column is the probability of a transition from the source (row) compartment to the destination (column) compartment (Table 1). The transition period is chosen to be equal approximately to the delay to diagnosis and initiation of therapy, and hence all members of P, the compartment of infectious cases, either die or start therapy during a transition. Thus the transition from P to Lp, the compartment of recovered cases i.e. latent cases, has the probability 1 - µp where µp denotes the mortality rate expressed as a probability per unit time step. The probability of the transition from the mortality compartment to the live births compartment is shown as 1.0. This is a technical detail to ensure that recruitment to the live births compartment exactly equals the losses from all of the compartments due to mortality. The patients who commence therapy are regarded, for simplicity, as being part of the latent compartment, Ls. A separate compartment could have been allocated specifically for this group but would merely serve to hold these cases for 1 one Markov time step. So far as the process is concerned this would be an idle compartment and is therefore not included as such. Using standard matrix notation, if at some time the distribution of the population is given by the row vector u1 then one time step later, the distribution vector is given by u2 = u1T. After many consecutive time steps, it may be the case that a steady state, v, is achieved, where the relative proportions of the population in the various compartments no longer undergo further changes. This state v is then given by v = lim u1Tn, and generally a matrix, T∞, called the long term transition matrix, can be calculated so that v = u1T∞. The long-term transition matrix, T∞, can indeed be calculated for the compartmental model under consideration (because T is regular) and has the property that v = uT∞ where v is the unique steady state distribution and u is an arbitrarily selected initial distribution. This v corresponds exactly to the determination of the constant compartment sizes found in the long-term steady state and has the form b m s im i p lp ls en i* ex The values of the elements of v are the proportions of the population in each of the compartments and correspond respectively to the total number of live births, deaths, susceptibles, immune, first-time infected cases, infectious, first-time latent, secondtime latent, endogenous, repeat infected cases and exogenous cases. Thus v is an appropriate representation of the epidemiological conditions prevalent in a region where TB has been endemic for an extended period and such that the compartment 2 sizes have achieved approximate stability. We refer to such conditions as stationary epidemiological conditions. It will be noted that stationary epidemiological conditions, that is constant compartment sizes, are equivalent to zero rates of changes of compartmental sizes. Therefore, instead of having to solve a system of differential equations to describe the epidemic, it is only necessary to solve a system of linear algebraic equations. The Markov process paradigm as described above is just another, and more convenient, way of setting out this system of linear equations and obtaining a solution. To calibrate the model to achieve the best fit to the data the following is taken into consideration: Three parameters are used in the construction of the reinfection/incidence graphs namely the reinfection factor (ρ), the rate of progression to disease (p), and the mortality rates (μ, as appropriate). Figure 3 (text) shows that ρ is the parameter that determines the shape of the graph. The slope of the graph is governed by p (figure 1) while the vertical position of the graph is fixed by the mortality rate (figure 2). These three parameters thus describe independent attributes of the graph and the values that produce the best fit to the empirical data are therefore uniquely determined. 3 Table 1: The Markov transition matrix T. Destination compartment Source compartment B M S Im I P Lp Ls En I* Ex B 0 1 0 0 0 0 0 0 0 0 0 M 0 0 µ µ µ µp µ µ µp µi µp S 1-n 0 1-µ-λ 0 0 0 0 0 0 0 0 Im n 0 0 1-µ 0 0 0 0 0 0 0 I 0 0 λ 0 1-p-k-µ 0 0 0 0 0 0 P 0 0 0 0 p 1-µp - rp 0 0 0 0 0 Lp 0 0 0 0 k 0 1-µ-a 0 0 0 0 Ls 0 0 0 0 0 rp 0 1-µ-ρλ-c re 0 re* En 0 0 0 0 0 0 a c 1-µp-re 0 0 I* 0 0 0 0 0 0 0 ρλ 0 1-µi-gp 0 Ex 0 0 0 0 0 0 0 0 0 gp 1-µp-re* Notes: The symbols p and λ denote the rate of progress to disease and the rate of infection respectively. µ, with appropriate subscripts denote the various mortality rates. A sensitivity analysis shows that without losing significant accuracy one global value for µ can be used for all compartments. The reinfection factor is denoted by ρ. g is a factor that modifies the rate of progress to disease after a reinfection. Since the model is designed in a way that in one time step all people in infectious compartments either die or recover, the rate for the transition P → P: 1 - µp - rp is equal to 0. This means rp = 1 - µp. By the same reasoning re = 1 - µp and re* = 1 - µp. Table 2: Default parameter values Parameter n µ Value 0.025 0.005 µp 0.05 µi 0.005 λ 0.0001 p k k* a, c ρ g 0.02 0.01 0.01 0.001 4 1 Note: The default values are estimates (Vynnycky & Fine 1997;Vynnycky & Fine 2000) Method of implementing the model in Excel For a given set of values of the parameters, the matrix T is set out in Excel. Using the matrix multiplication feature of Excel, T can be multiplied by itself to produce T 2. Subsequently T2n-1 for n = 3, 4 ,5… is obtained by reiteration using: T2n-1= Tn Tn-1 4 where the seed is T3 = T2 T1. As n increases, the point is reached eventually where Tn Tn-1 is constant for further values of n. T∞ denotes this matrix Tn Tn-1. The rows of this matrix are all identical and the steady state distribution vector is equal to such a row: This means that for the example in Tables 3 and 4 the rate of reinfection as a proportion of all infections is: i*/(i+i*) = 0.00034 / (0.00034 + 0.00378) = 8.3 %. The incidence is (0.000037 + 0.000025 + 0.000003) * 100 000 per 100 000, i.e. 6.5 per 100 000 (per annum). Table 3: An example of a typical T∞ B M S Im I P Lp Ls En I* Ex B 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 M 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 S 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 Im 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 I 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 P 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 Lp 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 Ls 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 En 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 I* 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 Ex 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 Note: The values in each row show the proportion of the population in each compartment. Table 4: The steady state distribution vector corresponding to T∞ B B M S Im I P Lp Ls En I* Ex 0.004954 0.004954 0.947111 0.024771 0.002706 0.000054 0.001082 0.014059 0.000022 0.000281 0.000006 Notes: The values in each row show the proportion of the population in each compartment. The proportion of disease due to reinfection is Ex/(Ex+En +P) = 0.000006/(0.000006+0.000022+0.000054) = 7.3% which matches closely the proportion seen in Holland (de Boer et al. 2003), and corresponds to an ARI of 0.0001 . The simulations lead to robust results that are relatively insensitive to variations in the model parameters (Table 5). In particular, simulations were performed with 5 a range of values (0.005 to 0.05) for the mortality rate of infectious cases together with a fixed mortality rate (0.005) for non-infectious cases. These simulations all produced the same steady state. This means that the simplification of a global mortality rate in place of different mortality rates for the various compartments is an acceptable simplification. Table 5: Sensitivity of the slope of the regression line for proportion of reinfected to log(incidence) with respect to model parameters. Standard Modified Change in value value slope (%) 0.005 0.0051 -2.2 μ 0.005 0.0051 -1.26 μi 0.05 0.051 -0.004 μp 0.05 0.051 0.65 p λ 0.0001 0.000102 0.05 k 0.01 0.0102 1.03 a, c 0.001 0.00102 -1.1 Notes: μ, μ i, μp are the mortality rates for susceptible and latent compartment members, the mortality rate specifically for the compartments of infected and reinfected cases and the mortality rate for infectious cases respectively. A 2% increase in each parameter in turn results in at most a 2.2 % change in the slope. This is negligible compared to data uncertainty. Parameter The infection, re-infection and relapse rates In figure 1 the following transitions, together with the corresponding infection or reinfection rates are shown: S → I, (λ) ; Ls → I*, (ρ.λ) and Lp → I, (b - rate not shown as it is incorporated into the counter-flow k). Similarly there are transitions due to relapse: Ls → En, (c) ; Lp → En, (a) . The model is investigated by varying the input parameter λ and is fitted to data by using a suitable choice of ρ. The remaining infection/relapse rates a and c can be independently chosen. The impact of these rates is best investigated by a sensitivity analysis on the parameters a and c (Table 5). 6 Reinfection and p-values Reinfection cases as a fraction of all cases (%) 70 60 50 p = 0.01 40 p = 0.02 30 p = 0.04 20 10 0 1 10 100 1000 10000 Incidence (per annum per 100 000) Figure 1: The reinfection rate as a function of incidence for three different p values and reinfection-factor = 7. Reinfection rates and mortality rates Reinfection cases as a fraction of all cases (%) 90 80 70 60 m = 0.005 50 m = 0.01 40 m = 0.015 30 m = 0.02 20 10 0 1 10 100 1000 10000 Incidence (per annum per 100 000) Figure 2: The reinfection rate as a function of incidence for four different mortality rates and reinfection-factor, ρ = 7. 7 References de Boer, A. S., Borgdorff, M. W., Vynnycky, E., Sebek, M. M., & van, S. D. 2003, "Exogenous re-infection as a cause of recurrent tuberculosis in a low-incidence area", Int.J Tuberc.Lung Dis., vol. 7, no. 2, pp. 145-152. Vynnycky, E. & Fine, P. E. 1997, "The natural history of tuberculosis: the implications of age-dependent risks of disease and the role of reinfection", Epidemiol.Infect, vol. 119, no. 2, pp. 183-201. Vynnycky, E. & Fine, P. E. 2000, "Lifetime risks, incubation period, and serial interval of tuberculosis", Am J Epidemiol., vol. 152, no. 3, pp. 247-263. 8