Bayesian uncertainty quantification for transmissibility of influenza and norovirus using information geometry Thomas House1,2 , Ashley Ford2 , Shiwei Lan3 , Samuel Bilson2 , Elizabeth Buckingham-Jefferey4,2 , and Mark Girolami3 1 School of Mathematics, University of Manchester, Oxford Road, Manchester, M13 9PL, UK. 2 Warwick Infectious Disease Epidemiology Research centre (WIDER), Warwick Mathematics Institute, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK. 3 Department of Statistics, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK. 4 Complexity Science Doctoral Training Centre, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK. October 8, 2015 Abstract Infectious diseases exert a large and growing burden on human health, but violate most of the assumptions of classical epidemiological statistics and hence require a mathematically sophisticated approach. In particular, the data produced by human challenge experiments, where volunteers are infected with a disease under controlled conditions and their level of infectiousness over time is estimated, have traditionally been difficult to analyse due to strong, complex correlations between parameters. Here, we show how a Bayesian approach to the inverse problem together with modern Markov chain Monte Carlo algorithms based on information geometry can overcome these difficulties and yield insights into the disease dynamics of two of the most prevalent human pathogens: influenza and norovirus. 1 Introduction Infectious diseases continue to pose a major threat to human health, with transmission dynamic models playing a key role in developing scientific understanding of their spread and informing public health policy [19]. These models typically require many parameters to make accurate predictions, which interact with data in complex, non-linear ways. It is seldom possible to perform a series of individual controlled experiments to calibrate these models, meaning that the use of multiple datasets is often necessary [2]. At the population level, infectious disease models most frequently involve a ‘compartmental’ approach in which individuals progress between different discrete disease states, usually at constant rates [22]. We note that alternatives to such a compartmental approach exist, for example use of a deterministic time-varying infectivity, or allowing for 1 non-Markovian state transitions in a stochastic context [10]. There are theoretical differences between these and compartmental models for redictive epidemic modelling, but since in practice each alternative can be approximated by a sufficiently complex compartmental model (see e.g. [29] for additional discussion of these questions) we do not consider them further here. At the individual level, there are three separate events that can be represented using different compartments: (i) an individual receiving an infectious dose of a pathogen; (ii) the individual becoming infectious and able to infect new cases; (iii) the individual showing symptoms. The times elapsed between these events are important for understanding the epidemiology of infectious diseases as well as designing control and mitigation strategies [14, 26]. Here, we consider how the rates of moving between compartments in standard epidemic models can be estimated from ‘shedding’ data generated when individuals are given an infectious dose of a virus under controlled conditions, and the level of live virus that they ‘shed’ is measured over time. The nature of these models and data means that there are not simple optimal values for the rate parameters, but instead the data essentially constrain these parameters to lie close to a non-linear curve in parameter space. We show that modern computationally intensive Bayesian methods that make use of information geometry can be used to calculate the posterior density for models of influenza and norovirus. We then use forward modelling based on this posterior knowledge to show that some epidemiological conclusions are robust under the remaining uncertainty, but others require additional information to determine. In particular, for influenza, we show that the predicted effectiveness of quarantine-type interventions is unaffected by the remaining uncertainty, but antiviral pharmaceutical interventions have a bimodal uncertainty structure. For norovirus, we show that the frequency of epidemics is predictable under the remaining uncertainty, but the severity and timing of each seasonal epidemic is not. 2 2.1 Methods Compartmental models of infectious disease In a compartmental approach, we model the state of an individual who has been infected with a pathogen at time t = 0 as an integer random variable X(t). We label the possible states after infection i, j, . . . ∈ {1, . . . , m}; while more general structures are needed for other pathogens, for both influenza and norovirus we consider the ‘linear chain’ case where an individual spends an exponentially distributed period of time with mean 1/γi in each state i < m before progressing to state i + 1, and where m is the ‘recovered’ state, corresponding to the end of the infection, from which the individual does not move. In the Supplementary Material we provide a general solution for the quantity πi (t; γ) := Pr[X(t) = i|X(0) = 1; γ] (1) and its derivatives with respect to the rates {γi }. Furthermore, we assume that an individual has ‘infection rate’ λi in state i (with λm = 0), which is proportional to the rate at which they would infect new cases in a populationlevel epidemic model. A key quantity is the expected infectiousness of an individual over 2 time, Λ(t; θ) := m ! λi πi (t; γ) , (2) i=1 where the model parameters are θ = (γ, λ) = (θa ). A key quantity in any epidemiological model of infections is the basic reproductive ratio, R0 , defined verbally as the expected number of secondary infections produced by a typical infectious individual early in the epidemic [10]. Under the simplifying but frequently made assumption of a homogeneous population, this quantity is given by " ∞ R0 (θ) = Λ(t; θ)dt . (3) t=0 Note that the constants of proportionality λ depend on the nature and strength of interactions in the population and therefore cannot be determined from measurements of individuals alone. 2.1.1 Influenza Influenza is most commonly modelled using the ‘SEEIIR’ or ‘SE2 I2 R’ framework, for example in the work that was used to inform vaccination policy during the recent H1N1 pandemic [5]. Here m = 5 and individuals spend a 2-Erlang distributed period of time with mean 1/ω in an uninfectious ‘latent’ class, and then a 2-Erlang distributed period of time with mean 1/γ in the infectious state, before recovering. To show the possible different impacts of uncertainty in parameter values, we will consider two models for delayed intervention (implemented after a time period of length d after infection) during an influenza pandemic. The first of these assumes that the intervention is ‘quarantine-like’ and completely halts transmission after administration, leading to an epidemic with reproduction number (1) Rd = " d Λ(t; θ)dt . (4) t=0 The second, however, assumes that the individual must not have progressed from the ‘latent’ to the ‘infectious’ class and is therefore similar to some models of administration of antiviral medication like Tamiflu [15, e.g.], leading to an epidemic with reproduction number (2) (5) Rd = (1 − Pr[X(d) < 3])R0 . We note from standard results in mathematical epidemiology [10], where the infection rate is β, and the mean infectious period is 1/γ, then the basic reproductive ratio is R0 = β/γ, and for an epidemic with reproduction number R the epidemic final size Z is given by the solution to the transcendental equation Z = 1 − e−RZ , (6) (1) (2) which we solve numerically for the two different reproduction numbers Rd and Rd above, a range of delays, and fitted values of the parameters (ω, γ) at a fixed value of R0 = 1.4 (as in [5]) to make a fair comparison. 3 2.1.2 Norovirus Norovirus is usually assumed to follow the ‘SEIRS’ framework [21], where after infection an individual spends an exponentially distributed period of time with mean 1/ω in a ‘latent’ class, then an exponentially distributed period of time with mean 1/γ in the infectious class. In contrast to influenza, individuals move from the ‘recovered’ class back to the ‘susceptible’ class with rate µ; this loss of immunity is a relatively slow process that does not impact on the analysis of shedding data. It does, however, influence the population level disease dynamics of norovirus, which can be described using a set of ordinary differential equations, where S(t) stands for the expected number of people who are susceptible and similarly for other compartments: dS = δN + µR − β(t)SI − δS , dt dI = ωE − γI − δI , dt dE = β(t)SI − ωE − δE , dt dR = γI − µR − δR . dt (7) We have a time-varying infection rate since this is necessary to reproduce the regular seasonality seen in real data [21], which we assume takes a sinusoidal form β(t) = β0 (1 + A sin(αt)). Since this variability is typically believed to arise from school terms, we take A = 1/3 to be close to existing empirical estimates of the impact of school closures on disease spreading [20, 12], and α can be set to 2π years−1 . The demographic rate δ is standardly set to 1/70 years−1 . From the results of [4], we have that R0 = β0 /γ; we vary this and the rate of waning immunity µ within ranges suggested by [21], and then run the model (7) to determine its long-term behaviour for different fitted values of the parameters (ω, γ). 2.2 Shedding model and data Our aim is to extract parameter estimates for compartmental epidemic models from ‘challenge’ studies in which human volunteers are infected with a pathogen, and the level of live virus they produce (or ‘shed’) is measured over time. The concentration of live virus is quantified as a ‘titre’, which is essentially an estimate of the concentration of live virus. The relationship between this quantity and transmissibility is complex, but generally agreed to be sub-linear [31, 31, 9]. We assume here that at each time point t the log titre yt is proportional to the expected intensity of infection Λ(t) plus additive Gaussian noise representing experimental error and other factors such as individual-level variability. The details are, however, different for the two datasets we consider and so we define our models separately below. 2.2.1 Influenza For influenza, we use the meta-analysis data of [8] for Influenza A H1N1, which is shown in Figure 1(i). Here, observations are made at regularly-spaced times belonging to a set T . Under our general modelling assumptions, the likelihood of observing a set of mean log-titres (yt ) amongst participants, with associated standard deviations (σt ), given the model parameters, is ! L(y, σ|θ) = N (yt ; (τ /β)Λ(t|θ), σt ) , (8) t∈T 4 where N is the normal distribution probability density function. Here, we assume that the variance at each time point is measured, as given in [8]. Since the infection rate β depends inter alia on the rate of contact between individuals, which cannot be estimated from shedding data, we rescale the infectiousness Λ meaning that our parameters for estimation are θ = (τ, ω, γ). 2.2.2 Norovirus For norovirus, we use data from the study of [3], where observations are made irregularly at times belonging to a set T . These data are shown in Figure 2(i) and since they do not aggregate in the same way as the influenza data the likelihood function is ! L(y|θ, σ) = N (yt ; (τ /β)Λ(t|θ), σ) . (9) t∈T so σ is here an additional parameter that must be estimated, and we have rescaled the infectiousness as before. This makes the norovirus parameter space dimension 4, in contrast to dimension 3 for influenza, with θ = (τ, ω, γ, σ). 2.3 2.3.1 Statistical framework The Bayesian approach to identifiability It is a long-established result that fitting of a sum of exponentials to data is complicated; in particular Acton [1] considers fitting the model y = Ae−at + Be−bt to (y, t) pairs and notes that “there are many combinations of (a, b, A, B) that will fit most exact data quite well indeed [ . . . ] and when experimental noise is thrown into the pot, the entire operation becomes hopeless.” Our compartmental models are more complex than sums of exponentials, but exhibit the same lack of a clear mode in the likelihood function. While there are various methods to address this issue in other applications (e.g. [30]), another response is (informally speaking) to consider all parameter combinations that fit well, and to investigate the epidemiological consequences of this uncertainty in parameter values. More formally, we work in a Bayesian framework, meaning that we attempt to calculate the posterior density p over parameters from the likelihood function L and the prior function f using Bayes’ rule L(y|θ)f (θ) . (10) p(θ|y) = " L(y|ϑ)f (ϑ)dϑ Given fixed data y, the measure p(θ|y)dθ is higher in more credible regions of parameter space, and can be multimodal and / or with many combinations of parameters having the same level of posterior support. 2.3.2 Markov chain Monte Carlo Typically the integral in the denominator of (10) is not tractable so we adopt the popular methodology of defining a Markov chain on parameter space whose stationary distribution 5 has probability density function p, i.e. Markov chain Monte Carlo (MCMC) [16], in particular the Metropolis-Hastings algorithm [27, 18]. This is a discrete-time Markov chain in which a change of state from θ n to θ ∗ is proposed with probability q(θ ∗ |θ n ) and the proposal is accepted with probability α=1∧ p(θ ∗ )q(θ n |θ ∗ ) . p(θ n )q(θ ∗ |θ n ) (11) The simplest approaches to MCMC are based on random walks in parameter space, for example letting q be a multivariate normal with constant covariance matrix and mean θ n . Such an approach is particularly unsuitable for sampling from the posterior densities that arise from our shedding model and as such a sophisticated statistical approach is required [6], and we shall outline alternatives below. 2.3.3 Derivative-based MCMC In the field of numerical optimisation, methods such as gradient descent that make use of first-order derivatives of the function to be optimised are popular. Significant care must be taken when extending these to stochastic algorithms such as MCMC, but there are two popular methods that make use of first order derivatives of L := ln(Lf ) (since we use uninformative priors throughout we will tend to call this the log-likelihood despite its inclusion of the prior). We use the following notation: ∂a L := ∂L ; ∂θ a ∂L = (∂a L) . (12) There are then two main families of derivative-based MCMC algorithms that we consider. (i) MALA: The first algorithm family starts with the Langevin equation 1 dθ = ∂L dt + dW . 2 (13) From [32], we know that this stochastic differential equation has stationary distribution p under relatively weak assumptions. The Metropolis-adjusted Langevin algorithm (MALA) uses the Euler approximation to (13) as a proposal, which we detail further below. (ii) HMC: The second family of algorithms starts from a system of ordinary differential equations dv dθ =v, = ∂L . (14) dt dt The randomness in the proposals is generated by picking the starting value of v, by default as a vector of standard normal random variables. MCMC algorithms based on Hamilton’s equations (14) are called Hybrid [11] or Hamiltonian [28] Monte Carlo (HMC). 2.3.4 Geometric concepts in MCMC While both MALA and HMC remove some of the inefficiencies of random walks through use of local gradients, they are not particularly efficient for the curved ‘boomerang’-shaped posteriors that we see for the shedding models and data defined above. To address this 6 issue, recent work has made progress through use of concepts from differential geometry [17]. This allows distances between parameter values to be modified by use of a metric matrix G; explicitly, the infinitesimal distance between θ and θ + dθ is: ds(θ, dθ) = !" a Gab dθ dθ a,b b #1/2 . (15) The matrix G can be any Riemannian metric matrix, and conceptually speaking this means that even the most complex posterior can be efficiently sampled with a choice of metric that brings all high-density regions of parameter space sufficiently close to each other. In practice, it is not simple to optimise the metric for a particular model, but a generally well-motivated choice is the Fisher-Rao metric $ Gab = ∂a L∂b L p(θ|y) dy , (16) as suggested by [17]. Calculations of this quantity for the models under consideration are given in the Supplementary Material. We used two different geometric algorithms depending on the features of the posterior. (i) SMMALA: The simplified manifold Metropolis-adjusted Langevin Algorithm (SMMALA) was introduced in [17] and shown to be competitive in terms of computational effort in several applied contexts by [7, 23]. In this approach the proposal distribution is q(θ ∗ |θ n ) = N (θ ∗ ; θ n + (ϵ/2)G−1 ∂L, ϵG−1 ) , (17) with standard MALA recovered if we set G = . We used SMMALA to sample from the norovirus posterior, which has one mode but is strongly correlated with variable local correlation structure. (ii) WHLMC: The idea behind Wormhole Lagrangian Monte Carlo (WHLMC) is that for a multimodal posterior, a metric can be defined that dramatically reduces the distance between modes, and a modified form of the dynamics (14) can exploit this proximity. Full details of the algorithm are highly technical and are given in the papers that first introduced it [24, 25]. 3 3.1 Results Geometry of viral shedding data Figures 1 and 2 show that the Fisher-Rao metric and associated geometry generally correctly resolves the difficulties associated with our highly correlated posterior distributions. This allows us to quantify our uncertainty appropriately in epidemiological parameters relating to the duration of the latent and infectious periods of influenza and norovirus. 3.2 Influenza parameters and antiviral treatment For influenza the credible ranges of individual parameters are close to typical values in the literature – [5], for example, considers scenarios with ω ∈ [0.5, 10] and γ ∈ [0.5, 2.5]. 7 More importantly, however, the bimodal and highly correlated nature of the posterior distribution means that for some models of antiviral action it is not possible to make firm predictions based on parameter values from challenge studies. Figure 3 shows in plot (i) of a quarantine-like intervention that is always effective after a delay, the relationship between delay in antiviral administration and epidemic final size at constant R0 is predictable to within a few percentage points, while in plot (ii) of an antiviral-like intervention that is only effective if administered during the latent period, then the absolute uncertainty can be almost 50%. 3.3 Norovirus parameters and seasonality The credible ranges of individual norovirus parameters are also close to typical values in the literature; e.g. [21] takes ω = 1 and γ = 0.5. Figure 4 shows that the impact of this uncertainty (for other parameter values as given above) is mainly seen in the height (with peak prevalence differing by a factor of 3 or more) and timing within the year of seasonal epidemics. For the chaotic / irregular scenario (plot (ii)) however, the overall epidemic dynamics are subject to significant uncertainty. 4 Discussion In summary, we have shown that it is possible to use modern Bayesian MCMC methods, based on derivatives of the log-likelihood and information geometry, to make a full uncertainty quantification of the epidemiological parameters that can be fitted to human challenge data. We have performed our analysis for two of the most prevalent pathogens: influenza and norovirus, and show that the epidemiological consequences of uncertainty in key parameters can often be highly significant. In order to make this progress, we have had to base our analysis on simplifying assumptions that we would hope can be relaxed in future work as the field develops. One example is that the simple likelihood functions (8) and (9) could be extended to include more general functional relationships between titre and infectiousness, most importantly to include individual-level heterogeneity. This would be particularly important if the methodology were extended to diseases such as HIV [13]. An additional assumption we have made is that the parameters we are not fitting, such as the basic reproductive ratio R0 , are fixed. This is particularly important to relax if multiple data sources are to be used in a principled way in infectious disease modelling for public health [2]. We hope that the current work has helped to make such evidence synthesis more achievable. Acknowledgements Work funded by the UK Engineering and Physical Sciences Research Council. 8 References [1] F. S. Acton. Numerical Methods that Work. Harper & Row, 1970. [2] D. D. Angelis, A. M. Presanis, P. J. Birrell, G. S. Tomba, and T. House. Four key challenges in infectious disease modelling using data from multiple sources. To appear, 2014. [3] R. L. Atmar, A. R. Opekun, M. A. Gilger, M. K. Estes, S. E. Crawford, F. H. Neill, and D. Y. Graham. Norwalk virus shedding after experimental human infection. Emerging Infectious Diseases, 14(10):15531557, 2008. [4] N. Bacaër and R. Ouifki. Growth rate and basic reproduction number for population models with a simple periodic factor. Mathematical Biosciences, 210(2):647–658, 2007. [5] M. Baguelin, A. J. V. Hoek, M. Jit, S. Flasche, P. J. White, and W. J. Edmunds. Vaccination against pandemic influenza A/H1N1v in England: A real-time economic evaluation. Vaccine, 28(12):2370–2384, 2010. [6] S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, editors. Handbook of Markov Chain Monte Carlo. CRC Press, 2011. [7] B. Calderhead and M. Girolami. Statistical analysis of nonlinear dynamical systems using differential geometric sampling methods. Interface Focus, 1(6):821–835, 2011. [8] F. Carrat, E. Vergu, N. M. Ferguson, M. Lemaitre, S. Cauchemez, S. Leach, and A. J. Valleron. Time lines of infection and disease in human influenza: A review of volunteer challenge studies. American Journal of Epidemiology, 167(7):775–785, 2008. [9] B. Charleston, B. M. Bankowski, S. Gubbins, M. E. Chase-Topping, D. Schley, R. Howey, P. V. Barnett, D. Gibson, N. D. Juleff, and M. E. J. Woolhouse. Relationship between clinical signs and transmission of an infectious disease and the implications for control. Science, 332(6030):726–729, 2011. [10] O. Diekmann, H. Heesterbeek, and T. Britton. Mathematical Tools for Understanding Infectious Diseases Dynamics. Princeton University Press, 2013. [11] S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid monte carlo. Physics Letters B, 195(2):216–222, 1987. [12] K. T. D. Eames, N. L. Tilston, and W. J. Edmunds. The impact of school holidays on the social mixing patterns of school children. Epidemics, 3(2):103–108, 2011. [13] C. Fraser, T. D. Hollingsworth, R. Chapman, F. de Wolf, and W. P. Hanage. Variation in HIV-1 set-point viral load: Epidemiological analysis and an evolutionary hypothesis. PNAS, 104(44):17441–17446, 2007. [14] C. Fraser, S. Riley, R. M. Anderson, and N. M. Ferguson. Factors that make an infectious disease outbreak controllable. PNAS, 101(16):6146–6151, 2004. [15] R. Gani, H. Hughes, D. Fleming, T. Griffin, J. Medlock, and S. Leach. Antiviral therapy and hospitalization estimates during possible influenza pandemic. Emerging Infectious Diseases, 11(9):DOI: 10.3201/eid1109.041344, 2005. 9 [16] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chapman and Hall/CRC, 1995. [17] M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society, Series B, 73(2):123–214, 2011. [18] W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109, 1970. [19] H. Heesterbeek, R. M. Anderson, V. Andreasen, S. Bansal, D. De Angelis, C. Dye, K. T. D. Eames, W. J. Edmunds, S. D. W. Frost, S. Funk, T. D. Hollingsworth, T. House, V. Isham, P. Klepac, J. Lessler, J. O. Lloyd-Smith, C. J. E. Metcalf, D. Mollison, L. Pellis, J. R. C. Pulliam, M. G. Roberts, C. Viboud, and I. N. I. I. Collaboration. Modeling infectious disease dynamics in the complex landscape of global health. Science, 347(6227), 2015. [20] N. Hens, G. Ayele, N. Goeyvaerts, M. Aerts, J. Mossong, J. Edmunds, and P. Beutels. Estimating the impact of school closure on social mixing behaviour and the transmission of close contact infections in eight European countries. BMC Infectious Diseases, 9(1):187, 2009. [21] S. K, G. M, L. J, and L. B. Duration of immunity to norovirus gastroenteritis. Emerging Infectious Diseases, 19(8):http://dx.doi.org/10.3201/eid1908.130472, 2013. [22] M. J. Keeling and P. Rohani. Modeling Infectious Diseases in Humans and Animals. Princeton University Press, New Jersey, 2007. [23] A. Kramer, B. Calderhead, and N. Radde. Hamiltonian Monte Carlo methods for efficient parameter estimation in steady state dynamical systems. BMC Bioinformatics, 15(1):253, 2014. [24] S. Lan, V. Stathopoulos, B. Shahbaba, and M. Girolami. Markov chain Monte Carlo from Lagrangian dynamics. Journal of Computational and Graphical Statistics DOI: 10.1080/10618600.2014.902764, 2014. [25] S. Lan, J. Streets, and B. Shahbaba. Wormhole hamiltonian monte carlo. AAAI Conference on Artificial Intelligence, 2014. [26] J. Lessler, N. G. Reich, R. Brookmeyer, T. M. Perl, K. E. Nelson, and D. A. Cummings. Incubation periods of acute respiratory viral infections: a systematic review. The Lancet Infectious Diseases, 9(5):291–300, 2009. [27] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6):1087–1092, 1953. [28] R. M. Neal. MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, editors, Handbook of Markov Chain Monte Carlo, chapter 5. CRC Press, 2011. [arXiv:1206.1901]. [29] L. Pellis, S. E. F. Spencer, and T. House. Real-time growth rate for general stochastic SIR epidemics on unclustered networks. To apear in Mathematical Biosciences [arXiv:1501.04824], 2015. [30] V. Pereyra and G. Scherer. Exponential Data Fitting and its Applications. Bentham eBooks, 2010. 10 [31] V. E. Pitzer, G. M. Leung, and M. Lipsitch. Estimating variability in the transmission of severe acute respiratory syndrome to household contacts in Hong Kong, China. American Journal of Epidemiology, 166(3):355–363, 2007. [32] G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 12 1996. 11 (i) (ii) 3.5 3 1 Latent rate, ω Log titre 2.5 2 1.5 0.8 0.6 0.4 1 0.2 0 0.5 4 20 0 0 2 4 6 2 40 0 8 Recovery rate, γ Transmissibility, τ Time (days) 1 0.1 0.05 1.5 1 0.5 0 0 20 30 20 10 0 0 1 2 10 20 30 3 10 Latent rate, ω 0.8 1 0.5 0 0 1 2 30 0.8 0.6 0.4 0 1 0.4 0.6 Latent rate, ω 1 2 Recovery rate, γ 2 0 20 1 1.5 Recovery rate, γ Transmissibility, τ 20 10 Transmissibility, τ 3 0.6 0.4 Recovery rate, γ 30 0.4 0.6 0 2 Recovery rate, γ 0 0.8 Transmissibility, τ Posterior Density Estimate Transmissibility, τ Transmissibility, τ Latent rate, ω 10 2 0.8 Posterior Density Estimate 0 0 Latent rate, ω 2.5 Recovery rate, γ Posterior Density Estimate (iii) 20 15 10 5 0 0.4 0.6 0.8 Latent rate, ω Figure 1: Analysis of influenza shedding data. (i) Data and empirical confidence intervals (black) against model mean and 95% credible interval (red). (ii) Posterior samples obtained using WHLMC. (iii) Diagonal plots: marginal posterior density estimates for each parameter. Bottom left plots: marginal log posterior for pairs of parameters – first-order gradients will be perpendicular to contours. Top right plots: samples (black) with visualisations of the metric – based on expectations of second-order derivatives as detailed in the main text – as red curves, centred on evenly-spaced points, whose radius is inversely proportional to the infinitesimal distance at that angle as defined in (15). 12 (i) 12 11.5 11 Log titre 10.5 10 9.5 9 8.5 8 7.5 0 5 10 15 Time(days) 1 2 Recovery rate, γ 40 20 0 0 0.5 1 Latent rate, ω 40 20 0 0.6 0.65 0.7 0.75 Variability, σ 2 0 0 0.5 1 Latent rate, ω 1 0 1 0.5 1 Recovery rate, γ 1 0.6 0.65 0.7 0.75 Variability, σ Variability, σ Latent rate, ω 0 1.5 0 0 0.7 0.6 0.5 0 50 Transmissibility, τ Variability, σ 2 0 0 50 Transmissibility, τ 0.8 0.7 0.6 0.5 1 2 Recovery rate, γ 0 1 2 Recovery rate, γ 0 1 Latent rate, ω 6 Variability, σ 0 1 0 0.5 0.8 4 2 0 0 0.5 1 1.5 Latent rate, ω 1 0.5 0 0.6 0.65 0.7 0.75 Variability, σ 0.8 0.7 0.6 0.5 Posterior Density Estimate 0 1 0 50 Transmissibility, τ Latent rate, ω 20 2 1.5 Posterior Density Estimate 40 Recovery rate, γ 0 20 40 Transmissibility, τ 0 Latent rate, ω Transmissibility, τ Transmissibility, τ 0 1 Posterior Density Estimate 0.05 2 Recovery rate, γ 0.1 Recovery rate, γ 0.15 Transmissibility, τ Posterior Density Estimate (ii) 15 10 5 0 0.6 0.8 Variability, σ Figure 2: Analysis of norovirus shedding data. (i) Data (black circles) with 67% (solid black line) and 95% (dotted black line) against model mean and 95% credible interval (red). (ii) Diagonal plots: marginal posterior density estimates for each parameter. Bottom left plots: marginal log posterior for pairs of parameters – first-order gradients will be perpendicular to contours. Top right plots: samples (black) with visualisations of the metric – based on the empirical Fisher information as detailed in the main text – as red curves, centred on evenly-spaced points, whose radius is inversely proportional to the infinitesimal distance at that angle as defined in (15). 13 (ii) Epidemic final size (percentage infected) Epidemic final size (percentage infected) (i) 50 40 30 20 10 0 0 2 4 6 8 10 Delay (days) 50 40 30 20 10 0 0 2 4 6 8 10 Delay (days) Figure 3: Interventions administered after a constant delay against influenza given parameter uncertainty. (i) Assuming that the intervention always stops transmission. (ii) Assuming that the intervention only stops transmission if the individual has not yet become infectious. 14 (i) (ii) 6 6 x 10 4 7 3.5 6 3 Prevalence Prevalence 8 5 4 3 2.5 2 1.5 2 1 1 0.5 0 0 0.5 1 1.5 x 10 0 0 2 2 4 Time (years) 6 8 10 Time (years) (iii) (iv) 6 3 6 x 10 4 x 10 3.5 2.5 Prevalence Prevalence 3 2 1.5 1 2.5 2 1.5 1 0.5 0 0 0.5 0.5 1 1.5 0 0 2 Time (years) 1 2 3 4 5 6 Time (years) Figure 4: Implications of uncertainty in norovirus shedding for long-term temporal behaviour. Deterministic ODE trajectories are shown for 50 different posterior samples with two randomly chosen ones highlighed in red and blue. (i) R0 = 1.8, TL = 6 months shows regular annual oscillations. (ii) R0 = 1.8, TL = 60 months shows irregular / chaotic behaviour. (iii) R0 = 4, TL = 6 months shows annual oscillation but not well-defined epidemics. (iv) R0 = 4, TL = 60 months shows bi-annual oscillation but with variable peak heights. 15 Supplementary Material Here we present technical and mathematical results supporting the main paper. Contents 1 Solution of the linear 1.1 Model definition . 1.2 Distinct rates . . . 1.2.1 Solution via 1.2.2 Solution via 1.2.3 Derivatives 1.3 Arbitrary rates . . 1.3.1 Solution . . 1.3.2 Derivatives compartmental model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . eigenvalue decomposition Laplace transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Fisher-Rao metric for the shedding models 3 The 3.1 3.2 3.3 3.4 1 WLMC algorithm Lagrangian Monte Carlo WLMC . . . . . . . . . Wormhole Metric . . . . Wormhole Network . . . (LMC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 4 5 6 7 9 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 13 14 14 Solution of the linear compartmental model 1.1 Model definition Consider the continuous-time Markov chain X = (X(t)|t ∈ R+ ) with state space S = + {i}m i=1 , m ∈ N, and rates γ = (γ1 , . . . , γm−1 ), γi ∈ R , ∀ i ∈ S \ {m}. For convenience we will set γm = 0. 1 2 γ1 ··· 3 γ2 γ3 1 γm−2 m m−1 γm−1 This has the following m × m generator matrix ⎛ ⎞ −γ1 ⎜ ⎟ ⎜ γ1 −γ2 ⎟ ⎜ ⎟ . ⎜ ⎟ .. M=⎜ γ2 ⎟ , ⎜ ⎟ .. ⎜ . −γm−1 ⎟ ⎝ ⎠ γm−1 0 0 (1) 0 which in component form reads ' γj (δi−1,j − δij ) if j ̸= m , Mij = 0 otherwise. (2) X has an absorbing state at i = m, a unique stationary distribution π ∗ such that [π ∗ ]i = δi,m , but it is not irreducible, ergodic or reversible (since pij (t) = 0, ∀ i ≤ j ∈ S). We now try to solve the system dπ = Mπ, dt πi (0) = δi1 , (3) for different relationships between the rates γ. 1.2 Distinct rates The solution to (3) is simpler if all of the rates are distinct, i.e. if γi ̸= γj ∀ i ̸= j ∈ S \ {m}. We will use two different methods to solve this system. 1.2.1 Solution via eigenvalue decomposition Here we start by finding the eigenvalues of M, we solve the characteristic equation det(M − λ ) = 0 (4) By noting that the matrix M − λ is lower triangular, we see that its determinant is the product of its diagonal entries and so −λ N ( (−γi − λ) = 0, =⇒ λ = 0, −γ1 , −γ2 , . . . , −γm−1 . (5) i=1 The eigenvectors equation can then be written in component form as i ) k=1 Mik [vj ]k = −γj [vj ]i , ∀ i, j ∈ S . (6) Now define a matrix A whose i-th column is the i-th right eigenvector of M so that M = AΛA−1 , where Λ is a diagonal matrix with i-th diagonal element equal to γi . Substituting for Mij as in (2) and rearranging, we have (γi − γj )Aij = γi−1 Ai−1,j 2 (7) and so since γi−1 ̸= 0, Aij = 0 for i < j. If we set Aii = 1, then by induction we have Aij = [vj ]i = ⎧ i−1 % ⎪ ⎪ ⎪ ⎨ k=j ⎪ ⎪ ⎪ ⎩ γk γk+1 − γj 1 0 i>j (8) if i = j , otherwise. and so the matrix A is also lower triangular. We then write the solution to (3) as π(t) = eMt π(0) = AeΛt A−1 π(0) = AeΛt c , (9) where c is a vector obeying Ac = π(0). Clearly, c1 = 1 and for i ̸= 1 i−1 & cj j=1 i−1 % k=j γk + ci = 0 . γk+1 − γj Let us consider the possible solution ⎧ ⎪ ⎨ i−1 1 % γj ci = ⎪ ⎩ γj − γi j=1 (10) for i = 1 , for i ̸= 1 . (11) Then for i ̸= 1, (10) becomes i−1 j−1 & % =⇒ i−1 i−1 % γk γj γk % + =0 γk − γj γk+1 − γj γj − γi j=1 j=1 k=1 k=j ⎛ ⎞ i−1 % i i−1 i−1 % % ⎜& 1 1 ⎟ ⎜ ⎟ + γj = 0 ⎝ γk − γj γk − γi ⎠ j=1 k=1 k̸=j and so i i % & j=1 k=1 k̸=j k=1 (12) j=1 1 = 0, since γi ̸= 0, ∀ i ∈ S \ {m}. γk − γj (13) Now we can show that expression (13) holds for all γi ̸= γj ∈ R, i ̸= j. Proof. Let there be a set {γi }ni=1 s.t. γi ∈ C, n ∈ N and γi ̸= γj ∀ i ̸= j. Now consider the expression n % Pi (x) = (γj − x) j=1 j̸=i Using the formula for partial fraction decomposition n & 1 1 1 = ′ Pi (x) Pi (γj ) x − γj j=1 j̸=i 3 we have that n ! j=1 j̸=i 1 γj − γi n " = = − = − =⇒ n n ! " j=1 k=1 k̸=j 1 γk − γj 1 P ′ (γ ) j=1 i j j̸=i n " 1 γi − γj n ! 1 1 γi − γj γk − γj j=1 j̸=i n ! n " j=1 k=1 j̸=i k̸=j k=1 k̸=i,j 1 γk − γj = 0 Combining this with the eigenvectors/values of M we can substitute into (9) to give πi (t) = i " Aij cj e−γj t j=1 ⎧ e−γ1 t ⎪ ⎪ ⎨ ! i−1 i i ! " 1 −γj t = γk × e ⎪ γk − γj ⎪ ⎩ k=1 j=1 k=1 for i = 1 , (14) for i ̸= 1 . k̸=j 1.2.2 Solution via Laplace transform For the more general case, there are benefits to a Laplace transform approach, which we will introduce here. We define the Laplace transform of a function f : R+ → Rd , t &→ f (t) as L{f } : C → Rd , s &→ f˜(s) such that ' ∞ ˜ L{f }(s) := f (s) = f (t)e−st dt (15) 0 Laplace transforming (3) then gives sπ̃ − π(0) = Qπ̃ . (16) If s is not an eigenvalue of Q, then the matrix s − Q is invertible and π̃ = (s − Q)−1 π(0) , so π(t) = L−1 {(s − Q)−1 π(0)}(t) . (17) We can then calculate the inverse Laplace transform using the residue theorem. Considering the explicit form of (17) for our model, if we let B(s) = s − Q, where s ̸= −γi ∀ i ∈ S, then B is invertible and we need to find B−1 π(0) where [π(0)]i = δi1 , or in component form m " −1 −1 (18) πj (0) = Bi1 Bij [B−1 π(0)]i = j=1 4 Using BB−1 = , and noting that B is lower triangular, we have that i ! −1 Bij Bj1 = δi1 (19) j=1 −1 For i = 1: B11 = 1/B11 = (s + γ1 )−1 . For i ̸= 1: i ! −1 Bij Bj1 =0 j=1 =⇒ i ! −1 (−γj δi−1,j + (s + γj )δij ) Bj1 =0 j=1 −1 =⇒ Bi1 = i " γi−1 −1 γk−1 −1 B , Bi−1,1 = s + γi s + γk 11 k=2 and so ⎧ ⎪ ⎪ ⎪ ⎨ 1 s + γ1 i−1 i [(s − Q)−1 π(0)]i = " " 1 ⎪ ⎪ γ × ⎪ k ⎩ s + γk k=1 k=1 i=1, (20) i ̸= 1 . Then finding πi (t) reduces to calculating L−1 {f˜i (s)} where f˜i (s) = i " k=1 1 . Since s + γk γi ̸= γj ∀ i ̸= j, all the poles s = −γi of f˜ are order 1, and so using the residue theorem we have L{f˜i (s)}(t) = = i ! j=1 i ! j=1 = i ! Res[f˜i (s)est , −γj ] lim (s + γj )est s→−γj e−γj t j=1 i " k=1 i " k=1 k̸=j 1 s + γk 1 , i ̸= 1 γk − γj or e−γ1 t , i = 1 , (21) and so finally πi (t) = L−1 {[(s − Q)−1 π(0)]i } = which is equivalent to (14). 1.2.3 ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ e−γ1 t i−1 " k=1 γk × i ! j=1 e−γj t i=1, i " k=1 k̸=j 1 γk − γj i ̸= 1 , (22) Derivatives Lets now take derivatives with respect to the model parameters {γi }. This can be most easily done in the Laplace domain since we can use the fact that ∂i L{f }(s) = L{∂i f }(s) , where ∂i := ∂/∂γi , 5 (23) and so ∂i f (t) = L−1 {∂i L{f }}(t) . If we let π̃ i (s) = [(s − Q)−1 π(0)]i then ⎧ ⎪ ⎪ π̃j (s) − π̃j (s) ⎪ ⎪ ⎨ γi s + γi π̃i (s) ∂i π̃j (s) = − ⎪ ⎪ ⎪ s + γi ⎪ ⎩ 0 (24) i<j , (25) i=j , i>j . Using the convolution theorem for Laplace transforms, we have L −1 % & ' t π̃j (s) π̃j (s) πj (t) πj (u)e−γi (t−u) du − = − γi s + γi γi 0 ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ j−1 j j j ⎨) γ e−γi t − γ e−γn t ( ( 1 − γi t −γi t ( 1 ⎬ 1 i n = γk × + e ⎪ γi (γi − γn ) γk − γn γi γk − γi ⎪ ⎪ ⎪ k=1 k=1 k=1 ⎩n=1 ⎭ n̸=i k̸=i k̸=n = j−1 ( k=1 γk × j ) n=1 n̸=i γn (e−γi t − e−γn t ) + te−γi t γi (γi − γn ) where we used the relationship proved in §1.2.1. Thus ⎧ −te−γ1 t ⎪ ⎪ ⎪ .( i i−1 i−1 ⎪ ) ( ⎪ 1 e−γi t − e−γn t ⎪ −γi t ⎪ + te γ × ⎪ k ⎪ ⎪ γi − γn γk − γn ⎪ n=1 ⎨ k=1 k=1 k̸=n ∂i πj (t) = .( j j−1 j ⎪ ) ( γn (e−γi t − e−γn t ) 1 ⎪ −γ t ⎪ i ⎪ γ × + te k ⎪ ⎪ γi (γi − γn ) γ − γn ⎪ ⎪ n=1 k=1 k=1 k ⎪ ⎪ n̸ = i k̸ = n ⎪ ⎩ 0 .( j k=1 k̸=n 1 γk − γn i=j=1, i = j ̸= 1 , (26) i<j , i>j , which can be checked by computing the derivative directly from (14). 1.3 Arbitrary rates +1 Consider a general, pure birth chain with m = N + 1 states {I}N I=1 , and M distinct M parameters {γi }i=1 , defined by the following (N + 1) × (N + 1) generator matrix Q, where 6 γi ̸= γj ∈ R+ ∀ i ̸= j ∈ {1, 2, . . . , M }, and n " ⎛ −γ1 ⎜ ⎜ γ1 ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ Q =⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1.3.1 #$1 .. .. . n #$2 % " . −γ1 γ1 −γ2 γ2 .. .. !M i=1 ni = N + 1. !M −1 % " i=3 #$ ni nM % " #$ % 0 . . −γ2 γ2 0 .. . .. . .. .. . . −γM γM .. .. . . −γM γM 0 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (27) Solution Let us find solutions to (3) using the Laplace transform method. Again defining B(s) = s − Q, where s ̸= −γi ∀ i ∈ {1, . . . , M }, and following the same procedure in §1.2.2, we can write down ⎧ 1k I 0 γ1 ⎪ ⎪ ⎪ γ1−1 I ̸= N + 1, mI = 1 , ⎪ ⎪ s + γ ⎪ 1 ⎪ 1k I 1n j ⎪ 0 m2 I −1 0 ⎪ ⎨ γmI γj −1 −1 I ̸= N + 1, mI ̸= 1 , γ mI [(s − Q) π(0)]I = s + γmI s + γj ⎪ j=1 ⎪ ⎪ 1n j ⎪ M 0 ⎪ 2 ⎪ γ 1 j ⎪ ⎪ I =N +1 , ⎪ ⎩ s s + γj j=1 where mI ∈ {1, 2, . . . , M } and kI ∈ {1, 2, . . . , nmI } are counting variables defined by the relationship m I −1 3 nj + kI = I , n0 := 0 . (28) j=0 7 !mI −1 Then to calculate πI (t), we notice that f˜(s) = (s + γmI )−kI j=1 (s + γj )−nj has poles at s = −γ1 , . . . , −γmI −1 , −γmI , of order n = n1 , . . . , nmI −1 , kI respectively, and so −1 L mI " {F (s)}(t) = j=1 Res[f˜(s)est , −γj ] # % I −1 nj m$ 1 (s + γ ) dnj −1 j = est lim (s + γk )−nk (nj − 1)! s→−γj dsnj −1 (s + γmI )kI j=1 k=1 # % m −1 I kI $ 1 (s + γ ) dkI −1 mI + lim est (s + γk )−nk (kI − 1)! s→−γmI dskI −1 (s + γmI )kI k=1 ⎛ ⎞ m I −1 " m I −1 " = j=1 m$ I −1 ⎟ 1 dnj −1 ⎜ est −nk ⎟ ⎜ lim (s + γ ) k ⎠ (nj − 1)! s→−γj dsnj −1 ⎝ (s + γmI )kI k=1 k̸=j dkI −1 1 lim + (kI − 1)! s→−γmI dskI −1 # est m$ I −1 (s + γk )−nk k=1 % . (29) Now consider function g(s) such that est (s + γmI )−kI m$ I −1 (s + γk )−nk k=1 k̸=j = ⎛ ⎞ m I −1 " ⎜ ⎟ exp ⎜ st − nk ln(s + γk ) − kI ln(s + γmI )⎟ ⎝ ⎠ k=1 k̸=j =: eg(s) , (30) and using a simplified form of Faá di Bruno’s formula , dp g(s) g(s) ′ ′′ (p) g (s), g (s), . . . , g (s) , e = e B p dsp (31) where Bp (x1 , x2 , . . . , xp ) is the pth -complete Bell polynomial. If we define a p × p matrix Mp (x1 , . . . , xp ) such that ⎧ 1p−i2 ⎨ j−i xj−i+1 i ≤ j , [Mp ]ij = −1 i=j+1 , ⎩ 0 otherwise, (32) then we can use the identity Bp (x1 , x2 , . . . , xp ) = det Mp (x1 , x2 , . . . , xp ) and so finally ⎫ ⎧ m$ I −1 ⎬ ⎨ st e −nj (s + γ ) = L−1 j ⎭ ⎩ (s + γmI )kI j=1 m I −1 " j=1 + m$ I −1 e−γj t det HI,j (t) (γk − γj )−nk (γmI − γj )kI (nj − 1)! k=0 k̸=j mI −1 e−γmI t det H̃I (t) $ (γk − γmI )−nk . (kI − 1)! k=1 8 (33) (34) Here HI,j (t) = 1 if nj = 1, while if nj > 1, HI,j (t) is a (nj − 1) × (nj − 1) matrix defined by ⎧ ⎛ ⎞ ⎪ ⎪ m I −1 ⎪ ( ⎟ ⎪ (nj − 1 − p)! ⎜ kI nk ⎪ ⎜ ⎟ + tδpq p ≤ q , ⎪ + ⎨ ⎠ ⎝ q−p+1 q−p+1 (nj − 1 − q)! (γj − γk ) (γj − γmI ) k=0 [HI,j ]pq (t) = k̸=j ⎪ ⎪ ⎪ ⎪ −1 p=q+1 , ⎪ ⎪ ⎩ 0 otherwise. Also, H̃(t) = 1 if kI = 1, while if kI > 1, H̃(t) is a (kI − 1) × (kI − 1) matrix defined by ⎧ mI −1 ⎪ (kI − 1 − p)! ( nk ⎪ ⎪ + tδpq p ≤ q , ⎨ (kI − 1 − q)! (γmI − γk )q−p+1 k=1 (35) [H̃I ]pq (t) = ⎪ ⎪ −1 p = q + 1 , ⎪ ⎩ 0 otherwise. This gives the overall solution ⎧ kI −1 ⎪ −γ1 t (γ1 t) ⎪ e ⎪ ⎪ (kI − 1)! ⎪ ⎪ ⎪ .n k m kI −1 γ nj e−γj t det H (t) m, ⎪ I −1 I −1 ( ⎪ γ I,j γ m ⎪ j k I ⎪ ⎪ ⎪ ⎪ (γmI − γj )kI (nj − 1)! γk − γj ⎪ k=0 ⎪ ⎨ j=1 k̸=j .n k πI (t) = I −1 kI −1 e−γmI t det H̃ (t) m, γm γk ⎪ I I ⎪ ⎪ + ⎪ ⎪ (kI − 1)! γk − γmI ⎪ ⎪ k=1 ⎪ .nk ⎪ M +1 M M +1 ⎪ , n ( e−γj t det HN +1,j (t) , ⎪ 1 ⎪ k ⎪ γk × ⎪ ⎪ (nj − 1)! γk − γj ⎪ ⎩ k=1 j=1 I ̸= N + 1, mI = 1 , I ̸= N + 1, mI ̸= 1 , I =N +1 , k=1 k̸=j (36) where we define γ0 := 1, γM +1 := 0, nM +1 := 1, mN +1 := M + 2 and kN +1 := 0 for simplicity of notation. 1.3.2 Derivatives An analytic expression also allows us to calculate derivatives in a similar way to §1.2.3. Taking a derivative of the Laplace transform, we get ⎧ nr nr ⎪ π̃I − π̃I r < mI , ⎪ ⎪ γr s + γr ⎨ kI − 1 kI ∂r π̃I = π̃I − π̃I r = mI , ⎪ ⎪ γmI s + γmI ⎪ ⎩ 0 otherwise. So we need to calculate expressions of the form L−1 {(s + γr )−1 π̃j (s)}. Rather than use the convolution theorem as before, here we apply the residue theorem multiple times. The 9 first application gives −1 L ! π̃I s + γr " = %n k I −1 $ kI −1 e−γr t det K̃ (t) m# γm γk I,r I (γmI − γr )kI nr ! γk − γr k=1 k̸=r + m I −1 & j=1 j̸=r n kI −1 γ j e−γj t det(H (t) + K (n )) m# I −1 γm I,j I,r j j I (γmI − γj )kI (γr − γj )(nj − 1)! k=0 k̸=j $ γk γk − γj %n k %n k I −1 $ kI −1 e−γmI t det(H̃ (t) + K (k )) m# γm γk I I,r I I + , (γr − γmI )(kI − 1)! γk − γmI (37) k=1 where K̃I,r (t) is a nr × nr matrix with components ⎧ ⎛ ⎞ ⎪ ⎪ m −1 I ⎪ ⎟ ⎪ (nr − p)! ⎜ & nk kI ⎪ ⎟ + tδpq p ≤ q , ⎜ ⎪ + ⎨ (nr − q)! ⎝ (γr − γk )q−p+1 (γr − γmI )q−p+1 ⎠ k=0 [K̃I,r ]pq (t) = k̸=r ⎪ ⎪ ⎪ ⎪ −1 p=q+1 , ⎪ ⎪ ⎩ 0 otherwisem and where KI,r (a) = 0 if a = 1 or if a > 1 KI,r (a) is a (a − 1) × (a − 1) matrix with components ⎧ 1 ⎨ (nj − 1 − p)! p≤q , [KI,r ]pq (nj ) = (nj − 1 − q)! (γj − γr )q−p+1 ⎩ 0 otherwise, and ⎧ 1 ⎨ (kI − 1 − p)! [KI,r ]pq (kI ) = (k − 1 − q)! (γmI − γr )q−p+1 ⎩ I 0 p≤q , otherwise. The second application of the residue theorem gives −1 L ! π̃I s + γmI " = m I −1 & j=1 + n mI −1 kI −1 γ j e−γj t det(H (t) + K γm I,j I,mI (nj )) # j I (γmI − γj )kI +1 (nj − 1)! k=0 k̸=j $ γk γk − γj %n k I −1 $ kI −1 e−γmI t det H̃ (t) m# γm γk I1 I , kI ! γk − γmI %n k (38) k=1 where H̃I1 (t) is a kI × kI matrix defined by ⎧ m I −1 & ⎪ nk ⎪ ⎪ (kI − p)! + tδpq p ≤ q , ⎨ (kI − q)! (γmI − γk )q−p+1 k=1 [H̃I1 ]pq (t) = ⎪ ⎪ −1 p=q+1 , ⎪ ⎩ 0 otherwise. 10 (39) This gives the full final expression as ⎧ −tπI (t) + tπI−1 (t) ⎪ ⎪ ⎪ m% I −1 ⎪ ⎪ kI − 1 ⎪ kI −1 ⎪ π (t) − k γ γknk × ⎪ I I mI ⎪ ⎪ γ m ⎪ I k=1 ⎪ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (n k mI −1 ' ⎪ I −1 −γj t ⎨m& ⎪ e det(HI,j (t) + KI,mI (nj )) % 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (γmI − γj )kI +1 (nj − 1)! γk − γj ⎪ ⎪ ⎪ k=0 ⎩ j=1 ⎪ ⎪ k̸ = j ⎪ ⎪ (n k ) ⎪ m% I −1 ' −γmI t ⎪ ⎪ 1 e det H̃ (t) I1 ⎪ ⎪ + ⎪ ⎪ k ! γk − γmI ⎪ I ⎪ k=1 ⎪ ⎪ m ⎪ I −1 % ⎪ nr ⎪ k −1 ⎪ I ⎪ πI (t) − nr kI γmI γknk × ⎪ ⎪ γ ⎪ ⎪ k=1 ⎧ r ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (nk ⎪ I −1 ' ⎨ e−γr t det K̃ (t) m% ⎪ ⎪ 1 I,r ⎪ ⎪ ⎪ ⎪ ⎪ (γ − γr )kI nr ! γk − γr ⎪ ⎪ ⎪ k=1 ⎩ mI ⎪ ⎨ k̸=r (n k mI −1 ' m I −1 ∂r πI (t) = & e−γj t det(HI,j (t) + KI,r (nj )) % 1 ⎪ ⎪ + ⎪ ⎪ (γmI − γj )kI (γr − γj )(nj − 1)! γk − γj ⎪ ⎪ j=1 k=0 ⎪ ⎪ k̸ = j ⎪ j̸=r ⎪ ⎪ (n k ) mI −1 ' ⎪ ⎪ e−γmI t det(H̃I (t) + KI,r (kI )) % 1 ⎪ ⎪ ⎪ + ⎪ ⎪ (γr − γmI )(kI − 1)! γk − γmI ⎪ ⎪ k=1 ⎪ ⎪ M +1 ⎪ % ⎪ nr ⎪ ⎪ γknk × π (t) − n ⎪ N +1 r ⎪ ⎪ γ r ⎪ k=1 ⎪ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (n k M +1 ' ⎪ ⎨ ⎪ det K̃N +1,r (t) % 1 ⎪ −γ ⎪ rt e ⎪ ⎪ ⎪ ⎪ nr ! γk − γr ⎪ ⎪ ⎪ k=1 ⎩ ⎪ ⎪ k̸ = r ⎪ ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ' ( M +1 M +1 ⎪ nk ⎪ ⎬ & % ⎪ det(H (t) + K (n )) 1 ⎪ N +1,j N +1,r j −γj t ⎪ + e ⎪ ⎪ ⎪ ⎪ (γr − γj )(nj − 1)! γk − γj ⎪ ⎪ j=1 ⎪ k=0 ⎭ ⎪ ⎪ k̸ = j j̸ = r ⎪ ⎩ 0 2 r = mI = 1, I ̸= N + 1 , r = mI ̸= 1, I ̸= N + 1 , r < mI , I ̸= N + 1 , I =N +1 , otherwise. (40) Fisher-Rao metric for the shedding models We first note the straightforwardly-obtained result that if f (θ) is the pdf of a normal distribution with mean µ(θ) and standard deviation σ(θ), then the Fisher-Rao metric as defined in the main text is ga,b = Ef [∂a ln(f )∂b ln(f )] = 11 2 1 ∂a µ∂b µ + 2 ∂a σ∂b σ . 2 σ σ (41) In two cases considered we have a likelihood given by products of such functions f over different times with µ(t|θ) = τ m ! πi (t, γ) =: τ π(t) , i=1 " σt σ(t) = σ (given in data) for influenza, (one of the parameters) for norovirus. (42) Given the results above for πi (t), we can then write down that for influenza the metric has components gτ,τ = 0 , gi,j = ! t∈T ! π(t) τ π(t)∂ π(t) , g = ∂ ∂i π(t) , j τ,i i σ(t)2 σ(t)2 (43) t∈T and for norovirus there are the additional elements ! 2 gσ,σ = , gσ,a̸=σ = 0 . σ(t)2 (44) t∈T For SMMALA this metric is all that is required, and has the primary benefit of introducing local second-order derivative information into the MCMC algorithm, but for WLMC we make additional use of the possibilities for reduction of global distances possible in a geometric approach. 3 3.1 The WLMC algorithm Lagrangian Monte Carlo (LMC) The geometric approach of [2] involves Hamiltonian dynamics with a discretized integrator, the generalized leapfrog method [6, 2], that requires significant numerical effort, in particular fixed-point iterations, to solve implicit equations. This step is potentially computational intensive (repeated matrix inversion of G(θ) involves O(D 2.373 ) operations in dimension D), and can be numerically unstable when certain contraction conditions are not satisfied [3]. To address this issue, Lan et al. [4] propose an explicit integrator for geometric MCMC by using the following Lagrangian dynamics, equivalent to the Euler-Lagrangian equation in Physics: dv dθ =v, = G(θ)−1 ∂L − vT Γ(θ)v , (45) dt dt where v(0) := G(θ(0))−1 p(0) ∼ N (0, G(θ(0))−1 ), and Γ(θ) are Christoffel Symbol of the second kind whose (i, j, k)-th element is Γkij = 12 gkm (∂i gmj + ∂j gim − ∂m gij ) with g km being the (k, m)-th element of G(θ)−1 . Geometrically speaking, instead of defining dynamics on cotangent bundle as in RHMC, the dynamics is now defined on tangent bundle, which is in the same spirit of [1] to avoid 12 large momentum. The following explicit integrator can then be derived: ! "−1 ! " 1 ε ε v(ℓ+ 2 ) = I + Ω(θ (ℓ) , v(ℓ) )) v(ℓ) − G(θ (ℓ) )−1 ∇θ φ(θ (ℓ) ) , 2 2 (ℓ+1) (ℓ) (ℓ+ 12 ) , θ = θ + εv "−1 ! ! " 1 1 ε ε v(ℓ+ 2 ) − G(θ (ℓ+1) )−1 ∇θ φ(θ (ℓ+1) ) , v(ℓ+1) = I + Ω(θ (ℓ+1) , v(ℓ+ 2 ) )) 2 2 (46) (47) (48) where Ω(θ (ℓ) , v(ℓ) ))kj := (v (ℓ) )i Γ(θ (ℓ) )kij . Such integrator is time reversible but not volume preserving. The acceptance probability is adjusted to have the detailed balance condition hold [4]: $ $% # $ dz(L+1) $ $ $ (49) α = min 1, exp(−E(z(L+1) ) + E(z(1) )) $ $ , $ dz(1) $ where the Jacobian determinant is $ $ $ dz(ℓ+1) $ det(I − ε/2Ω(θ (ℓ+1) , v(ℓ+1) )) det(I − ε/2Ω(θ (ℓ) , v(ℓ+1/2) )) $ $ , $= $ (ℓ) $ $ dz det(I + ε/2Ω(θ (ℓ+1) , v(ℓ+1/2) )) det(I + ε/2Ω(θ (ℓ) , v(ℓ) )) (50) and E(z) is the energy for the Lagrangian dynamics defined as: E(θ, p) = − log π(θ|D) − 1 1 log det G(θ) + vT G(θ)v . 2 2 (51) The resulting algorithm, Lagrangian Monte Carlo (LMC) is a valid exact sampler and has the same strength in exploring complex geometry as RHMC does. LMC is shown to be more efficient and stable than RHMC. For more details see [3, 4]. 3.2 WLMC When the target distribution is multi-modal, energy based sampling algorithms tend to fail as they are easily trapped in some of the modes without visiting all of them. Making proposals by numerically simulating Hamiltonian dynamics, the sampler has difficulty in passing high energy barriers (low probability regions) [7]. Compared to HMC, the geometric RHMC/LMC methods perform even worse in relation to on this issue because they are more adapted to the local geometry and more likely to be trapped in one mode. To overcome this issue, some kind of global knowledge of the distribution needs to learned and incorporated. Lan et al. [5] proposed the idea of wormhole for these geometric algorithms (HMC/RHMC/LMC) to work on multi-modal distributions. The proposed method comes in 2 parts: a distance-shortening metric and a mode-jumping mechanism. 3.3 Wormhole Metric Let θ̂ 1 and θ̂ 2 be two modes of the target distribution. We define a straight line segment, vW := θ̂ 2 − θ̂ 1 , and refer to a small neighborhood (tube) of the line segment as a wormhole. Next, we define a wormhole metric, GW (θ), in the vicinity of the wormhole. For a pair of tangent vectors u, w at θ, wormhole metric GW is defined as follows ∗ ∗ ∗ ∗ ∗ ∗ T G∗W (u, w) := ⟨u − ⟨u, vW ⟩vW , w − ⟨w, vW ⟩vW ⟩ = uT [I − vW (vW ) ]w , ∗ ∗ T ∗ ∗ T GW = G∗W + εvW (vW ) = I − (1 − ε)vW (vW ) , 13 (52) (53) ∗ = v /∥v ∥, and 0 < ε ≪ 1 is a small positive number. To see that G where vW W W W in fact shortens the distance between θ̂ 1 and θ̂2 , consider a simple case of a straight line: θ(t) = θ̂ 1 + vW t, t ∈ [0, 1]. In this case, the distance under GW is ! 1" √ T G v dt = vW ε∥vW ∥ ≪ ∥vW ∥ , dist(θ̂ 1 , θ̂ 2 ) = W W 0 which is much smaller than the Euclidean distance. Next, we define the overall metric, G, for the whole parameter space of θ as a weighted sum of the base metric G0 and the wormhole metric GW , G(θ) = (1 − m(θ))G0 (θ) + m(θ)GW , (54) where m(θ) ∈ (0, 1) is a mollifying function designed to make the wormhole metric GW influential in the vicinity of the wormhole only. 3.4 Wormhole Network For more than two modes, the above method alone could suffer from two potential shortcomings in higher dimensions. First, the effect of wormhole metric could diminish fast as the sampler leaves one mode towards another mode. Secondly, such a mechanism, which modifies the dynamics in the existing parameter space, could interfere with the native dynamics in the neighborhood of a wormhole, possibly preventing the sampler from properly exploring areas around the modes as well as some low probability regions. To address the first issue, we add an external vector field to enforce the movement between modes. More specifically, we define a vector field, f (θ, v), in terms of the position parameter θ and the velocity vector v = G(θ)−1 p as follows: ∗ ∗ ∗ ∗ f (θ, v) := exp{−V (θ)/(DF )}U (θ)⟨v, vW ⟩vW = m(θ)⟨v, vW ⟩vW , (55) with mollifier m(θ) := exp{−V (θ)/(DF )}, where D is the dimension, F > 0 is the influence factor, and V (θ) is a vicinity function indicating the Euclidean distance from the line segment vW , ∗ ∗ V (θ) := ⟨θ − θ̂ 1 , θ − θ̂ 2 ⟩ + |⟨θ − θ̂ 1 , vW ⟩||⟨θ − θ̂ 2 , vW ⟩| . (56) After adding the vector field, we modify the Hamiltonian/Lagrangian dynamics governing the evolution of θ as follows: θ̇ = v + f (θ, v) . (57) To address the second issue, we allow the wormholes to pass through an extra auxiliary dimension to avoid their interference with the existing dynamics in the given parameter space. In particular we introduce an auxiliary variable θD+1 ∼ N (0, 1) corresponding to an auxiliary dimension. We use θ̃ := (θ, θD+1 ) to denote the position parameters in the resulting D + 1 dimensional space MD × R. θD+1 can be viewed as random noise 2 to the total potential energy. Correspondingly, independent of θ and contributes 21 θD+1 we augment velocity v with one extra dimension, denoted as ṽ := (v, vD+1 ). At the end of the sampling, we project θ̃ to the original parameter space and discard θD+1 . We refer to MD × {−h} as the real world, and call MD × {+h} the mirror world. Here, h is half of the distance between the two worlds, and it should be in the same scale as the 14 &'()*'+, - /012+23(4562),782'7 ! ! . Mirror World % " #$" # Ô#$" Ô! Ô!" . ! Ô!# " Ô" # e ! " Ô" Ô!# !" !# " # Real World !# !" % &'()*'+,A e% Ô!" Figure 1: Illustrating a wormhole network connecting the real world to the mirror world (h = 1). As an example, the cylinder shows a wormhole connecting mode 5 in the real world to its mirror image. The dashed lines show two sets of wormholes. The red lines shows the wormholes when the sampler is close to mode 1 in the real world, and the magenta lines show the wormholes when the sampler is close to mode 5 in the mirror world. average distance between the modes. For most of the examples discussed here, we set h = 1. Figure 1 illustrates how the two worlds are connected by networks of wormholes. One can refer to [5] for full algorithmic details of Wormhole HMC/LMC, including the case where the modes are initially unknown. References [1] A. Beskos, F. J. Pinski, J. M. Sanz-Serna, and A. M. Stuart. Hybrid monte carlo on hilbert spaces. Stochastic Processes and their Applications, 121(10):2201–2230, 2011. [2] M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society, Series B, 73(2):123–214, 2011. [3] S. Lan. Advanced Bayesian computational methods through geometric techniques. PhD thesis, Long Beach, CA, USA. AAI3605182., 2013. [4] S. Lan, V. Stathopoulos, B. Shahbaba, and M. Girolami. Markov chain Monte Carlo from Lagrangian dynamics. Journal of Computational and Graphical Statistics DOI: 10.1080/10618600.2014.902764, 2014. [5] S. Lan, J. Streets, and B. Shahbaba. Wormhole hamiltonian monte carlo. AAAI Conference on Artificial Intelligence, 2014. [6] B. Leimkuhler and S. Reich. Simulating Hamiltonian Dynamics. Cambridge University Press, 2005. [7] C. Sminchisescu and M. Welling. Generalized darting Monte Carlo. Pattern Recognition, 44(1011):2738–2748, 2011. Semi-Supervised Learning for Visual Content Analysis and Understanding. 15