Bayesian uncertainty quantification for transmissibility of influenza and norovirus using information geometry

advertisement
Bayesian uncertainty quantification for
transmissibility of influenza and norovirus using
information geometry
Thomas House1,2 , Ashley Ford2 , Shiwei Lan3 , Samuel Bilson2 , Elizabeth
Buckingham-Jefferey4,2 , and Mark Girolami3
1
School of Mathematics, University of Manchester, Oxford Road, Manchester, M13
9PL, UK.
2
Warwick Infectious Disease Epidemiology Research centre (WIDER), Warwick
Mathematics Institute, University of Warwick, Gibbet Hill Road, Coventry, CV4
7AL, UK.
3
Department of Statistics, University of Warwick, Gibbet Hill Road, Coventry,
CV4 7AL, UK.
4
Complexity Science Doctoral Training Centre, University of Warwick, Gibbet Hill
Road, Coventry, CV4 7AL, UK.
October 8, 2015
Abstract
Infectious diseases exert a large and growing burden on human health, but violate
most of the assumptions of classical epidemiological statistics and hence require a
mathematically sophisticated approach. In particular, the data produced by human
challenge experiments, where volunteers are infected with a disease under controlled
conditions and their level of infectiousness over time is estimated, have traditionally
been difficult to analyse due to strong, complex correlations between parameters. Here,
we show how a Bayesian approach to the inverse problem together with modern Markov
chain Monte Carlo algorithms based on information geometry can overcome these
difficulties and yield insights into the disease dynamics of two of the most prevalent
human pathogens: influenza and norovirus.
1
Introduction
Infectious diseases continue to pose a major threat to human health, with transmission
dynamic models playing a key role in developing scientific understanding of their spread
and informing public health policy [19]. These models typically require many parameters
to make accurate predictions, which interact with data in complex, non-linear ways. It is
seldom possible to perform a series of individual controlled experiments to calibrate these
models, meaning that the use of multiple datasets is often necessary [2].
At the population level, infectious disease models most frequently involve a ‘compartmental’ approach in which individuals progress between different discrete disease states,
usually at constant rates [22]. We note that alternatives to such a compartmental approach exist, for example use of a deterministic time-varying infectivity, or allowing for
1
non-Markovian state transitions in a stochastic context [10]. There are theoretical differences between these and compartmental models for redictive epidemic modelling, but since
in practice each alternative can be approximated by a sufficiently complex compartmental
model (see e.g. [29] for additional discussion of these questions) we do not consider them
further here.
At the individual level, there are three separate events that can be represented using different compartments: (i) an individual receiving an infectious dose of a pathogen; (ii)
the individual becoming infectious and able to infect new cases; (iii) the individual showing symptoms. The times elapsed between these events are important for understanding
the epidemiology of infectious diseases as well as designing control and mitigation strategies [14, 26].
Here, we consider how the rates of moving between compartments in standard epidemic
models can be estimated from ‘shedding’ data generated when individuals are given an
infectious dose of a virus under controlled conditions, and the level of live virus that they
‘shed’ is measured over time. The nature of these models and data means that there
are not simple optimal values for the rate parameters, but instead the data essentially
constrain these parameters to lie close to a non-linear curve in parameter space. We
show that modern computationally intensive Bayesian methods that make use of information geometry can be used to calculate the posterior density for models of influenza and
norovirus. We then use forward modelling based on this posterior knowledge to show that
some epidemiological conclusions are robust under the remaining uncertainty, but others
require additional information to determine. In particular, for influenza, we show that
the predicted effectiveness of quarantine-type interventions is unaffected by the remaining
uncertainty, but antiviral pharmaceutical interventions have a bimodal uncertainty structure. For norovirus, we show that the frequency of epidemics is predictable under the
remaining uncertainty, but the severity and timing of each seasonal epidemic is not.
2
2.1
Methods
Compartmental models of infectious disease
In a compartmental approach, we model the state of an individual who has been infected
with a pathogen at time t = 0 as an integer random variable X(t). We label the possible
states after infection i, j, . . . ∈ {1, . . . , m}; while more general structures are needed for
other pathogens, for both influenza and norovirus we consider the ‘linear chain’ case where
an individual spends an exponentially distributed period of time with mean 1/γi in each
state i < m before progressing to state i + 1, and where m is the ‘recovered’ state,
corresponding to the end of the infection, from which the individual does not move. In
the Supplementary Material we provide a general solution for the quantity
πi (t; γ) := Pr[X(t) = i|X(0) = 1; γ]
(1)
and its derivatives with respect to the rates {γi }.
Furthermore, we assume that an individual has ‘infection rate’ λi in state i (with λm = 0),
which is proportional to the rate at which they would infect new cases in a populationlevel epidemic model. A key quantity is the expected infectiousness of an individual over
2
time,
Λ(t; θ) :=
m
!
λi πi (t; γ) ,
(2)
i=1
where the model parameters are θ = (γ, λ) = (θa ). A key quantity in any epidemiological
model of infections is the basic reproductive ratio, R0 , defined verbally as the expected
number of secondary infections produced by a typical infectious individual early in the
epidemic [10]. Under the simplifying but frequently made assumption of a homogeneous
population, this quantity is given by
" ∞
R0 (θ) =
Λ(t; θ)dt .
(3)
t=0
Note that the constants of proportionality λ depend on the nature and strength of interactions in the population and therefore cannot be determined from measurements of
individuals alone.
2.1.1
Influenza
Influenza is most commonly modelled using the ‘SEEIIR’ or ‘SE2 I2 R’ framework, for
example in the work that was used to inform vaccination policy during the recent H1N1
pandemic [5]. Here m = 5 and individuals spend a 2-Erlang distributed period of time
with mean 1/ω in an uninfectious ‘latent’ class, and then a 2-Erlang distributed period of
time with mean 1/γ in the infectious state, before recovering.
To show the possible different impacts of uncertainty in parameter values, we will consider
two models for delayed intervention (implemented after a time period of length d after
infection) during an influenza pandemic. The first of these assumes that the intervention
is ‘quarantine-like’ and completely halts transmission after administration, leading to an
epidemic with reproduction number
(1)
Rd
=
"
d
Λ(t; θ)dt .
(4)
t=0
The second, however, assumes that the individual must not have progressed from the
‘latent’ to the ‘infectious’ class and is therefore similar to some models of administration
of antiviral medication like Tamiflu [15, e.g.], leading to an epidemic with reproduction
number
(2)
(5)
Rd = (1 − Pr[X(d) < 3])R0 .
We note from standard results in mathematical epidemiology [10], where the infection rate
is β, and the mean infectious period is 1/γ, then the basic reproductive ratio is R0 = β/γ,
and for an epidemic with reproduction number R the epidemic final size Z is given by the
solution to the transcendental equation
Z = 1 − e−RZ ,
(6)
(1)
(2)
which we solve numerically for the two different reproduction numbers Rd and Rd above,
a range of delays, and fitted values of the parameters (ω, γ) at a fixed value of R0 = 1.4
(as in [5]) to make a fair comparison.
3
2.1.2
Norovirus
Norovirus is usually assumed to follow the ‘SEIRS’ framework [21], where after infection
an individual spends an exponentially distributed period of time with mean 1/ω in a
‘latent’ class, then an exponentially distributed period of time with mean 1/γ in the
infectious class. In contrast to influenza, individuals move from the ‘recovered’ class back
to the ‘susceptible’ class with rate µ; this loss of immunity is a relatively slow process
that does not impact on the analysis of shedding data. It does, however, influence the
population level disease dynamics of norovirus, which can be described using a set of
ordinary differential equations, where S(t) stands for the expected number of people who
are susceptible and similarly for other compartments:
dS
= δN + µR − β(t)SI − δS ,
dt
dI
= ωE − γI − δI ,
dt
dE
= β(t)SI − ωE − δE ,
dt
dR
= γI − µR − δR .
dt
(7)
We have a time-varying infection rate since this is necessary to reproduce the regular
seasonality seen in real data [21], which we assume takes a sinusoidal form β(t) = β0 (1 +
A sin(αt)). Since this variability is typically believed to arise from school terms, we take
A = 1/3 to be close to existing empirical estimates of the impact of school closures on
disease spreading [20, 12], and α can be set to 2π years−1 . The demographic rate δ is
standardly set to 1/70 years−1 . From the results of [4], we have that R0 = β0 /γ; we vary
this and the rate of waning immunity µ within ranges suggested by [21], and then run the
model (7) to determine its long-term behaviour for different fitted values of the parameters
(ω, γ).
2.2
Shedding model and data
Our aim is to extract parameter estimates for compartmental epidemic models from ‘challenge’ studies in which human volunteers are infected with a pathogen, and the level
of live virus they produce (or ‘shed’) is measured over time. The concentration of live
virus is quantified as a ‘titre’, which is essentially an estimate of the concentration of live
virus. The relationship between this quantity and transmissibility is complex, but generally agreed to be sub-linear [31, 31, 9]. We assume here that at each time point t the log
titre yt is proportional to the expected intensity of infection Λ(t) plus additive Gaussian
noise representing experimental error and other factors such as individual-level variability.
The details are, however, different for the two datasets we consider and so we define our
models separately below.
2.2.1
Influenza
For influenza, we use the meta-analysis data of [8] for Influenza A H1N1, which is shown
in Figure 1(i). Here, observations are made at regularly-spaced times belonging to a set
T . Under our general modelling assumptions, the likelihood of observing a set of mean
log-titres (yt ) amongst participants, with associated standard deviations (σt ), given the
model parameters, is
!
L(y, σ|θ) =
N (yt ; (τ /β)Λ(t|θ), σt ) ,
(8)
t∈T
4
where N is the normal distribution probability density function. Here, we assume that the
variance at each time point is measured, as given in [8]. Since the infection rate β depends
inter alia on the rate of contact between individuals, which cannot be estimated from
shedding data, we rescale the infectiousness Λ meaning that our parameters for estimation
are θ = (τ, ω, γ).
2.2.2
Norovirus
For norovirus, we use data from the study of [3], where observations are made irregularly
at times belonging to a set T . These data are shown in Figure 2(i) and since they do not
aggregate in the same way as the influenza data the likelihood function is
!
L(y|θ, σ) =
N (yt ; (τ /β)Λ(t|θ), σ) .
(9)
t∈T
so σ is here an additional parameter that must be estimated, and we have rescaled the infectiousness as before. This makes the norovirus parameter space dimension 4, in contrast
to dimension 3 for influenza, with θ = (τ, ω, γ, σ).
2.3
2.3.1
Statistical framework
The Bayesian approach to identifiability
It is a long-established result that fitting of a sum of exponentials to data is complicated;
in particular Acton [1] considers fitting the model y = Ae−at + Be−bt to (y, t) pairs and
notes that “there are many combinations of (a, b, A, B) that will fit most exact data quite
well indeed [ . . . ] and when experimental noise is thrown into the pot, the entire operation
becomes hopeless.”
Our compartmental models are more complex than sums of exponentials, but exhibit the
same lack of a clear mode in the likelihood function. While there are various methods to
address this issue in other applications (e.g. [30]), another response is (informally speaking)
to consider all parameter combinations that fit well, and to investigate the epidemiological
consequences of this uncertainty in parameter values.
More formally, we work in a Bayesian framework, meaning that we attempt to calculate the
posterior density p over parameters from the likelihood function L and the prior function
f using Bayes’ rule
L(y|θ)f (θ)
.
(10)
p(θ|y) = "
L(y|ϑ)f (ϑ)dϑ
Given fixed data y, the measure p(θ|y)dθ is higher in more credible regions of parameter
space, and can be multimodal and / or with many combinations of parameters having the
same level of posterior support.
2.3.2
Markov chain Monte Carlo
Typically the integral in the denominator of (10) is not tractable so we adopt the popular
methodology of defining a Markov chain on parameter space whose stationary distribution
5
has probability density function p, i.e. Markov chain Monte Carlo (MCMC) [16], in particular the Metropolis-Hastings algorithm [27, 18]. This is a discrete-time Markov chain
in which a change of state from θ n to θ ∗ is proposed with probability q(θ ∗ |θ n ) and the
proposal is accepted with probability
α=1∧
p(θ ∗ )q(θ n |θ ∗ )
.
p(θ n )q(θ ∗ |θ n )
(11)
The simplest approaches to MCMC are based on random walks in parameter space, for
example letting q be a multivariate normal with constant covariance matrix and mean
θ n . Such an approach is particularly unsuitable for sampling from the posterior densities
that arise from our shedding model and as such a sophisticated statistical approach is
required [6], and we shall outline alternatives below.
2.3.3
Derivative-based MCMC
In the field of numerical optimisation, methods such as gradient descent that make use of
first-order derivatives of the function to be optimised are popular. Significant care must
be taken when extending these to stochastic algorithms such as MCMC, but there are
two popular methods that make use of first order derivatives of L := ln(Lf ) (since we
use uninformative priors throughout we will tend to call this the log-likelihood despite its
inclusion of the prior). We use the following notation:
∂a L :=
∂L
;
∂θ a
∂L = (∂a L) .
(12)
There are then two main families of derivative-based MCMC algorithms that we consider.
(i) MALA: The first algorithm family starts with the Langevin equation
1
dθ = ∂L dt + dW .
2
(13)
From [32], we know that this stochastic differential equation has stationary distribution p
under relatively weak assumptions. The Metropolis-adjusted Langevin algorithm (MALA)
uses the Euler approximation to (13) as a proposal, which we detail further below.
(ii) HMC: The second family of algorithms starts from a system of ordinary differential
equations
dv
dθ
=v,
= ∂L .
(14)
dt
dt
The randomness in the proposals is generated by picking the starting value of v, by default
as a vector of standard normal random variables. MCMC algorithms based on Hamilton’s
equations (14) are called Hybrid [11] or Hamiltonian [28] Monte Carlo (HMC).
2.3.4
Geometric concepts in MCMC
While both MALA and HMC remove some of the inefficiencies of random walks through
use of local gradients, they are not particularly efficient for the curved ‘boomerang’-shaped
posteriors that we see for the shedding models and data defined above. To address this
6
issue, recent work has made progress through use of concepts from differential geometry [17]. This allows distances between parameter values to be modified by use of a metric
matrix G; explicitly, the infinitesimal distance between θ and θ + dθ is:
ds(θ, dθ) =
!"
a
Gab dθ dθ
a,b
b
#1/2
.
(15)
The matrix G can be any Riemannian metric matrix, and conceptually speaking this
means that even the most complex posterior can be efficiently sampled with a choice of
metric that brings all high-density regions of parameter space sufficiently close to each
other. In practice, it is not simple to optimise the metric for a particular model, but a
generally well-motivated choice is the Fisher-Rao metric
$
Gab = ∂a L∂b L p(θ|y) dy ,
(16)
as suggested by [17]. Calculations of this quantity for the models under consideration
are given in the Supplementary Material. We used two different geometric algorithms
depending on the features of the posterior.
(i) SMMALA: The simplified manifold Metropolis-adjusted Langevin Algorithm (SMMALA) was introduced in [17] and shown to be competitive in terms of computational
effort in several applied contexts by [7, 23]. In this approach the proposal distribution
is
q(θ ∗ |θ n ) = N (θ ∗ ; θ n + (ϵ/2)G−1 ∂L, ϵG−1 ) ,
(17)
with standard MALA recovered if we set G = . We used SMMALA to sample from
the norovirus posterior, which has one mode but is strongly correlated with variable local
correlation structure.
(ii) WHLMC: The idea behind Wormhole Lagrangian Monte Carlo (WHLMC) is that
for a multimodal posterior, a metric can be defined that dramatically reduces the distance
between modes, and a modified form of the dynamics (14) can exploit this proximity.
Full details of the algorithm are highly technical and are given in the papers that first
introduced it [24, 25].
3
3.1
Results
Geometry of viral shedding data
Figures 1 and 2 show that the Fisher-Rao metric and associated geometry generally correctly resolves the difficulties associated with our highly correlated posterior distributions.
This allows us to quantify our uncertainty appropriately in epidemiological parameters relating to the duration of the latent and infectious periods of influenza and norovirus.
3.2
Influenza parameters and antiviral treatment
For influenza the credible ranges of individual parameters are close to typical values in the
literature – [5], for example, considers scenarios with ω ∈ [0.5, 10] and γ ∈ [0.5, 2.5].
7
More importantly, however, the bimodal and highly correlated nature of the posterior
distribution means that for some models of antiviral action it is not possible to make firm
predictions based on parameter values from challenge studies. Figure 3 shows in plot (i) of
a quarantine-like intervention that is always effective after a delay, the relationship between
delay in antiviral administration and epidemic final size at constant R0 is predictable to
within a few percentage points, while in plot (ii) of an antiviral-like intervention that is
only effective if administered during the latent period, then the absolute uncertainty can
be almost 50%.
3.3
Norovirus parameters and seasonality
The credible ranges of individual norovirus parameters are also close to typical values in
the literature; e.g. [21] takes ω = 1 and γ = 0.5.
Figure 4 shows that the impact of this uncertainty (for other parameter values as given
above) is mainly seen in the height (with peak prevalence differing by a factor of 3 or more)
and timing within the year of seasonal epidemics. For the chaotic / irregular scenario (plot
(ii)) however, the overall epidemic dynamics are subject to significant uncertainty.
4
Discussion
In summary, we have shown that it is possible to use modern Bayesian MCMC methods, based on derivatives of the log-likelihood and information geometry, to make a full
uncertainty quantification of the epidemiological parameters that can be fitted to human
challenge data. We have performed our analysis for two of the most prevalent pathogens:
influenza and norovirus, and show that the epidemiological consequences of uncertainty in
key parameters can often be highly significant.
In order to make this progress, we have had to base our analysis on simplifying assumptions that we would hope can be relaxed in future work as the field develops. One example
is that the simple likelihood functions (8) and (9) could be extended to include more general functional relationships between titre and infectiousness, most importantly to include
individual-level heterogeneity. This would be particularly important if the methodology
were extended to diseases such as HIV [13].
An additional assumption we have made is that the parameters we are not fitting, such
as the basic reproductive ratio R0 , are fixed. This is particularly important to relax if
multiple data sources are to be used in a principled way in infectious disease modelling
for public health [2]. We hope that the current work has helped to make such evidence
synthesis more achievable.
Acknowledgements
Work funded by the UK Engineering and Physical Sciences Research Council.
8
References
[1] F. S. Acton. Numerical Methods that Work. Harper & Row, 1970.
[2] D. D. Angelis, A. M. Presanis, P. J. Birrell, G. S. Tomba, and T. House. Four key
challenges in infectious disease modelling using data from multiple sources. To appear,
2014.
[3] R. L. Atmar, A. R. Opekun, M. A. Gilger, M. K. Estes, S. E. Crawford, F. H. Neill,
and D. Y. Graham. Norwalk virus shedding after experimental human infection.
Emerging Infectious Diseases, 14(10):15531557, 2008.
[4] N. Bacaër and R. Ouifki. Growth rate and basic reproduction number for population
models with a simple periodic factor. Mathematical Biosciences, 210(2):647–658, 2007.
[5] M. Baguelin, A. J. V. Hoek, M. Jit, S. Flasche, P. J. White, and W. J. Edmunds.
Vaccination against pandemic influenza A/H1N1v in England: A real-time economic
evaluation. Vaccine, 28(12):2370–2384, 2010.
[6] S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, editors. Handbook of Markov
Chain Monte Carlo. CRC Press, 2011.
[7] B. Calderhead and M. Girolami. Statistical analysis of nonlinear dynamical systems
using differential geometric sampling methods. Interface Focus, 1(6):821–835, 2011.
[8] F. Carrat, E. Vergu, N. M. Ferguson, M. Lemaitre, S. Cauchemez, S. Leach, and
A. J. Valleron. Time lines of infection and disease in human influenza: A review
of volunteer challenge studies. American Journal of Epidemiology, 167(7):775–785,
2008.
[9] B. Charleston, B. M. Bankowski, S. Gubbins, M. E. Chase-Topping, D. Schley,
R. Howey, P. V. Barnett, D. Gibson, N. D. Juleff, and M. E. J. Woolhouse. Relationship between clinical signs and transmission of an infectious disease and the
implications for control. Science, 332(6030):726–729, 2011.
[10] O. Diekmann, H. Heesterbeek, and T. Britton. Mathematical Tools for Understanding
Infectious Diseases Dynamics. Princeton University Press, 2013.
[11] S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid monte carlo. Physics
Letters B, 195(2):216–222, 1987.
[12] K. T. D. Eames, N. L. Tilston, and W. J. Edmunds. The impact of school holidays
on the social mixing patterns of school children. Epidemics, 3(2):103–108, 2011.
[13] C. Fraser, T. D. Hollingsworth, R. Chapman, F. de Wolf, and W. P. Hanage. Variation in HIV-1 set-point viral load: Epidemiological analysis and an evolutionary
hypothesis. PNAS, 104(44):17441–17446, 2007.
[14] C. Fraser, S. Riley, R. M. Anderson, and N. M. Ferguson. Factors that make an
infectious disease outbreak controllable. PNAS, 101(16):6146–6151, 2004.
[15] R. Gani, H. Hughes, D. Fleming, T. Griffin, J. Medlock, and S. Leach. Antiviral
therapy and hospitalization estimates during possible influenza pandemic. Emerging
Infectious Diseases, 11(9):DOI: 10.3201/eid1109.041344, 2005.
9
[16] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain Monte Carlo in
Practice. Chapman and Hall/CRC, 1995.
[17] M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte
Carlo methods. Journal of the Royal Statistical Society, Series B, 73(2):123–214, 2011.
[18] W. K. Hastings. Monte Carlo sampling methods using Markov chains and their
applications. Biometrika, 57(1):97–109, 1970.
[19] H. Heesterbeek, R. M. Anderson, V. Andreasen, S. Bansal, D. De Angelis, C. Dye,
K. T. D. Eames, W. J. Edmunds, S. D. W. Frost, S. Funk, T. D. Hollingsworth,
T. House, V. Isham, P. Klepac, J. Lessler, J. O. Lloyd-Smith, C. J. E. Metcalf,
D. Mollison, L. Pellis, J. R. C. Pulliam, M. G. Roberts, C. Viboud, and I. N. I. I.
Collaboration. Modeling infectious disease dynamics in the complex landscape of
global health. Science, 347(6227), 2015.
[20] N. Hens, G. Ayele, N. Goeyvaerts, M. Aerts, J. Mossong, J. Edmunds, and P. Beutels.
Estimating the impact of school closure on social mixing behaviour and the transmission of close contact infections in eight European countries. BMC Infectious Diseases,
9(1):187, 2009.
[21] S. K, G. M, L. J, and L. B. Duration of immunity to norovirus gastroenteritis.
Emerging Infectious Diseases, 19(8):http://dx.doi.org/10.3201/eid1908.130472, 2013.
[22] M. J. Keeling and P. Rohani. Modeling Infectious Diseases in Humans and Animals.
Princeton University Press, New Jersey, 2007.
[23] A. Kramer, B. Calderhead, and N. Radde. Hamiltonian Monte Carlo methods for efficient parameter estimation in steady state dynamical systems. BMC Bioinformatics,
15(1):253, 2014.
[24] S. Lan, V. Stathopoulos, B. Shahbaba, and M. Girolami. Markov chain Monte Carlo
from Lagrangian dynamics. Journal of Computational and Graphical Statistics DOI:
10.1080/10618600.2014.902764, 2014.
[25] S. Lan, J. Streets, and B. Shahbaba. Wormhole hamiltonian monte carlo. AAAI
Conference on Artificial Intelligence, 2014.
[26] J. Lessler, N. G. Reich, R. Brookmeyer, T. M. Perl, K. E. Nelson, and D. A. Cummings. Incubation periods of acute respiratory viral infections: a systematic review.
The Lancet Infectious Diseases, 9(5):291–300, 2009.
[27] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller.
Equation of state calculations by fast computing machines. The Journal of Chemical
Physics, 21(6):1087–1092, 1953.
[28] R. M. Neal. MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. L.
Jones, and X.-L. Meng, editors, Handbook of Markov Chain Monte Carlo, chapter 5.
CRC Press, 2011. [arXiv:1206.1901].
[29] L. Pellis, S. E. F. Spencer, and T. House. Real-time growth rate for general stochastic SIR epidemics on unclustered networks. To apear in Mathematical Biosciences
[arXiv:1501.04824], 2015.
[30] V. Pereyra and G. Scherer. Exponential Data Fitting and its Applications. Bentham
eBooks, 2010.
10
[31] V. E. Pitzer, G. M. Leung, and M. Lipsitch. Estimating variability in the transmission
of severe acute respiratory syndrome to household contacts in Hong Kong, China.
American Journal of Epidemiology, 166(3):355–363, 2007.
[32] G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions
and their discrete approximations. Bernoulli, 2(4):341–363, 12 1996.
11
(i)
(ii)
3.5
3
1
Latent rate, ω
Log titre
2.5
2
1.5
0.8
0.6
0.4
1
0.2
0
0.5
4
20
0
0
2
4
6
2
40 0
8
Recovery rate, γ
Transmissibility, τ
Time (days)
1
0.1
0.05
1.5
1
0.5
0
0
20
30
20
10
0
0
1
2
10
20
30
3
10
Latent rate, ω
0.8
1
0.5
0
0
1
2
30
0.8
0.6
0.4
0
1
0.4
0.6
Latent rate, ω
1
2
Recovery rate, γ
2
0
20
1
1.5
Recovery rate, γ
Transmissibility, τ
20
10
Transmissibility, τ
3
0.6
0.4
Recovery rate, γ
30
0.4
0.6
0
2
Recovery rate, γ
0
0.8
Transmissibility, τ
Posterior Density Estimate
Transmissibility, τ
Transmissibility, τ
Latent rate, ω
10
2
0.8
Posterior Density Estimate
0
0
Latent rate, ω
2.5
Recovery rate, γ
Posterior Density Estimate
(iii)
20
15
10
5
0
0.4
0.6
0.8
Latent rate, ω
Figure 1: Analysis of influenza shedding data. (i) Data and empirical confidence intervals
(black) against model mean and 95% credible interval (red). (ii) Posterior samples obtained using WHLMC. (iii) Diagonal plots: marginal posterior density estimates for each
parameter. Bottom left plots: marginal log posterior for pairs of parameters – first-order
gradients will be perpendicular to contours. Top right plots: samples (black) with visualisations of the metric – based on expectations of second-order derivatives as detailed in
the main text – as red curves, centred on evenly-spaced points, whose radius is inversely
proportional to the infinitesimal distance at that angle as defined in (15).
12
(i)
12
11.5
11
Log titre
10.5
10
9.5
9
8.5
8
7.5
0
5
10
15
Time(days)
1
2
Recovery rate, γ
40
20
0
0
0.5
1
Latent rate, ω
40
20
0
0.6 0.65 0.7 0.75
Variability, σ
2
0
0
0.5
1
Latent rate, ω
1
0
1
0.5
1
Recovery rate, γ
1
0.6 0.65 0.7 0.75
Variability, σ
Variability, σ
Latent rate, ω
0
1.5
0
0
0.7
0.6
0.5
0
50
Transmissibility, τ
Variability, σ
2
0
0
50
Transmissibility, τ
0.8
0.7
0.6
0.5
1
2
Recovery rate, γ
0
1
2
Recovery rate, γ
0
1
Latent rate, ω
6
Variability, σ
0
1
0
0.5
0.8
4
2
0
0
0.5
1
1.5
Latent rate, ω
1
0.5
0
0.6 0.65 0.7 0.75
Variability, σ
0.8
0.7
0.6
0.5
Posterior Density
Estimate
0
1
0
50
Transmissibility, τ
Latent rate, ω
20
2
1.5
Posterior Density
Estimate
40
Recovery rate, γ
0
20
40
Transmissibility, τ
0
Latent rate, ω
Transmissibility, τ
Transmissibility, τ
0
1
Posterior Density
Estimate
0.05
2
Recovery rate, γ
0.1
Recovery rate, γ
0.15
Transmissibility, τ
Posterior Density
Estimate
(ii)
15
10
5
0
0.6
0.8
Variability, σ
Figure 2: Analysis of norovirus shedding data. (i) Data (black circles) with 67% (solid
black line) and 95% (dotted black line) against model mean and 95% credible interval (red).
(ii) Diagonal plots: marginal posterior density estimates for each parameter. Bottom
left plots: marginal log posterior for pairs of parameters – first-order gradients will be
perpendicular to contours. Top right plots: samples (black) with visualisations of the
metric – based on the empirical Fisher information as detailed in the main text – as
red curves, centred on evenly-spaced points, whose radius is inversely proportional to the
infinitesimal distance at that angle as defined in (15).
13
(ii)
Epidemic final size (percentage infected)
Epidemic final size (percentage infected)
(i)
50
40
30
20
10
0
0
2
4
6
8
10
Delay (days)
50
40
30
20
10
0
0
2
4
6
8
10
Delay (days)
Figure 3: Interventions administered after a constant delay against influenza given parameter uncertainty. (i) Assuming that the intervention always stops transmission. (ii)
Assuming that the intervention only stops transmission if the individual has not yet become infectious.
14
(i)
(ii)
6
6
x 10
4
7
3.5
6
3
Prevalence
Prevalence
8
5
4
3
2.5
2
1.5
2
1
1
0.5
0
0
0.5
1
1.5
x 10
0
0
2
2
4
Time (years)
6
8
10
Time (years)
(iii)
(iv)
6
3
6
x 10
4
x 10
3.5
2.5
Prevalence
Prevalence
3
2
1.5
1
2.5
2
1.5
1
0.5
0
0
0.5
0.5
1
1.5
0
0
2
Time (years)
1
2
3
4
5
6
Time (years)
Figure 4: Implications of uncertainty in norovirus shedding for long-term temporal behaviour. Deterministic ODE trajectories are shown for 50 different posterior samples with
two randomly chosen ones highlighed in red and blue. (i) R0 = 1.8, TL = 6 months
shows regular annual oscillations. (ii) R0 = 1.8, TL = 60 months shows irregular / chaotic
behaviour. (iii) R0 = 4, TL = 6 months shows annual oscillation but not well-defined
epidemics. (iv) R0 = 4, TL = 60 months shows bi-annual oscillation but with variable
peak heights.
15
Supplementary Material
Here we present technical and mathematical results supporting the main paper.
Contents
1 Solution of the linear
1.1 Model definition .
1.2 Distinct rates . . .
1.2.1 Solution via
1.2.2 Solution via
1.2.3 Derivatives
1.3 Arbitrary rates . .
1.3.1 Solution . .
1.3.2 Derivatives
compartmental model
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
eigenvalue decomposition
Laplace transform . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Fisher-Rao metric for the shedding models
3 The
3.1
3.2
3.3
3.4
1
WLMC algorithm
Lagrangian Monte Carlo
WLMC . . . . . . . . .
Wormhole Metric . . . .
Wormhole Network . . .
(LMC)
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
2
4
5
6
7
9
11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
13
14
14
Solution of the linear compartmental model
1.1
Model definition
Consider the continuous-time Markov chain X = (X(t)|t ∈ R+ ) with state space S =
+
{i}m
i=1 , m ∈ N, and rates γ = (γ1 , . . . , γm−1 ), γi ∈ R , ∀ i ∈ S \ {m}. For convenience we
will set γm = 0.
1
2
γ1
···
3
γ2
γ3
1
γm−2
m
m−1
γm−1
This has the following m × m generator matrix
⎛
⎞
−γ1
⎜
⎟
⎜ γ1 −γ2
⎟
⎜
⎟
.
⎜
⎟
..
M=⎜
γ2
⎟ ,
⎜
⎟
..
⎜
. −γm−1 ⎟
⎝
⎠
γm−1 0
0
(1)
0
which in component form reads
'
γj (δi−1,j − δij ) if j ̸= m ,
Mij =
0
otherwise.
(2)
X has an absorbing state at i = m, a unique stationary distribution π ∗ such that [π ∗ ]i =
δi,m , but it is not irreducible, ergodic or reversible (since pij (t) = 0, ∀ i ≤ j ∈ S).
We now try to solve the system
dπ
= Mπ,
dt
πi (0) = δi1 ,
(3)
for different relationships between the rates γ.
1.2
Distinct rates
The solution to (3) is simpler if all of the rates are distinct, i.e. if γi ̸= γj ∀ i ̸= j ∈ S \ {m}.
We will use two different methods to solve this system.
1.2.1
Solution via eigenvalue decomposition
Here we start by finding the eigenvalues of M, we solve the characteristic equation
det(M − λ ) = 0
(4)
By noting that the matrix M − λ is lower triangular, we see that its determinant is the
product of its diagonal entries and so
−λ
N
(
(−γi − λ) = 0, =⇒ λ = 0, −γ1 , −γ2 , . . . , −γm−1 .
(5)
i=1
The eigenvectors equation can then be written in component form as
i
)
k=1
Mik [vj ]k = −γj [vj ]i , ∀ i, j ∈ S .
(6)
Now define a matrix A whose i-th column is the i-th right eigenvector of M so that
M = AΛA−1 , where Λ is a diagonal matrix with i-th diagonal element equal to γi .
Substituting for Mij as in (2) and rearranging, we have
(γi − γj )Aij = γi−1 Ai−1,j
2
(7)
and so since γi−1 ̸= 0, Aij = 0 for i < j. If we set Aii = 1, then by induction we have
Aij = [vj ]i =
⎧ i−1
%
⎪
⎪
⎪
⎨
k=j
⎪
⎪
⎪
⎩
γk
γk+1 − γj
1
0
i>j
(8)
if i = j ,
otherwise.
and so the matrix A is also lower triangular. We then write the solution to (3) as
π(t) = eMt π(0) = AeΛt A−1 π(0) = AeΛt c ,
(9)
where c is a vector obeying Ac = π(0). Clearly, c1 = 1 and for i ̸= 1
i−1
&
cj
j=1
i−1
%
k=j
γk
+ ci = 0 .
γk+1 − γj
Let us consider the possible solution
⎧
⎪
⎨ i−1 1
% γj
ci =
⎪
⎩
γj − γi
j=1
(10)
for i = 1 ,
for i ̸= 1 .
(11)
Then for i ̸= 1, (10) becomes
i−1 j−1
&
%
=⇒
i−1
i−1
%
γk
γj
γk %
+
=0
γk − γj
γk+1 − γj
γj − γi
j=1
j=1 k=1
k=j
⎛
⎞
i−1 %
i
i−1
i−1
%
%
⎜&
1
1 ⎟
⎜
⎟
+
γj = 0
⎝
γk − γj
γk − γi ⎠
j=1 k=1
k̸=j
and so
i
i %
&
j=1 k=1
k̸=j
k=1
(12)
j=1
1
= 0, since γi ̸= 0, ∀ i ∈ S \ {m}.
γk − γj
(13)
Now we can show that expression (13) holds for all γi ̸= γj ∈ R, i ̸= j.
Proof. Let there be a set {γi }ni=1 s.t. γi ∈ C, n ∈ N and γi ̸= γj ∀ i ̸= j. Now consider the
expression
n
%
Pi (x) =
(γj − x)
j=1
j̸=i
Using the formula for partial fraction decomposition
n
& 1
1
1
=
′
Pi (x)
Pi (γj ) x − γj
j=1
j̸=i
3
we have that
n
!
j=1
j̸=i
1
γj − γi
n
"
=
= −
= −
=⇒
n
n !
"
j=1 k=1
k̸=j
1
γk − γj
1
P ′ (γ )
j=1 i j
j̸=i
n
"
1
γi − γj
n
!
1
1
γi − γj
γk − γj
j=1
j̸=i
n !
n
"
j=1 k=1
j̸=i k̸=j
k=1
k̸=i,j
1
γk − γj
= 0
Combining this with the eigenvectors/values of M we can substitute into (9) to give
πi (t) =
i
"
Aij cj e−γj t
j=1
⎧
e−γ1 t
⎪
⎪
⎨ !
i−1
i
i
!
"
1
−γj t
=
γk ×
e
⎪
γk − γj
⎪
⎩ k=1
j=1
k=1
for i = 1 ,
(14)
for i ̸= 1 .
k̸=j
1.2.2
Solution via Laplace transform
For the more general case, there are benefits to a Laplace transform approach, which we
will introduce here. We define the Laplace transform of a function f : R+ → Rd , t &→ f (t)
as L{f } : C → Rd , s &→ f˜(s) such that
' ∞
˜
L{f }(s) := f (s) =
f (t)e−st dt
(15)
0
Laplace transforming (3) then gives
sπ̃ − π(0) = Qπ̃ .
(16)
If s is not an eigenvalue of Q, then the matrix s − Q is invertible and
π̃ = (s − Q)−1 π(0) , so π(t) = L−1 {(s − Q)−1 π(0)}(t) .
(17)
We can then calculate the inverse Laplace transform using the residue theorem. Considering the explicit form of (17) for our model, if we let B(s) = s − Q, where s ̸= −γi ∀ i ∈ S,
then B is invertible and we need to find B−1 π(0) where [π(0)]i = δi1 , or in component
form
m
"
−1
−1
(18)
πj (0) = Bi1
Bij
[B−1 π(0)]i =
j=1
4
Using BB−1 = , and noting that B is lower triangular, we have that
i
!
−1
Bij Bj1
= δi1
(19)
j=1
−1
For i = 1: B11
= 1/B11 = (s + γ1 )−1 . For i ̸= 1:
i
!
−1
Bij Bj1
=0
j=1
=⇒
i
!
−1
(−γj δi−1,j + (s + γj )δij ) Bj1
=0
j=1
−1
=⇒ Bi1
=
i
"
γi−1 −1
γk−1 −1
B ,
Bi−1,1 =
s + γi
s + γk 11
k=2
and so
⎧
⎪
⎪
⎪
⎨
1
s + γ1
i−1
i
[(s − Q)−1 π(0)]i =
"
"
1
⎪
⎪
γ
×
⎪
k
⎩
s + γk
k=1
k=1
i=1,
(20)
i ̸= 1 .
Then finding πi (t) reduces to calculating L−1 {f˜i (s)} where f˜i (s) =
i
"
k=1
1
. Since
s + γk
γi ̸= γj ∀ i ̸= j, all the poles s = −γi of f˜ are order 1, and so using the residue theorem
we have
L{f˜i (s)}(t) =
=
i
!
j=1
i
!
j=1
=
i
!
Res[f˜i (s)est , −γj ]
lim (s + γj )est
s→−γj
e−γj t
j=1
i
"
k=1
i
"
k=1
k̸=j
1
s + γk
1
, i ̸= 1
γk − γj
or
e−γ1 t , i = 1 ,
(21)
and so finally
πi (t) = L−1 {[(s − Q)−1 π(0)]i } =
which is equivalent to (14).
1.2.3
⎧
⎪
⎪
⎨
⎪
⎪
⎩
e−γ1 t
i−1
"
k=1
γk ×
i
!
j=1
e−γj t
i=1,
i
"
k=1
k̸=j
1
γk − γj
i ̸= 1 ,
(22)
Derivatives
Lets now take derivatives with respect to the model parameters {γi }. This can be most
easily done in the Laplace domain since we can use the fact that
∂i L{f }(s) = L{∂i f }(s) , where ∂i := ∂/∂γi ,
5
(23)
and so
∂i f (t) = L−1 {∂i L{f }}(t) .
If we let π̃ i (s) = [(s − Q)−1 π(0)]i then
⎧
⎪
⎪ π̃j (s) − π̃j (s)
⎪
⎪
⎨ γi
s + γi
π̃i (s)
∂i π̃j (s) =
−
⎪
⎪
⎪
s + γi
⎪
⎩
0
(24)
i<j ,
(25)
i=j ,
i>j .
Using the convolution theorem for Laplace transforms, we have
L
−1
%
&
' t
π̃j (s)
π̃j (s)
πj (t)
πj (u)e−γi (t−u) du
−
=
−
γi
s + γi
γi
0
⎧
⎫
⎪
⎪
⎪
⎪
j−1
j
j
j
⎨) γ e−γi t − γ e−γn t (
(
1 − γi t −γi t (
1 ⎬
1
i
n
=
γk ×
+
e
⎪
γi (γi − γn )
γk − γn
γi
γk − γi ⎪
⎪
⎪
k=1
k=1
k=1
⎩n=1
⎭
n̸=i
k̸=i
k̸=n
=
j−1
(
k=1
γk ×
j )
n=1
n̸=i
γn (e−γi t − e−γn t )
+ te−γi t
γi (γi − γn )
where we used the relationship proved in §1.2.1. Thus
⎧
−te−γ1 t
⎪
⎪
⎪
.(
i
i−1
i−1
⎪
)
(
⎪
1
e−γi t − e−γn t
⎪
−γi t
⎪
+
te
γ
×
⎪
k
⎪
⎪
γi − γn
γk − γn
⎪
n=1
⎨ k=1
k=1
k̸=n
∂i πj (t) =
.(
j j−1
j
⎪
)
(
γn (e−γi t − e−γn t )
1
⎪
−γ
t
⎪
i
⎪
γ
×
+
te
k
⎪
⎪
γi (γi − γn )
γ − γn
⎪
⎪
n=1
k=1
k=1 k
⎪
⎪
n̸
=
i
k̸
=
n
⎪
⎩
0
.(
j
k=1
k̸=n
1
γk − γn
i=j=1,
i = j ̸= 1 ,
(26)
i<j ,
i>j ,
which can be checked by computing the derivative directly from (14).
1.3
Arbitrary rates
+1
Consider a general, pure birth chain with m = N + 1 states {I}N
I=1 , and M distinct
M
parameters {γi }i=1 , defined by the following (N + 1) × (N + 1) generator matrix Q, where
6
γi ̸= γj ∈ R+ ∀ i ̸= j ∈ {1, 2, . . . , M }, and
n
"
⎛
−γ1
⎜
⎜ γ1
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
Q =⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
1.3.1
#$1
..
..
.
n
#$2
% "
. −γ1
γ1 −γ2
γ2
..
..
!M
i=1 ni
= N + 1.
!M −1
% "
i=3
#$
ni
nM
% "
#$
%
0
.
. −γ2
γ2
0
..
.
..
.
..
..
.
. −γM
γM
..
..
.
. −γM
γM
0
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
(27)
Solution
Let us find solutions to (3) using the Laplace transform method. Again defining B(s) =
s − Q, where s ̸= −γi ∀ i ∈ {1, . . . , M }, and following the same procedure in §1.2.2, we
can write down
⎧
1k I
0
γ1
⎪
⎪
⎪
γ1−1
I ̸= N + 1, mI = 1 ,
⎪
⎪
s
+
γ
⎪
1
⎪
1k I
1n j
⎪ 0
m2
I −1 0
⎪
⎨
γmI
γj
−1
−1
I ̸= N + 1, mI ̸= 1 ,
γ
mI
[(s − Q) π(0)]I =
s + γmI
s + γj
⎪
j=1
⎪
⎪
1n j
⎪
M 0
⎪
2
⎪
γ
1
j
⎪
⎪
I =N +1 ,
⎪
⎩
s
s + γj
j=1
where mI ∈ {1, 2, . . . , M } and kI ∈ {1, 2, . . . , nmI } are counting variables defined by the
relationship
m
I −1
3
nj + kI = I , n0 := 0 .
(28)
j=0
7
!mI −1
Then to calculate πI (t), we notice that f˜(s) = (s + γmI )−kI j=1
(s + γj )−nj has poles
at s = −γ1 , . . . , −γmI −1 , −γmI , of order n = n1 , . . . , nmI −1 , kI respectively, and so
−1
L
mI
"
{F (s)}(t) =
j=1
Res[f˜(s)est , −γj ]
#
%
I −1
nj m$
1
(s
+
γ
)
dnj −1
j
=
est
lim
(s + γk )−nk
(nj − 1)! s→−γj dsnj −1
(s + γmI )kI
j=1
k=1
#
%
m
−1
I
kI $
1
(s
+
γ
)
dkI −1
mI
+
lim
est
(s + γk )−nk
(kI − 1)! s→−γmI dskI −1
(s + γmI )kI
k=1
⎛
⎞
m
I −1
"
m
I −1
"
=
j=1
m$
I −1
⎟
1
dnj −1 ⎜
est
−nk ⎟
⎜
lim
(s
+
γ
)
k
⎠
(nj − 1)! s→−γj dsnj −1 ⎝ (s + γmI )kI
k=1
k̸=j
dkI −1
1
lim
+
(kI − 1)! s→−γmI dskI −1
#
est
m$
I −1
(s + γk )−nk
k=1
%
.
(29)
Now consider function g(s) such that
est (s + γmI )−kI
m$
I −1
(s + γk )−nk
k=1
k̸=j
=
⎛
⎞
m
I −1
"
⎜
⎟
exp ⎜
st
−
nk ln(s + γk ) − kI ln(s + γmI )⎟
⎝
⎠
k=1
k̸=j
=: eg(s) ,
(30)
and using a simplified form of Faá di Bruno’s formula
,
dp g(s)
g(s)
′
′′
(p)
g
(s),
g
(s),
.
.
.
,
g
(s)
,
e
=
e
B
p
dsp
(31)
where Bp (x1 , x2 , . . . , xp ) is the pth -complete Bell polynomial. If we define a p × p matrix
Mp (x1 , . . . , xp ) such that
⎧ 1p−i2
⎨ j−i xj−i+1 i ≤ j ,
[Mp ]ij =
−1
i=j+1 ,
⎩
0
otherwise,
(32)
then we can use the identity
Bp (x1 , x2 , . . . , xp ) = det Mp (x1 , x2 , . . . , xp )
and so finally
⎫
⎧
m$
I −1
⎬
⎨
st
e
−nj
(s
+
γ
)
=
L−1
j
⎭
⎩ (s + γmI )kI
j=1
m
I −1
"
j=1
+
m$
I −1
e−γj t det HI,j (t)
(γk − γj )−nk
(γmI − γj )kI (nj − 1)!
k=0
k̸=j
mI −1
e−γmI t det H̃I (t) $
(γk − γmI )−nk .
(kI − 1)!
k=1
8
(33)
(34)
Here HI,j (t) = 1 if nj = 1, while if nj > 1, HI,j (t) is a (nj − 1) × (nj − 1) matrix defined
by
⎧
⎛
⎞
⎪
⎪
m
I −1
⎪
(
⎟
⎪
(nj − 1 − p)! ⎜
kI
nk
⎪
⎜
⎟ + tδpq p ≤ q ,
⎪
+
⎨
⎠
⎝
q−p+1
q−p+1
(nj − 1 − q)!
(γj − γk )
(γj − γmI )
k=0
[HI,j ]pq (t) =
k̸=j
⎪
⎪
⎪
⎪
−1
p=q+1 ,
⎪
⎪
⎩
0
otherwise.
Also, H̃(t) = 1 if kI = 1, while if kI > 1, H̃(t) is a (kI − 1) × (kI − 1) matrix defined
by
⎧
mI −1
⎪
(kI − 1 − p)! (
nk
⎪
⎪
+ tδpq p ≤ q ,
⎨
(kI − 1 − q)!
(γmI − γk )q−p+1
k=1
(35)
[H̃I ]pq (t) =
⎪
⎪
−1
p
=
q
+
1
,
⎪
⎩
0
otherwise.
This gives the overall solution
⎧
kI −1
⎪
−γ1 t (γ1 t)
⎪
e
⎪
⎪
(kI − 1)!
⎪
⎪
⎪
.n k
m
kI −1 γ nj e−γj t det H (t) m,
⎪
I −1
I −1 (
⎪
γ
I,j
γ
m
⎪
j
k
I
⎪
⎪
⎪
⎪
(γmI − γj )kI (nj − 1)!
γk − γj
⎪
k=0
⎪
⎨ j=1
k̸=j
.n k
πI (t) =
I −1 kI −1 e−γmI t det H̃ (t) m,
γm
γk
⎪
I
I
⎪
⎪
+
⎪
⎪
(kI − 1)!
γk − γmI
⎪
⎪
k=1
⎪
.nk
⎪
M
+1 M
M
+1
⎪
, n
( e−γj t det HN +1,j (t) ,
⎪
1
⎪
k
⎪
γk ×
⎪
⎪
(nj − 1)!
γk − γj
⎪
⎩
k=1
j=1
I ̸= N + 1, mI = 1 ,
I ̸= N + 1, mI ̸= 1 ,
I =N +1 ,
k=1
k̸=j
(36)
where we define γ0 := 1, γM +1 := 0, nM +1 := 1, mN +1 := M + 2 and kN +1 := 0 for
simplicity of notation.
1.3.2
Derivatives
An analytic expression also allows us to calculate derivatives in a similar way to §1.2.3.
Taking a derivative of the Laplace transform, we get
⎧
nr
nr
⎪
π̃I −
π̃I
r < mI ,
⎪
⎪
γr
s + γr
⎨
kI − 1
kI
∂r π̃I =
π̃I −
π̃I r = mI ,
⎪
⎪
γmI
s + γmI
⎪
⎩
0
otherwise.
So we need to calculate expressions of the form L−1 {(s + γr )−1 π̃j (s)}. Rather than use
the convolution theorem as before, here we apply the residue theorem multiple times. The
9
first application gives
−1
L
!
π̃I
s + γr
"
=
%n k
I −1 $
kI −1 e−γr t det K̃ (t) m#
γm
γk
I,r
I
(γmI − γr )kI nr !
γk − γr
k=1
k̸=r
+
m
I −1
&
j=1
j̸=r
n
kI −1 γ j e−γj t det(H (t) + K (n )) m#
I −1
γm
I,j
I,r j
j
I
(γmI − γj )kI (γr − γj )(nj − 1)!
k=0
k̸=j
$
γk
γk − γj
%n k
%n k
I −1 $
kI −1 e−γmI t det(H̃ (t) + K (k )) m#
γm
γk
I
I,r I
I
+
,
(γr − γmI )(kI − 1)!
γk − γmI
(37)
k=1
where K̃I,r (t) is a nr × nr matrix with components
⎧
⎛
⎞
⎪
⎪
m
−1
I
⎪
⎟
⎪ (nr − p)! ⎜ &
nk
kI
⎪
⎟ + tδpq p ≤ q ,
⎜
⎪
+
⎨
(nr − q)! ⎝
(γr − γk )q−p+1 (γr − γmI )q−p+1 ⎠
k=0
[K̃I,r ]pq (t) =
k̸=r
⎪
⎪
⎪
⎪
−1
p=q+1 ,
⎪
⎪
⎩
0
otherwisem
and where KI,r (a) = 0 if a = 1 or if a > 1 KI,r (a) is a (a − 1) × (a − 1) matrix with
components
⎧
1
⎨ (nj − 1 − p)!
p≤q ,
[KI,r ]pq (nj ) =
(nj − 1 − q)! (γj − γr )q−p+1
⎩
0
otherwise,
and
⎧
1
⎨ (kI − 1 − p)!
[KI,r ]pq (kI ) =
(k − 1 − q)! (γmI − γr )q−p+1
⎩ I
0
p≤q ,
otherwise.
The second application of the residue theorem gives
−1
L
!
π̃I
s + γmI
"
=
m
I −1
&
j=1
+
n
mI −1
kI −1 γ j e−γj t det(H (t) + K
γm
I,j
I,mI (nj )) #
j
I
(γmI − γj )kI +1 (nj − 1)!
k=0
k̸=j
$
γk
γk − γj
%n k
I −1 $
kI −1 e−γmI t det H̃ (t) m#
γm
γk
I1
I
,
kI !
γk − γmI
%n k
(38)
k=1
where H̃I1 (t) is a kI × kI matrix defined by
⎧
m
I −1
&
⎪
nk
⎪
⎪ (kI − p)!
+ tδpq p ≤ q ,
⎨
(kI − q)!
(γmI − γk )q−p+1
k=1
[H̃I1 ]pq (t) =
⎪
⎪
−1
p=q+1 ,
⎪
⎩
0
otherwise.
10
(39)
This gives the full final expression as
⎧
−tπI (t) + tπI−1 (t)
⎪
⎪
⎪
m%
I −1
⎪
⎪
kI − 1
⎪
kI −1
⎪
π
(t)
−
k
γ
γknk ×
⎪
I
I mI
⎪
⎪
γ
m
⎪
I
k=1
⎪
⎧
⎪
⎪
⎪
⎪
⎪
⎪
(n k
mI −1 '
⎪
I −1 −γj t
⎨m&
⎪
e
det(HI,j (t) + KI,mI (nj )) %
1
⎪
⎪
⎪
⎪
⎪
⎪
(γmI − γj )kI +1 (nj − 1)!
γk − γj
⎪
⎪
⎪
k=0
⎩ j=1
⎪
⎪
k̸
=
j
⎪
⎪
(n k )
⎪
m%
I −1 '
−γmI t
⎪
⎪
1
e
det
H̃
(t)
I1
⎪
⎪
+
⎪
⎪
k
!
γk − γmI
⎪
I
⎪
k=1
⎪
⎪
m
⎪
I −1
%
⎪
nr
⎪
k
−1
⎪
I
⎪
πI (t) − nr kI γmI
γknk ×
⎪
⎪
γ
⎪
⎪
k=1
⎧ r
⎪
⎪
⎪
⎪
⎪
⎪
(nk
⎪
I −1 '
⎨ e−γr t det K̃ (t) m%
⎪
⎪
1
I,r
⎪
⎪
⎪
⎪
⎪
(γ − γr )kI nr !
γk − γr
⎪
⎪
⎪
k=1
⎩ mI
⎪
⎨
k̸=r
(n k
mI −1 '
m
I −1
∂r πI (t) =
&
e−γj t det(HI,j (t) + KI,r (nj )) %
1
⎪
⎪
+
⎪
⎪
(γmI − γj )kI (γr − γj )(nj − 1)!
γk − γj
⎪
⎪
j=1
k=0
⎪
⎪
k̸
=
j
⎪
j̸=r
⎪
⎪
(n k )
mI −1 '
⎪
⎪
e−γmI t det(H̃I (t) + KI,r (kI )) %
1
⎪
⎪
⎪
+
⎪
⎪
(γr − γmI )(kI − 1)!
γk − γmI
⎪
⎪
k=1
⎪
⎪
M
+1
⎪
%
⎪
nr
⎪
⎪
γknk ×
π
(t)
−
n
⎪
N
+1
r
⎪
⎪
γ
r
⎪
k=1
⎪
⎧
⎪
⎪
⎪
⎪
⎪
⎪
(n k
M +1 '
⎪
⎨
⎪
det K̃N +1,r (t) %
1
⎪
−γ
⎪
rt
e
⎪
⎪
⎪
⎪
nr !
γk − γr
⎪
⎪
⎪
k=1
⎩
⎪
⎪
k̸
=
r
⎪
⎫
⎪
⎪
⎪
⎪
⎪
'
(
M
+1
M
+1
⎪
nk ⎪
⎬
&
%
⎪
det(H
(t)
+
K
(n
))
1
⎪
N +1,j
N +1,r j
−γj t
⎪
+
e
⎪
⎪
⎪
⎪
(γr − γj )(nj − 1)!
γk − γj
⎪
⎪
j=1
⎪
k=0
⎭
⎪
⎪
k̸
=
j
j̸
=
r
⎪
⎩
0
2
r = mI = 1, I ̸= N + 1 ,
r = mI ̸= 1, I ̸= N + 1 ,
r < mI , I ̸= N + 1 ,
I =N +1 ,
otherwise.
(40)
Fisher-Rao metric for the shedding models
We first note the straightforwardly-obtained result that if f (θ) is the pdf of a normal
distribution with mean µ(θ) and standard deviation σ(θ), then the Fisher-Rao metric as
defined in the main text is
ga,b = Ef [∂a ln(f )∂b ln(f )] =
11
2
1
∂a µ∂b µ + 2 ∂a σ∂b σ .
2
σ
σ
(41)
In two cases considered we have a likelihood given by products of such functions f over
different times with
µ(t|θ) = τ
m
!
πi (t, γ) =: τ π(t) ,
i=1
"
σt
σ(t) =
σ
(given in data) for influenza,
(one of the parameters) for norovirus.
(42)
Given the results above for πi (t), we can then write down that for influenza the metric has
components
gτ,τ = 0 , gi,j =
!
t∈T
! π(t)
τ
π(t)∂
π(t)
,
g
=
∂
∂i π(t) ,
j
τ,i
i
σ(t)2
σ(t)2
(43)
t∈T
and for norovirus there are the additional elements
! 2
gσ,σ =
,
gσ,a̸=σ = 0 .
σ(t)2
(44)
t∈T
For SMMALA this metric is all that is required, and has the primary benefit of introducing
local second-order derivative information into the MCMC algorithm, but for WLMC we
make additional use of the possibilities for reduction of global distances possible in a
geometric approach.
3
3.1
The WLMC algorithm
Lagrangian Monte Carlo (LMC)
The geometric approach of [2] involves Hamiltonian dynamics with a discretized integrator, the generalized leapfrog method [6, 2], that requires significant numerical effort, in
particular fixed-point iterations, to solve implicit equations. This step is potentially computational intensive (repeated matrix inversion of G(θ) involves O(D 2.373 ) operations in
dimension D), and can be numerically unstable when certain contraction conditions are
not satisfied [3].
To address this issue, Lan et al. [4] propose an explicit integrator for geometric MCMC
by using the following Lagrangian dynamics, equivalent to the Euler-Lagrangian equation
in Physics:
dv
dθ
=v,
= G(θ)−1 ∂L − vT Γ(θ)v ,
(45)
dt
dt
where v(0) := G(θ(0))−1 p(0) ∼ N (0, G(θ(0))−1 ), and Γ(θ) are Christoffel Symbol of the
second kind whose (i, j, k)-th element is Γkij = 12 gkm (∂i gmj + ∂j gim − ∂m gij ) with g km being
the (k, m)-th element of G(θ)−1 .
Geometrically speaking, instead of defining dynamics on cotangent bundle as in RHMC,
the dynamics is now defined on tangent bundle, which is in the same spirit of [1] to avoid
12
large momentum. The following explicit integrator can then be derived:
!
"−1 !
"
1
ε
ε
v(ℓ+ 2 ) = I + Ω(θ (ℓ) , v(ℓ) ))
v(ℓ) − G(θ (ℓ) )−1 ∇θ φ(θ (ℓ) ) ,
2
2
(ℓ+1)
(ℓ)
(ℓ+ 12 )
,
θ
= θ + εv
"−1 !
!
"
1
1
ε
ε
v(ℓ+ 2 ) − G(θ (ℓ+1) )−1 ∇θ φ(θ (ℓ+1) ) ,
v(ℓ+1) = I + Ω(θ (ℓ+1) , v(ℓ+ 2 ) ))
2
2
(46)
(47)
(48)
where Ω(θ (ℓ) , v(ℓ) ))kj := (v (ℓ) )i Γ(θ (ℓ) )kij . Such integrator is time reversible but not volume
preserving. The acceptance probability is adjusted to have the detailed balance condition
hold [4]:
$
$%
#
$ dz(L+1) $
$
$
(49)
α = min 1, exp(−E(z(L+1) ) + E(z(1) )) $
$ ,
$ dz(1) $
where the Jacobian determinant is
$
$
$ dz(ℓ+1) $ det(I − ε/2Ω(θ (ℓ+1) , v(ℓ+1) )) det(I − ε/2Ω(θ (ℓ) , v(ℓ+1/2) ))
$
$
,
$=
$
(ℓ)
$
$ dz
det(I + ε/2Ω(θ (ℓ+1) , v(ℓ+1/2) )) det(I + ε/2Ω(θ (ℓ) , v(ℓ) ))
(50)
and E(z) is the energy for the Lagrangian dynamics defined as:
E(θ, p) = − log π(θ|D) −
1
1
log det G(θ) + vT G(θ)v .
2
2
(51)
The resulting algorithm, Lagrangian Monte Carlo (LMC) is a valid exact sampler and has
the same strength in exploring complex geometry as RHMC does. LMC is shown to be
more efficient and stable than RHMC. For more details see [3, 4].
3.2
WLMC
When the target distribution is multi-modal, energy based sampling algorithms tend to
fail as they are easily trapped in some of the modes without visiting all of them. Making
proposals by numerically simulating Hamiltonian dynamics, the sampler has difficulty in
passing high energy barriers (low probability regions) [7]. Compared to HMC, the geometric RHMC/LMC methods perform even worse in relation to on this issue because they
are more adapted to the local geometry and more likely to be trapped in one mode.
To overcome this issue, some kind of global knowledge of the distribution needs to learned
and incorporated. Lan et al. [5] proposed the idea of wormhole for these geometric algorithms (HMC/RHMC/LMC) to work on multi-modal distributions. The proposed method
comes in 2 parts: a distance-shortening metric and a mode-jumping mechanism.
3.3
Wormhole Metric
Let θ̂ 1 and θ̂ 2 be two modes of the target distribution. We define a straight line segment,
vW := θ̂ 2 − θ̂ 1 , and refer to a small neighborhood (tube) of the line segment as a wormhole.
Next, we define a wormhole metric, GW (θ), in the vicinity of the wormhole. For a pair of
tangent vectors u, w at θ, wormhole metric GW is defined as follows
∗
∗
∗
∗
∗
∗ T
G∗W (u, w) := ⟨u − ⟨u, vW
⟩vW
, w − ⟨w, vW
⟩vW
⟩ = uT [I − vW
(vW
) ]w ,
∗
∗ T
∗
∗ T
GW = G∗W + εvW
(vW
) = I − (1 − ε)vW
(vW
) ,
13
(52)
(53)
∗ = v /∥v ∥, and 0 < ε ≪ 1 is a small positive number. To see that G
where vW
W
W
W in
fact shortens the distance between θ̂ 1 and θ̂2 , consider a simple case of a straight line:
θ(t) = θ̂ 1 + vW t, t ∈ [0, 1]. In this case, the distance under GW is
! 1"
√
T G v dt =
vW
ε∥vW ∥ ≪ ∥vW ∥ ,
dist(θ̂ 1 , θ̂ 2 ) =
W W
0
which is much smaller than the Euclidean distance.
Next, we define the overall metric, G, for the whole parameter space of θ as a weighted
sum of the base metric G0 and the wormhole metric GW ,
G(θ) = (1 − m(θ))G0 (θ) + m(θ)GW ,
(54)
where m(θ) ∈ (0, 1) is a mollifying function designed to make the wormhole metric GW
influential in the vicinity of the wormhole only.
3.4
Wormhole Network
For more than two modes, the above method alone could suffer from two potential shortcomings in higher dimensions. First, the effect of wormhole metric could diminish fast as
the sampler leaves one mode towards another mode. Secondly, such a mechanism, which
modifies the dynamics in the existing parameter space, could interfere with the native dynamics in the neighborhood of a wormhole, possibly preventing the sampler from properly
exploring areas around the modes as well as some low probability regions.
To address the first issue, we add an external vector field to enforce the movement between modes. More specifically, we define a vector field, f (θ, v), in terms of the position
parameter θ and the velocity vector v = G(θ)−1 p as follows:
∗
∗
∗
∗
f (θ, v) := exp{−V (θ)/(DF )}U (θ)⟨v, vW
⟩vW
= m(θ)⟨v, vW
⟩vW
,
(55)
with mollifier m(θ) := exp{−V (θ)/(DF )}, where D is the dimension, F > 0 is the influence factor, and V (θ) is a vicinity function indicating the Euclidean distance from the line
segment vW ,
∗
∗
V (θ) := ⟨θ − θ̂ 1 , θ − θ̂ 2 ⟩ + |⟨θ − θ̂ 1 , vW
⟩||⟨θ − θ̂ 2 , vW
⟩| .
(56)
After adding the vector field, we modify the Hamiltonian/Lagrangian dynamics governing
the evolution of θ as follows:
θ̇ = v + f (θ, v) .
(57)
To address the second issue, we allow the wormholes to pass through an extra auxiliary
dimension to avoid their interference with the existing dynamics in the given parameter
space. In particular we introduce an auxiliary variable θD+1 ∼ N (0, 1) corresponding
to an auxiliary dimension. We use θ̃ := (θ, θD+1 ) to denote the position parameters in
the resulting D + 1 dimensional space MD × R. θD+1 can be viewed as random noise
2
to the total potential energy. Correspondingly,
independent of θ and contributes 21 θD+1
we augment velocity v with one extra dimension, denoted as ṽ := (v, vD+1 ). At the end
of the sampling, we project θ̃ to the original parameter space and discard θD+1 .
We refer to MD × {−h} as the real world, and call MD × {+h} the mirror world. Here,
h is half of the distance between the two worlds, and it should be in the same scale as the
14
&'()*'+,
-
/012+23(4562),782'7
!
!
.
Mirror World
%
"
#$"
#
Ô#$"
Ô!
Ô!"
.
!
Ô!#
"
Ô"
#
e
!
"
Ô"
Ô!#
!"
!#
"
#
Real World
!#
!"
%
&'()*'+,A
e%
Ô!"
Figure 1: Illustrating a wormhole network connecting the real world to the mirror world
(h = 1). As an example, the cylinder shows a wormhole connecting mode 5 in the real
world to its mirror image. The dashed lines show two sets of wormholes. The red lines
shows the wormholes when the sampler is close to mode 1 in the real world, and the
magenta lines show the wormholes when the sampler is close to mode 5 in the mirror
world.
average distance between the modes. For most of the examples discussed here, we set h = 1.
Figure 1 illustrates how the two worlds are connected by networks of wormholes.
One can refer to [5] for full algorithmic details of Wormhole HMC/LMC, including the
case where the modes are initially unknown.
References
[1] A. Beskos, F. J. Pinski, J. M. Sanz-Serna, and A. M. Stuart. Hybrid monte carlo on
hilbert spaces. Stochastic Processes and their Applications, 121(10):2201–2230, 2011.
[2] M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte
Carlo methods. Journal of the Royal Statistical Society, Series B, 73(2):123–214, 2011.
[3] S. Lan. Advanced Bayesian computational methods through geometric techniques.
PhD thesis, Long Beach, CA, USA. AAI3605182., 2013.
[4] S. Lan, V. Stathopoulos, B. Shahbaba, and M. Girolami. Markov chain Monte Carlo
from Lagrangian dynamics. Journal of Computational and Graphical Statistics DOI:
10.1080/10618600.2014.902764, 2014.
[5] S. Lan, J. Streets, and B. Shahbaba. Wormhole hamiltonian monte carlo. AAAI
Conference on Artificial Intelligence, 2014.
[6] B. Leimkuhler and S. Reich. Simulating Hamiltonian Dynamics. Cambridge University
Press, 2005.
[7] C. Sminchisescu and M. Welling. Generalized darting Monte Carlo. Pattern Recognition, 44(1011):2738–2748, 2011. Semi-Supervised Learning for Visual Content Analysis
and Understanding.
15
Download