M A S D O C Efficient sequential Monte Carlo sampling of rare trajectories in reverse time Jere Koskela∗,†, Paul A. Jenkins†,‡ and Dario Spanò† * Mathematics Institute, † Department of Statistics and ‡ Department of Computer Science, University of Warwick Introduction The spatial Λ-coalescent The discretised hyperbolic diffusion Sequential Monte Carlo (SMC) is a technique for sampling from a sequence of complicated distributions of increasing dimension and known pointwise up to a normalising constant [DDFG01]. A “cloud” of weighted particles is extended from one distribution to the next by a combination of sequential importance sampling and resampling. Each set of weighted particles forms an empirical approximation of each subsequent distribution. Given an ensemble of weighted particles at time t, {wi(t), xi(t)}Ni=1, a proposal distribution qt (x, ·) and a target πt+1(·), the ensemble at time t + 1 is obtained by repeating the following for i ∈ {1, . . . , N}: 1. Resample an ancestral index ai ∼ Categorical(w1(t), . . . , wN (t)). 2. Propagate xi(t + 1) ∼ q(xa(i)(t), ·). 3. Set wi(t + 1) = πt+1(xi(t + 1))/q(xa(i)(t), xi(t + 1)). The hyperbolic diffusion [BN78] is the solution of the SDE −Xt dXt = p 1+ Xt2 dt + dWt . (2) Its transition probabilities are intractable, but its stationary distribution π is known: √ 1 − 1+x2 , (3) π(x) = e 2K1(1) where K1 is the modified Bessel function of the second kind. We estimate the hitting probability of a level b > 0 before returning to 0, given X0 = 0. The sets defining the problem are I = {0}, T = {b}, R = (−∞, 0]. ● which can be normalised numerically and sampled by proposing ∆ from the N(y, ∆) proposal distribution, solving for x and acx 1 − √1+x 2 cepting with probability e ● ● ● ● ● ● ● −2 −3 ● . The coalescent dynamics depicted in Figure 4 can be viewed as a particle system growing from the most recent common ancestor [VW15]. The likelihood of an observed configuration of types h∗ = (h1, . . . , hn) ∈ {1, . . . , d}n at locations z∗ = (z1, . . . , zn) ∈ Tn corresponds to the probability of the particle system hitting the observed data: an ideal problem for the reverse time algorithm since the terminal state is a null set. The sets defining the problem are 91 ● ● −4 65s −4 log (base 10) hitting probability ● 0 1 2 Iteration Figure 1 : Two steps of an SMC algorithm, with target densities in green and particle locations in red. Particle sizes are proportional to weights. 114 ● 131 73s ● 139 138 78s −5 ● ● ● 85s 85s 133 133 ● ● 92s 98s 139 129 147 ● ● ● 135 138 ● 106s121s 123s ● 143s 146 137 ● 141s ● 126 150s155s ● 143 141 167s ● ● 191s190s −6 ● ● I = {(z, h) : z ∈ T, h ∈ H}, T = {(z∗, h∗)}, k k R = ∪∞ k=n+1 {((z1 , . . . , zk ), (h1 , . . . , hk )) ∈ T × H }, −7 225 ● and the CSDs can be approximated using standard heuristics [KSJ16]. 220s The choice of the proposal family {qt (·, ·)}t≥1 largely determines the efficiency of the algorithm. This poster explores the use of proposal distributions progressing in reverse time to sample certain classes of rare events. 409 ● −72 −8 234s 5.0 5.5 6.0 ● 1466 503m 6.5 The opposite of 1. and 3. would hold for a forwards-time algorithm. Thus, reverse-time SMC is best seen as complementary to existing approaches. Design of proposal distributions The reverse-time dynamics in (1) would be a highly efficient proposal distribution if they could be simulated, and if p̃ could be evaluated. However, the Green’s function is typically at least as difficult to compute as the probability of interest. Progress can be made by approximating G, and defining a proposal based on the approximated Green’s function Ĝ. This can also lead to a convenient reduction in dimensionality. Let τA denote the hitting time of a set A, and consider a transition of X from xi−1 = (z, y) to xi = (z, ȳ). Assume that the conditional sampling distribution (CSD) π(x|z) := Pµ(Yi = x|Zi = z) is independent of i ∈ N for Pµ-almost every z. Then the ratio of Green’s functions in (1) cancels to the ratio of CSDs: G(µ, (z, y)) π(y|z) = . G(µ, (z, ȳ)) π(ȳ|z) MASDOC CDT, University of Warwick −73 ● 3455 435m ● 14815 517m ● 912 529m ● ● 1239 538m −74 441 509m ● ● 4098 651m 8617 512m ● ● 9948 348m 14104 323m ● 15950 307m ● 9521 891m ● 5277 1255m −75 −75 1555 552m ● ● 3162 498m 24227 1939m 0.000 0.004 0.006 0.008 0.010 0.6 0.8 1.0 1.2 1.4 −72.5 r −73.0 ● 8137 955m log (base 10) likelihood ● {(0, j)}, T = {(b, k)}, ● 7882 667m 2828 1453m ● ● 552 513m 602 432m ● 10240 379m ● −74.5 n [ 0.002 theta j=0 7184 344m ● 7184 2800m 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 u and the initial distribution is specified as j n−j n α0 α1 µ({(0, j)}) = . j α0 + α1 α0 + α1 Figure 5 : Simulated likelihood surfaces, effective sample sizes and run times (12 cores on the MidPlus cluster Minerva) for the parameters of the spatial Λ-coalescent, with other parameters are assumed known and using 4 million particles. The approximate CSDs of i given j and j given i are defined as i λj π̂i(i|j) ∝ µ j n−j n α0 α1 π̂j(j|i) ∝ π̂i(i|j) j α0 + α1 α0 + α1 for i ∈ {0, . . . , b} and j ∈ {0, . . . , n}. These are the correct CSDs for a model with the conditioned upon variable fixed at its current value. Acknowledgements and References JK is supported by EPSRC as part of the MASDOC DTC, grant EP/HO23364/1. PJ is supported in part by EPSRC grant EP/L018497/1. The MidPlus cluster Minerva is provided by EPSRC grant EP/K000128/1. [BEV10] N. H. Barton, A. M. Etheridge, and A. Véber. A new model for evolution in a spatial continuum. Electron. J. Probab., 15(7):162–216, 2010. [BN78] O. Barndorff-Nielsen. Hyperbolic distributions and distributions on hyperbolae. Scand. J. Statist., 5:151–157, 1978. 5797 3844 5754 ● ● ● ● [DDFG01] A. Doucet, J. F. G. De Freitas, and N. J. Gordon. Sequential Monte Carlo methods in practice. Springer, New York, 2001. 9576 5856 ● ● 4760 7882 ● 6328 6569 4472 5752 ● ● ● ● 3464 6556 ● 4977 ● [GHSZ99] P. Glasserman, P. Heidelberg, P. Shahabuddin, and T. Zajic. Multi-level splitting for estimating rare event probabilities. Oper. Res., 47:585–600, 1999. 4907 ● 4292 ● ● 4318 2442 ● ● [KSJ16] J. Koskela, D. Spanò, and P. A. Jenkins. Inference and rare event simulation for stopped Markov processes via reverse-time sequential Monte Carlo. Preprint, arXiv:1603.02834, 2016. [RW94] L. C. G Rogers and D. Williams. Diffusions, Markov processes and martingales, volume 1. Wiley, 2 edition, 1994. [VW15] A. Véber and A. Wakolbinger. The spatial Lambda-Fleming-Viot process: an event-based construction and a lookdown representation. Ann. Inst. H. Poincar Probab. Statist., 51(2), 2015. 2611 ● 4121 −26 ● CSDs can be of substantially lower dimension than G; indeed π(y|z) can even be a one dimensional family if the dynamics of X only update one coordinate at a time. However, π(y|z) is still typically intractable. Introducing an application-specific, low dimensional approximation π̂(y|z) yields an implementable algorithm. Note that the independence assumption above does not have to hold: it is possible to define a proposal distribution via a ratio of CSDs regardless, albeit at a cost in loss of computational efficiency. ● ● 20556 518m 7723 382m −75.0 I=R= −18 The reverse time method is advantageous when 1. I is large in the sense of Lebesgue volume or dimension. 2. T has small probability under the forward dynamics. 3. T is small in the sense of Lebesgue volume or dimension. The ideal setting for a reverse time algorithm is when T is a null set. Point 2. ensures that unconditioned reverse-time dynamics mimic forwardstime bridges between I and T by rapidly moving from T to I in reverse time. Point 3. ensures that the reverse-time algorithm does not have to integrate over an expensive “initial” condition. 11477 522m 3655 ● 532m 613 511m The sets defining the problem are −20 The rare trajectories in consideration are specified by three sets: I An initial set I ⊂ Ω, with µ(I) = 1. I A target set T ⊂ Ω. I An overshoot set R ⊂ Ω which is hit by X with probability 1. The aim is to sample trajectories started in I which hit T before reaching R. −22 Rare trajectories The ATM network [GHSZ99] consists of n sources which are either on or off. Sources which are off do nothing, while sources which are on each generate packets at rate λ. Packets are serviced by a common server with rate µ using the first-in-first-out policy. Off sources turn on at rate α0 and on sources turn off at rate α1. The state of the system is specified as (i, j) ∈ N0 × {0, . . . , n}, where i denotes the number of packets in the queue and j the number of on sources. We estimate the probability of an initially empty queue hitting a level b before emptying with exactly k sources on at the hitting time. −24 (1) The asynchronous transfer mode network log (base 10) hitting probability G(µ, x) p(y, x), p̃(x, y) = G(µ, y) Pτ where G(µ, x) := Eµ i=1 1{x} (Xi ) is the Green’s function. Figure 2 : Estimated hitting probabilities with effective sample sizes and run times (Intel i5-2520M 2.5 GHz) with 500 particles and ∆ = 0.01. log (base 10) likelihood Barrier b GivenQ a time-homogeneous Markov chain X := {Xi}τi=1 with state space Ω := ∞ i=1 Ei , initial distribution µ, transition density p(x, y) and a.s. finite life time τ , the time-reversal is given by Nagasawa’s formula [RW94]: ● ● ● Time-reversal of Markov chains −72 ● −73 ● √ − 1+x2 −74 ● −76 ● ● Once the most recent common ancestor has been reached, a mutation process on a state space H can be run along the edges of the realised tree. For concreteness we assume there are finitely many types identified with H = {1, . . . , d}, a mutation rate θ > 0 and a transition matrix M on {1, . . . , d} with a unique stationary distribution m. Then the type of the most recent common ancestor is sampled from m, and mutations take place along ancestral edges with rate θ and transitions in type sampled from M. log (base 10) likelihood 0 ● Figure 4 : Example realisation of a spatial Λ coalescent in one dimension. Lineages 1 and 3 merge in event A to form lineage 5. Lineage 4 jumps in event B, and merges with lineage 5 in event C to form lineage 6. Lineage 2 does not participate in event A, but merges with lineage 6 in event D to form lineage 7: the most recent common ancestor. −73.5 2 We use (4) and (3) to define a discretised reverse-time proposal: Z 0 π(z) π(x) p̂∆(y, x) ∝ p∆(z, y)dz δ0(x) + p∆(x, y)1[0,b)(x), π(y) π(y) −∞ −74.0 4 We consider a discretisation of (2) and use the Euler scheme with grid spacing ∆ to define the family of transition densities: 2! 1 ∆ 1 exp − y−x 1− √ . (4) p∆(x, y) = √ 2 2∆ 2π∆ 1+x ● Density The spatial Λ-coalescent [BEV10] is a model of the genetic ancestry of a population in a continuous geography, T. Individual lineages occupy fixed positions xi ∈ T, and evolution is driven by a Poisson process Π of extinction-recolonisation events occurring on R+ ×T at rate dt⊗dz. At each (t, z) ∈ Π, every lineage with xi ∈ Br (z), i.e. within distance r > 0 of the event location z, participates in the event with probability u ∈ (0, 1]. Participating lineages coalesce to a common ancestor whose location is sampled uniformly from Br (z). A one-dimensional example is shown in Figure 4. 0 5 10 15 20 Number of terminal on−sources Figure 3 : Estimated hitting probabilities and effective sample sizes with 50 000 particles, n = 20, b = 30, λ = 0.5, µ = 10.0, α0 = 1.0 and α1 = 3.0. Run times were approximately 2 min 10 sec per value of k (Intel i5-2520M 2.5 GHz). Mail: masdoc.info@warwick.ac.uk WWW: http://www2.warwick.ac.uk/fac/sci/masdoc/