M A S O D

advertisement
M A S
D O C
Efficient sequential Monte Carlo sampling of rare trajectories in reverse time
Jere Koskela∗,†, Paul A. Jenkins†,‡ and Dario Spanò†
* Mathematics Institute, † Department of Statistics and ‡ Department of Computer Science, University of Warwick
Introduction
The spatial Λ-coalescent
The discretised hyperbolic diffusion
Sequential Monte Carlo (SMC) is a technique for sampling from a sequence
of complicated distributions of increasing dimension and known pointwise
up to a normalising constant [DDFG01]. A “cloud” of weighted particles is
extended from one distribution to the next by a combination of sequential
importance sampling and resampling. Each set of weighted particles forms
an empirical approximation of each subsequent distribution.
Given an ensemble of weighted particles at time t, {wi(t), xi(t)}Ni=1, a proposal distribution qt (x, ·) and a target πt+1(·), the ensemble at time t + 1 is
obtained by repeating the following for i ∈ {1, . . . , N}:
1. Resample an ancestral index ai ∼ Categorical(w1(t), . . . , wN (t)).
2. Propagate xi(t + 1) ∼ q(xa(i)(t), ·).
3. Set wi(t + 1) = πt+1(xi(t + 1))/q(xa(i)(t), xi(t + 1)).
The hyperbolic diffusion [BN78] is the solution of the SDE
−Xt
dXt = p
1+
Xt2
dt + dWt .
(2)
Its transition probabilities are intractable, but its stationary distribution π is
known:
√
1
− 1+x2
,
(3)
π(x) =
e
2K1(1)
where K1 is the modified Bessel function of the second kind. We estimate
the hitting probability of a level b > 0 before returning to 0, given X0 = 0.
The sets defining the problem are
I = {0}, T = {b}, R = (−∞, 0].
●
which
can be normalised numerically and sampled by proposing
∆
from the N(y, ∆) proposal distribution, solving for x and acx 1 − √1+x
2
cepting with probability e
●
●
●
●
●
●
●
−2
−3
●
.
The coalescent dynamics depicted in Figure 4 can be viewed as a particle
system growing from the most recent common ancestor [VW15]. The likelihood of an observed configuration of types h∗ = (h1, . . . , hn) ∈ {1, . . . , d}n
at locations z∗ = (z1, . . . , zn) ∈ Tn corresponds to the probability of the
particle system hitting the observed data: an ideal problem for the reverse
time algorithm since the terminal state is a null set. The sets defining the
problem are
91
●
●
−4
65s
−4
log (base 10) hitting probability
●
0
1
2
Iteration
Figure 1 : Two steps of an SMC algorithm, with target densities in green and particle
locations in red. Particle sizes are proportional to weights.
114
●
131
73s
●
139 138
78s
−5
●
●
●
85s 85s 133 133
●
●
92s 98s
139 129
147
●
●
●
135
138
●
106s121s
123s ● 143s 146 137
●
141s
● 126
150s155s ● 143 141
167s ●
●
191s190s
−6
●
●
I = {(z, h) : z ∈ T, h ∈ H}, T = {(z∗, h∗)},
k
k
R = ∪∞
k=n+1 {((z1 , . . . , zk ), (h1 , . . . , hk )) ∈ T × H },
−7
225
●
and the CSDs can be approximated using standard heuristics [KSJ16].
220s
The choice of the proposal family {qt (·, ·)}t≥1 largely determines the efficiency of the algorithm. This poster explores the use of proposal distributions progressing in reverse time to sample certain classes of rare events.
409
●
−72
−8
234s
5.0
5.5
6.0
●
1466
503m
6.5
The opposite of 1. and 3. would hold for a forwards-time algorithm. Thus,
reverse-time SMC is best seen as complementary to existing approaches.
Design of proposal distributions
The reverse-time dynamics in (1) would be a highly efficient proposal
distribution if they could be simulated, and if p̃ could be evaluated.
However, the Green’s function is typically at least as difficult to compute
as the probability of interest. Progress can be made by approximating G,
and defining a proposal based on the approximated Green’s function Ĝ.
This can also lead to a convenient reduction in dimensionality.
Let τA denote the hitting time of a set A, and consider a transition of X
from xi−1 = (z, y) to xi = (z, ȳ). Assume that the conditional sampling
distribution (CSD)
π(x|z) := Pµ(Yi = x|Zi = z)
is independent of i ∈ N for Pµ-almost every z. Then the ratio of Green’s
functions in (1) cancels to the ratio of CSDs:
G(µ, (z, y)) π(y|z)
=
.
G(µ, (z, ȳ)) π(ȳ|z)
MASDOC CDT, University of Warwick
−73
●
3455
435m
●
14815
517m
●
912
529m
●
●
1239
538m
−74
441
509m
●
●
4098
651m
8617
512m
●
●
9948
348m
14104
323m
●
15950
307m
●
9521
891m
●
5277
1255m
−75
−75
1555
552m
●
●
3162
498m
24227
1939m
0.000
0.004
0.006
0.008
0.010
0.6
0.8
1.0
1.2
1.4
−72.5
r
−73.0
●
8137
955m
log (base 10) likelihood
●
{(0, j)}, T = {(b, k)},
●
7882
667m
2828
1453m
●
●
552
513m
602
432m
●
10240
379m
●
−74.5
n
[
0.002
theta
j=0
7184
344m
●
7184
2800m
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
u
and the initial distribution is specified as
j n−j
n
α0
α1
µ({(0, j)}) =
.
j
α0 + α1
α0 + α1
Figure 5 : Simulated likelihood surfaces, effective sample sizes and run times (12 cores
on the MidPlus cluster Minerva) for the parameters of the spatial Λ-coalescent, with other
parameters are assumed known and using 4 million particles.
The approximate CSDs of i given j and j given i are defined as
i
λj
π̂i(i|j) ∝
µ
j n−j
n
α0
α1
π̂j(j|i) ∝ π̂i(i|j)
j
α0 + α1
α0 + α1
for i ∈ {0, . . . , b} and j ∈ {0, . . . , n}. These are the correct CSDs for a
model with the conditioned upon variable fixed at its current value.
Acknowledgements and References
JK is supported by EPSRC as part of the MASDOC DTC, grant
EP/HO23364/1. PJ is supported in part by EPSRC grant EP/L018497/1.
The MidPlus cluster Minerva is provided by EPSRC grant EP/K000128/1.
[BEV10]
N. H. Barton, A. M. Etheridge, and A. Véber.
A new model for evolution in a spatial continuum.
Electron. J. Probab., 15(7):162–216, 2010.
[BN78]
O. Barndorff-Nielsen.
Hyperbolic distributions and distributions on hyperbolae.
Scand. J. Statist., 5:151–157, 1978.
5797 3844
5754
●
●
●
●
[DDFG01] A. Doucet, J. F. G. De Freitas, and N. J. Gordon.
Sequential Monte Carlo methods in practice.
Springer, New York, 2001.
9576
5856 ●
●
4760
7882
●
6328 6569
4472 5752
●
●
●
●
3464 6556
●
4977
●
[GHSZ99] P. Glasserman, P. Heidelberg, P. Shahabuddin, and T. Zajic.
Multi-level splitting for estimating rare event probabilities.
Oper. Res., 47:585–600, 1999.
4907 ●
4292
●
●
4318
2442
●
●
[KSJ16]
J. Koskela, D. Spanò, and P. A. Jenkins.
Inference and rare event simulation for stopped Markov
processes via reverse-time sequential Monte Carlo.
Preprint, arXiv:1603.02834, 2016.
[RW94]
L. C. G Rogers and D. Williams.
Diffusions, Markov processes and martingales, volume 1.
Wiley, 2 edition, 1994.
[VW15]
A. Véber and A. Wakolbinger.
The spatial Lambda-Fleming-Viot process: an event-based
construction and a lookdown representation.
Ann. Inst. H. Poincar Probab. Statist., 51(2), 2015.
2611
●
4121
−26
●
CSDs can be of substantially lower dimension than G; indeed π(y|z) can
even be a one dimensional family if the dynamics of X only update one
coordinate at a time. However, π(y|z) is still typically intractable.
Introducing an application-specific, low dimensional approximation
π̂(y|z) yields an implementable algorithm. Note that the independence
assumption above does not have to hold: it is possible to define a
proposal distribution via a ratio of CSDs regardless, albeit at a cost in
loss of computational efficiency.
●
●
20556
518m
7723
382m
−75.0
I=R=
−18
The reverse time method is advantageous when
1. I is large in the sense of Lebesgue volume or dimension.
2. T has small probability under the forward dynamics.
3. T is small in the sense of Lebesgue volume or dimension. The ideal
setting for a reverse time algorithm is when T is a null set.
Point 2. ensures that unconditioned reverse-time dynamics mimic forwardstime bridges between I and T by rapidly moving from T to I in reverse time.
Point 3. ensures that the reverse-time algorithm does not have to integrate
over an expensive “initial” condition.
11477
522m
3655
●
532m
613
511m
The sets defining the problem are
−20
The rare trajectories in consideration are specified by three sets:
I An initial set I ⊂ Ω, with µ(I) = 1.
I A target set T ⊂ Ω.
I An overshoot set R ⊂ Ω which is hit by X with probability 1.
The aim is to sample trajectories started in I which hit T before reaching R.
−22
Rare trajectories
The ATM network [GHSZ99] consists of n sources which are either on or
off. Sources which are off do nothing, while sources which are on each
generate packets at rate λ. Packets are serviced by a common server with
rate µ using the first-in-first-out policy. Off sources turn on at rate α0 and
on sources turn off at rate α1. The state of the system is specified as (i, j) ∈
N0 × {0, . . . , n}, where i denotes the number of packets in the queue and j
the number of on sources. We estimate the probability of an initially empty
queue hitting a level b before emptying with exactly k sources on at the
hitting time.
−24
(1)
The asynchronous transfer mode network
log (base 10) hitting probability
G(µ, x)
p(y, x),
p̃(x, y) =
G(µ, y)
Pτ
where G(µ, x) := Eµ
i=1 1{x} (Xi ) is the Green’s function.
Figure 2 : Estimated hitting probabilities with effective sample sizes and run times (Intel
i5-2520M 2.5 GHz) with 500 particles and ∆ = 0.01.
log (base 10) likelihood
Barrier b
GivenQ
a time-homogeneous Markov chain X := {Xi}τi=1 with state space
Ω := ∞
i=1 Ei , initial distribution µ, transition density p(x, y) and a.s. finite
life time τ , the time-reversal is given by Nagasawa’s formula [RW94]:
●
●
●
Time-reversal of Markov chains
−72
●
−73
●
√
− 1+x2
−74
●
−76
●
●
Once the most recent common ancestor has been reached, a mutation process on a state space H can be run along the edges of the realised tree.
For concreteness we assume there are finitely many types identified with
H = {1, . . . , d}, a mutation rate θ > 0 and a transition matrix M on
{1, . . . , d} with a unique stationary distribution m. Then the type of the
most recent common ancestor is sampled from m, and mutations take place
along ancestral edges with rate θ and transitions in type sampled from M.
log (base 10) likelihood
0
●
Figure 4 : Example realisation of a spatial Λ coalescent in one dimension. Lineages 1
and 3 merge in event A to form lineage 5. Lineage 4 jumps in event B, and merges with
lineage 5 in event C to form lineage 6. Lineage 2 does not participate in event A, but
merges with lineage 6 in event D to form lineage 7: the most recent common ancestor.
−73.5
2
We use (4) and (3) to define a discretised reverse-time proposal:
Z 0
π(z)
π(x)
p̂∆(y, x) ∝
p∆(z, y)dz δ0(x) +
p∆(x, y)1[0,b)(x),
π(y)
π(y)
−∞
−74.0
4
We consider a discretisation of (2) and use the Euler scheme with grid spacing ∆ to define the family of transition densities:
2!
1
∆
1
exp −
y−x 1− √
.
(4)
p∆(x, y) = √
2
2∆
2π∆
1+x
●
Density
The spatial Λ-coalescent [BEV10] is a model of the genetic ancestry of
a population in a continuous geography, T. Individual lineages occupy
fixed positions xi ∈ T, and evolution is driven by a Poisson process Π of
extinction-recolonisation events occurring on R+ ×T at rate dt⊗dz. At each
(t, z) ∈ Π, every lineage with xi ∈ Br (z), i.e. within distance r > 0 of the
event location z, participates in the event with probability u ∈ (0, 1]. Participating lineages coalesce to a common ancestor whose location is sampled
uniformly from Br (z). A one-dimensional example is shown in Figure 4.
0
5
10
15
20
Number of terminal on−sources
Figure 3 : Estimated hitting probabilities and effective sample sizes with 50 000
particles, n = 20, b = 30, λ = 0.5, µ = 10.0, α0 = 1.0 and α1 = 3.0. Run times were
approximately 2 min 10 sec per value of k (Intel i5-2520M 2.5 GHz).
Mail: masdoc.info@warwick.ac.uk
WWW: http://www2.warwick.ac.uk/fac/sci/masdoc/
Download