Rare Event Simulation and (Interacting) Particle Systems Adam M. Johansen a.m.johansen@warwick.ac.uk http://go.warwick.ac.uk/amjohansen/talks/ MASDOC Statistical Frontiers Seminar 4th February 2016 Context I I I I Given a probability space (Ω, F, P), and a random element X : (Ω, F) → (E , E), what is P(X ∈ A) = P ◦ X −1 (A) = P({ω ∈ Ω : X (ω) ∈ A}), for some A ∈ E such that P(A) 1? X Ω X−1(A) ω A X X(ω) X(Ω) 2/51 Some Simple Examples: Normal Probabilities 1. A really simple problem. I I I Let 2 1 x f (x) = √ exp − . 2 2π What is P(X ∈ A) if A = [a, ∞) for a 1? Simple semi-analytic solution 1 − Φ(a). 2. A somewhat harder problem: I I I let 1 1 f (x) = p exp − xT Σ−1 x . 2 |2πΣ| What is P(X ∈ A) if A = ⊗di=1 [ai , bi ]? What can we say about Law(X )|A 3/51 The Monte Carlo Method I To approximate I (ϕ) = E[ϕ(X )] I I with X ∼ π. iid Sample X1 , . . . , Xn ∼ π and use n În (ϕ) = 1X ϕ(Xi ). n i=1 I SLLN: a.s. lim În (ϕ) = E[ϕ(X )] n→∞ I CLT: lim n→∞ √ D n(În (ϕ) − I (ϕ)) = Z , Z ∼ N (0, Var(ϕ(X ))) provided Var(ϕ(X )) < ∞. 4/51 The Monte Carlo Method and Rare Events I I Use P(X ∈ A) ≡ E[IA (X )] = I (IA ). Then, directly: P(X ∈ A) ≈ În (IA ) = |A ∩ {X1 , . . . , Xn }| . n 5/51 Simple Monte Carlo and the Toy Problem a k 1 2 3 4 5 6 1 -2.30 2 -1.66 -3.91 log(Î10k (I[a,∞) )) 3 4 5 -1.80 -1.82 -1.83 -3.73 -3.76 -3.78 -6.91 -6.81 -6.59 -10.12 6 -1.84 -3.79 -6.60 -10.26 7 -1.84 -3.79 -6.61 -10.42 -14.73 log (1 − Φ(a)) -1.84 -3.78 -6.61 -10.36 -15.06 -20.74 Simple calculations reveal: I I I E[În (I[a,∞) )] = P(X ∈ [a, ∞)) Var[În (I[a,∞) )] = n1 P(X ∈ [a, ∞))(1 − P(X ∈ [a, ∞))) So the relative standard deviation is ∼ (nP(X ∈ [a, ∞)))−1 . 6/51 Simple Monte Carlo and the Toy Problem a k 1 2 3 4 5 6 1 -2.30 2 -1.66 -3.91 log(Î10k (I[a,∞) )) 3 4 5 -1.80 -1.82 -1.83 -3.73 -3.76 -3.78 -6.91 -6.81 -6.59 -10.12 6 -1.84 -3.79 -6.60 -10.26 7 -1.84 -3.79 -6.61 -10.42 -14.73 log (1 − Φ(a)) -1.84 -3.78 -6.61 -10.36 -15.06 -20.74 Simple calculations reveal: I I I E[În (I[a,∞) )] = P(X ∈ [a, ∞)) Var[În (I[a,∞) )] = n1 P(X ∈ [a, ∞))(1 − P(X ∈ [a, ∞))) So the relative standard deviation is ∼ (nP(X ∈ [a, ∞)))−1 . 7/51 Variance Reduction I Want p̂n such that p̂n ≈ P(X ∈ A) =: p: I I I I Ideally, with E[p̂n ] = p. Such that Var(p̂n ) p 2 . For modest n. Controlling variance is the key issue. I I I I Importance Sampling. Splitting. Interacting Particle Systems. Sequential Monte Carlo. 8/51 Importance Sampling — A Change of Measure View I If: I I I I I df dg (x) Then: I I X ∼f Y ∼g f g w (x) := E[ϕ(X )] ≡ E[w (Y )ϕ(Y )] iid So, if Y1 , . . . ∼ g , then: n 1X a.s. w (Yi )ϕ(Yi ) = E[ϕ(X )] lim n→∞ n i=1 and this is an unbiased estimator for any n. 9/51 Importance Sampling Variance The variance of this estimator is: " n # 1X Var w (Yi )ϕ(Yi ) n i=1 1 = Var [w (Y1 )ϕ(Y1 )] n i o 1n h = E (w (Y1 )ϕ(Y1 ))2 − E [w (Y1 )ϕ(Y1 )]2 n( Z 2 ) Z 1 (w (y )ϕ(y ))2 g (dy ) − w (y )ϕ(y )g (dy ) = n Z 1 = w (y )ϕ2 (y )f (dx) − E[ϕ(X )]2 n 10/51 Optimal Importance Sampling Proposition Let X ∼ f , where f (dx) = f (x)dx, with values in (E , E) and let φ : R → (0, ∞) a function of interest. The proposal which minimizes the variance of the importance sampling estimator of E[ϕ(X )] is g (x)dx, where: g (x) = R f (x)ϕ(x) f (y )ϕ(y )dy Note: if E ⊃ A ⊃ supp ϕ(x), it suffices for f |A g |A . 11/51 Importance Sampling and the Toy Problem a k 1 2 3 4 5 6 7 8 9 1 -1.72 -3.63 -6.43 -10.16 -14.85 -20.51 -27.16 -34.79 -43.41 2 -1.84 -3.78 -6.59 -10.34 -15.04 -20.72 -27.37 -35.01 -43.64 log(Î10k (I[a,∞) )) 3 4 5 -1.83 -1.84 -1.84 -3.79 -3.78 -3.78 -6.63 -6.60 -6.61 -10.40 -10.35 -10.36 -15.12 -15.06 -15.07 -20.81 -20.73 -20.73 -27.46 -27.38 -27.39 -35.10 -35.02 -35.01 -43.73 -43.63 -43.63 6 -1.84 -3.78 -6.61 -10.36 -15.06 -20.74 -27.38 -35.01 -43.63 7 -1.84 -3.78 -6.61 -10.36 -15.06 -20.74 -27.38 -35.01 -43.62 log (1 − Φ(a)) -1.84 -3.78 -6.61 -10.36 -15.06 -20.74 -27.38 -35.01 -43.63 Using g (x) = exp(−(x − a))I[a,∞) (x). 12/51 So far I Rare events probabilities are important, I but not often available analytically. I Monte Carlo appears to offer a solution; I naı̈ve approaches fail due to their high variance. I Importance sampling can help dramatically; I but can be very difficult (i.e. impossible) to implement effectively. Next I Two classes of problems; I potential solutions in these settings. 13/51 Problem Formulation Here we consider algorithms which are applicable to two types of rare event, both of which are defined in terms of the canonical Markov chain: ! ∞ ∞ Y Y Ω= En , F = Fn , (Xn )n∈N , Pη0 , n=0 n=0 where the law Pη0 is defined by its finite dimensional distributions: −1 Pη0 ◦ X0:N (dx0:N ) = η0 (dx0 ) N Y Mi (xi−1 , dxi ). i=1 14/51 Static Rare Events We term the first type of rare events which we consider static rare events: I I They are defined as the probability that the first P + 1 elements of the canonical Markov chain lie in a rare set, T . That is, we are interested in Pη0 (x0:P ∈ T ) and the associated conditional distribution: Pη0 (x0:P ∈ dx0:P |x0:P ∈ T ) I We assume that the rare event is characterised as a level set of a suitable potential function: V : T → [V̂ , ∞), and V : E0:P \ T → (−∞, V̂ ). 15/51 A Simple Example: Normal Random Walks I A toy example: I I I I I So: I I η0 (x0 ) = N (x0 ; 0, 1). Mn (xn−1 , xn ) = N (xn ; xn−1 , 1). PP V (x0:P ) = i=0 xi . T = [V̂ , ∞). XP ∼ N (0, P + 1) √ P(X0:P ∈ T ) = P(XP ≥ V̂ ) = 1 − Φ(V̂ / P + 1) 16/51 A Slightly Harder Problem I I η0 (x0 ) = N (x0 ; 0, 1). Mn (xn−1 , xn ) = N (xn ; xn−1 , 1). I V (x0:P ) = max0≤i≤P xi . I T = [V̂ , ∞). 17/51 A Real Problem: Polarization Mode Dispersion This examples was considered by (3, 8). We have a sequence of polarization vectors, rn which evolve according to the equation: 1 rn = R(θn , φn )rn−1 + S(θn ) 2 where: I I I φn ∼ U[−π, π] cos(θn ) ∼ U[−1, 1], sn = sgn(θn ) ∼ U{−1, +1}, S(θ) = (cos(θ), sin(θ), 0) and R(θ, φ) is the matrix which describes a rotation of φ about axis S(θ). The rare events of interest are system trajectories for which |rP | > D where D is some threshold. 18/51 Feynman-Kac Formulæ and Interacting Particle Systems. . . or SMC [Cf. Anthony Lee’s Talk] I I Want a black box method. Using only: I I I I I Samples from η0 and Mn (xn−1 , cdot) Pointwise evaluation of V Assume spatial homogeneity and that V : x0:P 7→ V (xp ). Recall the core of an SMC algorithm, iteratively, sample: PN i i Gn−1 (Xn−1 )Mn (Xn−1 , ·) 1 N iid . Xn , . . . , Xn ∼ i=1 PN j j=1 Gn−1 (Xn−1 ) Provides unbiased, consistent estimates of: P−1 P−1 Y Y Eη0 G (Xp ) Eη0 G (Xp )ϕ(XP ) p=0 p=0 Cf. (4) for more details. 19/51 The Approach of Del Moral and Garnier One SMC approach to th static problem makes use of: I η0 and Mn (xn−1 , dxn ) from the original model I Gn (xn ) = exp(βV (xn )) en = ⊗N Ep : Or the more flexible path space formulation, E p=0 I ηe0 = η0 as before and M̃n (xn−1,0:n−1 , dxn,0:n ) = δxn−1,0:n−1 (dxn,0:n−1 )Mn (xn−1,n−1 dxn,n ) I and either Gn (x0:n ) = exp(βV (xn )) or Gn (x0:n ) = exp (α(V (xn ) − V (xn−1 ))) . I A black box but with parameters. 20/51 I Underlying Identity: P(XP ∈ V −1 ([V̂ , ∞)) =P(V (XP ) ≥ V̂ ) h i =E I[V̂ ,∞) (V (XP )) "p−1 p−1 Y Y =E G (Xn ) · I[V̂ ,∞) (V (XP )) n=0 I n=0 # 1 · G (Xn ) Basic Algorithm: I I iid Sample X01 , . . . , X0N ∼ η0 . Y01 , . . . , Y0N = 1. For p = 1 to P, for i = P 1 to N: I I I I I j Compute zp−1 = N1 N j=1 Gp−1 (Xp−1 ). P N j 1 Sample Aip ∼ Nzp−1 Gp−1 (Xp−1 )δj (·). ij=1 Ap i Sample Xp ∼ Mp Xp−1 , · . i Aip Ap−1 Set Ypi = Yp−1 /Gp−1 Xp−1 . Report: p̂ = P−1 Y p=0 zp · n 1 X I[V̂ ,∞) (Xpi )Ypi . N i=1 21/51 SMC Samplers Actually, SMC techniques can be used to sample from any sequence of distributions... I I Given a sequence of target distributions, {πn }, on measurable spaces (En , En ) Construct a synthetic sequence {π̃n } on the product spaces n N (Ep , Ep ) by introducing arbitrary auxiliary Markov kernels, p=0 Lp : Ep+1 ⊗ Ep → [0, 1]: π̃n (dx1:n ) = πn (dxn ) n−1 Y Lp (xp+1 , dxp ) , p=0 which each admit one of the target distributions as their final time marginal. 22/51 SMC Outline — One Iteration I (i) Given a sample {X1:n−1 }N i=1 targeting π̃n−1 , for i = 1 to N: I I (i) (i) sample Xn ∼ Kn (Xn−1 , ·), calculate (i) π̃n (X1:n ) (i) Wn (X1:n ) = (i) πn (Xn ) n−1 Q p=1 = (i) πn−1 (Xn−1 ) (i) = (i) (i) π̃n−1 (X1:n−1 )Kn (Xn−1 , Xn ) n−2 Q p=1 (i) (i) (i) (i) (i) Lp (Xp+1 , Xp )Kn (Xn−1 , Xn ) (i) (i) πn (Xn )Ln−1 (Xn , Xn−1 ) (i) (i) Lp (Xp+1 , Xp ) (i) (i) . πn−1 (Xn−1 )Kn (Xn−1 , Xn ) I (i) Resample, yielding: {X1:n }N i=1 targeting π̃n . 23/51 Alternative SMC Summary At each iteration, given a set of weighted samples (i) (i) {Xn−1 , Wn−1 }N i=1 targeting πn−1 : I I (i) (i) Sample Xn ∼ Kn (Xn−1 , ·). n o (i) (i) (i) N ∼ πn−1 (Xn−1 )Kn (Xn−1 , Xn ). (Xn−1 , Xn ), Wn−1 (i) Wn i=1 (i) πn (Xn )Ln−1 (Xn ,Xn−1 ) Wn−1 πn−1 (Xn−1 )Kn (Xn−1 ,Xn ) . I Set weights = n o (i) N (Xn−1 , Xn ), Wn ∼ πn (Xn )Ln−1 (Xn , Xn−1 ) and, i=1 n o(i) (i) (i) marginally, Xn , Wn ∼ πn . I Resample to obtain an unweighted particle set. I Hints that we’d like Ln−1 (xn , xn−1 ) = I i=1 )Kn (xn−1 ,xn ) Rπn−1 (xn−1 . 0 0 πn−1 (xn−1 )Kn (xn−1 ,xn ) 24/51 Key Points of SMC I An iterative technique for sampling from a sequence of similar distributions. I By use of intermediate distributions, we can obtain well behaved weighted samples from intractable distributions, I and estimate associated normalising constants. I Can be interpreted as a mean field approximation of a Feynman-Kac flow (2). 25/51 Static Rare Events: Path-space Approach I Begin by sampling a set of paths from the law of the Markov chain. I Iteratively obtain samples from a sequence of distributions which moves “smoothly” towards one which places the majority of its mass on the rare set. I We construct our sequence of distributions via a potential function and a sequence of inverse temperatures parameters: πt (dx0:P ) ∝ Pη0 (dx0:P )gt/T (x0:P ) −1 gθ (x0:p ) = 1 + exp −α(θ) V (x0:P ) − V̂ I Estimate the normalising constant of the final distribution and correct via importance sampling. 26/51 Path Sampling — An Alternative Approach to Estimating Normalizing Constants I An integral expression for the log normalising constant of sufficiently regular distributions. I Given a sequence of densities p(x|θ) = q(x|θ)/z(θ): d d log z(θ) = Eθ log q(·|θ) dθ dθ (?) where the expectation is taken with respect to p(·|θ). I Consequently, we obtain: Z 1 z(1) d log = Eθ log q(·|θ) z(0) dθ 0 I See ?? or (5) for details. Also (8, 12) in SMC context. In our case, we use our particle system to approximate both integrals. 27/51 Static Rare Events: Framework Initialization proceeds via importance sampling: At t = 0. for i = 1 to N do (i) Sample X0 ∼ ν for some importance distribution ν. N (i) P π (X0 ) (i) (j) Set W0 ∝ 0 (i) such that W0 = 1. ν(X1 ) j=1 end for 28/51 Samples are obtained from our sequence of distributions using SMC techniques. for t = 1 to T do n o (i) (i) N . if ESS < threshold then resample Wt−1 , Xt−1 i=1 If desired, apply a Markov kernel, K̃t−1 of invariant distribution πt−1 to improve sample diversity. for i = 1 to N do (i) (i) Sample Xt ∼ Kt (Xt−1 , ·). (i) Weight Wt (i) (i) ∝ Ŵt−1 wt where the incremental (i) importance weight, wt is defined through (i) (i) (i) N P πt (Xt )Lt−1 (Xt ,Xt−1 ) (i) (j) wt = Wt = (i) (i) (i) , and πt−1 (Xt−1 )Kt (Xt−1 ,Xt ) j=1 1. end for end for 29/51 Finally, an estimate can be obtained: Approximate the path sampling identity to estimate the normalising constant: " T # X Ê + Ê 1 t−1 t exp (α(t/T ) − α((t − 1)/T )) Zˆ1 = 2 2 t=1 N P Eˆt = (j) Wt j=1 (j) V Xt −V̂ (j) 1+exp αt V Xt −V̂ N P (j) Wt j=1 Estimate the rare event probability using importance sampling: N P p? = Ẑ1 j=1 (j) WT (j) (j) 1 + exp(α(1)(V XT − V̂ )) I(V̂ ,∞] V XT N P j=1 . (j) WT 30/51 Example: Gaussian Random Walk I I I A toy example: Mn (Rn−1 , Rn ) = N (Rn |Rn−1 , 1). T = [V̂ , ∞). Proposal kernel: Kn (Xn−1 , dxn ) = S X wn+1 (Xn−1 , Xn ) j=−S P Y δXn−1,i +ijδ (dxn,i ), i=1 where the weight of individual moves is given by wn (Xn−1 , Xn ) ∝ πn (Xn ). I Linear annealing schedule. I Number of distributions T ∝ V̂ 3/2 (T=2500 when V̂ = 25). 31/51 Gaussian Random Walk Example Results 0 True Values IPS(100) IPS(1,000) IPS(20,000) SMC(100) -10 log Probability -20 -30 -40 -50 -60 5 10 15 20 25 30 35 40 Threshold 32/51 Typical SMC Run -- All Particles 30 25 Markov Chain State Value 20 15 10 5 0 -5 0 2 4 6 8 10 Markov Chain State Number 12 14 16 33/51 Typical IPS Run -- Particles Which Hit The Rare Set 30 25 Markov Chain State Value 20 15 10 5 0 -5 0 2 4 6 8 10 Markov Chain State Number 12 14 16 34/51 Example: Polarization Mode Dispersion This examples was considered by (3). We have a sequence of polarization vectors, rn which evolve according to the equation: 1 rn = R(θn , φn )rn−1 + Ω(θn ) 2 where: I I I φn ∼ U[−π, π] cos(θn ) ∼ U[−1, 1], sn = sgn(θn ) ∼ U{−1, +1}, Ω(θ) = (cos(θ), sin(θ), 0) and R(θ, φ) is the matrix which describes a rotation of φ about axis Ω(θ). The rare events of interest are system trajectories for which |rP | > D where D is some threshold. 35/51 I We use a πn -invariant MCMC proposal for Kn and the associated time-reversal kernel for Ln−1 . I This leads to: Wn (Xn−1 , Xn ) = πn (Xn−1 ) πn−1 (Xn−1 ) allowing sampling and resampling to be exchanged (cf. (7)). I (Precisely, we employed a Metropolis-Hastings kernel with a proposal which randomly selects two indices uniformly between 1 and n and proposes replacing the φ and c values between those two indices with values drawn from the uniform distribution over [−π, π] × [−1, 1]. This proposal is then accepted with the usual Metropolis acceptance probability.) 36/51 Example: Polarization Mode Dispersion - The PDF PDF Estimates for PMD 1 IPS (Alpha=1) IPS (Alpha=3) SMC 0.01 1e-04 pdf 1e-06 1e-08 1e-10 1e-12 1e-14 0 1 2 3 4 5 6 7 8 9 DGD 37/51 Dynamic Rare Events The other class of rare events in which we are interested are termed dynamic rare events: I I They correspond to the probability that a Markov process hits some rare set, T , before its first entrance to some recurrent set R. That is, given the stopping time τ = inf {p : Xp ∈ T ∪ R}, we seek Pη0 (Xτ ∈ T ) and the associated conditional distribution: Pη0 (τ = t, X0:t ∈ dx0:t |Xτ ∈ T ) 38/51 Dynamic Rare Events: Illustration T R 39/51 Importance Sampling I Use a second Markov process on the same space: Ω= ∞ Y n=0 En , F = ∞ Y n=0 ! eηe Fn , (Xn )n∈N , P 0 , eηe is defined by its finite dimensional where the law P 0 distributions: eηe ◦ X −1 (dx0:N ) = ηe0 (dx0 ) P 0 0:N I N Y e i (xi−1 , dxi ). M i=1 So that: N Y dPη0 ◦ X0:N dη0 dMn (xn−1 , ·) (x0:N ) = (x0 ) (xn ) eηe ◦ X0:N e d ηe0 dP 0 p=1 d Mn (xn−1 , ·) 40/51 Importance Sampling I Then: Pη0 (Xτ ∈ T ) =Eη0 [IT (Xτ )] = ∞ X ∞ X Eη0 I{t} (τ )IT (Xt ) t=0 dP e = Eη0 I{t} (τ )IT (Xt ) (X0:∞ ) e dP t=0 ∞ X dP e e = Eη0 Eη0 I{t} (τ )IT (Xt ) (X0:∞ ) Ft e dP t=0 " # ∞ −1 X dP ◦ X 0:t e η I{t} (τ )IT (Xt ) = E (X ) 0 e ◦ X −1 0:t d P 0:t t=0 " # −1 dP ◦ X 0:τ e η IT (Xτ ) =E (X ) 0 e ◦ X −1 0:τ dP 0:τ I e n? But how do you choose ηe0 and M 41/51 Splitting I If V increases towards T , could consider: T1 = V −1 ([V1 , ∞)) ⊃ T2 = V −1 ([V2 , ∞)) ⊃ · · · ⊃ TL = T I and the non-decreasing first hitting times: τi = inf{t : Xt ∈ R ∪ Ti } I which yield the decomposition: P(Xτ ∈ T ) = P(Xτ1 ∈ T1 ) I L Y l=2 P(Xτl ∈ Tl |{Xτl−1 ∈ Tl−1 }) and the multilevel splitting method [dating back to (9)]. 42/51 Multilevel Splitting V(Xt) V̂ V2 V1 V0 t 43/51 A Simple Splitting Algorithm I 1 , . . . , X N1 iid from P. Sample X1:τ 1 1:τ 1 I i ) = I (X i ). Let S = Compute G (X1:τ 1 T1 i τi 1 1 1 I 1 i i=1 G (X1:τ1i ). For l = 2 to L: I I Set Nl = rSl−1 . i Let (X̂τ̂i l−1 )i=1:Nl comprise r copies of each Xτl−1 ∈ Tl−1 . i ∼ P(·|{Xτi l−1 = X̂τ̂l−1 }). PN1 i i ) = ITl (Xτ i ). Let Sl = i=1 G (X1:τ i ). :τ i I For i = 1, . . . , Nl : Sample Xτ̂i i I Compute G (Xτ̂i i i l−1 :τl l−1 I PN1 l l l Compute \ P(X τ ∈T)= L Y Sl . Nl l=1 44/51 Adaptive Multilevel Splitting 1 — Choosing rl I I Describes a Branching Process. If E[rSl ] 6= Nl essentially: I I Particle system dies eventually. Nl grows exponentially fast. I See (6, 10) for some preliminary analysis. I See (11) for two-stage schemes in which a preliminary run specifies rL . 45/51 Adaptive Multilevel Splitting 2 — Choosing the Levels I I Let τ1i = inf{t : Xt ∈ R}. 1 , . . . , X N1 iid from P. Sample X1:τ i 1:τ i 1 I 1 (bαN1 c) Compute X̌1i = max{X1i , . . . , Xτ i }. Set V1 = X̌1 1 I . While Vl < V̂ . l ← l + 1. I I I Set Nl = N1 . i Let (X̂τ̂i l−1 )i=1:Nl comprise r copies of each Xτl−1 which reached Vl−1 up to the time when it reached it. i }). For i = 1, . . . , Nl : Sample Xτ̂i i :τ i ∼ P(·|{Xτi l−1 = X̂τ̂l−1 l−1 I l With τli = inf{t : Xt ∈ R}. (bαNl c) Compute X̌li = max{Xτ̂i i , . . . , Xτli }. Set Vl = X̌l . l−1 I Compute N l−1 1 \ P(X · I(τ i < τli ) τ ∈ T ) = (1 − α) N i=1 T i , . . . , τi} : Xi ∈ T } where τTi = inf{t ∈ {τ̂l−1 t l Related adaptive methods can be unbiased (1). 46/51 Sequential Monte Carlo: Interacting Particle Systems I 1 , . . . , X N iid from P. Sample X1:τ 1 1:τ 1 I Compute G1 (Xτi i ) = IT1 (Xτi i ). Let S1 = 1 1 1 For l = 2 to L: I Sample (X̂τ̂i l−1 )i=1:Nl iid from 1 Sl−1 Xτ̂i i :τ i l−1 l I For i = 1, . . . , N: Sample I Compute Gl (Xτi i ) = ITl (Xτi i ). Let l I i i=1 G1 (Xτ i ). 1 1 I PN l PN j=1 Gl−1 (Xτj l−1 )δX τ j l−1 i ∼ P(·|{Xτi l−1 = X̂τ̂l−1 }). PN1 Sl = i=1 Gl (Xτi i ). l Compute \ P(X τ ∈T)= L Y Sl . Nl l=1 47/51 [1] C.-E. Bréhier, M. Gazeau, L. Goudenège, T. Lelièvre, and M. Rousset. Unbiasedness of some generalized adaptive multilevel splitting schemes. Technical Report 1505.02674, ArXiv, 2015. [2] P. Del Moral. Feynman-Kac formulae: genealogical and interacting particle systems with applications. Probability and Its Applications. Springer Verlag, New York, 2004. [3] P. Del Moral and J. Garnier. Genealogical particle analysis of rare events. Annals of Applied Probability, 15(4):2496–2534, November 2005. [4] A. Doucet and A. M. Johansen. A tutorial on particle filtering and smoothing: Fiteen years later. In D. Crisan and B. Rozovsky, editors, The Oxford Handbook of Nonlinear Filtering, pages 656–704. Oxford University Press, 2011. 48/51 [5] A. Gelman and X.-L. Meng. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 13(2):163–185, 1998. [6] P. Glasserman, P. Heidelberger, P.Shahabuddin, and T. Zajic. Multilevel splitting for estimating rare event probabilities. Operations Research, 47(4):585–600, 1999. [7] A. M. Johansen and A. Doucet. Auxiliary variable sequential Monte Carlo methods. Research Report 07:09, University of Bristol, Department of Mathematics – Statistics Group, University Walk, Bristol, BS8 1TW, UK, July 2007. A substantially shorter version focusing on particle filtering appeared in Statistics and Probability Letters. [8] A. M. Johansen, P. Del Moral, and A. Doucet. Sequential Monte Carlo samplers for rare events. In Proceedings of the 6th International Workshop on Rare Event Simulation, pages 256–267, Bamberg, Germany, October 2006. 49/51 [9] H. Kahn and T. E. Harris. Estimation of particle transmission by random sampling. National Bureau of Standards Applied Mathematics Series, 12:27–30, 1951. [10] A. Lagnoux. Rare event simulation. Probability in the Engineering and Informational Sciences, 20(1):43–66, 2006. [11] A. Lagnoux. A two-step branching splitting model under cost constraint. Journal of Applied Probability, 46(2):429–452, 2009. [12] Y. Zhou, A. M. Johansen, and J. A. D. Aston. Towards automatic model comparison: An adaptive sequential Monte Carlo approach. Journal of Computational and Graphical Statistics, 2015. In press. 50/51 Path Sampling Identity Given a probability density, p(x|θ) = q(x|θ)/z(θ): 1 ∂ ∂ log z(θ) = z(θ) ∂θ z(θ) ∂θ Z 1 ∂ = q(x|θ)dx z(θ) ∂θ Z 1 ∂ = q(x|θ)dx (??) z(θ) ∂θ Z p(x|θ) ∂ = q(x|θ)dx q(x|θ) ∂θ Z ∂ ∂ = p(x|θ) log q(x|θ)dx = Ep(·|θ) log q(·|θ) ∂θ ∂θ wherever ?? is permissible. Back to ?. 51/51