Local Sequential Monte Carlo Adam M. Johansen and Arnaud Doucet University of Warwick a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/ March 28th, 2011 Lille Workshop on Filtering, MCMC and ABC 1 Outline I Background I I I I Local SMC I I I Hidden Markov Models / State Space Models Particle Filters / Sequential Monte Carlo Block Sampling Motivation Formulation Examples I I A Toy Linear Gaussian Model Stochastic Volatility 2 The Structure of the Problem 30 25 20 sy/km 15 10 5 0 −5 −10 0 5 10 15 20 25 30 35 sx/km 3 Hidden Markov Models / State Space Models x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 I Unobserved Markov chain {Xn } transition f . I Observed process {Yn } conditional density g. I Density: p(x1:n , y1:n ) = f1 (x1 )g(y1 |x1 ) n Y f (xi |xi−1 )g(yi |xi ). i=2 4 Motivating Examples I Tracking, e.g. ACV Model: I I I States: xn = [sxn uxn syn uyn ]T Dynamics: xn = Axn−1 + n x 1 ∆t 0 0 sn uxn 0 1 0 0 y = sn 0 0 1 ∆t uyn 0 0 0 1 sxn−1 uxn−1 y sn−1 + n uyn−1 Observation: yn = Bxn + νn sxn uxn y + νn sn uyn I rnx rny = 1 0 0 0 0 1 0 0 Stochastic Volatility, e.g.: f (xi |xi−1 ) =N φxi−1 , σ 2 g(yi |xi ) =N 0, β 2 exp(xi ) 5 Formal Solutions I Filtering: Prediction and Update Recursions: Z p(xn |y1:n−1 ) = p(xn−1 |y1:n−1 )f (xn |xn−1 )dxn−1 p(xn |y1:n ) = R I p(xn |y1:n−1 )g(yn |xn ) p(x0n |y1:n−1 )g(yn |x0n )dx0n Smoothing: p(x1:n |y1:n ) = R p(x1:n−1 |y1:n−1 )f (xn |xn−1 )g(yn |xn ) g(yn |x0n )f (x0n |xn−1 )p(x0n−1 |y1:n−1 )dx0n−1:n 6 The Monte Carlo Method I Given a probability density, p, Z I= ϕ(x)p(x)dx E I Simple Monte Carlo solution: I I iid Sample X1 , . . . , XN ∼ p. N P Estimate Ib = N1 ϕ(Xi ). i=1 I Can also be viewed as approximating π(dx) = p(x)dx with π bN (dx) = N 1 X δXi (dx). N i=1 7 Importance Sampling I Given q, such that I I p(x) > 0 ⇒ q(x) > 0 and p(x)/q(x) < ∞, define w(x) = p(x)/q(x) and: Z Z I = ϕ(x)p(x)dx = ϕ(x)w(x)q(x)dx. I This suggests the estimator: I I iid Sample X1 , . . . , XN ∼ q. N P w(Xi )ϕ(Xi ). Estimate Ib = N1 i=1 I Can also be viewed as approximating π(dx) = p(x)dx with π bN (dx) = N 1 X w(Xi )δXi (dx). N i=1 8 Self-Normalised Importance Sampling I Often, p is known only up to a normalising constant. I As Eq (Cwϕ) = CEp (ϕ). . . I If v(x) = Cw(x), then Eq (Cwϕ) CEp (ϕ) Eq (vϕ) = = = Ep (ϕ). Eq (v1) Eq (Cw1) CEp (1) I Estimate the numerator and denominator with the same sample: N P v(Xi )ϕ(Xi ) i=1 b I= . N P v(Xi ) i=1 9 Importance Sampling in The HMM Setting I Given p(x1:n |y1:n ) for n = 1, 2, . . . . I Choose qn (x1:n ) = qn (xn |x1:n−1 )qn−1 (x1:n−1 ). I Weight: p(x1:n |y1:n ) qn (xn |x1:n−1 )qn−1 (x1:n−1 ) p(x1:n |y1:n ) = wn−1 (x1:n−1 ) qn (xn |x1:n−1 )p(x1:n−1 |y1:n−1 ) f (xn |xn−1 )g(yn |xn ) ∝ wn−1 (x1:n−1 ) qn (xn |xn−1 ) wn (x1:n ) ∝ 10 Sequential Importance Sampling – Prediction & Update I A first “particle filter”: I I Simple default: qn (xn |xn−1 ) = f (xn |xn−1 ). Importance weighting becomes: wn (x1:n ) = wn−1 (x1:n−1 ) × g(yn |xn ) I Algorithmically, at iteration n: I i i } for i = 1, . . . , N : , X1:n−1 Given {Wn−1 I I I i Sample Xni ∼ f (·|Xn−1 ) (prediction) i Weight Wni ∝ Wn−1 g(yn |Xni ) (update) Actually: I I Better proposals exist. . . but even they aren’t good enough. 11 Resampling I I Stabilisation of importance weights. Given {Wni , Xni }: I I I eni } such that: Draw {X " # P n n W i ϕ(X i ) 1 X ei j j n Pn n i n ϕ(Xn ) σ({Wn , Xn }j=1 ) = i=1 E n i=1 i=1 Wn 1 ei N Replace {Wni , Xni }N i=1 with { N , Xn }i=1 . Simplest approach (multinomial) resampling: Pn j j=1 Wn δXnj i iid e Xn ∼ P j j=1n Wn I Lower variance options preferable. 12 Sequential Importance Resampling I Algorithmically, at iteration n: I I I i i } for i = 1, . . . , N : , X1:n−1 Given {Wn−1 ei Resample, obtaining {1/N, X 1:n−1 }. I i en−1 Sample Xni ∼ qn (·|X ) I Weight Wni ∝ i ei i f (Xn |Xn−1 )g(yn |Xn ) ei qn (X i |X ) n n−1 Actually: I I Resample efficiently. Only resample when necessary. 13 Iteration 2 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 14 Iteration 3 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 15 Iteration 4 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 16 Iteration 5 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 17 Iteration 6 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 18 Iteration 7 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 19 Iteration 8 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 20 Iteration 9 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 21 Iteration 10 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 22 Block Sampling: An Idealised Approach At time n, given x1:n−1 ; discard xn−L+1:n−1 : I Sample from q(xn−L+1:n |xn−L , yn−L+1:n ). I Weight with W (x1:n ) = I p(x1:n |y1:n ) p(x1:n−L |y1:n−1 )q(xn−L+1:n |xn−L , y1:n−L+1:n ) Optimally, q(xn−L+1:n |xn−L , yn−L+1:n ) =p(xn−L+1:n |xn−L , yn−L+1:n ) p(x1:n−L |y1:n ) W (x1:n ) ∝ =p(yn |x1:n−L , yn−L+1:n−1 ) p(x1:n−L |y1:n−1 ) I Typically intractable; auxiliary variable approach in [4]. 23 Particle MCMC I I MCMC algorithms which employ SMC proposals [1] SMC algorithm as a collection of RVs I I I I Values Weights Ancestral Lines Construct MCMC algorithms: I I I With many auxiliary variables Exactly invariant for distribution on extended space Standard MCMC arguments justify strategy I Does this suggest anything about SMC? I Can something similar help with smoothing? 24 Ancestral Trees t=1 t=2 t=3 a13 =1 a43 =3 a12 =1 a42 =3 b23,1:3 =(1, 1, 2) b43,1:3 =(3, 3, 4) b63,1:3 =(4, 5, 6) 25 SMC Distributions We’ll need: M ψn,L an−L+2:n , xn−L+1:n , k; xn−L # " "M # n M Y Y Y i i i ap r(k|wn ) r(ap |wp−1 ) q xp |xp−1 = q( xn−L+1 xn−L ) i=1 p=n−L+2 i=1 and k k M en−L+2:n en−L+1:n ψen,L a ,x ; xn−L ebkn−L+1:n−1 , k, x ekn−L+1:n M en−L+1:n , k; xn−L ) ψn,L (e an−L+2:n , x " = # ebk n k n e e Q bn,n−L+1 b n,p−1 e p−1 q x e n) q x en−L+1 |xn−L r ebkn,p |w xp−1 epn,p |e r(k|w p=n−L+2 26 Toy Model: Linear Gaussian HMM I Linear, Gaussian state transition: f (xt |xt−1 ) = N (xt ; xt−1 , 1) I and likelihood g(yt |xt ) = N (yt ; xt , 1) I Analytically: Kalman filter/smoother/etc. I Simple bootstrap PF: I Proposal: q(xt |xt−1 , yt ) = f (xt |xt−1 ) I Weighting: W (xt−1 , xt ) ∝ g(yt |xt ) I Resample residually every iteration. 27 More than one SMC Algorithm? I Standard approach: I I Run an SIR algorithm with N particles. Use N X i πnN (dx1:n ) = Wni δX1:n (dx1:n ). i=1 I A crude alternative: I I Run L = bN/M c algorithms with M particles. Use M X Wnl,i δX l,i (dx1:n ). πnM,l (dx1:n ) = 1:n i=1 I I Guarantees L i.i.d. samples. For small M their distribution may be poor. 28 Covariance Estimation: 1d Linear Gaussian Model 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 100 200 300 400 500 600 700 800 900 1000 29 Local Particle Filtering: Current Trajectories 4 3 2 1 0 −1 −2 −3 −4 0 2 4 6 8 10 12 14 16 30 Local Particle Filtering: First Particle 4 3 2 1 0 −1 −2 −3 −4 0 2 4 6 8 10 12 14 16 31 Local Particle Filtering: SMC Proposal 4 3 2 1 0 −1 −2 −3 −4 0 2 4 6 8 10 12 14 16 32 Local Particle Filtering: CSMC Auxiliary Proposal 4 3 2 1 0 −1 −2 −3 −4 0 2 4 6 8 10 12 14 16 33 Local SMC I Propose from: ⊗n−1 M U1:M (b1:n−2 , k)p(x1:n−1 |y1:n−1 )ψn,L (an−L+2:n , xn−L+1:n , k; xn−L ) k k M e en−L+2:n−1 , x en−L+1:n−1 ; xn−L ||bn−L+2:n−1 , xn−L+1:n−1 ψn−1,L−1 a I Target: k̄ b n,n−L+1:n ⊗n U1:M (b1:n−L , b̄k̄n,n−L+1:n−1 , k̄)p(x1:n−L , xn−L+1:n |y1:n ) k̄ bn,n−L+1:n k k M k̄ e ψn,L an−L+2:n , xn−L+1:n ; xn−L b̄n,n−L+1:n , xn−L+1:n M en−L+1:n−1 , k; xn−L ) . ψn−1,L−1 (e an−L+2:n−1 , x I Weight: Z̄n−L+1:n /Zen−L+1:n−1 . 34 Key Identity M ψn,L (an−L+2:n , xn−L+1:n , k; xn−L ) k M (a k p(xn−L+1:n |xn−L , yn−L+1:n )ψen,L n−L+2:n , xn−L+1:n , k; xn−L ||. . . ) " k k # n Q bn,n−L+1 bn,p bn n,p−1 k q xn−L+1 |xn−L r bn,p |wp−1 q xp |xp−1 r(k|wn ) = p=n−L+2 p(xn−L+1:n |xn−L , yn−L+1:n ) b =Zn−L+1:n /p(yn−L+1:n |xn−L ) 35 Bootstrap Local SMC I Top Level: I I I Local SMC proposal. Stratified resampling when ESS< N/2. Local SMC Proposal: I Proposal: q(xt |xt−1 , yt ) = f (xt |xt−1 ) I Weighting: W (xt−1 , xt ) ∝ I f (xt |xt−1 )g(yt |xt ) = g(yt |xt ) f (xt |xt−1 ) Resample multinomially every iteration. 36 Bootstrap Local SMC: M=100 N = 100, M = 100 100 Average Number of Unique Values 90 80 70 L=2 L=3 L=4 L=5 60 50 40 30 20 10 0 0 100 200 300 400 500 n 600 700 800 900 1000 37 Bootstrap Local SMC: M=1000 N = 100, M = 1000 100 Average Number of Unique Values 90 80 70 L=2 L=3 L=4 L=5 60 50 40 30 20 10 0 0 100 200 300 400 500 n 600 700 800 900 1000 38 Bootstrap Local SMC: M=10000 N = 100, M = 10000 100 Average Number of Unique Values 90 80 70 L=2 L=3 L=4 L=5 60 50 40 30 20 10 0 0 100 200 300 400 500 n 600 700 800 900 1000 39 Tuned Local SMC I Top Level: I I I Local SMC proposal. Stratified resampling when ESS< N/2. Local SMC Proposal: I Proposal: q(xt |xt−1 , yt ) = p(xt |xt−1 , yt ) I Weighting: W (xt−1 , xt ) ∝ p(yt |xt−1 ) I Resample residually every iteration. 40 Tuned Local SMC: M=100 N = 100, M = 100 100 Average Number of Unique Values 90 80 70 L=2 L=3 L=4 L=5 60 50 40 30 20 10 0 0 100 200 300 400 500 n 600 700 800 900 1000 41 Tuned Local SMC: M=1000 N = 100, M = 1000 100 Average Number of Unique Values 90 80 70 L=2 L=3 L=4 L=5 60 50 40 30 20 10 0 0 100 200 300 400 500 n 600 700 800 900 1000 42 Tuned Local SMC: M=10000 N = 100, M = 10000 100 Average Number of Unique Values 90 80 70 L=2 L=3 L=4 L=5 60 50 40 30 20 10 0 0 100 200 300 400 500 n 600 700 800 900 1000 43 Optimal Block Sampling N = 100, Exact Block Sampling 100 90 80 L=2 L=3 L=4 L=5 70 60 50 40 30 20 10 0 0 100 200 300 400 500 600 700 800 900 1000 44 Stochastic Volatility Bootstrap Local SMC I Model: f (xi |xi−1 ) =N φxi−1 , σ 2 g(yi |xi ) =N 0, β 2 exp(xi ) I Top Level: I I I Local SMC proposal. Stratified resampling when ESS< N/2. Local SMC Proposal: I Proposal: q(xt |xt−1 , yt ) = f (xt |xt−1 ) I Weighting: W (xt−1 , xt ) ∝ I f (xt |xt−1 )g(yt |xt ) = g(yt |xt ) f (xt |xt−1 ) Resample residually every iteration. 45 SV Simulated Data Simulated Data 4 3 Observed Values 2 1 0 −1 −2 −3 0 100 200 300 400 500 600 700 800 900 n 46 SV Bootstrap Local SMC: M=100 N = 100, M = 100 100 Average Number of Unique Values 90 80 70 L=2 L=4 L=6 L = 10 60 50 40 30 20 10 0 0 100 200 300 400 500 600 700 800 900 n 47 SV Bootstrap Local SMC: M=1000 N = 100, M = 1000 100 Average Number of Unique Values 90 80 70 60 L=2 L=4 L=6 L = 10 L = 14 L = 18 50 40 30 20 10 0 0 100 200 300 400 500 600 700 800 900 n 48 SV Bootstrap Local SMC: M=10000 N = 100, M = 10000 100 Average Number of Unique Values 90 80 70 60 50 L=2 L=4 L=6 L = 10 L = 14 L = 18 L = 22 L = 26 40 30 20 10 0 0 100 200 300 400 500 600 700 800 900 n 49 SV Exchange Rata Data Exchange Rate Data 5 4 Observed Values 3 2 1 0 −1 −2 −3 −4 0 100 200 300 400 500 600 700 800 900 n 50 SV Bootstrap Local SMC: M=100 N = 100, M = 100 100 Average Number of Unique Values 90 80 70 L=2 L=4 L=6 L = 10 60 50 40 30 20 10 0 0 100 200 300 400 500 600 700 800 900 n 51 SV Bootstrap Local SMC: M=1000 N = 100, M = 1000 100 Average Number of Unique Values 90 80 70 60 L=2 L=4 L=6 L = 10 L = 14 L = 18 50 40 30 20 10 0 0 100 200 300 400 500 600 700 800 900 n 52 SV Bootstrap Local SMC: M=10000 N=100, M=10,000 100 Average Number of Unique Values 90 80 70 60 50 L=2 L=4 L=6 L = 10 L = 14 L = 18 L = 22 L = 26 40 30 20 10 0 0 100 200 300 400 500 600 700 800 900 n 53 In Conclusion I SMC can be used hierarchically. I Software implementation is not difficult [5]. Optimal block sampling can be approximated well: I I I I Little specific tuning is required. Minimizes need for resampling. Robustness to outliers. I The computational cost of this strategy is rather high. I Parallel implementations are natural. I Actually, similar techniques apply elsewhere [3, 2]. 54 References [1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of the Royal Statistical Society B, 72(3): 269–342, 2010. [2] N. Chopin, P. Jacob, and O. Papaspiliopoulos. SMC2 . ArXiv Mathematics e-prints, 1101.1528, January 2011. [3] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society B, 63(3): 411–436, 2006. [4] A. Doucet, M. Briers, and S. Sénécal. Efficient block sampling strategies for sequential Monte Carlo methods. Journal of Computational and Graphical Statistics, 15(3):693–711, 2006. [5] A. M. Johansen. SMCTC: Sequential Monte Carlo in C++. Journal of Statistical Software, 30(6):1–41, April 2009. Thanks for Listening. Any questions? 55