Exact Approximation of Rao-Blackwellized Particle Filters Adam M. Johansen? , Nick Whiteley† and Arnaud Doucet‡ ? University of Warwick Department of Statistics, of Bristol School of Mathematics, ‡ University of Oxford Department of Statistics † University a.m.johansen@warwick.ac.uk http://go.warwick.ac.uk/amjohansen/talks February 21st, 2013 Séminaire BIG’MC, Paris 1 Context Filtering in State-Space Models: I SIR Particle Filters [GSS93] I Rao-Blackwellized Particle Filters [AD02, CL00] Exact Approximation of Monte Carlo Algorithms: I Particle MCMC [ADH10] I SMC2 [CJP11] I Local SMC [JD13] Approximating the RBPF I Approximated Rao-Blackwellized Particle Filters [CSOL11] This: EA-RBPFs [JWD12] 2 A (Rather Broad) Class of Hidden Markov Models y1 x1 z1 x2 z2 x3 y2 z3 y3 I I I Unobserved Markov chain {(Xn , Zn )} transition f . Observed process {Yn } conditional density g . Density: p(x1:n , z1:n , y1:n ) = f1 (x1 , z1 )g (y1 |x1 , z1 ) n Y f (xi , zi |xi−1 , zi−1 )g (yi |xi , zi ). i=2 3 Formal Solutions I Filtering and Prediction Recursions: p(xn , zn |y1:n−1 )g (yn |xn , zn ) p(xn , zn |y1:n ) = R p(xn0 , zn0 |y1:n−1 )g (yn |xn0 , zn0 )d(xn0 , zn0 ) Z p(xn+1 , zn+1 |y1:n ) = p(xn , zn |y1:n )f (xn+1 , zn+1 |xn , zn )d(xn , zn ) I Smoothing: p((x, z)1:n |y1:n ) ∝ p((x, z)1:n−1 |y1:n−1 )f ((x, z)n |(x, z)n−1 )g (yn |(x, z)n ) 4 Importance Sampling I Given q, such that I I p(x) > 0 ⇒ q(x) > 0 and p(x)/q(x) < ∞, define w (x) = p(x)/q(x) and: Z Z I = ϕ(x)p(x)dx = ϕ(x)w (x)q(x)dx. I This suggests the estimator: I I iid Sample X1 , . . . , XN ∼ q. N P w (Xi )ϕ(Xi ). Estimate bI = N1 i=1 I Can also be viewed as approximating π(dx) = p(x)dx with π bN (dx) = N 1 X w (Xi )δXi (dx). N i=1 5 Self-Normalised Importance Sampling I Often, p is known only up to a normalising constant. I As Eq (Cw ϕ) = C Ep (ϕ). . . I If v (x) = Cw (x), then Eq (Cw ϕ) C Ep (ϕ) Eq (v ϕ) = = = Ep (ϕ). Eq (v 1) Eq (Cw 1) C Ep (1) I Estimate the numerator and denominator with the same sample: N P v (Xi )ϕ(Xi ) bI = i=1 . N P v (Xi ) i=1 6 Importance Sampling in The HMM Setting I Given p((x, z)1:n |y1:n ) for n = 1, 2, . . . . I Choose qn ((x, z)1:n ) = qn ((x, z)n |(x, z)1:n−1 )qn−1 ((x, z)1:n−1 ). I Weight: p((x, z)1:n |y1:n ) qn ((x, z)n |(x, z)1:n−1 )qn−1 ((x, z)1:n−1 ) p((x, z)1:n |y1:n )wn−1 ((x, z)1:n−1 ) = qn ((x, z)n |(x, z)1:n−1 )p((x, z)1:n−1 |y1:n−1 ) f ((x, z)n |(x, z)n−1 )g (yn |(x, z)n ) ∝ wn−1 ((x, z)1:n−1 ) qn ((x, z)n |(x, z)n−1 ) wn ((x, z)1:n ) ∝ 7 Sequential Importance Sampling – Prediction & Update I A first “particle filter”: I I Simple default: qn ((x, z)n |(x, z)n−1 ) = f ((x, z)n |(x, z)n−1 ). Importance weighting becomes: wn ((x, z)1:n ) = wn−1 ((x, z)1:n−1 ) × g (yn |(x, z)n ) I Algorithmically, at iteration n: I i Given {Wn−1 , (X , Z )i1:n−1 } for i = 1, . . . , N: I I I Sample (X , Z )in ∼ f (·|(X , Z )in−1 ) (prediction) i Weight Wni ∝ Wn−1 g (yn |(X , Z )in ) (update) Actually: I I Better proposals exist. . . but even they aren’t good enough. 8 A Simple SIR Filter Algorithmically, at iteration n: I I i Given {Wn−1 , (X , Z )i1:n−1 } for i = 1, . . . , N: e, Z e )i }. Resample, obtaining { 1 , (X 1:n−1 N I I Sample (X , Z )in e, Z e )i ) ∼ qn (·|(X n−1 Weight Wni ∝ e ,Z e )i )g (yn |(X ,Z )i ) f ((X ,Z )in |(X n n−1 e ,Z e )i ) qn ((X ,Z )i |(X n n−1 Actually: I Resample efficiently. I Only resample when necessary. I ... 9 A Rao-Blackwellized SIR Filter Algorithmically, at iteration n: I X ,i i i Given {Wn−1 , (X1:n−1 , p(z1:n−1 |X1:n−1 , y1:n−1 )} 1 ei ei , p(z1:n−1 |X Resample, obtaining { , (X I For i = 1, . . . , N: I 1:n−1 N I I 1:n−1 , y1:n−1 ))}. ei ) Sample Xni ∼ qn (·|X n−1 i i i e Set X1:n ← (X , 1:n−1 Xn ). ei ) p(Xni ,yn |X n−1 i e qn (X |X i ) I Weight WnX ,i ∝ I i Compute p(z1:n |y1:n , X1:n ). n n−1 Requires analytically tractable substructure. 10 An Approximate Rao-Blackwellized SIR Filter Algorithmically, at iteration n: I I I X ,i i i Given {Wn−1 , (X1:n−1 ,b p (z1:n−1 |X1:n−1 , y1:n−1 )} 1 ei ei ,b p (z1:n−1 |X Resample, obtaining { , (X 1:n−1 N 1:n−1 , y1:n−1 ))}. For i = 1, . . . , N: I I ei ) Sample Xni ∼ qn (·|X n−1 i i i e Set X1:n ← (X , 1:n−1 Xn ). ei ) b p (Xni ,yn |X n−1 i e qn (X |X i ) I Weight WnX ,i ∝ I i Compute b p (z1:n |y1:n , X1:n ). n n−1 Is approximate; how does error accumulate? 11 Exactly Approximated Rao-Blackwellized SIR Filter At time n = 1 I I I Sample, X1i ∼ q x ( ·| y1 ). Sample, Z1i,j ∼ q z ·| X1i , y1 . Compute and normalise the local weights w1z X1i , Z1i,j p(X i , y1 , Z1i,j ) , W1z,i,j := 1 i,j i z q Z1 X1 , y1 define b p (X1i , y1 ) := := w1z X1i , Z1i,j , PM z X i , Z i,k w 1 1 k=1 1 M 1 X z i i,j w1 X1 , Z1 . M j=1 I Compute and normalise the top-level weights w1x X1i w1x X1i b p (X1i , y1 ) x,i , W1 := PN := x . x k q X1i |y1 k=1 w1 X1 12 At times n ≥ 2 I Resample x,i Wn−1 , n o z,i,j i,j i X1:n−1 , Wn−1 , Z1:n−1 j i to obtain I I I n o 1 z,i,j i,j ei , , X W , Z . 1:n−1 n−1 1:n−1 N j i z,i,j i,j 1 e i,j Resample {W n−1 , Z 1:n−1 }j to obtain { M , Z1:n−1 }j . i i ei ei Sample Xni ∼ q x (·|X 1:n−1 , y1:n ); set X1:n := (X1:n−1 , Xn ). e i,j Sample Zni,j ∼ q z ·| X i , y1:n , Z ; set 1:n i,j Z1:n 1:n−1 e i,j , Zni,j ). := (Z 1:n−1 13 I Compute and normalise the local weights ei , Z e i,j p Xni , yn , Zni,j X n−1 n−1 i,j i , wnz X1:n , Z1:n := i,j i,j e q z Zn X i , y1:n , Z 1:n ei b p ( Xni , yn X 1:n−1 , y1:n−1 ) : = Wnz,i,j : = I 1:n−1 M 1 X z i i,j wn X1:n , Z1:n , M j=1 i , Z i,j wnz X1:n 1:n . PM z X i , Z i,k w 1:n 1:n k=1 n Compute and normalise the top-level weights ei b p ( Xni , yn X 1:n−1 , y1:n−1 ) i wnx X1:n := , ei q x (Xni |X , y ) 1:n 1:n−1 x Xi w n 1:n x,i Wn := PN . k x k=1 wn X1:n 14 How can this be justified? I As an extended space SIR algorithm. I Via unbiased estimation arguments. Note also the M = 1 and M → ∞ cases. How does this differ from CSOL11? Principally in the local weights, benefits including: I Valid (N-consistent) for all M ≥ 1 rather than (M, N)-consistent. I Computational cost O(MN) rather than O(M 2 N). I Only requires knowledge of joint behaviour of x or z; doesn’t require say p(xn |xn−1 , zn−1 ). 15 Ancestral Lines n=1 n=2 n=3 a31 =1 a34 =3 a21 =1 a24 =3 2 4 6 b3,1:3 =(1, 1, 2) b3,1:3 =(3, 3, 4) b3,1:3 =(4, 5, 6) 16 Importance Sampling Intepretation – Proposal Neglecting resampling at the top level, the time n proposal is: z,1:M 1:M , z1:n qn x1:n , a1:n−1 y1:n = z,1:M 1:M , z1:n qn ( x1:n | y1:n ) qn a1:n−1 x1:n , y1:n where qn ( x1:n | y1:n ) = q x ( x1 | y1 ) n Y q x ( xm | x1:m−1 , y1:m ) , m=2 and assuming multinomialresampling locally z,1:M 1:M x , y qn a1:n−1 , z1:n 1:n 1:n becomes M Q j=1 ( q z z1j x1 , y1 n Y z,az,j Wm−1m−1 q z j zm z,j am−1 x1:m , y1:m , z1:m−1 ) . m=2 17 Importance Sampling Interpretation — Weights and Target The importance weights are: n b(x1 , y1 ) Y p b ( xm , ym | x1:m−1 , y1:m−1 ) p z,1:M 1:M wn x1:n , a1:n−1 , , z1:n = x q (x1 |y1 ) q x (xm |x1:m−1 , y1:m ) m=2 implying the target distribution z,1:M 1:M , z1:n ∝ πn x1:n , a1:n−1 z,1:M z,1:M 1:M 1:M , z1:n , z1:n wn x1:n , a1:n−1 qn x1:n , a1:n−1 y1:n z,1:M 1:M b(x1:n , y1:n )qn a1:n−1 x1:n , y1:n ∝ p , z1:n where b p (x1:n , y1:n ) := b p (x1 , y1 ) n Y b p ( xm , ym | x1:m−1 , y1:m−1 ) m=2 18 Importance Sampling Interpretation — Marginal Target It is now well-known that the marginal likelihood estimate provided by a particle filter is unbiased; i.e. X Z z,1:M 1:M 1:M b p (x1:n , y1:n ) qn a1:n−1 , z1:n y1:n , x1:n dz1:n = p (x1:n , y1:n ) . z,1:M a1:n−1 It follows straightforwardly that the marginal target is: X Z z,1:M 1:M 1:M πn (x1:n ) = , z1:n dz1:n = p ( x1:n | y1:n ) . πn x1:n , a1:n−1 z,1:M a1:n−1 19 A Joint Target Distribution I A little rearrangement yields: z,1:M 1:M = πn x1:n , a1:n−1 , z1:n p( x1:n |y1:n ) n−1 M j 1 z zk x , y ×M p z x , y q 1 1 1 1:n 1:n 1:n j=1 k=1,k6=b1j ( ) z,k z,k n M Q Q am−1 am−1 k x × Wm−1 q z zm , y , z 1:m 1:m 1:m−1 PM m=2 M Q j k=1,k6=bm [Easily verified by division by the joint proposal.] 20 A Joint Target Distribution II Introducing an additional discrete random variable L ∈ {1, ..., M} such that 1:M 1:M πn x1:n , a1:n−1 , l, z1:n = M l Y p x1:n , z1:n y1:n z k q z x , y 1 1 1 Mn k=1,k6=b1j n M z,k Y Y k am−1 am−1 z k . × Wm−1 q zm x1:m , y1:m , z1:m−1 j m=2 k=1,k6=bm This provides use with an estimate of the joint distribution of interest p ( x1:n , z1:n | y1:n ). 21 Toy Example: Model We use a simulated sequence of 100 observations from the model defined by the densities: x1 0 1 0 µ(x1 , z1 ) =N ; z1 0 0 1 xn xn−1 1 0 f (xn , zn |xn−1 , zn−1 ) =N ; , zn zn−1 0 1 2 xn σx 0 g (yn |xn , zn ) =N yn ; , zn 0 σz2 Consider IMSE (relative to optimal filter) of filtering estimate of first coordinate marginals. 22 Mean Squared Filtering Error Approximation of the RBPF N = 10 −1 10 N = 20 N = 40 N = 80 −2 10 N = 160 0 10 1 10 Number of Lower−Level Particles, M 2 10 For σx2 = σz2 = 1. 23 Computational Performance 0 Mean Squared Filtering Error 10 N = 10 N = 20 N = 40 N = 80 N = 160 −1 10 −2 10 −3 10 1 10 2 10 3 10 Computational Cost, N(M+1) 4 10 5 10 For σx2 = σz2 = 1. 24 Computational Performance N = 10 N = 20 N = 40 N = 80 N = 160 1.9 Mean Squared Filtering Error 10 1.8 10 1.7 10 1.6 10 1 10 2 10 3 10 Computational Cost, N(M+1) 4 10 5 10 For σx2 = 102 and σz2 = 0.12 . 25 In Conclusion I SMC can be used hierarchically. I Software implementation is not difficult [Joh09]. The Rao-Blackwellized particle filter can be approximated exactly I I I I I Can reduce estimator variance at fixed cost. Attractive for distributed/parallel implementation. Allows combination of different sorts of particle filter. Can be combined with other techniques for parameter estimation etc.. 26 References I C. Andrieu and A. Doucet. Particle filtering for partially observed Gaussian state space models. Journal of the Royal Statistical Society B, 64(4):827–836, 2002. C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of the Royal Statistical Society B, 72(3):269–342, 2010. N. Chopin, P. Jacob, and O. Papaspiliopoulos. SMC2 . ArXiv Mathematics e-prints, 1101.1528, January 2011. R. Chen and J. S. Liu. Mixture Kalman filters. Journal of the Royal Statistical Society B, 62(3):493–508, 2000. T. Chen, T. Schön, H. Ohlsson, and L. Ljung. Decentralized particle filter with arbitrary state decomposition. IEEE Transactions on Signal Processing, 59(2):465–478, February 2011. N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140(2):107–113, April 1993. 27 References II A. M. Johansen and A. Doucet. Hierarchical particle sampling for intractable state-space models. CRiSM working paper, University of Warwick, 2013. In preparation. A. M. Johansen. SMCTC: Sequential Monte Carlo in C++. Journal of Statistical Software, 30(6):1–41, April 2009. A. M. Johansen, N. Whiteley, and A. Doucet. Exact approximation of Rao-Blackwellised particle filters. In Proceedings of 16th IFAC Symposium on Systems Identification, Brussels, Belgium, July 2012. IFAC, IFAC. 28