Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling • Recall: – Let’s say that we want to compute some expectation (integral) Z E p [f ] = p(x)f (x)dx and we remember from Monte Carlo integration theory that with samples from p() we can approximate this integral thusly E p [f ] ¼ L X 1 f (x ` ) L `= 1 What if p() is hard to sample from? • One solution: use importance sampling – Choose another easier to sample distribution q() that is similar to p() and do the following: Z E p [f ] = p(x)f (x)dx Z = ¼ p(x) f (x)q(x)dx q(x) L X 1 p(x ` ) f (x ` ) L q(x ` ) `= 1 x ` » q(¢) I.S. with distributions known only to a proportionality • Importance sampling using distributions only known up to a proportionality is easy and common, algebra yields E p [f ] ¼ L X Zq 1 p~(x ` ) f (x ` ) Zp L q~(x ` ) r~` = XL w` = P p( ~ x` ) q~( x ` ) `= 1 ¼ w` f (x ` ) `= 1 r~` L `= 1 r~` A model requiring sampling techniques • Non-linear non-Gaussian first order Markov model Hidden and of interest xt ¡ 1 xt \\ xt + 1 yt ¡ 1 yt yt + 1 p(x 1:i ; y 1:i ) = QN i = 1 p(yi jx i )p(x i jx i ¡ 1 ) Filtering distribution hard to obtain • Often the filtering distribution is of interest p(x i jy 1:i ¡ 1 ) / R R : : : p(x 1:i ; y 1:i )dx 1 : : : dx i ¡ • It may not be possible to compute these integrals analytically, be easy to sample from this directly, nor even to design a good proposal distribution for importance sampling. 1 A solution: sequential Monte Carlo • Sample from sequence of distributions that “converge” to the distribution of interest • This is a very general technique that can be applied to a very large number of models and in a wide variety of settings. • Today: particle filtering for a first order Markov model Concrete example: target tracking • A ballistic projectile has been launched in our direction. • We have orders to intercept the projectile with a missile and thus need to infer the projectiles current position given noisy measurements. Problem Schematic (xt, yt) rt µt (0,0) Probabilistic approach • Treat true trajectory as a sequence of latent random variables • Specify a model and do inference to recover the position of the projectile at time t First order Markov model Hidden true trajectory xt ¡ 1 xt xt + 1 yt ¡ 1 yt yt + 1 Noisy measurements p(x t + 1 jx t ) = a(¢; µa ; x t ) p(yt jx t ) = h(¢; µh ; x t ) Posterior expectations • We want filtering distribution samples Z p(x i jy 1:i ) / p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ / p(yi jx i )p(x i jy 1:i ¡ 1 ) Write this down! so that we can compute expectations Z E p [f ] = f (x i )p(x i jy 1:i )dx i 1 Importance sampling p ~(x i ) / p(yi jx i )p(x i jy 1:i ¡ 1 ) Distribution from which we want samples q(x i ) = p(x i jy 1:i ¡ 1 ) Proposal distribution • By identity the posterior predictive distribution can be written as Z q(x i ) = p(x i jy 1:i ¡ 1 ) = p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1 Basis of sequential recursion • If we start with samples from f i¡ 1 i¡ 1 L w` ; x ` g` = 1 » p(x i ¡ 1 jy 1:i ¡ 1 ) then we can write the proposal distribution as a finite mixture model XL q(x i ) = p(x i jy 1:i ¡ 1 ) ¼ w`i ¡ 1 p(x i jx i` ¡ 1 ) `= 1 and draw samples accordingly i fw ^m ; x^im gM m = 1 » q(¢) Samples from the proposal distribution • We now have samples from the proposal i fw ^m ; x^im g » q(x i ) • And if we recall p ~(x i ) = p(yi jx i )p(x i jy 1:i ¡ 1 ) Distribution from which we want samples q(x i ) = p(x i jy 1:i ¡ 1 ) Proposal distribution Updating the weights completes importance sampling i r^m = p( ~ x^ im ) q( x^ im ) i = p(yi j x^im ) w ^m • We are left with M weighted samples from the posterior up to observation i i wm = P i r^m M m= 1 i r^m i i M f wm ; x m gm = 1 » p(x i jy 1:i ) Intuition • Particle filter name comes from physical interpretation of samples Start with samples representing the hidden state f w`i ¡ 1 ; x i` ¡ 1 gL`= 1 » p(x i ¡ 1 jy 1:i ¡ 1 ) (0,0) Evolve them according to the state model x^im » i¡ 1 p(¢jx ` ) i i¡ w ^m / wm 1 (0,0) Re-weight them by the likelihood i i wm / w ^m p(yi j x^im ) (0,0) Results in samples one step forward i f w`i ; x i` gL`= 1 ¼ f wm ; x im gM m= 1 (0,0) SIS Particle Filter • The Sequential Importance Sampler (SIS) particle filter multiplicatively updates weights at every iteration and thus often most weights get very small • Computationally this makes little sense as eventually low-weighted particles do not contribute to any expectations. • A measure of particle degeneracy is “effective sample size” N^ef f = P L 1 ( w `i ) 2 `= 1 • When this quantity is small, particle degeneracy is severe. Solutions • Sequential Importance Re-sampling (SIR) particle filter avoids many of the problems associated with SIS pf’ing by re-sampling the posterior particle set to increase the effective sample size. • Choosing the best possible importance density is also important because it minimizes the variance of the weights which maximizes N^ ef f Other tricks to improve pf’ing • Integrate out everything you can • Replicate each particle some number of times • In discrete systems, enumerate instead of sample • Use fancy re-sampling schemes like stratified sampling, etc. Initial particles f w`i ¡ 1 ; x i` ¡ 1 gL`= 1 » p(x i ¡ 1 jy 1:i ¡ 1 ) (0,0) Particle evolution step i fw ^m ; x^im gM m = 1 » p(x i jy 1:i ¡ 1 ) (0,0) Weighting step i f wm ; x im gM m = 1 » p(x i jy 1:i ) (0,0) Resampling step f w`i ; x i` gL`= 1 » p(x i jy 1:i ) (0,0) Wrap-up: Pros vs. Cons • Pros: – Sometimes it is easier to build a “good” particle filter sampler than an MCMC sampler – No need to specify a convergence measure • Cons: – Really filtering not smoothing • Issues – Computational trade-off with MCMC Thank You Tricks and Variants • Reduce the dimensionality of the integrand through analytic integration – Rao-Blackwellization • Reduce the variance of the Monte Carlo estimator through – Maintaining a weighted particle set – Stratified sampling – Over-sampling – Optimal re-sampling Particle filtering • Consists of two basic elements: – Monte Carlo integration XL p(x) ¼ w` ±x ` `= 1 Z XL lim L! 1 w` f (x ` ) = `= 1 – Importance sampling f (x)p(x)dx PF’ing: Forming the posterior predictive f w`i ¡ 1 ; x i` ¡ 1 gL`= 1 » p(x i ¡ 1 jy1:i ¡ 1 ) Posterior up to observation i¡ 1 Z p(x i jy1:i ¡ 1 ) = p(x i jx i ¡ 1 )p(x i ¡ 1 jy1:i ¡ 1 )dx i ¡ XL ¼ 1 w`i ¡ 1 p(x i jx i` ¡ 1 ) `= 1 The proposal distribution for importance sampling of the posterior up to observation i is this approximate posterior predictive distribution Sampling the posterior predictive • Generating samples from the posterior predictive distribution is the first place where we can introduce variance reduction techniques i fw ^m ; x^im gM m = 1 » p(x i jy 1:i ¡ 1 ); XL p(x i jy 1:i ¡ 1 ) ¼ `= 1 w`i ¡ 1 p(x i jx i` ¡ 1 ) • For instance sample from each mixture component several twice such that M, the number of samples drawn, is two times L, the number of densities in the mixture model, and assign weights i w ^m w`i ¡ 1 = 2 Not the best • Most efficient Monte Carlo estimator of a function ¡(x) – From survey sampling theory: Neyman allocation – Number drawn from each mixture density is proportional to the weight of the mixture density times the std. dev. of the function ¡ over the mixture density • Take home: smarter sampling possible [Cochran 1963] Over-sampling from the posterior predictive distribution i fw ^m ; x^im gM m = 1 » p(x i jy 1:i ¡ 1 ) (0,0) Importance sampling the posterior • Recall that we want samples from Z p(x i jy 1:i ) / p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ / p(yi jx i )p(x i jy 1:i ¡ 1 ) 1 • and make the following importance Proposal distribution sampling identifications p~(x i ) = p(yi jx i )p(x i jy 1:i ¡ 1 ) Distribution from which we want to sample q(x i ) = p(x i jy 1:i ¡ 1 ) XL ¼ `= 1 w`i ¡ 1 p(x i jx i` ¡ 1 ) Sequential importance sampling • Weighted posterior samples arise as i fw ^m ; x^im g » q(¢) • Normalization of the weights takes place as before i wm = P r^`i L r^`i `= 1 i r^m = p( ~ x^ im ) q( x^ im ) i = p(yi j x^im ) w ^m • We are left with M weighted samples from the posterior up to observation i i f wm ; x im gM m = 1 » p(x i jy 1:i ) An alternative view XL i fw ^m ; x^ im g » w`i ¡ 1 p(x i jx i` ¡ 1 ) `= 1 p(x i jy 1:i ¡ 1 ) ¼ p(x i jy 1:i ) ¼ P P M m= 1 M m= 1 i w ^m ±x^ im i p(yi j x^im ) w ^m ±x^ im Importance sampling from the posterior distribution i f wm ; x im gM m = 1 » p(x i jy 1:i ) (0,0) Sequential importance re-sampling • Down-sample L particles and weights from the collection of M particles and weights i f w`i ; x i` gL`= 1 ¼ f wm ; x im gM m= 1 this can be done via multinomial sampling or in a way that provably minimizes estimator variance [Fearnhead 04] Down-sampling the particle set f w`i ; x i` gL`= 1 » p(x i jy 1:i ) (0,0) Recap • Starting with (weighted) samples from the posterior up to observation i-1 • Monte Carlo integration was used to form a mixture model representation of the posterior predictive distribution • The posterior predictive distribution was used as a proposal distribution for importance sampling of the posterior up to observation i • M > L samples were drawn and re-weighted according to the likelihood (the importance weight), then the collection of particles was down-sampled to L weighted samples LSSM Not alone • Various other models are amenable to sequential inference, Dirichlet process mixture modelling is another example, dynamic Bayes’ nets are another Rao-Blackwellization • In models where parameters can be analytically marginalized out, or the particle state space can otherwise be collapsed, the efficiency of the particle filter can be improved by doing so Stratified Sampling • Sampling from a mixture density using the algorithm on the right produces a more efficient Monte Carlo estimator f wn ; x n gKn= 1 X » ¼k f k (¢) k • for n=1:K – choose k according to ¼k – sample xn from fk – set wn equal to 1/N • for n=1:K – choose fn – sample xn from fn – set wn equal to ¼n Intuition: weighted particle set • What is the difference between these two discrete distributions over the set {a,b,c}? – (a), (a), (b), (b), (c) – (.4, a), (.4, b), (.2, c) • Weighted particle representations are equally or more efficient for the same number of particles Optimal particle re-weighting • Next step: when down-sampling, pass all particles above a threshold c through without modifying their weights where c is the unique solution to P M i m = 1 minf cwm ; 1g = L • Resample all particles with weights below c using stratified sampling and give the selected particles weight 1/ c Fearnhead 2004 Result is provably optimal • In the down-sampling step i f w`i ; x i` gL`= 1 ¼ f wm ; x im gM m= 1 • Imagine instead a “sparse” set of weights of which some are zero i i M fw ~`i ; x i` gM ¼ f w ; x g m m m= 1 `= 1 • Then this down-sampling algorithm is optimal w.r.t. P M i E [( w ~ w m m= 1 i 2 ¡ wm ) ] Fearnhead 2004 Problem Details • Time and position are given in seconds and meters respectively • Initial launch velocity and position are both unknown • The maximum muzzle velocity of the projectile is 1000m/s • The measurement error in the Cartesian coordinate system is N(0,10000) and N(0,500) for x and y position respectively • The measurement error in the polar coordinate system is N(0,.001) for µ and Gamma(1,100) for r • The kill radius of the projectile is 100m Data and Support Code http://www.gatsby.ucl.ac.uk/~fwood/pf_tutorial/ Laws of Motion • In case you’ve forgotten: r = (v0 cos(®))ti + ((v0 sin(®)t ¡ 1 2 2 gt )j • where v0 is the initial speed and ® is the initial angle Good luck! Monte Carlo Integration • Compute integrals for which analytical solutions are unknown Z f (x)p(x)dx XL p(x) ¼ w` ±x ` `= 1 Monte Carlo Integration • Integral approximated as the weighted sum of function evaluations at L points Z Z f (x)p(x)dx ¼ XL f (x) w` ±x ` dx `= 1 XL XL p(x) ¼ w` ±x ` `= 1 = w` f (x ` ) `= 1 Sampling • To use MC integration one must be able to sample from p(x) f w` ; x ` gL`= 1 » p(¢) ! p(¢) XL lim L! 1 w` ±x ` `= 1 Theory (Convergence) • Quality of the approximation independent of the dimensionality of the integrand • Convergence of integral result to the “truth” is O(1/n1/2) from L.L.N.’s. • Error is independent of dimensionality of x Bayesian Modelling • Formal framework for the expression of modelling assumptions • Two common definitions: – using Bayes’ rule – marginalizing over models Posterior Prior p(xjµ)p(µ) p(xjµ)p(µ) p(µjx) = = R p(x) p(xjµ)p(µ)dµ Likelihood Evidence Posterior Estimation • Often the distribution over latent random variables (parameters) is of interest • Sometimes this is easy (conjugacy) • Usually it is hard because computing the evidence is intractable Conjugacy Example µ » xjµ » p(µ) = p(xjµ) = Bet a(®; ¯) Binomial(N ; µ) ¡ (® + ¯) ®¡ 1 µ (1 ¡ µ) ¯ ¡ ¡ (®)¡ (¯) µ ¶ N x µ (1 ¡ µ) N ¡ x x 1 x successes in N trials, µ probability of success Conjugacy Continued p(µjx) = = = µjx Z (x) » = p(xjµ)p(µ) R p(xjµ)p(µ)dµ 1 p(xjµ)p(µ) Z (x) 1 ®¡ 1+ x µ (1 ¡ µ) ¯ ¡ Z (x) 1+ N ¡ x Bet a(® + x; ¯ + N ¡ x) µ ¶¡ ¡ (® + ¯ + N ) ¡ (® + x)¡ (¯ + N ¡ x) 1 Non-Conjugate Models • Easy to come up with examples ¾2 » N (0; ®) xj¾2 » N (0; ¾2 ) Posterior Inference • Posterior averages are frequently important in problem domains – posterior predictive distribution p(x i + 1jx 1:i ) = R p(x i + 1 jµ; x 1:i )p(µjx 1:i )dµ – evidence (as seen) for model comparison, etc. Relating the Posterior to the Posterior Predictive Z p(x i jy 1:i ) = p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z / / p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡ Z p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1 / p(yi jx i )p(x i jy 1:i ¡ 1 ) / / 1 Relating the Posterior to the Posterior Predictive Z p(x i jy 1:i ) = p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z / / p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡ Z p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1 / p(yi jx i )p(x i jy 1:i ¡ 1 ) / / 1 Relating the Posterior to the Posterior Predictive Z p(x i jy 1:i ) = p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z / / p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡ Z p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1 / p(yi jx i )p(x i jy 1:i ¡ 1 ) / / 1 Relating the Posterior to the Posterior Predictive Z p(x i jy 1:i ) = p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z / / p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡ Z p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1 / p(yi jx i )p(x i jy 1:i ¡ 1 ) / / 1 Importance sampling Z E x [f (x)] = p(x)f (x)dx Z Original distribution: hard to sample from, easy to evaluate = ¼ p(x) f (x)q(x)dx q(x) L X 1 p(x ` ) f (x ` ) L q(x ` ) `= 1 Importance weights r` = p( x ` ) q( x ` ) Proposal distribution: easy to sample from x ` » q(¢) Importance sampling un-normalized distributions p(x) = q(x) = p( ~ x) Zp q~( x ) Zq Un-normalized distribution to sample from, still hard to sample from and easy to evaluate Un-normalized proposal distribution: still easy to sample from x` » q ~(¢) E x [f (x)] ¼ L X 1 p(x ` ) f (x ` ) L q(x ` ) `= 1 New term: ratio of normalizing constants ¼ L X Zq 1 p~(x ` ) f (x ` ) Zp L q~(x ` ) `= 1 Normalizing the importance weights Un-normalized importance weights r~` = p( ~ x` ) q~( x ` ) Zq Zp ¼P Takes a little algebra L L `= 1 r~` w` = P Normalized importance weights E x [f (x)] ¼ L X Zq 1 p~(x ` ) f (x ` ) Zp L q~(x ` ) `= 1 XL ¼ w` f (x ` ) `= 1 r~` L `= 1 r~` Linear State Space Model (LSSM) • Discrete time • First-order Markov chain xt + 1 = ax t + ²; ² » N (¹ ² ; ¾²2 ) yt = bx t + ´ ; ´ » N (¹ ´ ; ¾´2 ) xt ¡ 1 xt xt + 1 yt ¡ 1 yt yt + 1 Inferring the distributions of interest • Many methods exist to infer these distributions – – – – Markov Chain Monte Carlo (MCMC) Variational inference Belief propagation etc. • In this setting sequential inference is possible because of characteristics of the model structure and preferable due to the problem requirements Exploiting LSSM model structure… Z p(x i jy 1:i ) = p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z / / p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡ Z p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1 / p(yi jx i )p(x i jy 1:i ¡ 1 ) / / 1 Particle filtering p(x i jy 1:i ) / p(yi jx i ) R p(x i jx i ¡ 1 )p(x i ¡ 1jy 1:i ¡ 1)dx i ¡ 1 Exploiting Markov structure… Z p(x i jy 1:i ) = / / p(xBayes’ )dxconditional i ; x 1:i rule ¡ 1 jy 1:i the 1:i ¡ 1 Use and independence structure dictated by the first Z order Markov hidden variable model p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡ 1 xt ¡ Z 1 p(yi jx i ) xt xt + 1 p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ Z / p(yi jx yt ¡i )1 p(x i yjxt i ¡ 1 )p(x jy 1:i ¡ 1 )dx 1:i ¡ yt +1:i 1 ¡ 1 Z / p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ / p(yi jx i )p(x i jy 1:i ¡ 1 ) 1 1 1 for sequential inference Z p(x i jy 1:i ) = p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z / / p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡ Z p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1 Z p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1 / p(yi jx i )p(x i jy 1:i ¡ 1 ) / / 1 An alternative view XL i fw ^m ; x^ im g » w`i ¡ 1 p(x i jx i` ¡ 1 ) `= 1 p(x i jy 1:i ¡ 1 ) ¼ p(x i jy 1:i ) ¼ P P M m= 1 M m= 1 i w ^m ±x^ im i p(yi j x^im ) w ^m ±x^ im Sequential importance sampling inference • Start with a discrete representation of the posterior up to observation i-1 • Use Monte Carlo integration to represent the posterior predictive distribution as a finite mixture model • Use importance sampling with the posterior predictive distribution as the proposal distribution to sample the posterior distribution up to observation i What? Posterior predictive distribution State model Likelihood Z p(x i jy 1:i ) / p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ / p(yi jx i )p(x i jy 1:i ¡ 1 ) Start with a discrete representation of this distribution 1 Monte Carlo integration General setup p(x) ¼ Z XL XL w` ±x ` `= 1 lim L! 1 w` f (x ` ) = f (x)p(x)dx `= 1 As applied in this stage of the particle filter Z p(x i jy 1:i ¡ 1 ) = p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ XL ¼ `= 1 w`i ¡ 1 p(x i jx i` ¡ 1 ) 1 Samples from Finite mixture model