Sequential Monte Carlo

advertisement
Sequential Monte Carlo and
Particle Filtering
Frank Wood
Gatsby, November 2007
Importance Sampling
• Recall:
– Let’s say that we want to compute some expectation (integral)
Z
E p [f ] =
p(x)f (x)dx
and we remember from Monte Carlo integration theory that with
samples from p() we can approximate this integral thusly
E p [f ] ¼
L
X
1
f (x ` )
L
`= 1
What if p() is hard to sample from?
• One solution: use importance sampling
– Choose another easier to sample distribution
q() that is similar to p() and do the following:
Z
E p [f ] =
p(x)f (x)dx
Z
=
¼
p(x)
f (x)q(x)dx
q(x)
L
X
1
p(x ` )
f (x ` )
L
q(x ` )
`= 1
x ` » q(¢)
I.S. with distributions known only to
a proportionality
• Importance sampling using distributions only
known up to a proportionality is easy and
common, algebra yields
E p [f ] ¼
L
X
Zq 1
p~(x ` )
f (x ` )
Zp L
q~(x ` )
r~` =
XL
w` = P
p(
~ x` )
q~( x ` )
`= 1
¼
w` f (x ` )
`= 1
r~`
L
`= 1
r~`
A model requiring sampling
techniques
• Non-linear non-Gaussian first order
Markov model
Hidden and of
interest
xt ¡
1
xt
\\
xt + 1
yt ¡
1
yt
yt + 1
p(x 1:i ; y 1:i ) =
QN
i = 1 p(yi jx i )p(x i jx i ¡ 1 )
Filtering distribution hard to obtain
• Often the filtering distribution is of interest
p(x i jy 1:i ¡ 1 ) /
R
R
: : : p(x 1:i ; y 1:i )dx 1 : : : dx i ¡
• It may not be possible to compute these integrals
analytically, be easy to sample from this directly, nor
even to design a good proposal distribution for
importance sampling.
1
A solution: sequential Monte Carlo
• Sample from sequence of distributions that
“converge” to the distribution of interest
• This is a very general technique that can
be applied to a very large number of
models and in a wide variety of settings.
• Today: particle filtering for a first order
Markov model
Concrete example: target tracking
• A ballistic projectile has been launched in
our direction.
• We have orders to intercept the projectile
with a missile and thus need to infer the
projectiles current position given noisy
measurements.
Problem Schematic
(xt, yt)
rt
µt
(0,0)
Probabilistic approach
• Treat true trajectory as a sequence of
latent random variables
• Specify a model and do inference to
recover the position of the projectile at
time t
First order Markov model
Hidden true
trajectory
xt ¡
1
xt
xt + 1
yt ¡
1
yt
yt + 1
Noisy
measurements
p(x t + 1 jx t )
=
a(¢; µa ; x t )
p(yt jx t )
=
h(¢; µh ; x t )
Posterior expectations
• We want filtering distribution samples
Z
p(x i jy 1:i )
/
p(yi jx i )
p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
Write this down!
so that we can compute expectations
Z
E p [f ] =
f (x i )p(x i jy 1:i )dx i
1
Importance sampling
p
~(x i ) / p(yi jx i )p(x i jy 1:i ¡ 1 )
Distribution from which we want
samples
q(x i )
=
p(x i jy 1:i ¡ 1 )
Proposal distribution
• By identity the posterior predictive
distribution can be written as
Z
q(x i ) = p(x i jy 1:i ¡ 1 ) =
p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡
1
Basis of sequential recursion
• If we start with samples from
f
i¡ 1
i¡ 1 L
w` ; x ` g` = 1
» p(x i ¡ 1 jy 1:i ¡ 1 )
then we can write the proposal distribution as a
finite mixture model
XL
q(x i ) = p(x i jy 1:i ¡ 1 ) ¼
w`i ¡ 1 p(x i jx i` ¡ 1 )
`= 1
and draw samples accordingly
i
fw
^m
; x^im gM
m = 1 » q(¢)
Samples from the proposal
distribution
• We now have samples from the proposal
i
fw
^m
; x^im g » q(x i )
• And if we recall
p
~(x i ) = p(yi jx i )p(x i jy 1:i ¡ 1 )
Distribution from which we want
samples
q(x i )
=
p(x i jy 1:i ¡ 1 )
Proposal distribution
Updating the weights completes
importance sampling
i
r^m
=
p(
~ x^ im )
q( x^ im )
i
= p(yi j x^im ) w
^m
• We are left with M weighted samples from the
posterior up to observation i
i
wm
= P
i
r^m
M
m= 1
i
r^m
i
i
M
f wm ; x m gm = 1
» p(x i jy 1:i )
Intuition
• Particle filter name comes from physical
interpretation of samples
Start with samples representing the
hidden state
f w`i ¡ 1 ; x i` ¡ 1 gL`= 1 » p(x i ¡ 1 jy 1:i ¡ 1 )
(0,0)
Evolve them according to the state model
x^im
»
i¡ 1
p(¢jx ` )
i
i¡
w
^m
/ wm
1
(0,0)
Re-weight them by the likelihood
i
i
wm
/ w
^m
p(yi j x^im )
(0,0)
Results in samples one step
forward
i
f w`i ; x i` gL`= 1 ¼ f wm
; x im gM
m= 1
(0,0)
SIS Particle Filter
• The Sequential Importance Sampler (SIS) particle filter
multiplicatively updates weights at every iteration and
thus often most weights get very small
• Computationally this makes little sense as eventually
low-weighted particles do not contribute to any
expectations.
• A measure of particle degeneracy is “effective sample
size”
N^ef f = P
L
1
( w `i ) 2
`= 1
• When this quantity is small, particle degeneracy is
severe.
Solutions
• Sequential Importance Re-sampling (SIR)
particle filter avoids many of the problems
associated with SIS pf’ing by re-sampling
the posterior particle set to increase the
effective sample size.
• Choosing the best possible importance
density is also important because it
minimizes the variance of the weights
which maximizes N^ ef f
Other tricks to improve pf’ing
• Integrate out everything you can
• Replicate each particle some number of
times
• In discrete systems, enumerate instead of
sample
• Use fancy re-sampling schemes like
stratified sampling, etc.
Initial particles
f w`i ¡ 1 ; x i` ¡ 1 gL`= 1 » p(x i ¡ 1 jy 1:i ¡ 1 )
(0,0)
Particle evolution step
i
fw
^m
; x^im gM
m = 1 » p(x i jy 1:i ¡ 1 )
(0,0)
Weighting step
i
f wm
; x im gM
m = 1 » p(x i jy 1:i )
(0,0)
Resampling step
f w`i ; x i` gL`= 1 » p(x i jy 1:i )
(0,0)
Wrap-up: Pros vs. Cons
• Pros:
– Sometimes it is easier to build a “good”
particle filter sampler than an MCMC sampler
– No need to specify a convergence measure
• Cons:
– Really filtering not smoothing
• Issues
– Computational trade-off with MCMC
Thank You
Tricks and Variants
• Reduce the dimensionality of the integrand
through analytic integration
– Rao-Blackwellization
• Reduce the variance of the Monte Carlo
estimator through
– Maintaining a weighted particle set
– Stratified sampling
– Over-sampling
– Optimal re-sampling
Particle filtering
• Consists of two basic elements:
– Monte Carlo integration
XL
p(x) ¼
w` ±x `
`= 1
Z
XL
lim
L! 1
w` f (x ` ) =
`= 1
– Importance sampling
f (x)p(x)dx
PF’ing: Forming the posterior predictive
f
w`i ¡ 1 ; x i` ¡ 1 gL`= 1
» p(x i ¡ 1 jy1:i ¡ 1 )
Posterior up to observation
i¡ 1
Z
p(x i jy1:i ¡ 1 )
=
p(x i jx i ¡ 1 )p(x i ¡ 1 jy1:i ¡ 1 )dx i ¡
XL
¼
1
w`i ¡ 1 p(x i jx i` ¡ 1 )
`= 1
The proposal distribution for
importance sampling of the
posterior up to observation i
is this approximate posterior
predictive distribution
Sampling the posterior predictive
• Generating samples from the posterior
predictive distribution is the first place where we
can introduce variance reduction techniques
i
fw
^m
; x^im gM
m = 1 » p(x i jy 1:i ¡ 1 );
XL
p(x i jy 1:i ¡ 1 )
¼
`= 1
w`i ¡ 1 p(x i jx i` ¡ 1 )
• For instance sample from each mixture
component several twice such that M, the
number of samples drawn, is two times L, the
number of densities in the mixture model, and
assign weights
i
w
^m
w`i ¡ 1
=
2
Not the best
• Most efficient Monte Carlo estimator of a
function ¡(x)
– From survey sampling theory: Neyman
allocation
– Number drawn from each mixture density is
proportional to the weight of the mixture
density times the std. dev. of the function ¡
over the mixture density
• Take home: smarter sampling possible
[Cochran 1963]
Over-sampling from the posterior
predictive distribution
i
fw
^m
; x^im gM
m = 1 » p(x i jy 1:i ¡ 1 )
(0,0)
Importance sampling the posterior
• Recall that we want samples from
Z
p(x i jy 1:i )
/
p(yi jx i )
p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
1
• and make the following importance
Proposal distribution
sampling identifications
p~(x i ) = p(yi jx i )p(x i jy 1:i ¡ 1 )
Distribution from which we
want to sample
q(x i )
=
p(x i jy 1:i ¡ 1 )
XL
¼
`= 1
w`i ¡ 1 p(x i jx i` ¡ 1 )
Sequential importance sampling
• Weighted posterior samples arise as
i
fw
^m
; x^im g » q(¢)
• Normalization of the weights takes place as before
i
wm
= P
r^`i
L
r^`i
`= 1
i
r^m
=
p(
~ x^ im )
q( x^ im )
i
= p(yi j x^im ) w
^m
• We are left with M weighted samples from the posterior
up to observation i
i
f wm
; x im gM
m = 1 » p(x i jy 1:i )
An alternative view
XL
i
fw
^m
; x^ im g »
w`i ¡ 1 p(x i jx i` ¡ 1 )
`= 1
p(x i jy 1:i ¡ 1 ) ¼
p(x i jy 1:i ) ¼
P
P
M
m= 1
M
m= 1
i
w
^m
±x^ im
i
p(yi j x^im ) w
^m
±x^ im
Importance sampling from the
posterior distribution
i
f wm
; x im gM
m = 1 » p(x i jy 1:i )
(0,0)
Sequential importance re-sampling
• Down-sample L particles and weights from the collection
of M particles and weights
i
f w`i ; x i` gL`= 1 ¼ f wm
; x im gM
m= 1
this can be done via multinomial sampling or in a way
that provably minimizes estimator variance
[Fearnhead 04]
Down-sampling the particle set
f w`i ; x i` gL`= 1 » p(x i jy 1:i )
(0,0)
Recap
• Starting with (weighted) samples from the
posterior up to observation i-1
• Monte Carlo integration was used to form a
mixture model representation of the posterior
predictive distribution
• The posterior predictive distribution was used as
a proposal distribution for importance sampling
of the posterior up to observation i
• M > L samples were drawn and re-weighted
according to the likelihood (the importance
weight), then the collection of particles was
down-sampled to L weighted samples
LSSM Not alone
• Various other models are amenable to
sequential inference, Dirichlet process
mixture modelling is another example,
dynamic Bayes’ nets are another
Rao-Blackwellization
• In models where parameters can be
analytically marginalized out, or the
particle state space can otherwise be
collapsed, the efficiency of the particle
filter can be improved by doing so
Stratified Sampling
• Sampling from a mixture density using the algorithm on
the right produces a more efficient Monte Carlo estimator
f wn ; x n gKn= 1
X
»
¼k f k (¢)
k
• for n=1:K
– choose k according to ¼k
– sample xn from fk
– set wn equal to 1/N
• for n=1:K
– choose fn
– sample xn from fn
– set wn equal to ¼n
Intuition: weighted particle set
• What is the difference between these two
discrete distributions over the set {a,b,c}?
– (a), (a), (b), (b), (c)
– (.4, a), (.4, b), (.2, c)
• Weighted particle representations are
equally or more efficient for the same
number of particles
Optimal particle re-weighting
• Next step: when down-sampling, pass all
particles above a threshold c through
without modifying their weights where c is
the unique solution to
P
M
i
m = 1 minf cwm ; 1g =
L
• Resample all particles with weights below
c using stratified sampling and give the
selected particles weight 1/ c
Fearnhead 2004
Result is provably optimal
• In the down-sampling step
i
f w`i ; x i` gL`= 1 ¼ f wm
; x im gM
m= 1
• Imagine instead a “sparse” set of weights of which some
are zero
i
i
M
fw
~`i ; x i` gM
¼
f
w
;
x
g
m
m m= 1
`= 1
• Then this down-sampling algorithm is optimal w.r.t.
P
M
i
E
[(
w
~
w
m
m= 1
i 2
¡ wm
) ]
Fearnhead 2004
Problem Details
• Time and position are given in seconds and meters
respectively
• Initial launch velocity and position are both unknown
• The maximum muzzle velocity of the projectile is
1000m/s
• The measurement error in the Cartesian coordinate
system is N(0,10000) and N(0,500) for x and y position
respectively
• The measurement error in the polar coordinate system is
N(0,.001) for µ and Gamma(1,100) for r
• The kill radius of the projectile is 100m
Data and Support Code
http://www.gatsby.ucl.ac.uk/~fwood/pf_tutorial/
Laws of Motion
• In case you’ve forgotten:
r = (v0 cos(®))ti + ((v0 sin(®)t ¡
1 2
2 gt )j
• where v0 is the initial speed and ® is the
initial angle
Good luck!
Monte Carlo Integration
• Compute integrals for which analytical
solutions are unknown
Z
f (x)p(x)dx
XL
p(x) ¼
w` ±x `
`= 1
Monte Carlo Integration
• Integral approximated as the weighted
sum of function evaluations at L points
Z
Z
f (x)p(x)dx
¼
XL
f (x)
w` ±x ` dx
`= 1
XL
XL
p(x) ¼
w` ±x `
`= 1
=
w` f (x ` )
`= 1
Sampling
• To use MC integration one must be able to
sample from p(x)
f w` ; x ` gL`= 1
»
p(¢)
!
p(¢)
XL
lim
L! 1
w` ±x `
`= 1
Theory (Convergence)
• Quality of the approximation independent
of the dimensionality of the integrand
• Convergence of integral result to the
“truth” is O(1/n1/2) from L.L.N.’s.
• Error is independent of dimensionality of x
Bayesian Modelling
• Formal framework for the expression of
modelling assumptions
• Two common definitions:
– using Bayes’ rule
– marginalizing over models
Posterior
Prior
p(xjµ)p(µ)
p(xjµ)p(µ)
p(µjx) =
= R
p(x)
p(xjµ)p(µ)dµ
Likelihood
Evidence
Posterior Estimation
• Often the distribution over latent random
variables (parameters) is of interest
• Sometimes this is easy (conjugacy)
• Usually it is hard because computing the
evidence is intractable
Conjugacy Example
µ »
xjµ »
p(µ)
=
p(xjµ)
=
Bet a(®; ¯)
Binomial(N ; µ)
¡ (® + ¯) ®¡ 1
µ
(1 ¡ µ) ¯ ¡
¡ (®)¡ (¯)
µ ¶
N x
µ (1 ¡ µ) N ¡ x
x
1
x successes in N trials, µ probability of success
Conjugacy Continued
p(µjx)
=
=
=
µjx
Z (x)
»
=
p(xjµ)p(µ)
R
p(xjµ)p(µ)dµ
1
p(xjµ)p(µ)
Z (x)
1 ®¡ 1+ x
µ
(1 ¡ µ) ¯ ¡
Z (x)
1+ N ¡ x
Bet a(® + x; ¯ + N ¡ x)
µ
¶¡
¡ (® + ¯ + N )
¡ (® + x)¡ (¯ + N ¡ x)
1
Non-Conjugate Models
• Easy to come up with examples
¾2
»
N (0; ®)
xj¾2
»
N (0; ¾2 )
Posterior Inference
• Posterior averages are frequently
important in problem domains
– posterior predictive distribution
p(x i + 1jx 1:i ) =
R
p(x i + 1 jµ; x 1:i )p(µjx 1:i )dµ
– evidence (as seen) for model comparison,
etc.
Relating the Posterior to the
Posterior Predictive
Z
p(x i jy 1:i )
=
p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡
1
Z
/
/
p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡
Z
p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
/
/
1
Relating the Posterior to the
Posterior Predictive
Z
p(x i jy 1:i )
=
p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡
1
Z
/
/
p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡
Z
p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
/
/
1
Relating the Posterior to the
Posterior Predictive
Z
p(x i jy 1:i )
=
p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡
1
Z
/
/
p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡
Z
p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
/
/
1
Relating the Posterior to the
Posterior Predictive
Z
p(x i jy 1:i )
=
p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡
1
Z
/
/
p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡
Z
p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
/
/
1
Importance sampling
Z
E x [f (x)] =
p(x)f (x)dx
Z
Original
distribution:
hard to
sample from,
easy to
evaluate
=
¼
p(x)
f (x)q(x)dx
q(x)
L
X
1
p(x ` )
f (x ` )
L
q(x ` )
`= 1
Importance
weights
r` =
p( x ` )
q( x ` )
Proposal
distribution:
easy to
sample from
x ` » q(¢)
Importance sampling
un-normalized distributions
p(x) =
q(x) =
p(
~ x)
Zp
q~( x )
Zq
Un-normalized distribution to sample from, still hard
to sample from and easy to evaluate
Un-normalized proposal distribution:
still easy to sample from
x` » q
~(¢)
E x [f (x)] ¼
L
X
1
p(x ` )
f (x ` )
L
q(x ` )
`= 1
New term:
ratio of
normalizing
constants
¼
L
X
Zq 1
p~(x ` )
f (x ` )
Zp L
q~(x ` )
`= 1
Normalizing the importance weights
Un-normalized importance weights
r~` =
p(
~ x` )
q~( x ` )
Zq
Zp
¼P
Takes a little algebra
L
L
`= 1
r~`
w` = P
Normalized importance weights
E x [f (x)] ¼
L
X
Zq 1
p~(x ` )
f (x ` )
Zp L
q~(x ` )
`= 1
XL
¼
w` f (x ` )
`= 1
r~`
L
`= 1
r~`
Linear State Space Model (LSSM)
• Discrete time
• First-order Markov chain
xt + 1
=
ax t + ²; ² » N (¹ ² ; ¾²2 )
yt
=
bx t + ´ ; ´ » N (¹ ´ ; ¾´2 )
xt ¡
1
xt
xt + 1
yt ¡
1
yt
yt + 1
Inferring the distributions of interest
• Many methods exist to infer these distributions
–
–
–
–
Markov Chain Monte Carlo (MCMC)
Variational inference
Belief propagation
etc.
• In this setting sequential inference is possible
because of characteristics of the model structure
and preferable due to the problem requirements
Exploiting LSSM model structure…
Z
p(x i jy 1:i )
=
p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡
1
Z
/
/
p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡
Z
p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
/
/
1
Particle filtering
p(x i jy 1:i ) / p(yi jx i )
R
p(x i jx i ¡ 1 )p(x i ¡ 1jy 1:i ¡ 1)dx i ¡
1
Exploiting Markov structure…
Z
p(x i jy 1:i )
=
/
/
p(xBayes’
)dxconditional
i ; x 1:i rule
¡ 1 jy
1:i the
1:i ¡ 1
Use
and
independence
structure dictated by the first
Z
order Markov hidden variable model
p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡ 1
xt ¡
Z
1
p(yi jx i )
xt
xt + 1
p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡
Z
/
p(yi jx
yt ¡i )1
p(x i yjxt i ¡ 1 )p(x
jy 1:i ¡ 1 )dx 1:i ¡
yt +1:i
1 ¡ 1
Z
/
p(yi jx i )
p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
1
1
1
for sequential inference
Z
p(x i jy 1:i )
=
p(x i ; x 1:i ¡ 1 jy 1:i )dx 1:i ¡
1
Z
/
/
p(yi jx i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )p(x i ; x 1:i ¡ 1 ; y 1:i ¡ 1 )dx 1:i ¡
Z
p(yi jx i ) p(x i jx 1:i ¡ 1 ; y 1:i )p(x 1:i ¡ 1 jy 1:i )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x 1:i ¡ 1 jy 1:i ¡ 1 )dx 1:i ¡ 1
Z
p(yi jx i ) p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡ 1
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
/
/
1
An alternative view
XL
i
fw
^m
; x^ im g »
w`i ¡ 1 p(x i jx i` ¡ 1 )
`= 1
p(x i jy 1:i ¡ 1 ) ¼
p(x i jy 1:i ) ¼
P
P
M
m= 1
M
m= 1
i
w
^m
±x^ im
i
p(yi j x^im ) w
^m
±x^ im
Sequential importance sampling
inference
• Start with a discrete representation of the
posterior up to observation i-1
• Use Monte Carlo integration to represent
the posterior predictive distribution as a
finite mixture model
• Use importance sampling with the
posterior predictive distribution as the
proposal distribution to sample the
posterior distribution up to observation i
What?
Posterior predictive distribution
State model
Likelihood
Z
p(x i jy 1:i )
/
p(yi jx i )
p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡
/
p(yi jx i )p(x i jy 1:i ¡ 1 )
Start with a discrete
representation of this
distribution
1
Monte Carlo integration
General setup
p(x) ¼
Z
XL
XL
w` ±x `
`= 1
lim
L! 1
w` f (x ` ) =
f (x)p(x)dx
`= 1
As applied in this stage of the particle filter
Z
p(x i jy 1:i ¡ 1 ) =
p(x i jx i ¡ 1 )p(x i ¡ 1 jy 1:i ¡ 1 )dx i ¡
XL
¼
`= 1
w`i ¡ 1 p(x i jx i` ¡ 1 )
1
Samples from
Finite mixture model
Download