Sai Ravela
M. I. T
Last Updated: Spring 2013
1
� Applied to Sequential filtering problems
� Can also be applied to smoothing problems
� Solution via Recursive Bayesian Estimation
� Approximate Solution
� Can work with non-Gaussian distributions/non-linear dynamics
� Applicable to many other problems e.g. Spatial Inference
2
� x t
, X k
: Models states in continuous and discrete space-time respectively.
�
�
�
� x T t
: True system state y t
, Y k
: Continous and Discrete measurements, respectively.
X n k
: n th sample of discrete vector at step k .
M : model, P : probability mass function.
� Q : Proposal Distribution, δ : kronecker or dirac delta function.
We follow Arulampalam et al.’s paper.
Non-Gaussianity
Sampling
SIS
SIR
Kernel
RPF
3
Recall: Ensemble Kalman filter & Smoother y
1 y
2
Observations x
0 x
1 x
2
Model States
We are interested in studying the evolution of y t system, using a model with state x t
.
∈ f ( x T t
) , observed
4
This means (in discrete time, discretized space):
P ( X k
| Y
1 : k
) step
Can be solved recursively
P ( X k
| Y
1 : k
) =
P ( X k
, Y
1 : k
)
P ( Y
1 : k
)
5
Y
1 : k
So: is a collection of variables Y
1
. . .
Y k
P ( X k
| Y
1 : k
) =
P ( X k
, Y
1 : k
)
P ( Y
1 : k
)
=
P ( Y k
| X k
) P ( X k
| Y k
) P ( Y
1 : k − 1
)
P ( Y k
| Y
1 : k − 1
) P ( Y
1 : k − 1
)
=
P ( Y k
| X k
) P ( X k
| Y
1 : k − 1
)
P ( Y k
| Y
1 : k − 1
)
6
P ( X k
| Y
1 : k
) =
P ( Y k
_ __
2
| X k
�
) P ( X k
| X k − 1
) P ( X k − 1
_ X k − 1
_ __
1
| Y
1 : k − 1
)
_
� �
P ( Y k
| X k
) P ( X k
| X k − 1
) P ( X k − 1
| Y k − 1
)
X k
_
X k − 1
__
3
_
1. From the Chapman-Kolmogorov equation
2. The measurement model/observation equation
3. Normalization Constant
When can this recursive master equation be solved?
7
Let’s say
X k
= F k
X k − 1
+ V k
Z k
= H k
X k
+ η k v k
= N ( · , P k | k
)
η k
= N ( 0 , R )
Linear Gaussian → Kalman Filter
8
� Extended Kalman Filter, via linearization
� Ensemble Kalman filter
•
No linearization
• Gaussian assumption
•
Ensemble members are “particles” that moved around in state space
• They represent the moments of uncertainty
9
How may we relax the Gaussian assumption?
If P ( X k
| X k − 1
) and P ( Y k
| X k
) are non-gaussian;
How do we represent them, let alone perform these integrations in (2)
& (3)?
10
Generically
N
X
P ( X ) = w i
δ ( X − X i
) i = 1 pmf/pdf defined as a weighted sum
→ Recall from Sampling lecture
→ Response Surface Modeling lecture
11
w 1 w 2 w 10
X
1
X
2
X
10
Even so,
Whilst P ( X ) can be evaluated sampling from it may be difficult.
12
Suppose we wish to evaluate f ( x ) P ( x ) dx (e.g. moment calculation) x x f ( x )
P ( x )
Q ( x )
Q ( x ) dx , X i ∼ Q ( x )
1
=
N
N i = 1 f ( x = X i
) w i
, w i
=
P ( x = X
Q ( x = X i i
)
)
13
So:
Sample from Q ≡ Proposal distribution
Evaluate from P ≡ the density
Apply importance weight = w i
=
P ( X i
)
Q ( X i )
Now let’s consider
P ( x ) =
P x )
Q
( x ) dx
=
( x )
Z p
Q ( x ) =
Q x )
Q
( x ) dx
=
Q
( x )
Z q
14
So:
1 Z q
N Z p
N i = 1 f ( x = X i
)
Q i where i
Q
( x = X
=
Q
( x = X i
) i )
These are un-normalized “mere potentials”
It turns out:
NZ p
=
�
ˆ i
Z q i
∴ f ( x ) P ( x ) dx
∼
�
N i = 1 f ( x = X i
�
N j w j i
15
P i f ( X i j j i is just a “weighted sum”
Where a proposal distribution was used to get around sampling difficulties and the importance weights manage all the normalization.
⇒ It is important to select a good proposal distribution. Not one that focus on a small part of the state space and perhaps better than an uninformative prior.
16
Application of Importance Sampling to Bayesian Recursive Estimation
Particle Filter
P ( X ) = i i
δ ( X − X i
)
X
= w i
δ ( X − X i
) j w j i w i is normalized.
17
Let’s consider again:
X k
= f ( X k − 1
) + V k
Y k
= h ( X k
) + η k
A relationship between the observation and the state (measurement)
⇒ Additive noise, but can be generalized
18
Let’s consider the joint distribution
P ( X
0 : k
| Y
1 : K
)
Y
1
Y k
X
0
X
1
X k
IC
We may factor this distribution using particles
19
N
P ( X
0 : k
| Y
1 : k
X
) = w i
δ ( X
0 : k
− X i
0 : k
) i = 1 w i
≡
P ( X i
0 : k
| Y
1 : k
)
Q ( X i
0 : k
| Y
1 : k
)
And let’s factor P ( X
0 : k
| Y
1 : k
) as
P ( X
0 : k
| Y
1 : k
) =
P ( Y k
| X
0 : k
, Y
1 : k − 1
) P ( X
0 : k
| Y
1 : k − 1
)
P ( Y k
| Y
1 : k − 1
)
=
P ( Y k
| X k
) P ( X k
| X k − 1
) P ( X k − 1
| Y
1 : k − 1
)
P ( Y k
| Y
1 : k
)
20
Suppose we pick
Q ( X
0 : k
| Y
1 : k
) = Q ( X k
| X
0 : k − 1
, Y
1 : k
) Q ( X
0 : k − 1
| Y
1 : k − 1
) i.e. there is some kind of recursion on the proposal distribution.
Further, if we approximate
Q ( X k
| X
0 : k − 1
, Y
1 : k
) = Q ( X k
| X k − 1
, Y k
) i.e. there is a Markov property.
21
Then we may have found an update equation for the weights.
P ( X
0 : k
Q ( X
0 : k
| Y
1 : k
| Y
1 : k
)
=
) P ( Y
P ( Y k
| Y k
| X k
1 : k − 1
) P ( X k
| X k − 1
) Q ( X k
| X k − 1
) P ( X
0 : k − 1
, Y
1 : k − 1
, Y k
) Q ( X
0 : k − 1
| Y
)
1 : k − 1
) w i k
P ( Y k
| X k i ) P ( X k i | X i k − 1
) P ( X i
0 : k − 1
, Y
1 : k − 1
)
=
Q ( X k i | X k i
− 1
, Y k
) P ( Y k
| Y
1 : k − 1
) Q ( X i
0 : k − 1
, Y
1 : k − 1
)
=
P ( Y k
| X k i
) P ( X k i | X i k − 1
)
Q ( X k i | X k i
− 1
, Y k
) P ( Y k
| Y
1 : k − 1
) w i k − 1
∝
P ( Y k
| X k i
) P ( X
Q ( X k i | X k i
− 1 i k
,
| X i k − 1
Y k
)
) w i k − 1
22
In the filtering problem
P ( X k
| Y
1 : k
) w i k
∝ w i k − 1
P ( Y k
| X k i
) P ( X k i | X k i
− 1
Q ( X k i | X k i
− 1
, Y k
)
)
N
( So ) P ( X k
| Y
1 : k
) =
X w i k
δ ( X k
− X i k
) i = 1
Where the x i k
∼ Q ( X k
| X k i
− 1
, Y k
)
The method essentially draws particles from a proposal distribution and recursively update its weights.
⇒ No gaussian assumption
⇒ Neat
23
end
Input: { X k i
− 1
, w k i
− 1
} , Y k for: i = 1 . . .
N i = 1
Draw: X k i ∼ Q ( X k
| X k i
− 1
, Y k
) w i k
∝ w i k − 1
P ( Y k
| X k i
) P ( X k
Q ( X k i | X i k − 1 i
| X i k − 1
, Y k
)
)
. . .
N
24
BUT
X k − 1
Q
X k
In a few intervals one particle will have a non negligible weight; all but one will have negligible weights!
N eff
1
=
P i
N
= 1
( w k i ) 2
25
N eff
≡ Effective Sample size
When N eff
<< N → Degenaracy sets in
Resampling is a way to address this problem
Main idea weights
Resample
You can sample uniformly and set weights to obtain a representation. You can sample pdf to get particles and reset their weights. weights are reset
26
w 1 w 2 w 3 w
4 w 5 w 6 w 7
X
1
X
2
X
3
X
4 X 5
X
6 X 7
Cdf(w)
Uniform weights w 1 w 7 w 2 w 6 w 3 w 5 w 4
Resampling
Sample more probable states more
Many points
27
Input { X k i , w i k
}
1. Construct cdf for i = 2
2. Seed u
1
: N C i
← C i − 1
+ w i k
( sorted )
∼ U [ 0 , N
− 1
]
3. for j = 1 : N u j
= u
1
+
1
N
( j − 1 ) i ← find ( C i
≥ u j
)
X j k
= X k i w j k
=
Set Parent of i j
1
N
→ i
28
So the resampling method can avoid degeneracy because it produces more samples for higher probability points
But Sample impoverishment may result; Too many samples too close → impoverishment or loss of diversity
⇒ MCMC may be a way out
29
Input: { X k i
− 1
, w k i
− 1
} , Y k for i = 1 : N
X i k w i k
∼
←
Q w
( X k
| i k − 1
X k i
− 1
P ( Y k
, Y )
| X i k
) P ( X i k
| X
Q ( X i k
| X k k i
− 1
, Y i k − 1
) k
) end
η w
= k i
P i
← w k i w i k
/η
N eff
1
=
P i
N
= 1
( w k i ) 2
If N eff
{ X k
< N
T
, w k i } ← Resample { X k i , w k i }
30
What is the optimal Q function? we try to minimize �
N i = 1
Then:
( w ∗ i k
) 2
Q
∗
( X k
| X k i
− 1
, Y k
) = P ( X k
| X k i
− 1
, Y k
)
=
P ( Y k
| X k
, X k i
− 1
) P ( X k
| X i k − 1
)
P ( Y k
| X k i
− 1
) w i k
∝ w k i
− 1
P
P
(
(
Y
Y k k
| X
| X k i k i
) P ( X
) P ( X k i k
|
| X
X k i
− 1 k i
− 1
)
P ( Y
) k
| X i k − 1
)
∝ w i k − 1
Z
_
X k
P ( Y k
| X k
) P ( X k
| X k i
− 1
) dX k
__
Not easy to do!
31
_
Asymptotically:
Q ∼ P ( X k
| X k i
− 1
) ← Common choice Q ≡ P ( X k
| X i k − 1
)
Sometimes feasible to use proposal from process noise
Then w i
K
∝ w k i
− 1
P ( Y k
| X i k
)
If resampling is done at every step: w i k
∝ p ( Y k
| X k i
)
( w k i
− 1
∝ 1
N
)
32
SIR -Sampling Importance Resampling
Input { X k i
− 1
, w k i
− 1
} , Y k for i = 1 : N
X w i k i k
∼ P ( X k
= P ( Y k
| X
| X i k − 1 i k
)
) end
η = P w
{ x i k i k i
= w
, w i k w k i i k
/η
} ← Resample [ { X k i
, w i k
} ]
33
X k
=
X k − 1
2
+
25 X k − 1
1 + X k
2
− 1
+ 8 cos ( 1 .
2 k ) + v k − 1
Y k
=
X 2 k w
+ η k
η k
∼ N ( 0 , R ) v k − 1
∼ N ( 0 , Q k − 1
)
34
12.S990 Quantifying Uncertainty
Fall 2012
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms .