Statistical Techniques in Robotics (STR, S16) Lecture#04 (Wednesday, January 27)

advertisement
Statistical Techniques in Robotics (STR, S16) Lecture#04 (Wednesday, January 27)
Lecturer: Byron Boots
1
Importance Sampling, Particle Filters
Introduction
Last time, we discussed the Monte Carlo Method. In this lecture we will first discuss Importance
Sampling and then we will discuss Particle Filters. In particular, we will explain how they work,
the good, the bad, and the ugly of Particle Filters. We will also talk about how to fix the bad.
1.1
Return to Robots
In the case of a robot, we would like to compute the expected value of a function f knowing previous
location xt−1 and measurement zt .
Ep(xt |zt ,xt−1 ) [f (xt )]
(1)
We will likely never know a closed form solution for this distribution. So, we pursue the previously described sample based Monte Carlo Method. This would require samples drawn from the
distribution
p(xt |zt , xt−1 ),
(2)
1
p(zt |xt )p(xt |xt−1 ).
Z
(3)
which can be rewritten as
The distribution is the product of the measurement model, p(zt |xt ), and the robot motion model,
p(xt |xt−1 ). It is not obvious how to sample from the product of these distributions. It is, however,
straightforward to sample from the robot motion model, a classic example of which is shown in
Fig. 1.1.
1
Figure 1: The motion model: Posterior distributions of the robot’s pose upon executing the motion
command illustrated by the dashed line. Isocontours are shown for the probability of the particle.
This plot has been projected into 2-D. The original density is three-dimensional, taking the robot’s
heading direction, θ, into account.
2
2
Importance sampling (IS)
Although we are unable to sample from the required distribution, p(x), we are able to evaluate p(x)
at desired xi . Thus, we can sample from a known distribution q(x) and evaluate p(x) to correctly
reweigh the samples.
The trick: Sample from the available distribution and reweight samples to fix it. This trick is
referred to as Importance Sampling. Manipulating the expectation over p(x) yields the following
approximation:
X
E [f (x)] =
p(x)f (x)
(4a)
p(x)
=
X
p(x)f (x)
q(x)
q(x)
p(x)
f (x)
q(x)
p(x)
= E[
f (x)]
q(x) q(x)
=
≈
X
q(x)
N
1 X p(xi )
f (xi ).
N
q(xi )
(4b)
(4c)
(4d)
(4e)
i=1
p(x)
q(x)
is called the importance weight. When we are undersampling an area, it weights it stronger;
likewise, oversampled areas are weighted more weakly.
2.1
Example
q
p
Figure 2: Distribution we wish to sample from, p, and distribution we can sample from, q.
Consider the above (somewhat contrived) pair of functions. Our estimate of the expectation of p
becomes:
N
1 X p(xi )
f (xi ) →N →∞ Ep [f (x)]
N
q(xi )
i=1
3
(5)
2.2
Example
Applying the importance sampling to Section 1.1, we can set q = p(xt |xt−1 ) as the motion model,
and p = p(xt |zt , xt−1 ) = Z1 p(zt |xt )p(xt |xt−1 ) as the posterior.
p = η p(zt |xt )p(xt |xt−1 )
(6)
X i ∼ p(xt |xt−1 )
(7)
wi =
p
= p(zt |xt )
q
(8)
N
1 X i i
wX
N
X̄ = E[X] =
(9)
i=1
But notice that we dropped the normalizer from bayes rule! (if we don’t know Z)
2.3
Normalized IS
Proposed fix: we can normalize estimate using
N
X
1 X i
w →N →∞
p(zt |xt )p(xt ) = Z
N
x
i=1
X̄ =
If
P
iw
i
1 X
N
(10)
t
1
N
wi
P
w
Xi =
i
X wi
P i Xi
w
is low, we haven’t drawn enough samples or the probability of the observation is low.
Algorithm 1 NIS Filter Algorithm
for all i do
sample xi0 from p(x0 )
end for
for all t do
for all i do
sample xit from p(xt |xit−1 )
wi ∗ = p(zt |xit )
end for
w̃i = Pwwi i
end for P
E[f (x)] = i w̃i f (xi )
4
Figure 3: Resampling
3
SIR Filter (Particle filter, Condensation)
3.1
Problem - recursive importance sampling
Unfortunately, if we implement Importance Sampling recursively and keep the samples fixed, then
as time progresses most samples will become useless since they do not match the observations. The
weights then go towards either 0 (most particles) or 1 (the few particles that match the observation).
Essentially, we are using particles to keep track of parts of the state space that we know are not
likely. The true path will be a sample with vanishing probability, so we have to throw out low
probability samples and resample in high probability regions. This is known as a Particle or SIR
filter.
3.2
Solution - Particle Filter (Sampling with Importance Resampling)
The solution is to do NIS, but to resample particles instead of changing the weights. This way, we
do not waste particles on unlikely portions of the state space, but instead use our particles more
intelligently.
Given the previous set of particles χt−1 , ut , and zt , we perform each of the following steps.
1. Apply the forward motion model to all particles. This means that, for each i, we sample χit
from p(xt |xit−1 , ut ).
2. Update the weights for each particle, which means we set w̃i = p(zt |xit ) for each i.
3. Normalize the weights. Set wi =
w̃i
Σi w̃i
for each i.
4. Perform the resampling step. We resample each particle according to the weights wi .
Resampling proceeds as shown in Fig. 3: we select the particles with replacement according
to their weight. Intuitively, we can think of this as lining the particles up on a dartbord that
is 1 unit long, with width according to weight, and then throwing darts to select which ones
to keep (possibly taking two copies of one particle). Having selected the particles, we set all
their weights to 1/N , since otherwise we are double counting the weights.
5. Reset each particle’s weight to
1
n
and return the set of particles.
This approach has a variety of names: SIR, Particle Filter, CONDENSATION, etc., and can be
applied to a variety of problems. All that a particle filter entails is sampling particles, reweighting
them according to a belief, normalizing the weights, and then resampling. Thus, while this lecture
primarily deals with localization, particle filters can be applied to a great deal of other problems.
5
4
The Good
Particle filters have a number of advantages which have led to them being very popular in modern
robotics.
1. Since the particles approximate the posterior belief, we can answer any question we can pose
in terms of an expectation.
2. Works for any observation model and any motion model we can design.
3. Particle filters scale well; they are ’embarassingly parallelizable’.
4. They work for high dimensional systems as they are independent of the size of the system.
5. The algorithm is relatively easy to implement.
5
5.1
The Bad
Lack of diversity
Over time, especially with uninformative sensor readings, samples tend to congregate. One metric
to benchmark a particle filter is how long it takes for particles to congregate. Take the following
example.
(a) Two rooms with equal probability
(b) The same filter, after repeated resampling
Figure 4: A problem with assuming the independence of the observations and using naı̈ve sampling.
Without gaining any new information, the particle filter converges on one of the rooms.
Example. There are two rooms, and the robot is unsure about which room it is in. The sensor
readings show equal probability of being in either room. We start out with N particles equally
distributed between the two rooms. Over time, the particles will converge to one room, without
getting any new sensor data.
Why does this happen? Particle filters are non-deterministic. At the resampling step, one room
is likely to gain particles. Once this happens, at the next resampling step, that room is more likely
to gain particles. Eventually, all the particles converge to one state. Once a state loses particles, it
cannot regain them without motion.
6
Practical Fixes:
1. Skip the resampling step if you detect you are not receiving new information.You might detect
this by looking at how max({wi })/ min({wi }) or entropy of the weights changes.
2. Use low variance resampling (see Fig. 5) (AKA “Stochastic Universal Sampler” in genetic
algorithms domain). In one dimension (AKA the “dartboard” example), this method works
as follows. A single random number r is drawn, and particles are selected at every N1 units,
offset from r. This has the desirable property that any particle with normalized weight greater
than N1 is guaranteed to be drawn, which eliminates our “Lack of Diversity” problem.
Figure 5: Low Variance Resampling; in practice, this approach is preferred over the naı̈ve approach.
3. Add more particles. This will increase the number of resamples the particles need to congregate to one state. Adding more particles usually helps with most problems.
4. Inject uniform samples. This is an unprincipled hack, and should be used as a last resort.
5.2
Precise Observation Models may make the PF fail
Consider a very accurate sensor with an observation model similar to that in Fig. 6. If we sample
uniformly along xt (i.e after we push particles through a motion model), it is unlikely that any of
the sames will fall under the slim peak of the model. In fact, it may be more likely for a particle
to fall wider under the second, wider, peak.
Fixes: There are a number of fixes for this problem, including more principled approaches and
unprincipled and less effective hacks.
1. Sample from a better model than the motion model: the underlying problem is that
the the proposal distribution (motion model) does not match the distribution that we want
to sample from (observation model). The best, and most principled solution is to change the
distribution. For instance, if one can sample directly from the observation model, one should
do so. This is generally known as a dual particle filter.
2. Rao-Blackwellization: Solve as much of the problem analytically to reduce the dimensionality of your resampling step.
3. Squash the observation model: take all of the probabilities to some power 1/p less than
1. When this is done, p observations count as 1 observation.
4. More particles: Sample more particles to ensure that that the peak is sampled (although
this is unsatisfactory, since it doesn’t guarantee a particle will land under the desired peak).
7
Figure 6: A very good observation model. Counterintuitively, this reliabiilty may pose problems
for a vanilla implementation of a particle filter.
5. Pretend your observation model is worse: convolve the observation model with a Gaussian, and pretend that the resulting much less confident observation model is the actual model.
This is a last resort, as it throws away a lot of valuable data.
5.3
Measuring Particle Filter Performance is Difficult
There is no convenient way of relating accuracy to number of particles. Similarly, particle filters
offer no measure of confidence in their readings.
5.4
Particle Filters are Expensive Computationally
Despite being scalable (parallelizable), a good particle filter still requires a LOT of particles.
Fix: If your distribution is unimodal, it is a good idea to use a Kalman filter instead.
5.5
Particle Filters are Non-Deterministic
For the same input, a particle filter can produce different outputs, which makes them difficult to
predict and debug.
Fix: Use a fixed random seed, which will produce consistent results.
8
6
The Ugly (Common Misconceptions)
1. Particle Filter = “PF’s are sample from motion model, weight by observation
model.” is wrong. You don’t need to sample from the motion model, and in practice you
often don’t. Sometimes, for example, you might sample directly from the observation model.
2. PFs are for localization only is wrong. You can apply particle filters to any state estimation problem.
3. PFs are anything that samples is wrong. NIS uses samples, but does not resample.
Another counter-example is a Gibbs filter.
9
Download