Statistical Techniques in Robotics (STR, S16) Lecture#04 (Wednesday, January 27) Lecturer: Byron Boots 1 Importance Sampling, Particle Filters Introduction Last time, we discussed the Monte Carlo Method. In this lecture we will first discuss Importance Sampling and then we will discuss Particle Filters. In particular, we will explain how they work, the good, the bad, and the ugly of Particle Filters. We will also talk about how to fix the bad. 1.1 Return to Robots In the case of a robot, we would like to compute the expected value of a function f knowing previous location xt−1 and measurement zt . Ep(xt |zt ,xt−1 ) [f (xt )] (1) We will likely never know a closed form solution for this distribution. So, we pursue the previously described sample based Monte Carlo Method. This would require samples drawn from the distribution p(xt |zt , xt−1 ), (2) 1 p(zt |xt )p(xt |xt−1 ). Z (3) which can be rewritten as The distribution is the product of the measurement model, p(zt |xt ), and the robot motion model, p(xt |xt−1 ). It is not obvious how to sample from the product of these distributions. It is, however, straightforward to sample from the robot motion model, a classic example of which is shown in Fig. 1.1. 1 Figure 1: The motion model: Posterior distributions of the robot’s pose upon executing the motion command illustrated by the dashed line. Isocontours are shown for the probability of the particle. This plot has been projected into 2-D. The original density is three-dimensional, taking the robot’s heading direction, θ, into account. 2 2 Importance sampling (IS) Although we are unable to sample from the required distribution, p(x), we are able to evaluate p(x) at desired xi . Thus, we can sample from a known distribution q(x) and evaluate p(x) to correctly reweigh the samples. The trick: Sample from the available distribution and reweight samples to fix it. This trick is referred to as Importance Sampling. Manipulating the expectation over p(x) yields the following approximation: X E [f (x)] = p(x)f (x) (4a) p(x) = X p(x)f (x) q(x) q(x) p(x) f (x) q(x) p(x) = E[ f (x)] q(x) q(x) = ≈ X q(x) N 1 X p(xi ) f (xi ). N q(xi ) (4b) (4c) (4d) (4e) i=1 p(x) q(x) is called the importance weight. When we are undersampling an area, it weights it stronger; likewise, oversampled areas are weighted more weakly. 2.1 Example q p Figure 2: Distribution we wish to sample from, p, and distribution we can sample from, q. Consider the above (somewhat contrived) pair of functions. Our estimate of the expectation of p becomes: N 1 X p(xi ) f (xi ) →N →∞ Ep [f (x)] N q(xi ) i=1 3 (5) 2.2 Example Applying the importance sampling to Section 1.1, we can set q = p(xt |xt−1 ) as the motion model, and p = p(xt |zt , xt−1 ) = Z1 p(zt |xt )p(xt |xt−1 ) as the posterior. p = η p(zt |xt )p(xt |xt−1 ) (6) X i ∼ p(xt |xt−1 ) (7) wi = p = p(zt |xt ) q (8) N 1 X i i wX N X̄ = E[X] = (9) i=1 But notice that we dropped the normalizer from bayes rule! (if we don’t know Z) 2.3 Normalized IS Proposed fix: we can normalize estimate using N X 1 X i w →N →∞ p(zt |xt )p(xt ) = Z N x i=1 X̄ = If P iw i 1 X N (10) t 1 N wi P w Xi = i X wi P i Xi w is low, we haven’t drawn enough samples or the probability of the observation is low. Algorithm 1 NIS Filter Algorithm for all i do sample xi0 from p(x0 ) end for for all t do for all i do sample xit from p(xt |xit−1 ) wi ∗ = p(zt |xit ) end for w̃i = Pwwi i end for P E[f (x)] = i w̃i f (xi ) 4 Figure 3: Resampling 3 SIR Filter (Particle filter, Condensation) 3.1 Problem - recursive importance sampling Unfortunately, if we implement Importance Sampling recursively and keep the samples fixed, then as time progresses most samples will become useless since they do not match the observations. The weights then go towards either 0 (most particles) or 1 (the few particles that match the observation). Essentially, we are using particles to keep track of parts of the state space that we know are not likely. The true path will be a sample with vanishing probability, so we have to throw out low probability samples and resample in high probability regions. This is known as a Particle or SIR filter. 3.2 Solution - Particle Filter (Sampling with Importance Resampling) The solution is to do NIS, but to resample particles instead of changing the weights. This way, we do not waste particles on unlikely portions of the state space, but instead use our particles more intelligently. Given the previous set of particles χt−1 , ut , and zt , we perform each of the following steps. 1. Apply the forward motion model to all particles. This means that, for each i, we sample χit from p(xt |xit−1 , ut ). 2. Update the weights for each particle, which means we set w̃i = p(zt |xit ) for each i. 3. Normalize the weights. Set wi = w̃i Σi w̃i for each i. 4. Perform the resampling step. We resample each particle according to the weights wi . Resampling proceeds as shown in Fig. 3: we select the particles with replacement according to their weight. Intuitively, we can think of this as lining the particles up on a dartbord that is 1 unit long, with width according to weight, and then throwing darts to select which ones to keep (possibly taking two copies of one particle). Having selected the particles, we set all their weights to 1/N , since otherwise we are double counting the weights. 5. Reset each particle’s weight to 1 n and return the set of particles. This approach has a variety of names: SIR, Particle Filter, CONDENSATION, etc., and can be applied to a variety of problems. All that a particle filter entails is sampling particles, reweighting them according to a belief, normalizing the weights, and then resampling. Thus, while this lecture primarily deals with localization, particle filters can be applied to a great deal of other problems. 5 4 The Good Particle filters have a number of advantages which have led to them being very popular in modern robotics. 1. Since the particles approximate the posterior belief, we can answer any question we can pose in terms of an expectation. 2. Works for any observation model and any motion model we can design. 3. Particle filters scale well; they are ’embarassingly parallelizable’. 4. They work for high dimensional systems as they are independent of the size of the system. 5. The algorithm is relatively easy to implement. 5 5.1 The Bad Lack of diversity Over time, especially with uninformative sensor readings, samples tend to congregate. One metric to benchmark a particle filter is how long it takes for particles to congregate. Take the following example. (a) Two rooms with equal probability (b) The same filter, after repeated resampling Figure 4: A problem with assuming the independence of the observations and using naı̈ve sampling. Without gaining any new information, the particle filter converges on one of the rooms. Example. There are two rooms, and the robot is unsure about which room it is in. The sensor readings show equal probability of being in either room. We start out with N particles equally distributed between the two rooms. Over time, the particles will converge to one room, without getting any new sensor data. Why does this happen? Particle filters are non-deterministic. At the resampling step, one room is likely to gain particles. Once this happens, at the next resampling step, that room is more likely to gain particles. Eventually, all the particles converge to one state. Once a state loses particles, it cannot regain them without motion. 6 Practical Fixes: 1. Skip the resampling step if you detect you are not receiving new information.You might detect this by looking at how max({wi })/ min({wi }) or entropy of the weights changes. 2. Use low variance resampling (see Fig. 5) (AKA “Stochastic Universal Sampler” in genetic algorithms domain). In one dimension (AKA the “dartboard” example), this method works as follows. A single random number r is drawn, and particles are selected at every N1 units, offset from r. This has the desirable property that any particle with normalized weight greater than N1 is guaranteed to be drawn, which eliminates our “Lack of Diversity” problem. Figure 5: Low Variance Resampling; in practice, this approach is preferred over the naı̈ve approach. 3. Add more particles. This will increase the number of resamples the particles need to congregate to one state. Adding more particles usually helps with most problems. 4. Inject uniform samples. This is an unprincipled hack, and should be used as a last resort. 5.2 Precise Observation Models may make the PF fail Consider a very accurate sensor with an observation model similar to that in Fig. 6. If we sample uniformly along xt (i.e after we push particles through a motion model), it is unlikely that any of the sames will fall under the slim peak of the model. In fact, it may be more likely for a particle to fall wider under the second, wider, peak. Fixes: There are a number of fixes for this problem, including more principled approaches and unprincipled and less effective hacks. 1. Sample from a better model than the motion model: the underlying problem is that the the proposal distribution (motion model) does not match the distribution that we want to sample from (observation model). The best, and most principled solution is to change the distribution. For instance, if one can sample directly from the observation model, one should do so. This is generally known as a dual particle filter. 2. Rao-Blackwellization: Solve as much of the problem analytically to reduce the dimensionality of your resampling step. 3. Squash the observation model: take all of the probabilities to some power 1/p less than 1. When this is done, p observations count as 1 observation. 4. More particles: Sample more particles to ensure that that the peak is sampled (although this is unsatisfactory, since it doesn’t guarantee a particle will land under the desired peak). 7 Figure 6: A very good observation model. Counterintuitively, this reliabiilty may pose problems for a vanilla implementation of a particle filter. 5. Pretend your observation model is worse: convolve the observation model with a Gaussian, and pretend that the resulting much less confident observation model is the actual model. This is a last resort, as it throws away a lot of valuable data. 5.3 Measuring Particle Filter Performance is Difficult There is no convenient way of relating accuracy to number of particles. Similarly, particle filters offer no measure of confidence in their readings. 5.4 Particle Filters are Expensive Computationally Despite being scalable (parallelizable), a good particle filter still requires a LOT of particles. Fix: If your distribution is unimodal, it is a good idea to use a Kalman filter instead. 5.5 Particle Filters are Non-Deterministic For the same input, a particle filter can produce different outputs, which makes them difficult to predict and debug. Fix: Use a fixed random seed, which will produce consistent results. 8 6 The Ugly (Common Misconceptions) 1. Particle Filter = “PF’s are sample from motion model, weight by observation model.” is wrong. You don’t need to sample from the motion model, and in practice you often don’t. Sometimes, for example, you might sample directly from the observation model. 2. PFs are for localization only is wrong. You can apply particle filters to any state estimation problem. 3. PFs are anything that samples is wrong. NIS uses samples, but does not resample. Another counter-example is a Gibbs filter. 9