Statistical Techniques in Robotics (STR, S16) Lecture#15 (Monday, March 7) Lecturer: Byron Boots 1 Kalman Filtering Gauss Markov Model Consider X1 , X2 , ..., Xt , Xt+1 to be the state variables and Y1 , Y2 , ..., Yt , Yt+1 be the sequence of corresponding observations. As in Hidden Markov models, conditional independencies (see Figure 1) dictate that past and future states are uncorrelated given the current state, Xt at time t. For example, if we know what X2 is, then no information about X1 can possibly help us to reason about what X3 should be. We are interested in a special case where xi and yi are jointly Gaussian distributed. X1 X2 Y1 Y2 ------------- Xt Xt+1 ... Yt Yt+1 ... Figure 1: The Independence Diagram of a Gauss-Markov model The update for state variable Xt+1 is given by: Xt+1 = AXt + where X0 ∼ N (µ0 , Σ0 ) ∼ N (0, Q) Xt+1 |Xt ∼ N (AXt , Q) The corresponding observation Yt+1 is given by equation: Yt+1 = CXt+1 + δ where Y0 ∼ N (µ0 , 0 ) δ ∼ N (0, R) Each component is defined as follows: 1 • At : Matrix (n × n) that describes how the state evolves from t to t-1 without controls or noise. • Ct : Matrix (k × n) that describes how to map the state Xt to an observation Yt , where k is the number of observations. • t , δt : Random variables representing the process and measurement noise that are assumed to be independent and normally distributed with n × n noise covariances Rt and Qt respectively. We want to find xt |y1...t , so we need to calculate µxt and Σxt . In a real situation we would first try to remove all outliers to achieve a more stable result. 2 Lazy Gauss Markov Filter Let us take an example of a velocity-controlled mobile robot equipped with a GPS sensor. I (∆t)I pos pos + = AXt + = Xt+1 = vel t 0 I vel t+1 pos + δ = CXt+1 + δ Yt+1 = I 0 vel t+1 where 2 γ I 0 = N (0, Q) ∼ N 0, 0 σ2I 0 0 = N (0, R) δ ∼ N 0, 0 ξ2I Lazy Gauss Markov filter has two steps; prediction (motion update) and correction (observation update). We denote Gaussian parameters after prediction step but before correction step by superscript ( )− . 2.1 Prediction Step (Motion Update) For Xt |Y1,...,t ∼ N (µt , Σt ), we have the following by applying the motion model − Xt+1 |Y1,...,t ∼ N (µ− t+1 , Σt+1 ) where µ− t+1 = Aµt T Σ− t+1 = AΣt A + Q 2 Proof Mean: µ− t+1 = E[Xt+1 ] = E[AXt + ] = E[AXt ] + E[] = AE[Xt ] (since the mean of is 0) = Aµt Variance: T Σ− t+1 = E[(Xt+1 − µt+1 )(Xt+1 − µt+1 ) ] = E[(AXt + − Aµt )(AXt + − Aµt )T ] = E[(AXt − Aµt )(AXt − Aµt )T ] + V ar() = AE[(Xt − µt )(Xt − µt )T ]AT + Q = AΣt AT + Q 2.2 Correction Step (Observation Update) For the observation model the natural parameterization is more suitable as it involves multiplication − − e − of terms. So we first convert Xt+1 |Y1,...,t ∼ N (µ− t+1 , Σt+1 ) to Xt+1 |Y1,...,t ∼ N (Jt+1 , Pt+1 ). Then we apply observation update. Note that P (Xt+1 |Y1,...,t+1 ) ∝ P (Yt+1 |Xt+1 )P (Xt+1 ) by Markov assumption. 1 T 1 T −1 −T P (Yt+1 |Xt+1 )P (Xt+1 ) ∝ exp J Xt+1 − Xt+1 P Xt+1 exp − (Yt+1 − CXt+1 ) R (Yt+1 − CXt+1 ) 2 2 1 T −1 T T −1 T −1 = exp − −2Yt+1 R CXt+1 + Xt+1 C R CXt+1 + Yt+1 R Yt+1 2 1 T −1 T T −1 = exp − −2Yt+1 R CXt+1 + Xt+1 C R CXt+1 2 The last term in the second line is constant with respect to xt+1 , so it can be added to the the marginalization term. Therefore we have the observation update as follows. + − Jt+1 = Jt+1 + C T R−1 Yt+1 + − Pt+1 = Pt+1 + C T R−1 C 2.3 KF BLR Note that this parameterization is directly related to Bayes Linear Regression (BLR) if it meets the following conditions: • X is equivalent to θ in BLR. 3 • δ is equivalent to the measurement noise for Y in BLR. • is zero because θ, the state in BLR, is the same for t. • Q is going to 0 as t → ∞. It is nonzero if the noise might be changing as a function of time. • A is an identity matrix. • Y is equivalent to Y in BLR. • C is equivalent to xt in BLR which may be different at every timestep. In short: KF to BLR is given by: BLR 2.4 ΘT = X σ2 = R 0=Q I=A Y =Y X T = Ct KF Performance Lazy Gauss Markov can be expressed in two forms: • When expressed in terms of moment parameters, µ and Σ, it acts as Kalman Filter. • When expressed in terms of natural parameters, J and P , it acts as Information Filter. Kalman filters, as we will see, require matrix multiplications, approximately O(n2 ) time, to do a prediction step, yet require matrix inversions, approximately O(n2.8 ) time, to perform the observation update. Information filters are the exact opposite, requiring matrix inversions for the prediction step and matrix multiplications for the observation update. As mentioned above, the conversion between moment and natural parameterization requires an inversion of the covariance matrix, so switching between the two can be costly. Depending on the ratio of motion model updates to observation model updates one filter may be faster than the other. 3 3.1 The Kalman Filter Introduction The Kalman filter was invented as a technique for filtering and prediction in linear Gaussian systems. It implements belief computations (Bayesian filtering) for continuous spaces and is not applicable to discrete/hybrid state spaces. Kalman filters represent beliefs by the moment parameterization, where the belief at time t is represented by a mean µt and covariance Σt . Contrast this with the Information Filter, which describes belief in terms of the natural paramaterization. All computed posteriors are Gaussian 4 if the following properties hold (Markov assumption must hold as well). These three items ensure that all posteriors are always Gaussian for any time t. 1. The state transition probability p(xt |ut , xt−1 ) must be linear in their arguments and with added Gaussian noise. This probability therefore assumes linear dynamics and in this form is called linear Gaussian. It is represented by the following equation : where ε ∼ N (0, Q) xt+1 = Axt + ε, (1) 2. The measurement probability p(yt |xt ) must also be linear with its arguments and with added Gaussian noise. This probability is represented by the following probability: where ε ∼ N (0, R) yt+1 = Cxt + δ, (2) 3. The initial belief must be normally distributed with some mean µ0 and covariance Σ0 . Observation Update in terms of moment parameters µ and Σ: Above, we presented the Lazy Gauss Markov Filter, which did the motion step in the moment parameterization, and the observation step in the natural parameterization. We are now going to derive both steps in the moment parameterization, resulting in the Kalman Filter. Combining (1) and (2) together, we can show how the state and measurement are thought to come from a normal distribution with a mean vector µ and covariance matrix Σ: Xt Yt ∼N µXt µY t ΣXX ΣXY ΣY X ΣY Y The observation update is: µX|Y = µX + ΣXY Σ−1 Y Y (Y − µY ) ΣX|Y = ΣXX − ΣXY Σ−1 Y Y ΣY X We know that µY = CµX ΣY Y = R + CΣXX C T Therefore, we have to find ΣXY to calculate the remaining terms in equation 1 and 2. By definition, ΣXY = E[(X − µX )(Y − µY )T ] = E[(X − µX )(Y − CµX )T ] However, Y = CX + δ with δ having 0 mean and independent of all other observations. ΣXY = E[(X − µX )(X − µX )T ]C T ⇒ ΣXY = ΣXX C T Plugging these values into the conditional mean and covariance above results in µX|Y = µX + ΣXX C T (R + CΣXX C T )−1 (Y − CµX ) ΣX|Y = ΣXX − ΣXX C T (R + CΣXX C T )−1 CΣXX 5 3.2 The Kalman Filter Algorithm The complete Kalman filter algorithm is outlines below. The input to the filter is the belief represented by the mean µt and covariance Σt at the previous timestep. µ− t+1 = Aµt T Σ− t+1 = AΣt A + Q − − − T T −1 µt+1 = µt+1 + Σ− t+1 C (CΣt+1 C + R) (yt − Cµt+1 ) − − − T T −1 Σt+1 = Σ− t+1 − Σt+1 C (CΣt+1 C + R) CΣt+1 The first two lines of the algorithm calculate predicted beliefs represented by µ− and Σ− at the new time (t + 1) without incorporating the measurement yt . The mean is then updated using the − deterministic version of the state transition function shown in (1) using the variables µ− t and Σt . In the final two lines, the predicted belief is updated to the posterior belief by incorporating the − T −1 is known as the Kalman Gain. It details T measurement yt . The quantity Σ− t+1 C (CΣt+1 C + R) the degree to which each measurement is incorporated into the new state estimate. The third line manipulates the predicted mean in proportion to the Kalman Gain with the quantity (yt − Cµ− t ), known as the innovation. 3.3 Kalman Filter Illustration Figure 2: Kalman filtering example for 1D localization Figure 2 above illustrates the Kalman filter algorithm for simple 1D localization. Supposing the robot moves in the horizontal direction, the prior of robot’s position is shown as the red line in (a). Polling its sensors gives the distribution shown by the green line in (a), and the blue line is a result 6 of the final two lines of the Kalman filter algorithm shown above. In (b) the robot has moved to a new position illustrated by the magenta line. It polls its sensor that report back the position given by the cyan line in (c), and the update from the algorithm gives the posterior given by the yellow line in (d). The Kalman filter alternates the sensor update step (lines 3-4), which incorporates sensor data into the present belief, with the motion update step (lines 1-2), where the belief is predicted according to the action taken by the robot. 7