C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 19: Tracking Tracking scenarios • • • • Follow a point Follow a template Follow a changing template Follow all the elements of a moving person, fit a model to it. 2 Things to consider in tracking What are the dynamics of the thing being tracked? How is it observed? 3 Three main issues in tracking 4 Simplifying Assumptions 5 Kalman filter graphical model and corresponding factorized joint probability x1 x2 x3 y1 y2 y3 P( x1 , x2 , x3 , y1 , y2 , y3 ) P( x1 ) P( y1 | x1 ) P( x2 | x1 ) P( y2 | x2 ) P( x3 | x2 ) P( y3 | x3 ) 6 Tracking as induction • Make a measurement starting in the 0th frame • Then: assume you have an estimate at the ith frame, after the measurement step. • Show that you can do prediction for the i+1th frame, and measurement for the i+1th frame. 7 Base case 8 Prediction step given 9 Update step given 10 The Kalman Filter • Key ideas: – Linear models interact uniquely well with Gaussian noise - make the prior Gaussian, everything else Gaussian and the calculations are easy – Gaussians are really easy to represent --- once you know the mean and covariance, you’re done 11 Recall the three main issues in tracking (Ignore data association for now) 12 The Kalman Filter 13 [figure from http://www.cs.unc.edu/~welch/kalman/kalmanIntro.html] The Kalman Filter in 1D • Dynamic Model • Notation Predicted mean Corrected mean 14 The Kalman Filter 15 Prediction for 1D Kalman filter • The new state is obtained by – multiplying old state by known constant – adding zero-mean noise • Therefore, predicted mean for new state is – constant times mean for old state • Old variance is normal random variable – variance is multiplied by square of constant – and variance of noise is added. 16 17 The Kalman Filter 18 Measurement update for 1D Kalman filter Notice: – if measurement noise is small, we rely mainly on the measurement, – if it’s large, mainly on the prediction – s does not depend on y 19 20 Kalman filter for computing an on-line average • What Kalman filter parameters and initial conditions should we pick so that the optimal estimate for x at each iteration is just the average of all the observations seen so far? 21 Kalman filter model d i 1, mi 1, s di 0, s mi 1 Initial conditions x0 0 s 0 Iteration 0 1 x i 0 y0 x i y0 s i s i y0 y1 2 1 1 1 2 2 y0 y1 2 y0 y1 y2 3 1 2 1 3 22 What happens if the x dynamics are given a non-zero variance? 23 Kalman filter model d i 1, mi 1, s di 1, s mi 1 Initial conditions x0 0 s 0 Iteration x i 0 1 2 0 y0 y0 2 y1 3 xi y0 s i 2 s i 1 2 3 y0 2 y1 3 y0 2 y1 5 y2 8 5 3 5 8 24 Linear dynamic models • A linear dynamic model has the form x i N D i1x i1 ; d i yi N Mi xi ; mi • This is much, much more general than it looks, and extremely powerful 25 Examples of linear state space models • Drifting points x i N D i1x i1 ; d i yi N Mi xi ; mi – assume that the new position of the point is the old one, plus noise D = Identity cic.nist.gov/lipman/sciviz/images/random3.gif 26 http://www.grunch.net/synergetics/images/random 3.jpg x i N D i1x i1 ; d i Constant velocity yi N Mi xi ; mi • We have ui ui1 tvi1 i vi vi1 i – (the Greek letters denote noise terms) • Stack (u, v) into a single state vector u 1 vi 0 – which is the form we had above xi t u noise 1 v i1 Di-1 xi-1 27 velocity position position time measurement,position Constant Velocity Model time 28 x i N D i1x i1 ; d i Constant acceleration yi N Mi xi ; mi • We have ui ui1 tvi1 i vi vi1 tai1 i ai ai1 i – (the Greek letters denote noise terms) • Stack (u, v) into a single state vector u 1 t v 0 1 ai 0 0 – which is the form we had above xi Di-1 0 u t v noise 1 a i1 xi-1 29 velocity position position Constant Acceleration Model time 30 Periodic motion N M x ; x i N D i1x i1 ; d i yi i i mi Assume we have a point, moving on a line with a periodic movement defined with a differential eq: can be defined as with state defined as stacked position and velocity u=(p, v) 31 x i N D i1x i1 ; d i Periodic motion yi N Mi xi ; mi Take discrete approximation….(e.g., forward Euler integration with t stepsize.) xi Di-1 xi-1 32 n-D Generalization to n-D is straightforward but more complex. 33 n-D Generalization to n-D is straightforward but more complex. 34 n-D Prediction Generalization to n-D is straightforward but more complex. Prediction: • Multiply estimate at prior time with forward model: • Propagate covariance through model and add new noise: 35 n-D Correction Generalization to n-D is straightforward but more complex. Correction: • Update a priori estimate with measurement to form a posteriori 36 n-D correction Find linear filter on innovations which minimizes a posteriori error covariance: T E x x x x K is the Kalman Gain matrix. A solution is 37 Kalman Gain Matrix As measurement becomes more reliable, K weights residual more heavily, 1 K M lim i m 0 As prior covariance approaches 0, measurements are ignored: Ki 0 lim i 0 38 39 position velocity position time Constant Velocity Model 40 time 41 position position This is figure 17.3 of Forsyth and Ponce. The notation is a bit involved, but is logical. We plot the true state as open circles, measurements as x’s, predicted means as *’s with three standard deviation bars, corrected means as +’s with three standard deviation bars. time 42 position time The o-s give state, x-s measurement. 43 Smoothing • Idea – We don’t have the best estimate of state - what about the future? – Run two filters, one moving forward, the other backward in time. – Now combine state estimates • The crucial point here is that we can obtain a smoothed estimate by viewing the backward filter’s prediction as yet another measurement for the forward filter 44 position Forward estimates. The o-s give state, x-s measurement. time 45 position Backward estimates. The o-s give state, x-s measurement. time 46 position Combined forward-backward estimates. The o-s give state, x-s measurement. time 47 2-D constant velocity example from Kevin Murphy’s Matlab toolbox 48 [figure from http://www.ai.mit.edu/~murphyk/Software/Kalman/kalman.html] 2-D constant velocity example from Kevin Murphy’s Matlab toolbox • MSE of filtered estimate is 4.9; of smoothed estimate. 3.2. • Not only is the smoothed estimate better, but we know that it is better, as illustrated by the smaller uncertainty ellipses • Note how the smoothed ellipses are larger at the ends, because these points have seen less data. • Also, note how rapidly the filtered ellipses reach their steady-state (“Ricatti”) values. 49 [figure from http://www.ai.mit.edu/~murphyk/Software/Kalman/kalman.html] Resources • Kalman filter homepage http://www.cs.unc.edu/~welch/kalman/ (kalman filter demo applet) • Kevin Murphy’s Matlab toolbox: http://www.ai.mit.edu/~murphyk/Software/Kalman/k alman.html 50 Embellishments for tracking • Richer models of P(xn|xn-1) • Richer models of P(yn|xn) • Richer model of probability distributions 51 Abrupt changes What if environment is sometimes unpredictable? Do people move with constant velocity? Test several models of assumed dynamics, use the best. 52 Multiple model filters Test several models of assumed dynamics 53 [figure from Welsh and Bishop 2001] MM estimate Two models: Position (P), Position+Velocity (PV) 54 [figure from Welsh and Bishop 2001] P likelihood 55 [figure from Welsh and Bishop 2001] No lag 56 [figure from Welsh and Bishop 2001] Smooth when still 57 [figure from Welsh and Bishop 2001] Embellishments for tracking • Richer models of P(yn|xn) • Richer model of probability distributions 58 Jepson, Fleet, and El-Maraghi tracker 59 Wandering, Stable, and Lost appearance model • Introduce 3 competing models to explain the appearance of the tracked region: – A stable model—Gaussian with some mean and covariance. – A 2-frame motion tracker appearance model, to rebuild the stable model when it gets lost – An outlier model—uniform probability over all appearances. • Use an on-line EM algorithm to fit the (changing) model parameters to the recent appearance data. 60 Jepson, Fleet, and El-Maraghi tracker for toy 1-pixel image model Red line: observations Blue line: true appearance state Black line: mean of the stable process 61 Mixing probabilities for Stable (black), Wandering (red) and Lost (green) processes 62 Non-toy image representation Phase of a steerable quadrature pair (G2, H2). Steered to 4 different orientations, at 2 scales. 63 The motion tracker • Motion prior, P(xn | xn1) , prefers slow velocities and small accelerations. • The WSL appearance model gives a likelihood for each possible new position, orientation, and scale of the tracked region. • They combine that with the motion prior to find the most probable position, orientation, and scale of the tracked region in the next frame. • Gives state-of-the-art tracking results. 64 Jepson, Fleet, and El-Maraghi tracker 65 Jepson, Fleet, and El-Maraghi tracker Add fleet&jepson tracking slides Far right column: the stable component’s mixing probability. Note its behavior at occlusions. 66 Embellishments for tracking • Richer model of P(yn|xn) • Richer model of probability distributions – Particle filter models, applied to tracking humans 68 (KF) Distribution propagation prediction from previous time frame Noise added to that prediction Make new measurement at next time frame 69 [Isard 1998] Distribution propagation 70 [Isard 1998] Representing non-linear Distributions 71 Representing non-linear Distributions Unimodal parametric models fail to capture realworld densities… 72 Discretize by evenly sampling over the entire state space Tractable for 1-d problems like stereo, but not for high-dimensional problems. 73 Representing Distributions using Weighted Samples Rather than a parametric form, use a set of samples to represent a density: 74 Representing Distributions using Weighted Samples Rather than a parametric form, use a set of samples to represent a density: Sample positions Probability mass at each sample This gives us two knobs to adjust when representing a probability density by samples: the locations of the samples, and the probability weight on each sample. 75 Representing distributions using weighted samples, another picture 76 [Isard 1998] Sampled representation of a probability distribution You can also think of this as a sum of dirac delta functions, each of weight w: p f ( x) wi ( x u i ) i 77 Tracking, in particle filter representation x1 x2 x3 y1 y2 y3 p f ( x) wi ( x u i ) i P(xn | y1...yn ) k P(yn | xn ) dxn1P(xn | xn1)P(xn1 | y1...yn1) Prediction step Update step 78 Particle filter • Let’s apply this sampled probability density machinery to generalize the Kalman filtering framework. • More general probability density representation than uni-modal Gaussian. • Allows for general state dynamics, f(x) + noise 79 Sampled Prediction = ? Drop elements to marginalize to get ~= 80 Sampled Correction (Bayes rule) Prior posterior Reweight every sample with the likelihood of the observations, given that sample: yielding a set of samples describing the probability distribution after the correction (update) step: 81 Naïve PF Tracking • Start with samples from something simple (Gaussian) Take each particle from the prediction step and • Repeat modify the old weight by multiplying by the new likelihood – Correct – Predict s Run every particle through the dynamics function and add noise. But doesn’t work that well because of sample impoverishment… 82 Sample impoverishment 10 of the 100 particles, along with the true Kalman filter track, with variance: time 83 Resample the prior In a sampled density representation, the frequency of samples can be traded off against weight: s.t. … These new samples are a representation of the same density. I.e., make N draws with replacement from the original set of samples, using the weights as the probability of drawing a sample. 84 Resampling concentrates samples 85 A practical particle filter with resampling 86 Pictorial view 87 [Isard 1998] 88 89 Animation of condensation algorithm 90 [Isard 1998] Applications Tracking – hands – bodies – Leaves What might we expect? Reliable, robust, slow 91 Contour tracking 92 [Isard 1998] Head tracking Picture of the states represented by the top weighted particles The mean state 93 [Isard 1998] Leaf tracking 94 [Isard 1998] Hand tracking 95 [Isard 1998] Desired operations with a probability density • Expected value • Marginalization • Bayes rule 97 Computing an expectation using sampled representation using p f ( x) wi ( x u i ) i 98 Marginalizing a sampled density If we have a sampled representation of a joint density and we wish to marginalize over one variable: we can simply ignore the corresponding components of the samples (!): 99 Marginalizing a sampled density 100 Sampled Bayes rule k p(V=v0|U)p(U) 101