EECS 274 Computer Vision Tracking Tracking • Motivation: Obtain a compact representation from an image/motion sequence/set of tokens • Should support application • Broad theory is absent at present • Some slides from R. Collins lecture notes • Reading: FP Chapter 17 Tracking • Very general model: – We assume there are moving objects, which have an underlying state X – There are measurements Y, some of which are functions of this state – There is a clock • at each tick, the state changes • at each tick, we get a new observation • Examples – object is ball, state is 3D position+velocity, measurements are stereo pairs – object is person, state is body configuration, measurements are frames, clock is in camera (30 fps) Three main steps Simplifying assumptions Tracking as induction • Assume data association is done – we’ll talk about this later; a dangerous assumption • Do correction for the 0’th frame • Assume we have corrected estimate for i’th frame – show we can do prediction for i+1, correction for i+1 General model for tracking • The moving object of interest is characterized by an underlying state X • State X gives rise to measurements or observations Y • At each time t, the state changes to Xt and we get a new observation Yt X1 X2 Y1 Y2 … Can be better explained with graphical model Xt Yt Base case Induction step Given Induction step Explanation with graphical model Explanation with graphical model Explanation with graphical model Linear dynamic models • Use notation ~ to mean “has the pdf of”, N(a, b) is a normal distribution with mean a and covariance b. • Then a linear dynamic model has the form xi ~ N (Di 1xi 1; Σ di ) y i ~ N (M i xi ; Σ mi ) D: transition matrix M: observation (measurement) matrix • This is much, much more general than it looks, and extremely powerful Propagation of Gaussian densities Examples • Drifting points – we assume that the new position of the point is the old one, plus noise. – For the measurement model, we may not need to observe the whole state of the object • e.g. a point moving in 3D, at the 3k’th tick we see x, 3k+1’th tick we see y, 3k+2’th tick we see z • in this case, we can still make decent estimates of all three coordinates at each tick. – This property, which does not apply to every model, is called Observability Examples • • • • Points moving with constant velocity Periodic motion etc. Points moving with constant acceleration Moving with constant velocity • We have p i p i 1 tv i 1 ε i v i v i 1 ς i – (the Greek letters denote noise terms) • Stack (u, v) into a single state vector, xi p Id v i 0 t p noise I d v i 1 – which is the form we had above xi ~ N (Di 1xi 1; Σ di ) y i ~ N (M i xi ; Σ mi ) Id Di 0 tI d , M i I d Id 0 Moving with constant acceleration • We have p i p i 1 tv i 1 ε i v i v i 1 ta i 1 ς i a i a i 1 ξ i – (the Greek letters denote noise terms) • Stack (u, v) into a single state vector u I d t 0 u v 0 I d t v noise a 0 0 I a d i 1 i – which is the form we had above Id Di 0 0 tI d Id 0 0 tI d , M i I d I d 0 0 Constant velocity dynamic model velocity state 1st component of state position position time measurement and 1st component of state time Constant acceleration dynamic model velocity position position time Kalman filter • Key ideas: – Linear models interact uniquely well with Gaussian noise • make the prior Gaussian, everything else Gaussian and the calculations are easy – Gaussians are really easy to represent • once you know the mean and covariance, you’re done Kalman filter in 1D • Dynamic Model • Notation Predicted mean Corrected mean Before/after i-th measurement Prediction for 1D Kalman filter • Because the new state is obtained by – multiplying old state by known constant – adding zero-mean noise • Therefore, predicted mean for new state is – constant times mean for old state • Predicted variance is – sum of constant^2 times old state variance and noise variance – old state is normal random variable, multiplying normal rv by constant implies mean is multiplied by a constant variance by square of constant, adding zero mean noise adds zero to the mean, adding rv’s adds variance Correction for 1D Kalman filter • Pattern match to identities given in book – basically, guess the integrals, get: • Notice: – if measurement noise is small, we rely mainly on the measurement, – if it’s large, mainly on the prediction mi yi mi 0 xi , i mi mi m is large xi x , i i i i Example: Kalman filtering : state : measuremen t * : xi (predictio n before measuremen t) : xi (correctio n after measuremen t) Kalman filter: general case In higher dimensions, derivation follows the same lines, but isn’t as easy. Expressions here. the measurement standard deviation is small, so the state estimates are rather good Smoothing • Idea – We don’t have the best estimate of state what about the future? – Run two filters, one moving forward, the other backward in time. – Now combine state estimates • The crucial point here is that we can obtain a smoothed estimate by viewing the backward filter’s prediction as yet another measurement for the forward filter – so we’ve already done the equations Forward filter : state : measuremen t * : xi (predictio n before measuremen t) : xi (correctio n after measuremen t) Backward filter : state : measuremen t * : xi (predictio n before measuremen t) : xi (correctio n after measuremen t) Forward-backward filter : state : measuremen t * : xi (predicito n before measuremen t) : xi (correctio n after measuremen t) Data association •Also known as correspondence problem • Given features detectors (measurements) are not perfect, how can one find correspondence between measurements and features in the model? Data association • Determine which measurements are informative (as not every measurement conveys the same amount of information) • Nearest neighbors – choose the measurement with highest probability given predicted state – popular, but can lead to catastrophe • Probabilistic Data Association (PDA) – combine measurements, weighting by probability given predicted state – gate using predicted state Data association Constant velocity model measurements are plotted with standard deviation (dash line) : state : measuremen t * : xi (predictio n before measuremen t) : xi (correctio n after measuremen t) Data association with NN •The blue track represents the actual measurements of state, and red tracks are noise •The noise is all over the place, and the gate around each prediction typically contains only one measurement (the right one). •This means that we can track rather well ( the blue track is very largely obscured by the overlaid measurements of state) : state •Nearest neighbors (NN) pick best measurements (consistent with predictions) work well in this case •Occasional mis-identifcaiton may not cause problems : measuremen t * : xi (predictio n before measuremen t) : xi (correctio n after measuremen t) Data association with NN • If the dynamic model is not sufficiently constrained, choosing the measurement that best may lead to failure : state : measuremen t * : xi (predictio n before measuremen t) : xi (correctio n after measuremen t) Data association with NN • But actually the tracker loses tracks •These problems occur because error can accumulate •it is now relatively easy to continue tracking the wrong point for a long time, •and the longer we do this the less chance there is of recovering the right point Probabilistic data association • Instead of choosing the region most like the predicted measurement, • we can exclude all regions that are too different and then use the others, • weighting them according to their similarity to prediction Instead of using P(y i | y 0 ,, y i 1 ) Eh y i P(h j | y 0 ,, y i 1 )y ij j superscipt indicates the region Nonlinear dynamics •As one repeats iterations of this function --- there isn’t any noise --- points move towards or away from points where sin(x)=0. •This means that if one has a quite simple probability distribution on x_0, one can end up with a very complex distribution (many tiny peaks) quite quickly. •Most easily demonstrated by running particles through this dynamics as in the next slide. xi 1 xi sin( xi ) Nonlinear dynamics •Top figure shows tracks of particle position against iteration for the dynamics of the previous slide, where the particle position was normal in the initial configuration (first graph at the bottom) •As the succeeding graphs show, the number of particles near a point (= histogram = p(x_n)) very quickly gets complex, with many tiny peaks. Propagation of general densities Factored sampling • Represent the state distribution non-parametrically – Prediction: Sample points from prior density for the state, p(x) – Correction: Weight the samples according to p(y|x) p (x | y ) kp(y | x) p (x) n p z (s ( n ) ) N p (s j 1 z ( j) ) p z ( x) p ( y | x) pxt | y 0 ,, y t py t | xt pxt | y 0 ,, y t 1 py t | xt pxt | y 0 ,, y t 1 dxt Representing p(y|x) in terms of {st-1(n), πt-1(n) } Particle filtering • We want to use sampling to propagate densities over time (i.e., across frames in a video sequence) • At each time step, represent posterior p(xt|yt) with weighted sample set • Previous time step’s sample set p(xt-1|yt-1) is passed to next time step as the effective prior Particle filtering Start with weighted samples from previous time step Sample and shift according to dynamics model Spread due to randomness; this is predicted density p(xt|yt-1) Weight the samples according to observation density Arrive at corrected density estimate p(xt|yt) M. Isard and A. Blake, “CONDENSATION -- conditional density propagation for visual tracking”, IJCV 29(1):5-28, 1998 Paramterization of splines Dynamic model p(xt|xt-1) • Can learn from examples using ARMA or LDS • Can be modeled with random walk • Often factorize xt Observation model p(yt|xt) Observation model: 2D case Particle filtering results Issues • Initialization • Obtaining observation and dynamics model – – – Generative observation model: “render” the state on top of the image and compare Discriminative observation model: classifier or detector score Dynamics model: learn (very difficult) or specify using domain knowledge • Prediction vs. correction – If the dynamics model is too strong, will end up ignoring the data – If the observation model is too strong, tracking is reduced to repeated detection • Data association – What if we don’t know which measurements to associate with which tracks? • Drift – Errors caused by dynamical model, observation model, and data association tend to accumulate over time • Failure detection Mean-Shift Object Tracking Start from the position of the model in the current frame Search in the model’s neighborhood in next frame Find best candidate by maximizing a similarity func. Repeat the same process in the next pair of frames … Model Candidate Current frame … Mean-Shift Object Tracking Target Representation Choose a reference target model Represent the model by its PDF in the feature space Choose a feature space 0.35 Quantized Color Space Probability 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 . color Kernel Based Object Tracking, by Comaniniu, Ramesh, Meer . . m Mean-Shift Object Tracking PDF Representation Target Candidate (centered at y) 0.35 0.3 0.3 0.25 0.25 Probability Probability Target Model (centered at 0) 0.2 0.15 0.1 0.2 0.15 0.1 0.05 0.05 0 0 1 2 3 . . . m 1 2 color q qu u 1..m 3 . . . m color m q u 1 u 1 Similarity f y f q , p y Function: p y pu y u 1..m m p u 1 u 1 Mean-Shift Object Tracking Finding the PDF of the target model xi i1..n candidate model Target pixel locations y 0 k ( x) A differentiable, isotropic, convex, monotonically decreasing kernel b( x ) The color bin index (1..m) of pixel x • Peripheral pixels are affected by occlusion and background interference Probability of feature u in model k xi b ( xi ) u y xi pu y Ch k h b ( xi ) u 2 0.3 0.3 0.25 0.25 0.2 Pixel weight 0.15 0.1 Normalization factor Probability Normalization factor Probability qu C Probability of feature u in candidate Pixel weight 0.1 0 0 1 2 3 . color . . m 0.2 0.15 0.05 0.05 2 1 2 3 . color . . m Mean-Shift Object Tracking Similarity Function Target model: q q1, , qm Target candidate: p y p1 y , Similarity function: f y f p y , q ? , pm y The Bhattacharyya Coefficient q q1 , , qm p1 y , p y q' 1 , pm y y 1 m p y q f y cos y pu y qu p y q u 1 T p' ( y ) Mean-Shift Object Tracking Target Localization Algorithm Start from the position of the model in the current frame q' Search in the model’s neighborhood in next frame similarity func. p' ( y ) f [ p' ( y), q ' ] Find best candidate by maximizing a Mean-Shift Object Tracking Approximating the Similarity Function m f y pu y qu u 1 Linear approx. (around y0) Model location: y0 Candidate location: y 1 m 1 m f y pu y0 qu pu y 2 u 1 2 u 1 qu pu y0 y xi pu y Ch k h b ( xi ) u 2 Independent of y Ch 2 y xi wi k h i 1 n 2 Density estimate! (as a function of y) Mean-Shift Object Tracking Maximizing the Similarity Function The mode of Ch 2 y xi wi k h i 1 n Important Assumption: The target representation provides sufficient discrimination One mode in the searched neighborhood 2 = sought maximum Mean-Shift Object Tracking Applying Mean-Shift The mode of Ch 2 y xi wi k h i 1 2 n = sought maximum y0 xi 2 xi g h i 1 y1 2 n y0 xi g h i 1 n Original MeanShift: y xi c k h i 1 n Find mode of 2 using y0 xi 2 xi wi g 2 h i 1 y xi y using 1 n y0 xi 2 h wi g h i 1 n Extended MeanShift: c wi k i 1 n Find mode of Mean-Shift Object Tracking About Kernels and Profiles A special class of radially K x ck symmetric kernels: x 2 The profile of kernel K k x g x y0 xi 2 xi wi g h i 1 y1 n y0 xi 2 wi g h i 1 n Extended MeanShift: y xi Find mode of c wi k i 1 h n 2 using Mean-Shift Object Tracking Choosing the Kernel A special class of radially symmetric kernels: K x ck x 2 Epanechnikov kernel: Uniform kernel: 1 x if x 1 k x 0 otherwise 1 if x 1 g x k x 0 otherwise y0 xi 2 xi wi g h i 1 y1 n y0 xi 2 wi g h i 1 n n y1 xw i 1 n i i w i 1 i Mean-Shift Object Tracking Adaptive Scale Problem: The scale of the target changes in time The scale (h) of the kernel must be adapted Solution: Run localization 3 times with different h Choose h that achieves maximum similarity Mean-Shift Object Tracking Results Feature space: 161616 quantized RGB Target: manually selected on 1st frame Average mean-shift iterations: 4 Mean-Shift Object Tracking Results Partial occlusion Distraction Motion blur