Human-Computer Interaction Tracking Hanyang University Jong-Il Park Tracking in computer vision Definition Problem of generating an inference about the motion of an object given a sequence of images Application Motion capture Recognition from motion Surveillance Who is doing what? Eg. Security, HCI(Kinect…) Tracking Establish state of object using time sequence state could be: position; position+velocity; position+velocity+acceleration or more complex, Eg. all joint angles for a person Biggest problem -- Data Association which image pixels are informative, which are not? Key ideas Tracking by detection if we know what an object looks like, that selects the pixels to use Tracking through flow if we know how an object moves, that selects the pixels to use Appearance vs. Flow Tracking by Detection Assume a very reliable detector (e.g. faces; back of heads) detections that are well spaced in images (or have distinctive properties) e.g. news anchors; heads in public Link detects across time only one - easy multiple - weighted bipartite matching but what if one is missing? Better: create abstract tracks link detects to track create tracks, reap tracks as required clean up spacetime paths Tracking by Known Appearance Even if we don’t have a detector Know rectangle in image (n) Want to find corresponding rectangle in (n+1) Search over nearby rectangles to find one that minimizes SSD error where sum is over pixels in rectangle Application stabilize players in TV sport Buidling Tracks Start at scattered points in image 1 perhaps corner detector responses For each in image (n), compute position in (n+1) as in previous slide Now check tracks patch in (n+1) should look like an affine transform of patch in 1 Prune bad tracks What if the patch deforms? Eg a football player’s jersey Colors are “similar” but SSD won’t work Idea: patch histogram is stable To track: repeat predict location of new patch search nearby for patch whose histogram matches original the best When are motions easy? Current procedure predict state obtaining measurement from prediction by search correct state Easy When the object is close to where you expect it to be eg Object guaranteed to move a little Large motions can be easy When they’re “predictable” e.g. ballistic motion e.g. constant velocity Need a theory to fuse this procedure with motion model General Model We assume there are moving objects, which have an underlying state X There are measurements Y, some of which are functions of this state Eg. There is a clock at each tick, the state changes at each tick, we get a new observation object is ball, state is 3D position+velocity, measurements are stereo pairs object is person, state is body configuration, measurements are frames, clock is in camera (30 fps) Tracking as an Abstract Inference Problem Internal state: X Measurement: Y Major Steps: 1. Prediction 2. Data association: determining which data are informative 3. Correction Independence Assumption Only the immediate past matters * X: Markov process Measurements depend only on the current state Bayes Rule Prediction Correction Linear Dynamic Model Model D: Transition matrix M: Measurement matrix Kalman filter is well-suited for this type of tracking Read Ch.11.3, Forsyth & Ponce, Computer Vision, 2012. Kalman filter Eg. Kalman filter *: predicted x: measured +: estimated o: true bar: 3s Nonlinear Dynamics Problems tend not to be normal may not be Gaussian(quite common in vision problem) multiple, well-separated modes Eg. Nonlinear model Eg. Nonlinear model Time Evolution & pdf Difficulties in Complicated Models In order to maintain handling multiple peaks handling multi-dimensional state vector How? Particle filtering is a useful approach. Sampled Representation Representation of pdf is NOT for representing a pdf itself BUT for computing some expectation Computing expectation using sampled representation Monte Carlo Integration where : sampling distribution : weight Obtaining a sampled representation of probability distribution Computing an expectation using a set of samples Transformation Transforming a sampled representation of a prior into that of a posterior Naive particle filter Eg. Naive particle filter Poor results due to sample impoverishment! Most of the weights get small very fast The way to get accurate estimates = to have samples lie where the p is likely to be large Overcoming Sample Impoverishment Equivalent to maintaining a set of good particles Then, how? Resampling the prior 1. expand the sample set non-uniformly using the weights 2. Form a new set of samples consisting of a union of Nk copies of (sk,1) for each k. Subsample the sample set uniformly Practical particle filter Initialization Prediction Naive particle filter Correction Resampling: 1. Normalise the weights so that 2. Compute the variance of the normalised weights. If(var >Th) construct a new set of samples by drawing, with replacement, N samples from the old set, using the weights as the probability that a sample will be drawn. The weight of each sample is now 1/N. Consequence of resampling Particles that tend to reflect the state rather well usually reappear in the resampled set Many particles lie within one standard deviation of the mean of the posterior Particle filters Different community -> different name statistics particle filter AI survival of the fittest computer vision condensation Tracking people Essential components Motion model Likelihood model P(image features|person present at given configuration) Motion model Strong motion model: markers, angles,… Weak motion model: drift model Likelihood model SSD, edges, other features,… Likelihood computation Boundary information Non-background information Sample points Annealed particle filter To overcome the problems of High dimensionality of the state Many local peaks in the likelihood Annealing Starts from smooth approximations to the likelihood to less smooth approximation Repeats weighting and resampling [Deutscher,Blake,Reid, CVPR2000] Finding people As an initialization of people tracker There is no person tracker that represents the configuration of the body and can start automatically Known approaches 1. 2. 3. Template matching Finding faces Search over correspondence * Challenging topic: learning models from data