Object Detection Object Detection MAP Estimation Data Association Data Association t t t i t s.t X1 = q( X 2 ) X1 , X 2 g(.) can be seen as the objective function in detection problem (image likelihood term); h(.) is the objective function in data association problem (temporal smoothness term); q(.) is the coupling constraint to enforce agreement of the solutions between two sub-problems. The choices of g, h, q and their combination are flexible. Typically, we want g, h are relatively easy to optimize. For traditional detection-tracking scheme with independence likelihood assumption (h(.) does not model the joint image likelihood), there is no need to introduce coupling constraint; the objective function is equivalent to classic tracker such as Multiple-Hypothesis-Tracking or Network-Flow-Tracker. The overall optimization can be solved through Dual Decomposition Function X max p( X | Y ) min g ( X1 , Y ) + h( X 2 ) General Form of the Objective Function X is the joint state vector of all objects in the scene Y is the observation vector for the entire image, which depends on the states of all objects. Example of Y: binary image obtained after background subtraction X ≈ max ∏ p(Yt | X t )∏ p( xi ,1 )∏ p( xi ,t | xi ,t −1 ) X = max ∏ p(Yt | X t )[ p( X1 )∏ p( X t | X t −1 )] X = max p(Y | X ) p( X ) X max p( X | Y ) Bayesian -Bayesian Formulation Formulation Our System (Coupling) Traditional Tracking System (Detection-Tracking) We present a novel framework for multiple object tracking in which the problems of object detection and data association are expressed by a single objective function. The framework follows the Lagrange dual decomposition strategy, taking advantage of the often complementary nature of the two subproblems. The advantages of our coupling framework are: No problem of error propagation from which traditional “detection-tracking approaches” to multiple object tracking suffer. No need to apply “non-maximum suppression” during detection stage. Abstract d3 Localization in 3D (Triangulation) d2 i N D d2 D d2 d3 2 2 t-1 3 2 1 t 6 5 T S 4 t+1 9 8 7 S S t-1 1 t 5 source s.t. f 8 (t ) i ,n T sink t+1 T i ∑f t i (t ) n, j Object location track 1 j = ∑ f , ∀t∀n j h : min ∑∑∑ c f (t ) (t ) i, j i, j Each edge on the network is associated with a flow variable f and a cost c to measure how likely an object moves from one location to another. Push certain amount of flows into the network with minimum cost such that each path of the flow connects a set of detections across time to form a unique track. d1 d1 ≤ 1, X i ∈ {0,1} Min-cost Flow Data Association d1 i ∑X Localization in 2D (Ground Plane) s.t. X i g : min || Y − ∑ Di X i ||0 , Multiple Templates X g : min || Y − DX ||0 , X ∈ {0,1} Single Template N Given a dictionary D that encodes the shape and spatial information of objects in image, instantiate binary templates at selected positions through selector X such that the generated image (DX) looks similar to the observation Y. Sparsity-constrained Concrete Example Detection Zheng Wu, Ashwin Thangali, Stan Sclaroff, Margrit Betke (t ) i ,n j =∑ f , (t ) n, j X j (t ) n, j X t ,n = ∑ f , i ∑f t t j T ∀t∀n ∀t∀n i f t 127 127 Our CP 75 RT[1] Our CP 75 19 Our CP RT[1] 19 RT[1] 36 36 OM[2] Our CP 23 Our CP 23 23 OM[2] ILP[3] #Objects Method 95 60 71 68 19 19 24 20 22 20 20 Mostly Track 5 8 1 0 0 0 2 7 0 8 1 Mostly Lost 0.87 -0.34 0.92 0.51 0.90 0.80 0.89 0.64 0.94 0.26 0.88 MOTA i j 11.4cm 11.6cm 9.7cm 9.9cm 9.5cm 9.0cm 0.61 0.67 0.70 0.67 0.76 MOTP [1] Z. Wu, N. I. Hristov, T. H. Kunz, and M. Betke. Tracking-reconstruction or reconstruction-tracking? Comparison of two multiple hypothesis tracking approaches to interpret 3D object motion from several camera views. In IEEE Wkshp Motion and Video Computing (WMVC), 2009 [2] A. Andriyenko, S. Roth, and K. Schindler. An analytical formulation of global occlusion reasoning for multitarget tracking. In 11th IEEE Intl. Workshop on Visual Surveillance, 2011. [3] A. Andriyenko and K. Schindler. Globally optimal multi-target tracking on a hexagonal lattice. In 11th European Conf. on Computer Vision, 2010. Reference High Median Infrared S2 Infrared S3 Low Infrared S1 Median Low PETS S2L1 PETS S1L12 Density Sequence (t ) i, j h(λ) = min ∑∑∑(c − λt ,i ) f (t ) i, j Flow conservation: If there is a flow coming into a node, there is a flow coming out. Variable agreement: If there is a detection at a node, there is a flow going through. Dual Decomposition Datasets: PETS2009 (single view), Infrared Video (three views) Experiment t g (λ ) = min ∑ (|| Yt − Dt X t ||0 +λ X t ) s.t. X,f min ∑|| Yt − Dt X t ||0 + ∑∑∑ c f (t ) (t ) i, j i, j Coupling Detection and Data Association Department of Computer Science Coupling Detection and Data Association for Multiple Object Tracking