Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network 1 Hong , 1 You , 2 Kwak , 1 Han Seunghoon Tackgeun Suha Bohyung 1Dept. of Computer Science and Engineering, POSTECH 2INRIA - WILLOW Project team Problem 1. Pre-trained CNN for feature descriptor Using a pre-trained CNN to represent a general object in visual tracking • Base network: R-CNN[Girshick14] pre-trained with PASCAL VOC images Advantage: CNN provides strong target representation robust to various appearance changes. 2. Target-specific saliency map estimation • Input: sub-image 𝒛𝒊 extracted from each target candidate proposal 𝒙𝒊 • output: outputs from the first fully-connected layer 𝜙(𝒙𝒊 ) Limitation: CNN feature is not appropriate for precise localization due to spatial abstraction. Class-specific saliency map[Simonyan14] Identify relevance of pixels w.r.t specific class by 𝜕𝑆𝑐 (𝐼) 𝑔𝑐 𝐼 = 𝜕𝐼 𝐼 : input image 𝑆𝑐 (𝐼): score of class c Problem: No predefined class for target in tracking Target-specific saliency map Our approach: Compute target-specific saliency map as observation for tracking. Online SVM as the last fully-connected layer of the network 4. Model update • Generative model: temporal sliding of target filters 𝐻𝑡 = 𝐻𝑡−1 − 𝑀𝑡−𝑚 + 𝑀𝑡 𝑔𝐹𝐺 • Discriminative model: Update incremental SVM with new examples { 𝑥𝑖 ′, 𝑦𝑖 ′ } +1, 𝑦𝑖′ = −1, = 𝑥𝑡∗ BB 𝑥𝑡∗ ∩ BB 𝑥𝑡′ if ∗ ′ <𝛿 BB 𝑥𝑡 ∪ BB 𝑥𝑡 if ① Computing target specific feature Localization by sequential Bayesian filtering 𝑥𝑡∗ = argmax 𝑝 𝑥𝑡 𝑀1:𝑡 ) = argmax 𝑝 𝑀𝑡 𝑥𝑡 𝑝(𝑥𝑡 |𝑀1:𝑡−1 ) 𝑥𝑡 𝑥𝑡 • Construct generative model 𝐻𝑡 by accumulating 𝑚 recent tracking results on saliency map Target segmentation • Compute likelihood by convolution 𝑝 𝑀𝑡 𝑥𝑡 ∝ 𝐻𝑡 ⊗ 𝑀𝑡 𝑥𝑡 Employing GrabCut[Rother04] on saliency map • Given tracking result, select FG/BG seeds based on saliency value Quantitative results 1 Evaluation based on bounding box (1,2) and segmentation (3) ground-truth 2 3 𝑇 𝜕𝜙(𝑥𝑖 ) 𝜕𝜙 + (𝑥𝑖 ) = 𝜕𝑧𝑖 𝜕𝑧𝑖 Then compute the target-specific saliency map 𝑀 by 3. Target localization with saliency map ′ 𝑥𝑖 𝜕𝑆𝐹𝐺 (𝑧𝑖 ) 𝑧𝑖 = = 𝑤+ 𝜕𝑧𝑖 Qualitative results 𝜙𝑘+ 𝑥𝑖 𝑤𝑘 𝜙𝑘 𝑥𝑖 , if 𝑤𝑘 > 0 = 0, otherwise ② Computing gradient map 𝑔𝐹𝐺 𝑧𝑖 by back-propagating 𝜙𝑘+ 𝑥𝑖 ③ Aggregating sample gradient maps Examples of obtained target-specific saliency map