Pascal Grand Challenge Felix Vilensky 19/6/2011 1 Outline • Pascal VOC challenge framework. • Successful detection methods Object Detection with Discriminatively Trained Part Based Models (P.F.Felzenszwalb et al.)-”UoC/TTI” Method. o Multiple Kernels for Object Detection (A.Vedaldi et al.)”Oxford\MSR India” method. o • A successful classification method o Image Classification using Super-Vector Coding of Local Image Descriptors (Xi Zhou et al)-NEC/UIUC Method. • Discussion about bias in datasets. • 2010 Winners Overview. 2 Pascal VOC Challenge Framework The PASCAL Visual Object Classes (VOC) Challenge Mark Everingham · Luc Van Gool · Christopher K. I. Williams · John Winn · Andrew Zisserman 3 Pascal VOC Challenge • • • • • Classification Task. Detection Task. Pixel-level segmentation. “Person Layout” detection. Action Classification in still images. 4 Classification Task 100% At least one bus 5 Detection Task 100% Predicted bounding box should overlap by at least 50% with ground truth!!! 6 Detections “near misses” Didn’t fulfill the BB overlap criterion 7 Pascal VOC Challenge-The Object Classes 8 Pascal VOC Challenge-The Object Classes Images retrieved from flicker website. 9 Pixel Level Segmentation Image Object segmentation Class segmentation 10 Person Layout 11 Action Classification • Classification among 9 action classes. 100% Speaking on the phone 100% Playing the guitar 12 Annotation • • • • • Class. Bounding Box. Viewpoint. Truncation. Difficult (for classification\detection). 13 Annotation Example 14 Evaluation A way to compare between different methods. Recall # True Positives # False Negatives + #True Positives Precision= # True Positives # False Positives + #True Positives • Precision\Recall Curves. • Interpolated Precision. • AP(Average Precision) 15 Evaluation-Precision\Recall Curves(1) • Practical Tradeoff between precision and recall. Rank 1 2 3 4 5 6 7 8 9 10 g.t. Yes No Yes No Yes No No No No No Precision 1/1 1/2 2/3 2/4 3/5 3/6 3/7 3/8 3/9 3/10 Recall 0.2 0.2 0.4 0.4 0.6 0.6 0.6 0.6 0.6 0.6 p( r ) • Interpolated Precision- Pinterp ( r ) max r :r r 16 Evaluation-Precision\Recall Curves(2) 17 Evaluation-Average Precision(AP) AP 1 Pinterp (r ) 11 r{0,0.1,.....,1} AP is for determining who’s the best. 18 Successful Detection Methods 19 UoC/TTI Method Overview (P.Felzenszwalb et al.) • Joint winner in 2009 Pascal VOC challenge with the Oxford Method. • Award of "lifetime achievement“ in 2010. • Mixture of deformable part models. • Each component has global template + deformable parts o HOG feature templates. • Fully trained from bounding boxes alone. 20 UoC/TTI Method – HOG Features(1) • [-1 0 1] and its transpose Gradient. • Gradient orientation is discretized into one of p values. p ( x, y ) Contrast sensitive: B1 ( x, y ) round mod p 2 p ( x, y ) Contrast insensitive: B2 ( x, y ) round mod p r ( x, y ) if b B( x, y ) F ( x, y )b 0 otherwise • Pixel-level features Cells of size k. • 8-pixel cells(k=8). • 9 bins contrast sensitive +18 bins contrast insensitive =total 27 bins! Soft binning 21 UoC/TTI Method – HOG Features(2) …27 22 UoC/TTI Method – HOG Features(3) • Normalization. N , (i, j ) ( C (i, j ) C (i , j ) C (i, j ) C (i , j ) ) 2 2 2 2 1 2 , {1,1} • Truncation. • 27 bins X 4 normalization factors= 4X27 matrix. • Dimensionality Reduction to 31. sum over bins N1 : 27 bins V1 sum over NFs B1 : 4 NFs V5 sum over bins N 2 : 27 bins V2 sum over NFs B2 : 4 NFs V6 sum over bins N 3 : 27 bins V3 ..... sum over bins N 4 : 27 bins V4 sum over NFs B27 : 4 NFs V31 23 UoC/TTI Method – Deformable Part Models • Coarse root. • High-Resolution deformable parts. • Part (Anchor position, deformation cost, Res. Level) 24 UoC/TTI Method – Mixture Models(1) • Diversity of a rich object category. • Different views of the same object. • A mixture of deformable part models for each class. • Each deformable part model in the mixture is called a component. 25 UoC/TTI Method – Object Hypothesis Slide taken from the methods presentation 26 UoC/TTI Method –Models(1) 6 component person model 27 UoC/TTI Method –Models(2) 6 component bicycle model 28 UoC/TTI Method – Score of a Hypothesis Slide taken from method's presentation 29 UoC/TTI Method – Matching(1) Best part location Root location score( p0 ) max ( score( p0 ,......, pn )) p1 ,...., pn • “Sliding window approach” . • High scoring root locations define detections. • Matching is done for each component separately. 30 UoC/TTI Method – Matching(2) 31 UoC/TTI Method – Post Processing & Context Rescoring Slide taken from method's presentation 32 UoC/TTI Method – Training & DM • Weakly labeled data in Training set. • Latent SVM(LSVM) training with z (c, p0 ,......, pn ) as latent value. • Training and Data mining in 4 stages: c Optimize z Add hard negative examples Optimize β Remove easy negative examples 33 UoC/TTI Method – Results(1) 34 UoC/TTI Method – Results(2) 35 Oxford Method Overview (A.Vedaldi et al.) Regions with different scales and aspect ratios 6 feature channels 3 level spatial pyramid Cascade :3 SVM classifiers with 3 different kernels Post Processing 36 Oxford Method – Feature Channels • Bag of Visual Words- SIFT descriptors are extracted and quantized in a vocabulary of 64 words. • Dense words (PhowGray, PhowColor)- Another set of SIFT Descriptors are then quantized in 300 visual words. • Histogram of oriented edges (Phog180, Phog360)-Similar to the HOG descriptor used by the ”UoC/TTI” Method with 8 orientation bins. • Self-similarity features (SSIM). 37 Oxford Method – Spatial Pyramids 38 Oxford Method – Feature Vector Chart is taken from the methods presentation 39 Oxford Method – Discriminant Function(1) M C (h ) yii K (h R , hi ). R i 1 h i , i 1,........, M are the histogram collections acting as support vectors for a SVM. yi {1,1} K is a positive definite kernel. h R is the collection of normalized feature histograms {h Rfl }. f is the feature channel. l is the level of the spatial pyramid. 40 Oxford Method – Discriminant Function(2) • The kernel of the discriminant function 𝐶(ℎ𝑅 is a linear combination of histogram kernels: K (h R , hi ) d fl K (h Rfl , hifl ) fl • The parameters i and the weights d fl 0 (total 18)are learned using MKL(Multiple Kernel Learning). • The discriminant function 𝐶(ℎ𝑅 is used to rank candidate regions R by the likelihood of containing an instance of the object of interest. 41 Oxford Method – Cascade Solution(1) • Exhaustive search of the best candidate regions R , requires a number of operations which is O(MBN): o o o N – The number of regions. R M – The number of support vectors in C ( h ) . B – The dimensionality of the histograms. N 105 , B 104 , M 103 • To reduce this complexity a cascade solution is applied. • The first stage uses a “cheap” linear kernel to evaluate C(h R ) . • The second uses a more expensive and powerful quasi-linear kernel. • The Third uses the most powerful non-linear kernel. • Each stage evaluates the discriminant function on a smaller number of candidate regions. 42 Oxford Method – Cascade Solution(2) Type Evaluation Complexity Linear O(N) Quasi-Linear O( BN) Non-Linear O(MBN) Stage 1- Linear Stage 2- Quasi-linear Stage 3- Non linear 43 Oxford Method – Cascade Solution(3) Chart is taken from the methods presentation 44 Oxford Method – The Kernels • All the before mentioned kernels are of the following form: B K ( h, h ) f g (hb , hb' ) b1 i f :RR g : R2 R b is a histogram bin index. • For Linear kernels both f and g are linear. For quasi-linear kernels only f is linear. 45 Oxford Method – Post-Processing • The output of the last stage is a ranked list of 100 candidate regions per image. • Many of these regions correspond to multiple detections. • Non- Maxima Suppression is used. • Max 10 regions per image remain. 46 Oxford Method – Training/Retraining(1) • Jittered\flipped instances are used as positive samples. Addition Error(Overlap <20%) Testing Training Data Training Classifier • Training images are partitioned into two subsets. • The classifiers are tested on each subset in turn adding new hard negative samples for retraining. 47 Oxford Method – Results(1) 48 Oxford Method – Results(2) 49 Oxford Method – Results(3) Training and testing on VOC2009. Training and testing on VOC2007. Training and testing on VOC2008. Training on VOC2008 and testing on VOC2007. 50 Oxford Method – Summary 51 A Successful Classification Method 52 NEC/UIUC Method Overview (Xi Zhou Kai Yu et al.) • A winner in the 2009 Pascal VOC classification challenge. • A framework for classification is proposed. Descriptor Coding: Super Vector Coding The important part! Spatial Pyramid Pooling Classification: Linear SVM 53 NEC/UIUC Method – Notation X Descriptor Vector. ( X ) Coding function. f ( X ) Unknown function on local features. fˆ ( X ) Approximating function. Y Set of descriptor vectors. 54 NEC/UIUC Method – Descriptor Coding(1) Vector Quantization Coding fˆ ( X ) W T ( X ) W [W1 ,W2 ,.....,WK ]T ( X ) is the code of X. 55 NEC/UIUC Method – Descriptor Coding(2) Super Vector Coding W [W1T ,W2T ,.....,WK T ]T ( X ) =[C1 ( X ) X T ,C2 ( X ) X T ,.....,C K ( X ) X T ]T Ck ( X ) 1 if X belongs to cluster k, otherwise C K ( X ) 0. T fˆ ( X ) W T ( X ) Ck ( X ) Wk X k 56 NEC/UIUC Method – Spatial Pooling 2X2 1X1 1 (Y ) N C k 1 1 pk ( X ) X Yk 1 (Y ) N C k 1 3X1 1 pk ( X ) X Yk 1 (Y ) N C k 1 1 pk ( X ) X Yk N-The size of a set of local descriptors. Y-The set of local descriptors. s (Y ) [ (Y111 ), (Y112 ), (Y122 ), (Y212 ), (Y222 ), (Y113 ), (Y123 ), (Y133 )] To linear SVM classifier on 𝛹𝑠 (𝑌 . . . 57 NEC/UIUC Method – Results(1) • • • • SIFT PCA 128-dimentional vectors over a grid with spacing of 4 pixels on three patch levels (16x16,25x25 and 31x31). Reduction of dimensionality to 80. Comparison of non- linear coding methods. Comparison with other methods. Impact of codebook size(tested on validation set). Images and visualization of patch- level score(using 𝑔(𝑋 ). 58 NEC/UIUC Method – Results(2) |C|=512 59 NEC/UIUC Method – Results(3) |C|=2048 60 NEC/UIUC Method – Results(4) 61 Bias in Datasets Unbiased Look at Dataset Bias Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University 62 Name The Dataset • People were asked to guess, based on three images, the dataset they were taken from. • People, who worked in the field got more than 75% correct. 63 Name The Dataset - The Dataset Classifier • 4 classifiers were trained to play the “Name The Dataset” game. • Each classifier used different image descriptoro 32X32 thumbnail grayscale and color. o Gist. o Bag of HOG visual words. • 1000 images were randomly sampled from the training portions of 12 datasets. • The classifier was tested on 300 random images from each of the test sets repeated 20 times. 64 Name The Dataset - The Dataset Classifier • The best classifier performs at 39% (chance is about 8%)!!! Recog. Performance vs. Number of training examples per class Confusion Table 65 Name The Dataset - The Dataset Classifier • Performance is 61% on car images from 5 different datasets (chance is 20%). Car images from different datasets 66 Cross - Dataset Generalization(1) • Training on one dataset while testing on another. • Dalal&Triggs detector(HOG detector + linear SVM) for the detection task. • Bag of Words approach with a Gaussian kernel SVM for the classification task. • The “car” and “person” objects are used. • Each classifier(for each dataset) was trained with 500 positive images and 2000 negative ones. • Each detector (for each dataset) was trained with 100 positive images and 1000 negative ones. • Testing classification with 50 positive and 1000 negative examples. • Testing detection 10 positive and 20000 negative examples. • Each classifier\detector ran 20 times and the results averaged. 67 Cross - Dataset Generalization(2) 68 Cross - Dataset Generalization(3) Logarithmic dependency on the amount of training samples. 69 Types Of Dataset Biases • • • • Selection Bias. Capture Bias. Label Bias. Negative Set Bias-What the dataset considers to be “the rest of the world”. 70 Negative Set Bias-Experiment(1) • Evaluation of the relative bias in the negative sets of different datasets. • Training detectors on positives and negatives of a single dataset. • Testing on positives from the same dataset and on negatives from all 6 datasets combined. • The detector was trained with 100 positives and 1000 negatives. • For testing, multiple runs of 10 positive examples for 20,000 negatives were performed. 71 Negative Set Bias-Experiment(2) 72 Negative Set Bias-Experiment(3) • A large negative train set is important for discriminating object with similar contexts in images. 73 Dataset’s Market Value(1) • A measure of the improvement in performance when adding training data from another dataset. α is the shift in number of training samples between different datasets to achieve the same average precision APj j ( n ) APi j ( n / ) APi j ( n ) is obtained when training on dataset i and testing on dataset j. 74 Dataset’s Market Value(2) This table shows the sample value (“market value”) for a “car” sample across datasets. A sample from another dataset worth more than a sample from the original dataset!!! 75 Bias In Datasets- Summary • Datasets, though gathered from the internet, have distinguishable features of their own. • Methods performing well on a certain dataset can be much worse on another. • The Negative set has at least the same importance as the positive samples in the dataset. • Every dataset has it own “Market Value”. 2010 Winners Overview 77 Pascal VOC 2010-Winners Classification Winner: Detection Winner: Honourable Mentions: NUSPSL_KERNELREGFUSING Qiang Chen1, Zheng Song1, Si Liu1, Xiangyu Chen1, Xiaotong Yuan1, Tat-Seng Chua1, Shuicheng Yan1, Yang Hua2, Zhongyang Huang2, Shengmei Shen2 1National University of Singapore; 2Panasonic Singapore Laboratories NLPR_HOGLBP_MC_LCEGCHLC Yinan Yu, Junge Zhang, Yongzhen Huang, Shuai Zheng, Weiqiang Ren, Chong Wang, Kaiqi Huang, Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences MITUCLA_HIERARCHY Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille, Antonio Torralba MIT, UCLA NUS_HOGLBP_CTX_CLS_RESCORE_V2 Zheng Song, Qiang Chen, Shuicheng Yan National University of Singapore UVA_GROUPLOC/UVA_DETMONKEY Jasper Uijlings, Koen van de Sande, Theo Gevers, Arnold Smeulders, Remko Scha University of Amsterdam 78 NUS-SPL Classification Method 79 NLPR Detection Method 80 Thank You…. 81