Data Driven Attributes for Action Detection Week 2 Presented by Christina Peterson Background Liu et al. [1] propose a unified framework for action recognition where manually specified attributes are: Selected discriminatively to account for intra-class variability Integrated with data-driven attributes to make the attribute set more descriptive Yu et al. [2] propose a framework for an attribute-based query by using a large pool of weak attributes composed of automatic classifier scores that are easily acquired with no human labor. Query attributes are acquired by human labeling process Weak attributes are generated automatically by machine Query attributes are mapped to weak attributes Background Malisiewicz et al. [3] propose a method for object detection which combines a discriminative object detector with a nearest-neighbor approach. A separate linear SVM classifier is trained for each exemplar in the dataset Each exemplar is represented using a rigid HOG template This results in a large collection of simple individual Exemplar-SVM detectors rather than a single complex category detector Farhadi et al. [4] propose an attribute based approach to object detection. Semantic and discriminative attributes Feature selection method for learning attributes that can be generalized across categories Base feature definition Background Tian et. al. [5] proposes a spatiotemporal deformable part model (SDPM) that stays true to the structure of the original deformable part model (DPM). SDPM has volumetric parts that displace in both time and space Root filter used to capture the overall information of the action cycle and is obtained by applying an SVM on the HOG3D features of the action cycle Low Level Features STIP Histogram of Oriented Gradient (HOG) 72 element descriptor Histogram of Optical Flow (HOF) 90 element descriptor Color Texture Bag of Words Concatenate Low Level Features for each video Cluster Features by Kmeans 128 for color, 256 for texture, 1000 for STIP Each feature type will be clustered separately by Kmeans 3 x 3 x 3 + 1 = 28 cells Features collected for each cell Create Histogram of cluster centers per feature for each cell in bounding box (128 + 256 + 1000) x 28 Normalize based on size of bounding box Exemplar SVM Train a separate linear SVM classifier for each exemplar in the dataset with a single positive example and many negative examples This results in a large collection of simple individual Exemplar-SVM detectors rather than a single complex category detector Example: The action Diving-side will have multiple linear SVM classifiers each based on a positive example within this action class Test set will need to run all Exemplar-SVM detectors for the respective action class to calculate label prediction accuracy Goals Implement the Exemplar SVM classifiers in matlab Label Propagation Finding relationship between labels and prediction results Conditional Probability References [1] J. Liu, B. Kuipers, and S. Savarese. Recognizing Human Actions by Attributes. In CVPR, 2011. [2] F. Yu, R. Ji, M.-H. Tsai, G. Ye, and S.-F. Chang. Weak Attributes for Large-Scale Image Retrieval. In CVPR, 2012. [3] T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of Exemplar SVMS for Object Detection and Beyond. In Proc. ICCV, 2011. [4] H. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes. In CVPR, 2009. [5] Y. Tian, R. Sukthankar, and M. Shah. Spatiotemporal Deformable Part Models for Action Detection. In CVPR, 2013. [6] Y. Wang and G. Mori. Hidden Part Models for Human Action Recognition: Probabilistic vs. Max-Margin. In PAMI, 2011.