Kapitel 14 Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image categorization Bag-of-features models Large-scale image search Kapitel 14 “Recognition” – p. 1 Scene understanding (1) Scene categorization • outdoor/indoor • city/forest/factory/etc. Kapitel 14 “Recognition” – p. 2 Scene understanding (2) Object detection • find pedestrians Kapitel 14 “Recognition” – p. 3 Scene understanding (3) sky mountain building tree building banner street lamp people market Kapitel 14 “Recognition” – p. 4 Visual object categorization (1) Kapitel 14 “Recognition” – p. 5 Visual object categorization (2) Kapitel 14 “Recognition” – p. 6 Visual object categorization (3) Kapitel 14 “Recognition” – p. 7 Visual object categorization (4) Recognition is all about modeling variability camera position illumination shape variations within-class variations Kapitel 14 “Recognition” – p. 8 Visual object categorization (5) Within-class variations (why are they chairs?) Kapitel 14 “Recognition” – p. 9 Pose clustering (1) Working in transformation space - main ideas: Generate many hypotheses of transformation image vs. model, each built by a tuple of image and model features Correct transformation hypotheses appear many times Main steps: Quantize the space of possible transformations For each tuple of image and model features, solve for the optimal transformation that aligns the matched features Record a “vote” in the corresponding transformation space bin Find "peak" in transformation space Kapitel 14 “Recognition” – p. 10 Pose clustering (2) Example: Rotation only. A pair of one scene segment and one model segment suffices to generate a transformation hypothesis. Kapitel 14 “Recognition” – p. 11 Pose clustering (3) C.F. Olson: Efficient pose clustering using a randomized algorithm. IJCV, 23(2): 131-147, 1997. 2-tuples of image and model corner points are used to generate hypotheses. (a) The corners found in an image. (b) The four best hypotheses found with the edges drawn in. The nose of the plane and the head of the person do not appear because they were not in the models. Kapitel 14 “Recognition” – p. 12 Object recognition by local features (1) D.G. Lowe: Distinctive image features from scale-invariant keypoints. IJCV, 60(2): 91-110, 2004 Input Image Stored Image The SIFT features of training images are extracted and stored For a query image Extract SIFT features Efficient nearest neighbor indexing Pose clustering by Hough transform For clusters with >2 keypoints (object hypotheses): determine the optimal affine transformation parameters by least squares method; geometry verification Kapitel 14 “Recognition” – p. 13 Object recognition by local features (2) Cluster of 3 corresponding feature pairs Kapitel 14 “Recognition” – p. 14 Object recognition by local features (3) Kapitel 14 “Recognition” – p. 15 Object recognition by local features (4) Kapitel 14 “Recognition” – p. 16 Image categorization (1) Training Labels Training Images Training Image Features Classifier Training Trained Classifier Testing Prediction Image Features Trained Classifier Outdoor Test Image Kapitel 14 “Recognition” – p. 17 Image categorization (2) Global histogram (distribution): color, texture, motion, … histogram matching distance Kapitel 14 “Recognition” – p. 18 Image categorization (3) Cars found by color histogram matching See Chapter “Inhaltsbasierte Suche in Bilddatenbanken Kapitel 14 “Recognition” – p. 19 Bag-of-features models (1) Object Bag of ‘words’ Kapitel 14 “Recognition” – p. 20 Bag-of-features models (2) Origin 1: Text recognition by orderless document representation (frequencies of words from a dictionary), Salton & McGill (1983) US Presidential Speeches Tag Cloud Kapitel 14 “Recognition” – p. 21 Bag-of-features models (3) Origin 2: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Kapitel 14 “Recognition” – p. 22 Bag-of-features models (4) histogram Universal texton dictionary Kapitel 14 “Recognition” – p. 23 Bag-of-features models (5) We need to build a “visual” dictionary! G. Cruska et al.: Visual categorization with bags of keypoints. Proc. of ECCV, 2004. Kapitel 14 “Recognition” – p. 24 Bag-of-features models (6) Main step 1: Extract features (e.g. SIFT) … Kapitel 14 “Recognition” – p. 25 Bag-of-features models (7) Main step 2: Learn visual vocabulary (e.g. using clustering) clustering Kapitel 14 “Recognition” – p. 26 Bag-of-features models (8) visual vocabulary (cluster centers) Kapitel 14 “Recognition” – p. 27 Bag-of-features models (9) Example: learned codebook … Appearance codebook Source: B. Leibe Kapitel 14 “Recognition” – p. 28 Bag-of-features models (10) Main step 3: Quantize features using “visual vocabulary” Kapitel 14 “Recognition” – p. 29 Bag-of-features models (11) Main step 4: Represent images by frequencies of “visual words” (i.e., bags of features) Kapitel 14 “Recognition” – p. 30 Bag-of-features models (12) Example: representation based on learned codebook Kapitel 14 “Recognition” – p. 31 Bag-of-features models (13) Main step 5: Apply any classifier to the histogram feature vectors Kapitel 14 “Recognition” – p. 32 Bag-of-features models (14) Example: Kapitel 14 “Recognition” – p. 33 Bag-of-features models (15) Caltech6 dataset Dictionary quality and size are very important parameters! Kapitel 14 “Recognition” – p. 34 Bag-of-features models (16) Caltech6 dataset Action recognition J.C. Niebles, H. Wang, L. Fei-Fei: Unsupervised learning of human action categories using spatial-tempotal words. IJCV, 79(3): 299-318, 2008. Kapitel 14 “Recognition” – p. 35 Large-scale image search (1) J. Philbin, et al.: Object retrieval with large vocabularies and fast spatial matching. Proc. of CVPR, 2007 Query Results from 5k Flickr images Kapitel 14 “Recognition” – p. 36 Large-scale image search (2) [Quack, Leibe, Van Gool, CIVR’08] Mobile tourist guide self-localization object/building recognition photo/video augmentation Kapitel 14 “Recognition” – p. 37 Large-scale image search (3) Kapitel 14 “Recognition” – p. 38 Large-scale image search (4) Moulin Rouge Old Town Square (Prague) Tour Montparnasse Colosseum Viktualienmarkt Maypole Application: Image auto-annotation Left: Wikipedia image Right: closest match from Flickr Kapitel 14 “Recognition” – p. 39 Sources K. Grauman, B. Leibe: Visual Object Recognition. Morgen & Claypool Publishers, 2011 R. Szeliski: Computer Vision: Algorithms and Applications. Springer, 2010. (Chapter 14 “Recognition”) Course materials from others (G. Bebis, J. Hays, S. Lazebnik, …) Kapitel 14 “Recognition” – p. 40