Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July-23-2010 *Figures and formulae are directly copied from the original paper Outline • • • • Introduction Group Coding Dictionary Learning Results and Discussion Introduction • Bag-of-words document representations – Encode document by a vector of the counts of descriptors (words) – Widely used in text, image, and video processing • Easy to determine a suitable word dictionary for text documents. • For images and videos – No simple mapping from the raw document to descriptor counts – Require visual descriptors (color, texture, angles, and shapes) extraction – Measure descriptors at appropriate locations (regular grids, special interest points, multiple scales) – More carful design of dictionary is needed Dictionary Construction • Unsupervised vector quantization (VQ), often kmeans clustering – Pro: maximally sparse per descriptor occurrence – Cons: • Does not guarantee sparse coding whole image • Not robust w.r.to descriptor variability • regularized optimization – Encode each visual descriptor as a weighted sum of dictionary elements • Mixed-norm regularizers – Take into account the structure of bags of visual descriptors in images – Presenting sets of images from a given category Problem Statement • The main goal : encode groups of instances (e.g. image patches) in terms of dictionary code words (some kind of average patches) • Notations – The m’th group – the subscript m is removed for single group operation. • Sub goals – Encoding ( ) – Learning a good dictionary from a set of training groups Group Coding • Given and , group coding is achieved by solving where –. – is the – balances fidelity and reconstruction complexity. • Coordinate descent is applied to solve the above problem. • Finally, compress into a single vector by taking p-norm of each . Group coding • Define • Optimum for p=1 • Optimum for p=2 Dictionary Learning • Good Dictionary should balances between – Reconstruction error – Reconstruction complexity – Overall complexity relative to the given training set • Seeking learning method facilitates both – induction of new dictionary words – removal of dictionary words that have low predictive power • Applying • Let • Objective Dictionary Learning • In this paper p=2 • Define auxiliary variables • Define vector (appearing in the gradient of objective function) • Similar to the argument in group coding, one can obtain Experimental Setting • Compare with previous sparse coding method by measuring impact on classification the PASCAL VOC (Visual Object Classes) 2007 dataset – image from 20 classes, including people, animals, vehicles and indoor objects etc. – around 2500 images for respective training and validation; 5000 images for testing. • Extract local descriptors based on Gabor wavelet response at – Four orientations ( ) – Spatial scales and offsets (27 combination) • The 27 (scale, offset) pairs were chosen by optimizing a previous image recognition task, unrelated to this paper. Results and Discussion Results and Discussion Results and Discussion