Group Sparse Coding

advertisement
Group Sparse Coding
Samy Bengio, Fernando Pereira,
Yoram Singer, Dennis Strelow
Google
Mountain View, CA
(NIPS2009)
Presented by Miao Liu
July-23-2010
*Figures and formulae are directly copied from the original paper
Outline
•
•
•
•
Introduction
Group Coding
Dictionary Learning
Results and Discussion
Introduction
• Bag-of-words document representations
– Encode document by a vector of the counts of descriptors (words)
– Widely used in text, image, and video processing
• Easy to determine a suitable word dictionary for text
documents.
• For images and videos
– No simple mapping from the raw document to descriptor counts
– Require visual descriptors (color, texture, angles, and shapes) extraction
– Measure descriptors at appropriate locations (regular grids, special
interest points, multiple scales)
– More carful design of dictionary is needed
Dictionary Construction
• Unsupervised vector quantization (VQ), often kmeans clustering
– Pro: maximally sparse per descriptor occurrence
– Cons:
• Does not guarantee sparse coding whole image
• Not robust w.r.to descriptor variability
•
regularized optimization
– Encode each visual descriptor as a weighted sum of
dictionary elements
• Mixed-norm regularizers
– Take into account the structure of bags of visual
descriptors in images
– Presenting sets of images from a given category
Problem Statement
• The main goal : encode groups of instances (e.g. image
patches) in terms of dictionary code words
(some kind of average patches)
• Notations
– The m’th group
– the subscript m is removed for single group operation.
• Sub goals
– Encoding (
)
– Learning a good dictionary
from a set of training groups
Group Coding
• Given
and
, group coding is achieved by solving
where
–.
–
is the
–
balances fidelity and reconstruction complexity.
• Coordinate descent is applied to solve the above problem.
• Finally, compress into a single vector by taking p-norm of
each .
Group coding
• Define
• Optimum
for p=1
• Optimum
for p=2
Dictionary Learning
• Good Dictionary should balances between
– Reconstruction error
– Reconstruction complexity
– Overall complexity relative to the given training set
• Seeking learning method facilitates both
– induction of new dictionary words
– removal of dictionary words that have low predictive power
• Applying
• Let
• Objective
Dictionary Learning
• In this paper p=2
• Define auxiliary variables
• Define vector (appearing in the gradient of objective function)
• Similar to the argument in group coding, one can obtain
Experimental Setting
• Compare with previous sparse coding method by
measuring impact on classification the PASCAL VOC
(Visual Object Classes) 2007 dataset
– image from 20 classes, including people, animals, vehicles
and indoor objects etc.
– around 2500 images for respective training and validation;
5000 images for testing.
• Extract local descriptors based on Gabor wavelet response
at
– Four orientations (
)
– Spatial scales and offsets (27 combination)
• The 27 (scale, offset) pairs were chosen by optimizing a
previous image recognition task, unrelated to this paper.
Results and Discussion
Results and Discussion
Results and Discussion
Download