Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman

advertisement
Object Recognition with
Informative Features and Linear
Classification
Authors: Vidal-Naquet & Ullman
Presenter: David Bradley
Vs.
• Image fragments make good features
– especially when training data is limited
• Image fragments contain more information than
wavelets
– allows for simpler classifiers
• Information theory framework for feature
selection
Intermediate complexity
What’s in a feature?
• You and your favorite learning algorithm settle down for
a nice game of 20 questions
• Except since it is a learning algorithm it can’t talk, and
the game really becomes 20 answers:
10110010110000111001
• Have you asked the right questions?
• What information are you really giving it?
• How easy will it be for it to say “Aha, you are thinking of
the side view of a car!”
“Pseudo-Inverse”
• In general image reconstruction from
features provides a good intuition of what
information they are providing
Wavelet coefficients
• Asks the question “how much is the current
block of pixels like my wavelet pattern?”
• This set of wavelets can entirely represent a 2x2
pixel block:
• So if you give your learning algorithm all of the
wavelet coefficients then you have given it all of
the information it could possibly need, right?
Sometimes wavelets work well
• Viola and Jones Face Detector
• Trained on 24x24 pixel windows
• Cascade Structure (32 classifiers
total):
– Initial 2-feature classifier rejects 60% of
non-faces
– Second, 5-feature classifier rejects 80%
of non-faces
Initial 2-feature Classifier
But they can require a lot of training
data to use correctly
• Rest of the Viola and Jones Face Detector
– 3 20-feature classifiers
– 2 50-feature classifiers
– 20 200-feature classifiers
• In the later stages it is tough to learn what
combinations of wavelet questions to ask.
• Surely there must be an easier way…
Image fragments
• Represent the opposite extreme
• Wavelets are basic image building blocks.
• Fragments are highly specific to the patterns they come
from
• Present in the image if cross-correlation > threshold
• Ideally if one could label all possible images (and search
them quickly):
– Use whole images as fragments
– All vision problems become easy
– Just look for the match
Dealing with the non-ideal world
• Want to find fragments that:
– Generalize well
– Are specific to the class
– Add information that other fragments haven’t
already given us.
• What metric should we use to find the best
fragments?
Information Theory Review
• Entropy: the minimum # of bits required to
encode a signal
Shannon Entropy
Conditional Entropy
Mutual Information
Class
Entropy
Conditional Entropy
• I(C, F) = H(C) – H(C|F)
Feature
• High mutual information means that
knowing the feature value reduces the
number of bits needed to encode the class
Picking features with Mutual
Information
• Not practical to exhaustively search for the
combination of features with the highest mutual
information.
• Instead do a greedy search for the feature
whose minimum pair-wise information gain with
the feature set already chosen is the highest.
Picking features with Mutual
Information
X1 X2
Pick the most pairwise independent
variable
X3
X4
Low pair-wise
information gain
indicates variables
are dependent
Features picked for cars
The Details
• Image Database
– 573 14x21 pixel car side-view images
• Cars occupied approx 10x15 pixels
– 461 14x21 pixel non-car images
• 4 classifiers were trained for 20 cross-validation
iterations to generate results
– 200 car and 200 non-car images in the training set
– 100 car images to extract fragments from
Features
• Extracted 59200 fragments from the first 100
images
– 4x4 to 10x14 pixel image patches
– Taken from the 10x15 pixel region containing the car.
• Location restricted to a 5x5 area around original
location
• Used 2 scales of wavelets from the 10x15 region
• Selected 168 features total
Classifiers
• Linear SVM
• Tree Augmented Network (TAN)
– Models feature’s class
dependency and biggest pairwise
feature dependency
– Quadratic decision surface in
feature space
Occasional
information loss
due to overfitting
More Information About Fragments
• Torralba et al. Sharing Visual Features for Multiclass
and Multiview Object Detection. CVPR 2004.
– http://web.mit.edu/torralba/www/extendedCVPR2004.pdf
• ICCV Short Course (great matlab demo)
– http://people.csail.mit.edu/torralba/iccv2005/
Objections
• Wavelet features chosen are very weak
– Images were very low resolution, maybe too low-res for more
complicated wavelets
• Data set is too easy
– Side-views of cars have low intra-class variability
– Cars and faces have very stable and predictable appearances
– not hard enough to stress the fragment + linear SVM classifier, so TAN
shows no improvement.
• Didn’t compare fragments against successful wavelet application
– Schneiderman & Kanade car detector
• Do the fragment-based classifiers effectively get 100 more training
images?
Download