Recognition of 3D Objects

or, 3D Recognition of Objects
Alec Rivers
• 3D object recognition was dead, now it’s
coming back
– These papers are within the last 2 years
• Doesn’t really work yet, but it’s just a
• The Layout Consistent Random Field
for Recognizing and Segmenting
Partially Occluded Objects
– CVPR 2006
• 3D LayoutCRF for Multi-View Object
Class Recognition and Segmentation
– CVPR 2007
• 3D Generic Object Categorization,
Localization and Pose Estimation
– ICCV 2007
The Layout Consistent Random Field for
Recognizing and Segmenting Partially Occluded
John Winn
Jamie Shotton
Microsoft Research
University of
• Needed to understand next paper
– It’s 2D
• What does it try to solve?
– Recognize one class of object at one pose and one
scale, but with occlusions
• Does it work?
– Yes, really well, especially given occlusions
• What is interesting about it?
– Segments objects
– Interesting methods
• No sliding windows
– Multiple instances for free
• Instead of sparse parts at features, use a
densely covering part grid
[Fischler & Elschlager 73]
[Winn & Shotton 06]
Recognizing New Image – Overview
• Walk through an example
Recognizing a New Image – Overview
1. Pixels guess their part
Recognizing a New Image – Overview
2. Maximize layout consistency
Layout Consistency
• Defined pairwise between two pixels:
PI, PJ => Bool
• Means pixels I, J could be part of one instance
• Toy example:
Object: 1,2,3,4,5
Layout Consistency
• Defined pairwise between two pixels:
PI, PJ => Bool
• Means pixels I, J could be part of one instance
• Toy example:
Object: 1,2,3,4,5
instance 1
instance 2
instance 3
Layout Consistency
• In 2D, consistent IFF their relative assignments
could exist in a deformed regular grid
• Formally:
2. Maximize layout consistency
Layout Consistency
3. Find consistent regions; create instances
Possible due to layout inconsistency at
occluding borders
1. Pixels guess parts
2. Maximize layout consistency
3. Create instances
[Winn & Shotton 06]
Implementation Details
• Trained on manually segmented data
• Crux of algorithm is conditional distribution
– Like a probability for each possibility, or a score
• Algorithm is just finding maximum
Part Appearance
• Each pixel prefers parts that match
surrounding image data
• Randomized decision trees
– Multiple trees, each trained on a subset of the
– Node is maximal-information-gain binary test on
two nearby pixels’ intensities
– Leaf of node is histogram of part possibilities
– Actual preference is average over all trees
Deformed Training Part Labelings
• Fits parts tighter
1. Label by grid
2. Learn from data
3. Apply to data
4. Set guesses as
5. Relearn
Part Layout
• Preference for layout consistency plus additional
pairwise costs:
• Helps remove noise
• Align edges along image edges
Part Layout
• Return to toy example
Just appearance:
With layout costs:
instance 1
instance 2
Instance Layout
• Apply weak force trying to keep parts at sane
positions relative to instance data (centroid, L/R flip)
• Toy example: 0,1,1,1,1,1,2,3,4,5 is bad!
• Theoretically, finding global maximum of
• This is “MAP” estimation
– MAP = Maximum A Posteriori
• In reality, using tricks to find a local maximum
– α-expansion, annealed expansion move
Approximating MAP Estimation
• Global maximum is intractable
• α-expansion
– Start with given configuration
– For a given new label, ask each pixel: do you want
to switch?
– Can be solved efficiently with graph cuts
• Repeat over all part labels
• Annealed expansion move
– Relabel grid, but offset to avoid local maxima
Oh, snap!
• Bottom-up system is great
– No sliding windows
– Multiple instances for free
• Information about segment boundaries:
occlusion vs. completion
– Reason about complete segment boundaries?
3D LayoutCRF for Multi-View Object Class
Recognition and Segmentation
Derek Hoiem
Carnegie Mellon
Carsten Rother
John Winn
Microsoft Research
• What does it try to solve?
– Extend LayoutCRF to be pose and scale invariant
• Does it work?
– Improvements to LayoutCRF work;
3D information does little
• What is interesting about it?
– One method for combining 2D methods with a 3D
– The improvements to 2D are good
• Generate rough 3D model of class
• Parts created over 3D model
• Probability distribution
• Part layout, instance layout take into account
3D position
• New term: Instance cost
Instance Cost
• Eliminates false positives
– LayoutCRF: object-background cost
• Explain multiple groups with one instance
• New term: Instance appearance
Instance appearance
• Learn color distribution for each instance
• Separate groups of pixels: definitely object,
definitely background
• Use these to learn colors
• Apply cost to non-standard-color pixels
This would fail…
Implementation Details
• Parts are learned separately for each 45o
viewing range, and for different scales
• Instance layout is also discretized by viewpoint
Results – Comparison to LCRF
• A little better
(+ 8% recall)
• BUT they
actually turn off
3D information
for this
• Better
Results – PASCAL 2006
• 61% precision-recall
– Previous best: 45%
– But, reduced test set
• Without 3D: -5%
• Without color: -5%
• Color, instance costs very nice
• Shoehorns LCRF into 3D without much success
• LCRF is already somewhat viewpoint-invariant:
segments can stretch
3D Generic Object Categorization, Localization
and Pose Estimation
Silvio Savarese
Fei-Fei Li
University of Illinois at
Princeton University
• What does it try to solve?
– Multiclass pose-invariant, scale-invariant object
• Does it work?
– Not well. But it may be due to implementation
• Why is it interesting?
– Attempt learn actual 3D structure of an object
– Interesting data structure for 3D info
Overview – Data Structure
• Decompose object into large parts; find “canonical view”
• Relate parts by mutual appearance
Related Work – Aspect Graphs
Aspect graph
of a cube:
Image [Khoh & Kovesi, 99]
• Represent stable views rather than parts
Data Structure for Cube
Related Work
• Constellation models
• Similar, but wraps around in 3D
Implementation – Links
• Link from canonical PI to PJ consists of
• Matrix defines transformation to observe
PJ when PI is viewed canonically
• AIJ is skew, tIJ is translation
Implementation – Links
Part J
canonical view
Part I
canonical view
Implementation – Links
Part I
canonical view
Part J
canonical view
• Learn data structure from images
• Apply to new image by recognizing parts and
selecting model that best accounts for their
Implementation – Learning Parts
• Tricky implementation!
• Part = collection of SIFT features
For each pair of images of the same
1. Find set M of shared SIFT features
2. RANSAC M to find a group of pairs
that transform together
3. Group close-together parts of M
into candidate parts
Background: What is RANSAC?
• Finds subset of data that
is accounted for by some
model; ignores outliers
1. Guess points
2. Fit model
3. Select matching points
4. Calculate error
• In our case: find points for which a
homographic transformation of
the points in image I yield the
points in image J
Implementation – Canonical Views
• Goal: front-facing view of part
• Construct directed graph
– Direction means “more front-facing”
• Traverse to find canonical view
• How to go from pairwise-defined to graph?
• Upshot: a collection of
parts with canonical
views and links
Recognizing a New Image
1. Extract SIFT features
2. Use scanning windows to get 5 best canonical
part matches
3. For every pair of found parts, for each model,
score how well the model accounts for their
relative appearances
4. Select the model with the best score
• Not stellar
• New test set
– Overfit?
– Comparison?
• Low performance may make it useless as a
system, but the data structure is very nice
• Implementation has a lot of tricky parts
– Doesn’t seem to select great canonical parts
– I wonder if there’s a simpler way
– Are SIFT features the right choice?
Extremely Confusing Figure
• “Each dashed box indicates a particular view. A
subset of the canonical parts is presented for each
view. Part relationships are denoted by arrows.”
Overall Conclusions
• 3D is just starting out. Doesn’t work too well
right now, but neither did MV at the
• LayoutCRF:
– Nice method to learn 2D patches
• 3D Object Categorization:
– Nice conceptual model relating 3D parts
• Possible to combine strengths of both?