Nonparametric Scene Parsing via Label Transfer

advertisement
Nonparametric Scene Parsing via
Label Transfer
Author: Ce Liu Jenny Yuen Antonio Torralba
Group 3
Presenter: Hongsheng Yang
Adapted from Ce Liu's CVPR2009 slides
The task of object recognition and scene parsing
window
tree
sky
road
field
car
building
unlabeled
Output
Input
Adapted from Ce Liu's CVPR2009 slides
Training based object recognition and scene parsing
• Sliding window method
- Train a classifier for a fixed-size window (e.g., car vs. non-car)
- Try all possible scales and locations, run the classifier
- Merge multiple detections
• Texton method
- Extract pixel-wise high-dimensional feature vectors
- Train a multi-class classifier
- Spatial regularity: neighboring pixels should agree
J. Shotton et al. Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. ECCV, 2006
Adapted from Ce Liu's CVPR2009 slides
Label Transfer - Intuition
• I’ve seen and recognized a few similar
pictures before.
• If I could correspond each pixels in the
query image to the pixels in the previous
seen images,
• then I could infer how the new query image
looks like based on the database images.
Adapted from Ce Liu's CVPR2009 slides
Label Transfer - Pipeline
> Given a query image
• Find another annotated image with similar
scene
• Find dense correspondences between these
two images
• Warp the annotation according to the
correspondences
Query
from Database
window
tree
sky
road
field
> Two key components:
• A large, annotated database
car
building
unlabeled
• Good correspondences for label transfer
Warped annotation
Adapted from Ce Liu's CVPR2009 slides
User annotation
Large image databases
• A subset of LabelMe database (outdoor scenes)
• 2688 in total, 2488 for training, 200 for test
• 33 object categories + “unlabeled”, including street, beach, mountains,
fields, buildings, etc.
B. Russell et al. LabelMe: a database and web-based tool for image annotation. IJCV 2008.
Adapted from Ce Liu's CVPR2009 slides
A good correspondence approach
• SIFT flow - analogous to - Optical flow
• Scene level
Image level
• SIFT Flow – dense SIFT, spatial regularization
Optical flow
Adapted from Ce Liu's CVPR2009 slides
Input
Support
Optical flow
Warping of optical flow
SIFT flow
Dense SIFT image (RGB = first 3 components of 128D SIFT)
Adapted from Ce Liu's CVPR2009 slides
SIFT flow
Warping of SIFTflow
Objective energy function is similar to that of optical flow:
Data term (reconstruction)
Small displacement bias
Smoothness term
• MRF - p, q: grid coordinate, w: flow vector, u, v: x- and
y-components, s1, s2: SIFT descriptors
C. Liu et al. SIFT Flow: Dense Correspondence across Scenes and its Applications. TPAMI 2011
Adapted from Ce Liu's CVPR2009 slides
Design of Nonparametric Scene Parsing System
• Scene retrieval: retrieve a set of nearest neighbors in the database for
a given query image. (One image is not good enough, using GIST as
matching score)
• Compute the SIFT flow from the query to each nearest neighbor, and
use the achieved minimum energy to re-rank the nearest neighbors.
Further select the top M re-ranked retrievals to create the voting
candidate set.
Adapted from Ce Liu's CVPR2009 slides
Query
SIFT
Candidate set
SIFT
Annotation
Adapted from Ce Liu's CVPR2009 slides
SIFT flow
Warped Annotations
• Another multi-labeling MRF to integrate the result of candidate
annotated images, including per-pixel likelihood, spatial prior,
neighborhood spatial consistency
Warped Anotation
Query
SIFT
Parsing
Candidate set
SIFT
Annotation
SIFT flow
Adapted from Ce Liu's CVPR2009 slides
Warped Annotations
Ground
truth
Scene parsing results (1)
Query
Best match
Annotation of
best match
Warped best
match to query
Parsing result of
label transfer
Ground truth
Scene parsing results (2)
Query
Best match
Annotation of
best match
Warped best
match to query
Parsing result
Ground truth
Pixel-wise performance
Our system
optimized parameters
Per-pixel rate 74.75%
Pixel-wise frequency
count of each class
Stuff
Small, discrete objects
The relative importance of different components
of the parsing system
Conclusion
• Label transfer provides a novel data-driven way to understand scene.
• A few future work are conducted from this line: e.g. Superparsing
• Need a better robust correspondence approach: e.g. scale rotation
invariant dense descriptor? complexity? -> one up-to-date work:
Deformable Spatial Pyramid Matching for Fast Dense
Correspondences Problem, Jaechul Kim, Ce Liu, Fei Sha and Kristen
Grauman, CVPR 2013
Adapted from Ce Liu's CVPR2009 slides
Download