The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects

advertisement
The Layout Consistent Random Field
for Recognizing and Segmenting
Partially Occluded Objects
By John Winn & Jamie Shotton
CVPR 2006
presented by Tomasz Malisiewicz
for CMU’s Misc-Read
April 26, 2006
Talk Overview
Objective
CRF  HRF  LayoutCRF
LayoutCRF Potentials
Learning
Inference
Results
Summary
LayoutCRF Objectives
 To detect and segment partially occluded
objects of a known category
 To detect multiple object instances which
possibly occlude each other
 To define a part labeling which densely covers
the object of interest
 To model various types of occlusions (FG/BG,
BG/FG, FG/FG)
Conditional Random Field (Lafferty ‘01)
 A random field
globally conditioned
on the observation X
 Discriminative
framework where we
model P(Y|X) and do
not explicitly model
the marginal P(X)
Hidden Random Field (Szumer ‘05)
 Extension to CRF
with hidden layer of
variables
 The hidden variables
represent object
‘parts’ in this work
Deterministic Mapping
LayoutCRF
 An HRF with asymmetric
pair-wise potentials,
extended with a set of
discrete valued instance
transformations {T1,…,TM}
M foreground object instances
LayoutCRF
 *only one non-background
class is considered at a time
 M+1 instance labels: yi \in
{0,1,…,M}
 Each object instance has a
separate set of H part labels hi
\in {0,1,…,H x M}
LayoutCRF
 Each transformation T
represents the translation and
left/right flip of an object
instance by indexing all possible
integer pixel translations for
each flip orientation
 Each T is linked to every hi
LayoutCRF Potentials
 Unary Potentials: Use local
information to infer part labels
(randomized decision trees)
 Asymmetric Pair-wise Potentials:
Measure local part compatibilities
 Instance Potentials: Encourage
correct long-range spatial layout of
parts for each object instance
LayoutCRF Potentials: Unary
 A set of decision trees; each
trained on a random subset of
the data (improves
generalization and efficiency)
 Each DT returns a distribution
over part labels; K DTs are
averaged
 Each non-terminal node in the
DT evaluates an intensity
difference or absolute intensity
difference between a learned
pair of pixels and compares
this to a learned threshold
Window of size
D around pixel i
Layout Consistency (for pair-wise potentials)
Colors represent part labels
A label is layout-consistent with itself, and with those labels that are adjacent
in the grid ordering above
Neighboring pixels whose labels are not layout consistent are not part of the
same object
Distinguished Transitions
1. Background: hi and hj are BG labels
2. Consistent FG: hi and hj are layout-consistent FG labels
3. Object edge: one label is BG, the other is part label lying on object edge
4. Class occlusion: one label is interior FG label, the other is a BG label
5. Instance occlusion: both are FG labels, but not layout-consistent
(at least one label is object edge)
6. Inconsistent Interior FG: both labels are interior FG labels, but not layout-consistent (rare)
LayoutCRF Potentials: Pair-wise
 The value of the pairwise potential varies
according to the
transition type
 eij is image-based
edge cost which
encourages object
edges to align with
image boundaries
Contrast term estimated for each image
LayoutCRF Potentials: Instance
 Look-up tables
(histograms)
 Encourage the correct
spatial layout of parts
for each object
instance by
gravitating parts
towards their
expected positions,
given transformation
of the instance
Weighs strength of potential
Returns position i inverse-transformed
by the transformation Tm
LayoutCRF: What comes next?
We just defined the LayoutCRF and its
potentials
First we need to learn the parameters of
the LayoutCRF from labeled training data
Then we apply the model to a new image
(inference) to obtain a detection and
segmentation
Learning (the model parameters)
Supervised Algorithm requires foreground
/ background segmentation, but not part
labels
Unary Potential and Part Labeling
 Part labeling for the
training images is
initialized based on a
dense regular grid
that fits the object
bounding box
 Unary classifiers are
learned, then new
labeling is inferred
 *Two iterations are
sufficient
Dense grid is spatially quantized such that a unique
part covers several pixels (on average 8x8)
Learning Pair-wise Potentials
Parameters are learned via crossvalidation by a search over a sensible
range of positive values
Gradient-based ML learning too slow;
(future work: more efficient means of
learning these parameters)
Learning Instance Potentials
 Deformed part labelings of all training images
are aligned on their segmentation mask
centroids
 A bounding box is placed relative to the centroid
around the part labelings
 For each pixel within the bounding box, the
distribution over part labels is learned by
histogramming the deformed training image
labels
Empirical Distribution over parts h given position w
Inference (on a novel image)
 Initially, we don’t know the number of object
instances and their locations
 Step1: collapse part labels across instances,
merge instance labels together, and remove
transformations. MAP inference is performed to
obtain part labeling image h*
Inference (on a novel image)
 Step2: determine number of layout-consistent
regions in h* using connected component
analysis; pixels are connected if they are layoutconsistent
 This gives us an estimate of M (number of object
instances) and initial instance labeling
 estimate T separately for each instance label
Inference (on a novel image)
Step3: re-run MAP inference with full
model to get full h, which now
distinguishes between instances
Approximate MAP inference via Annealed
Expansion Move Algorithm
 Alternating regular
grid expansions at
random offset and
standard alpha
expansions (for
changing to BG label)
 Annealing schedule
weakens pair-wise
potential during early
stages by raising to a
power less than one
Results on Cars
*Training on images
that contain only
one visible car
instance
False Positive
Segmentation Accuracy on Cars
Evaluated segmentation accuracy on 20
randomly chosen images of cars,
containing 34 car instances
Segmentation Accuracy per instance: ratio
of intersection to the union of the detected
and ground-truth segmentations = .67
Results on Faces
Multi-class LayoutCRF (Future Work)
Summary
LayoutCRF used to detect multiple
instances of an object of a given class
Deformed-grid part labeling densely
covers the object
Simultaneous detection and segmentation
Questions?
References
J. Lafferty, A. McCallum, and F. Pereira.
Conditional random fields: Probabilistic models
for segmenting and labeling sequence data. In
International Conference on Machine Learning,
2001.
M. Szummer. Learning diagram parts with hidden
random fields. In International Conference on
Document Analysis and Recognition, 2005.
J. Winn and J. Shotton. The Layout Consistent
Random Field for Recognizing and Segmenting
Partially Occluded Objects. In CVPR 2006.
Download