Visual Element Discovery as Discriminative Mode Seeking

advertisement
Visual Element Discovery as
Discriminative Mode Seeking
Carl Doersch, Abhinav Gupta, Alexei A. Efros
CMU
CMU
UCB
The need for mid-level representations
6 billion images
70 billion images
1 billion images
served daily
10 billion images
60 hours uploaded per
minute
From
:
Almost 90% of web traffic is visual!
Discriminative patches
• Visual words are too simple
• Objects are too difficult
• Something in the middle?
(Felzenswalb et al. 2008)
(Singh et al. 2012)
Mid-level “Visual Elements”
(Singh et al. 2012)
(Doersch et al. 2012)
• Simple enough to be detected easily
• Complex enough to be meaningful
– “Meaningful” as measured by weak labels
Mid-level “Visual Elements”
(Singh et al. 2012)
(Doersch et al. 2012)
•
•
•
•
•
Doersch et al. 2012
Singh et al. 2012
Jain et al. 2013
Endres et al. 2013
Juneja et al. 2013
•
•
•
•
•
Li et al. 2013
Sun et al. 2013
Wang et al. 2013
Fouhey et al. 2013
Lee et al. 2013
Our goal
• Provide a mathematical optimization for visual
elements
• Improve performance of mid-level
representations.
Elements as Patch Classifiers
What if the labels are weak?
• E.g. image has horse/no-horse
• (Or even weaker, like Paris/not-Paris)
• Idea: Label these
all as “horse”
• Problem: 10,000 patches per image, most of
which are unclassifiable.
The weaker the label, the bigger the
problem.
Task: Learn to classify Paris from Not-Paris
Paris
Also Paris
Other approaches
• Latent SVM:
– Assumes we have one instance per positive image
• Multiple instance learning
– Not clear how to define the bags
What if the labels are weak?
(Singh et al. 2012)
(Doersch et al. 2012)
• Negatives are negatives, positives might not be
positive
• Most of our data can be ignored
• First: how to cluster without clustering everything
Mean shift
Mean shift
Mean shift
Patch distances
Input Nearest neighbor
Min distance:
2.59e-4
Max distance: 1.22e-4
Mean shift
Negative Set
Paris
Not Paris
Negative Set
Paris
Not Paris
Density Ratios
Paris
Not Paris
Density Ratios
Paris
Not Paris
Adaptive Bandwidth
Positive
Negative
Bandwidth
Discriminative Mode Seeking
• Find local optima of an estimate of the density
ratio
• Allow an adaptive bandwidth
• Be extremely fast
– Minimize the number of passes through the data
Discriminative Mode Seeking
• Mean shift: maximize (w.r.t. w)
Bandwidth
Patch Feature
Distance
Centroid
w
b
Discriminative Mode Seeking
B(w) is the value of b satisfying:
Discriminative Mode Seeking
optimize
s.t.
• Distance metric: Normalized Correlation
Discriminative Mode Seeking
optimize
s.t.
Positive
Negative
w
Optimization
s.t.
• Initialization is straightforward
• For each element, just keep around ~500
patches where wTx - b > 0
• Trivially parallelizable in MapReduce.
• Optimization is piecewise quadratic
Evaluation via Purity-Coverage Plot
• Analogous to Precision-Recall Plot
Low Purity
Element 1
Element 2
Element 3
Element 4
Element 5
High purity, Low Coverage
Element 1
Element 2
Element 3
Element 4
Element 5
Purity-Coverage Curve
1
0.8
Purity
0.6
0.4
0.2
Paris
Not Paris
0
0
2
4
6
Coverage
8
10
x1e4 pixels
Purity-Coverage Curve
1
0.8
Purity
0.6
0.4
0.2
Paris
Not Paris
0
0
2
4
6
Coverage
8
10
x1e4 pixels
Purity-Coverage Curve
• Coverage for multiple elements is simply the
union.
This work
This work, no inter-element
SVM Retrained 5x (Doersch et al. 2012)
LDA Retrained 5x
LDA Retrained
Exemplar LDA (Hariharan et al. 2012)
Purity-Coverage
Top 25 Elements
1
Top 200 Elements
0.98
0.96
Purity
0.94
0.92
0.9
0.88
0.86
0.84
0.82
0.8
0
0.1
0.2
0.3
0.4
Coverage (fraction of positive dataset)
0.5
0
0.2
0.4
0.6
Coverage (fraction of positive dataset)
0.8
Results on Indoor 67 Scenes
Kitchen
Elevator
Grocery
Bakery
Bowling
Bathroom
Results on Indoor 67 Scenes
Method
Accuracy Method
Accuracy
ROI+Gist (Quattoni et al.)
26.05 miSVM (Li et al.)
46.40
MM-Scene (Zhu et al.)
28.00 D. Patches (full) (Singh et al.)
49.40
Scene-DPM (Pandley et al.)
30.40 MMDL (Wang et al.)
50.15
CENTRIST (Wu et al.)
36.90 Discr. Parts (Sun et al.)
51.40
Object Bank (Li et al.)
37.60 IFV (Juneja et al.)
60.77
RBoW (Parizi et al.)
37.93 Bag of Parts+IFV (Juneja et al.)
63.10
Discr. Patches (Singh et al.)
38.10 Ours (no inter-element)
63.36
Latent Pyramid. (Sadeghi et al.)
44.84 Ours
64.03
Bag of Parts (Juneja et al.)
46.10 Ours+IFV
66.87
Qualitative Indoor67 Results
Indoor67: Error Analysis
Ground Truth (GT): deli
Guess: grocery store
GT: corridor
Guess: staircase
GT: museum
Guess: garage
GT: laundromat
Guess: closet
Thank you!
More results at
http://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/
Ground Truth (GT): deli
GT: museum
Guess: grocery store
GT: corridor
Paris Elements • Indoor 67 Elements
garage• SourceGT:
laundromat
Indoor 67Guess:
Heatmaps
code
(soon)
Guess: staircase
Guess: closet
Some New Paris Elements
Download