HOGgles: Visualizing Object Detection Features (to be appeared in

advertisement
HOGgles: Visualizing Object
Detection Features (to be appeared in ICCV 2013)
Carl Vondrick Aditya Khosla Tomasz Malisiewicz and Antonio Torralba ,MIT
Presented By: Yonatan Dishon
Nov 2013
Today talk



Motivation
Related Work
HOGgles under the hood.






3 Baselines + main algorithm
Limitations
Quantative + Qualitative evaluation
Human+HOG detector
Paper Conclusion
Future Development
So why HOGgles?
Image from: C. Vondrick, A. Khosla, T. Malisiewicz, A. Torralba. "HOGgles: Visualizing Object Detection Features" 2013
So why HOGgles?
!?
Maybe I should put my HOGgles!!
HOGgles Visualization
So why HOGgles?


This is a visualization of the descriptor space – this is what a
classification/detection algorithm sees!
So which of the following would you classify as a car?
1
2
3
4
5
6
7

Remember Humans are the “perfect” classifier!

Object categories (PASCAL VOC): Airplane, Bicycle, Bird, Boat , Bottle, Bus, Car, Cat,
Chair, Cow, Table, Dog, Horse, Motorbike, Person, Potted Plant, Sheep, Sofa, Train,
TV/Monitor
Motivation

So why did my detector failed?




Training set Maybe not a good one?
Learning Algorithm Maybe should have been different?
Features Maybe there are better features for this kind of
problem?
Visualizing the Feature space can bring us to an
intuitive understanding of our detection system
limitations and failures
HOGgles Contributions




A tool to explain some of the failure of object detection
systems.
Algorithm to present the feature space of object
detectors (general features – not only HOG!).
4 different algorithms to do so are presented.
Public feature visualization toolbox http://web.mit.edu/vondrick/ihog/#code
Today talk



Motivation
Related Work
HOGgles under the hood.






3 Baselines + main algorithm
Limitations
Quantative + Qualitative evaluation
Human+HOG detector
Paper Conclusion
Future Development
Related Work (1)

Reconstruct an image given keypoint of SIFT based
on a huge database
P. Weinzaepfel, H. J´egou, and P. P´erez. Reconstructing an image
from its local descriptors. In CVPR, 2011

Image from: P. Weinzaepfel, H. J´egou, and P. P´erez. Reconstructing an image from its local descriptors. In CVPR, 2011.
Related Work (1)
Related Work (1)
calculate SIFT
elliptic
region of interest
copied patches
blending
affine normalization
to square patch
interpolation
original image
Reconstructing an image from its local descriptors, Philippe Weinzaepfel, Hervé Jégou and Patrick Pérez, Proc. IEEE CVPR’11.
Slide credit: Ezgi Mercan
Related Work (1)


Image from: P. Weinzaepfel, H. J´egou, and P. P´erez. Reconstructing an image from its local
descriptors. In CVPR, 2011.
Website for more examples :
http://www.irisa.fr/texmex/people/jegou/projects/reconstructing/index.html
Related Work (2)

Reconstruct an image given only LBP features
E. d’Angelo, A. Alahi, and P.Vandergheynst. Beyond bits: Reconstructing images from
local binary descriptors. ICPR, 2012. 2





Using LBD descriptors (Local Binary Descriptors)
– BRIEF and FREAK
No external information!
Reconstruction as a regularized inverse problem
A. Alahi, R. Ortiz, and P.Vandergheynst. FREAK: Fast Retina Keypoint. In IEEE Conference on Computer
Vision and Pattern Recognition (To Appear), 2012.
M. Calonder,V. Lepetit, C. Strecha, and P. Fua. BRIEF: Binary Robust Independent Elementary Features.
Computer Vision–ECCV 2010, pages 778–792, 2010.
Related Work (2)
Image reconstruction results of
FREAK (top row) and BRIEF (bottom row) descriptors.
E. d’Angelo, A. Alahi, and P.Vandergheynst. Beyond bits: Reconstructing images from local binary descriptors.
ICPR, 2012. 2
Today talk



Motivation
Related Work
HOGgles under the hood.






3 Baselines + main algorithm
Limitations
Quantative + Qualitative evaluation
Human+HOG detector
Paper Conclusion
Future Development
HOGgles – under the hood



The problem of feature visualization as a feature inversion
problem.
Given a feature vector - what was the image/patch
that created it?
D
Let x R be an image and y    x  be the
corresponding HOG feature descriptor.
  is a many to one function.
The inversion problem cannot be solved
analytically!
HOGgles under the hood

The problem is formalized as an optimization problem –
given a descriptor y we seek an image x that minimize:
x 
*
1
 y   arg min   x   y 2
2
xRD


This function isn’t convex
Trying to find a minima with ordinary optimization
algorithms didn’t work (Steepest decent and Newton’s
method).
Today talk



Motivation
Related Work
HOGgles under the hood.






3 Baselines + main algorithm
Limitations
Quantative + Qualitative evaluation
Human+HOG detector
Paper Conclusion
Future Development
HOGgles under the hood

4 algorithms are showed – 3 as a baselines and one is
offered as the main algorithm.
Baseline 1: Exampler LDA

Baseline 2: Ridge Regression

Baseline 3: Direct Optimization

Main Algorithm : Paired Dictionary

HOGgles under the hood (Baseline 1 -ELDA)

Baseline 1: Exampler LDA (B.Hariharan, , J.Malik & D.Ramanan ECCV 2012)

HOG inverse is the average of the top K detections of the ELDA detector in RGB space .
 1  y 
HOGgles under the hood (Baseline 1 -ELDA)
Slide credit: Ezgi Mercan
HOGgles under the hood (Baseline 1 -ELDA)

PROs:



Simple.
Surprisingly accurate results! Even when the database doesn’t
contain the category of HOG template!
CONs:


Computationally expensive! – running an object detector over
a large database
Yields blurred results.
HOGgles under the hood (Baseline 2 –Ridge
Regression)

Baseline 2: Ridge Regression




Statistically most likely image given a HOG descriptor.
Calculating the most probable grayscale image given its
HOG feature. P  X | Y
Modeling P  X, Y  as N   ,  
The HOG inverse (the visualization) is given by
 1B  y   XY 1YY  y  Y   X
are estimated on a large database
Single matrix multiplication!
  and X

HOGgles under the hood (Baseline 2 –Ridge
Regression)

PROs:



Simple – this is a matrix multiplication!
Very fast! – under a second for inversion.
CONs:

Inversion yields blur images.
HOGgles under the hood (Baseline 3 -Direct)

Baseline 3: Direct Optimization
Describing a natural image basis U  R DK
K
D
 Any image x  R can be encoded by coefficients   R
in this basis: x  U 
2
 And we wish to find  *  arg min  U    y 2

R K
HOGgles under the hood (Baseline 3 -Direct)

PROs:


Recover high frequencies
CONs:

adding noise to the image
HOGgles under the hood (the main
algorithm –Pair Dictionary)


Let x  R D be an image and y  Rd be its HOG descriptor.
Suppose we write x and y in terms of bases U  R DK and
V  R d  K respectively where  are shared coefficients
y can be projected to V basis and then to image basis U
HOGgles under the hood (Main Algorithm –
PairDict)
V  R d K
U  R DK
HOGgles under the hood (Baseline 4 –
PairDict)

How the bases U and V are found?

Solving Pair dictionaries learning problem.

The objective is simplified to a standard sparse coding and
* 
dictionary learning problem, optimized with SPAMS
* J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In ICML, 2009. 4
SPAMS – SParse Modeling Software, Code available at http://spams-devel.gforge.inria.fr/
HOGgles under the hood (Baseline 4 –
PairDict)




Dictionaries optimization time is a few hours (offline).
3
U and V are estimated with dictionary size K  10
Training samples from a large database N  106
examples for U & V pairs – notice the correlation of the
dictionaries
HOGgles visual results
Today talk



Motivation
Related Work
HOGgles under the hood.






3 Baselines + main algorithm
Limitations
Quantative + Qualitative evaluation
Human+HOG detector
Paper Conclusion
Future Development
HOGgles – Limitations

There exist a better visualization than paired dictionaries
although it may not be traceable to construct it.

On Recursive iterative solution some high frequencies are
lost.
HOGgles – Limitations – cont.

Inversion is sensitive to the HOG dimensionality
Today talk



Motivation
Related Work
HOGgles under the hood.






3 Baselines + main algorithm
Limitations
Quantative + Qualitative evaluation
Human+HOG detector
Paper Conclusion
Future Development
Evaluation of Inversions



PASCAL VOC 2011 dataset.
Inverting patches correspond to objects.
Quantitative evaluation:


How well each pixel in x is reconstructed from y by each
Algorithm?
Qualitative evaluation:

How well the high level content is saved? (Human research
using MTurk platform).
Evaluation of Inversions
Quantitative

Mean normalized cross correlation of inverse image to
ground truth. (Higher is better , max. is 1)
Evaluation of Performance
ELDA
Ridge
Direct
PairDict
0.80
0.75
0.71
0.70
0.67
0.65
0.64
0.65
0.62
0.64
0.61
0.59
0.60
0.56
0.55
0.50
0.45
0.40
bicycle
bottle
car
cat
chair
table
motorbike
person
Mean
Evaluation of Inversions
Qualitative


MTurk workers where asked to classified inversions to 1
of 20 categories.
MIT PhD. In CV refers as experts. Trying the same task
with HOG Glyphs.
(*) Numbers are percentage classified correctly, chance is 0.05
Evaluation of Inversions
Qualitative
Evaluation of Vizualization Performance
0.70
ELDA
Ridge
Direct
PairDict
Glyph
Expert
0.68
0.59
0.60
0.50
0.45
0.39
0.40
0.38
0.31
0.30
0.24
0.20
0.20
0.22
0.10
0.00
bicycle
bottle
car
-0.10

Graphs credit : Ezgi Mercan
cat
chair
table
motorbike
person
Mean
Evaluation of Inversions
Qualitative

Gliphs vs. HOGgles
Today talk



Motivation
Related Work
HOGgles under the hood.






3 Baselines + main algorithm
Limitations
Quantative + Qualitative evaluation
Human+HOG detector
Paper Conclusion
Future Development
Human+HOG detector

Get Insight on performance of HOG with the perfect
learning algorithm (people).

Large Human Expirement consist of:




MTurk workers
Dataset – Top detections from DPM on PASCAL 2007 VOC.
5000 windows per category. 20% are true positive.
25 votes per window.
Human+HOG detector
Results
Paper Conclusions




DPM is operating very close to the performance limit of
HOG.
HOG may be too lossy of a descriptor for high
performance object detection.
The features we are using are the one to blame in current
novel object detection algorithms.
To advance to the next level recognition – finer details
and higher level information capturing features are
needed to be built.
Written Level of the Paper

Pros:






Well written
Well referenced
Novel solution
Large and detailed Human experiment
Website.
Cons:



Details/examples on baseline algorithms are lacking
The chosen algorithm settings are lacking
Some conclusions on the Human+HOG are questionable
Future development




Applying visualization on other descriptors.
Color reconstruction of HOG features.
Developing new feature that can be more discriminate.
Developing Algorithms that better model the interaction
of the simple atoms of the image.
Links



PASCAL 2 challenge
Deformable Part Model
HOGgles
Download