Image and video descriptors - Weizmann Institute of Science

advertisement
Image and video descriptors
Advanced Topics in Computer Vision
Spring 2010
Weizmann Institute of Science
Oded Shahar and Gil Levi
Outline
• Overview
• Image Descriptors
– Histograms of Oriented Gradients Descriptors
– Shape Descriptors
– Color Descriptors
• Video Descriptors
Overview - Motivation
• The problem we are trying to solve is
image similarity.
• Given two images (or image regions) – are
they similar or not ?
Overview - Motivation
• Solution: Image Descriptors.
• An image descriptors “describes” a region
in an image.
• To compare two such regions we will
compare their descriptors.
Overview - Descriptor
To compare two images, we will compare
their descriptors
Similar?
Descriptor
Function
Similar?
Overview - Similarity
• But what is similar to you ?
• Depends on the application !
Overview
• Image (or region) similarity is used in
many CV applications, for example:
–
–
–
–
–
–
–
–
Object recognition
Scene classification
Image registration
Image retrieval
Robot localization
Template matching
Building panorama
And many more…
Overview
• Example – 3D reconstruction from stereo
images.
3
75
12
80
15
30
39
80
102
103
110
23
150
195
200
196
208
19
• Comparing the pixels as they are, will not
work!
Overview
• Descriptors provide a means for
comparing images or image regions.
• Descriptors allow certain differences
between the regions – scale, rotation,
illumination changes, noise, shape, etc.
Overview - Motivation
Similar ?
Descriptor
Function
•Again, can’t take the pixels alone…
Similar ?
Overview
Comonly used as follows
1. Extract features from the image as small regions
2. Describe each region using a feature descriptor
3. Use the descriptors in application (comparison, training
a classifier, etc.)
Overview
• Main problems
– Features Detection –
Where to compute the descriptors? will cover briefly
– Feature Description (Descriptors)
How to compute descriptors? today
– Feature Comparison
How to compare two descriptors? will cover briefly
Overview - Features Detection
Detection Methods
Where to compute the descriptors?
• Grid
• Key-Points
• Global
Overview - Features Detection
Key-Points as Detector Output
• Can be
– Points
– Regions (of different
orientation, scale and affine
trans.)
•
•
•
•
Squares
Ellipses
Circles
Etc..
Overview – Descriptor Comparison
Given two region description, how to
compare them?
•
•
Usually descriptor come with it’s own
distance function
Many descriptors use L2 distance
Overview – Descriptor Invariance
• Different descriptors measure different similarity
• Descriptors can have invariance for visual
effects
–
–
–
–
Illumination
Noise
Colors
Texture
Similar ?
• Different applications require different invariance
therefore require different descriptors
Outline
• Overview
• Image Descriptors
– Histograms of Oriented Gradients Descriptors
– Shape Descriptors
– Color Descriptors
• Video Descriptors
Descriptor
To compare two images, we will compare
their descriptors
Similar?
Descriptor
Function
Similar?
Descriptors
Types of descriptors
• Intensity based
• Histogram
• Gradient based
• Color Based
• Frequency
• Shape
• Combination of the above
Descriptors
Why not use patches?
• Very large representation.
• Not invariant to small deformations in the descriptor location.
• Not invariant to changes in illumination.
Descriptors
Intensity Histogram
0
255
- Not invariant to light intensity change
- Does not capture geometric information
Descriptors
Histogram of image gradients
• Normalize for light intensity invariance
• Does not capture geometric information
Descriptors
Solution:
• Divide the area
• For each section compute it’s own histogram
SIFT - David Lowe 1999
Descriptors - SIFT
How to compute SIFT descriptor
Input: an image and a location to compute the descriptor
16 x 16
Step 1: Warp the image to the correct orientation and scale,
and than extract the feature as 16x16 pixels
Descriptors - SIFT
Step 2: Compute the gradient for each pixel (direction and magnitude)
16 x 16
Step 3: Divide the pixels into 16, 4x4 squares
Descriptors - SIFT
Step 4: For each square, compute gradient direction histogram over 8
directions.
The result: 128 dimensions feature vector.
Descriptors - SIFT
• Warp the feature into 16x16 square.
• Divide into 16, 4x4 squares.
• For each square, compute an histogram of the gradient
directions.
=> Feature vector (128)
Descriptors - SIFT
• Weighted by magnitude and Gaussian window ( σ is
half the window size)
• Normalize the feature to unit vector
• Use L2 distance to compare features
Can use other distance functions
• X^2 (chi square)
• Earth mover’s distance
Descriptors - SIFT
Invariance to illumination
• Gradient are invariant to Light intensity shift (i.e. add
a scalar to all the pixels)
• Normalization to unit length add invariance to light
intensity change (i.e. multiply all the pixels by a scalar)
Invariance to shift and rotation
• Histograms does not contains any geometric
information
• Using 16 histograms allows to preserve
geometric information.
Descriptors - GLOH
• Similar to SIFT
• Divide the feature into log-polar bins instead of dividing
the feature into square.
– 17 log-polar location bins
– 16 orientation bins
– We get 17x16=272 dimensions.
Analyze the 17x16=272 Dimensions
Apply PCA analysis, keep 128 components
C. S. Krystian Mikolajczyk. A performance evaluation of local descriptors. TPAMI 2005
SURF
• Use integral images to detect and describe SIFT
like features
• SURF describes image faster than SIFT by 3
times
• SURF is not as well as SIFT on invariance to
illumination change and viewpoint change
Descriptors
Histograms of Oriented Gradients Descriptors
SIFT David Lowe 1999
GLOH Mikolajczyk K., Schmid C 2005
SURF Bay H., Ess A., Tuytelaars T., Van Gool L 2008
Outline
• Overview
• Image Descriptors
– Histograms of Oriented Gradients Descriptors
– Shape Descriptors
– Color Descriptors
• Video Descriptors
Descriptors
Descriptors - Shape Context
=?
Assume we have a good edge detector
Take a patch of edges?
Not invariant to small deformations in the shape
Descriptors - Shape Context
• Quantize the edges surface using a log-polar binning
• In each bin, sum the number of edge points
Descriptors - Shape Context
Descriptors - Shape Context
Complex Notion of Similarity
The Local Self-Similarity Descriptor
Input image
Correlation
surface
e  SSD
Image
descriptor
The Local Self-Similarity Descriptor
3
2
1
1
2
3
1
2
3
3
2
1
The Local Self-Similarity Descriptor
Input image
Correlation
surface
Properties & Benefits:
Image
descriptor
MAX
1. A unified treatment of repetitive patterns, color, texture, edges
2. Captures the shape of a local region
3. Invariant to appearance
4. Accounts for small local affine & non-rigid deformations
Color
Texture
Edges
Template image:
Descriptors
Shape Descriptors
Allows measuring of shape similarity
Shape Context
Belongie S., Malik J., Puzicha J. Shape Matching and Object Recognition Using Shape Contexts. PAMI, 2002.
Local Self-Similarity
Shechtman E., Irani M. Matching Local Self-Similarities across Images and Videos. CVPR, 2007.
Geometric Blur
rg A. C., Malik J. Geometric Blur for Template Matching. CVPR, 2001.
Outperform the commonly used SIFT in object classification task
Horster E., Greif T., Lienhart R., Slaney M. Comparing local feature descriptors in pLSA-based image models.
Outline
• Overview
• Image Descriptors
– Histograms of Oriented Gradients Descriptors
– Shape Descriptors
– Color Descriptors
• Video Descriptors
Color Descriptors
Color Descriptors
Color spaces
• RGB
• HSV
• Opponent
Color Descriptors
Opponent color space
• intensity information is represented by channel O3
• color information is represented by channel O1 and O2
• O1 and O2 are invariant to offset
Color Descriptors
• RGB color histogram
• Opponent O1, O2
• Color moments
• Use all generalized color moments up to the second
degree and the first order.
• Gives information on the distribution of the colors.
Color Descriptors
• RGB-SIFT descriptors are computed for every RGB channel
independently
– Normalize each channel separately
– Invariant to light color change
• rg-SIFT
- SIFT descriptors over to r and g channels of the
normalized-RGB space (2x128 dimensions per descriptor)
• OpponentSIFT - describes all the channels in the opponent color
space
• C-SIFT
- Use O1/O3 and O2/O3 of the opponent color space
(2x128 dimensions per descriptor)
– Scale-invariant with respect to light intensity.
– Due to the definition of the color space, the offset does not cancel
out when taking the derivative
G. J. Burghouts and J. M. Geusebroek
Performance evaluation of local color invariants 2009
Color Descriptors
Studies the invariance properties and the distinctiveness of color descriptors
Light intensity change
Light color change
Light intensity shift
Light color change and shift
Light intensity shift and change
Color Descriptors
Color Descriptors
Color Descriptors
Increased invariance can reduce discriminative power
Color Descriptors
Descriptor performance on image benchmark
Color Descriptors
Descriptors
How to chose your descriptor?
What is the similarity that you need for your
application?
Descriptors
Descriptors
Name
Capture
SIFT
Gradient histograms
Texture, gradients
GLOH
Variant of SIFT, log-polar descriptor
Texture, gradients
SURF
Faster variant of SIFT with lower
performance
Texture, gradients
Shape
Context
Histogram of edges, good for shapes
description
Shape, edges
SelfSimilarity
Higher level shape description, Invariant Shape
to appearance
RGB-SIFT SIFT descriptors are computed for every Texture, gradients
RGB channel independently
C-SIFT
SIFT base on the opponent color space, Texture, gradients, color
shown to be better then SIFT for object
and scene recognition
Outline
• Overview
• Image Descriptors
– Histograms of Oriented Gradients Descriptors
– Shape Descriptors
– Color Descriptors
• Video Descriptors
Video Descriptors
Application:
Action recognition
Video: More then just a
sequence of images
Want to capture
temporal information
Video Descriptors
• Space-Time SIFT
64-directions histogram
P. Scovanner, S. Ali, M. Shah
A 3-dimensional sift descriptor and its
application to action recognition - 2007
Video Descriptors
Actions as Space-Time Shapes
3D Shape Context
Represent an action in a video sequence by a 3D point
cloud extracted by sampling 2D silhouettes over time
M. Grundmann, F. Meier, and I. Essa (2008) “3D Shape Context and Distance Transform for Action
Recognition”
The Local Self-Similarity Descriptor
in Video
Input
video
y
Correlation
volume
space-time
patch
x
space-time
region
Action
detection
Video
descriptor
Video Descriptors
• On Space-Time Interest Points; Ivan
Laptev
– Local image features provide compact and
abstract representations of images, eg:
corners
– Extend the concept of a spatial corner
detector to a spatio-temporal corner detector
Space-Time Interest Points
• Consider a synthetic
sequence of a ball
moving towards a wall
and colliding with it
• An interest point is
detected at the collision
point
Space-Time Interest Points
• Consider a synthetic sequence of 2 balls moving towards
each other
coarser scale
• Different interest points are calculated at different spatial and
temporal scales
Conclusion
• The problem we are trying to solve is
similarity between images and videos.
• Descriptors provide a solution
Conclusion
• Tradeoff between keeping the geometric
structure and obtaining invariance
properties (perturbations & rotations).
• Tradeoff between preserving information
and obtaining invariance.
Thank You
Download