Measures of proximity

advertisement
Measures of Proximity
Mei-Chen Yeh
03/20/2012
Last Week
• SIFT features
…
…
…
Today
• Matching two images
f ( X, Y )  ?
Strategy 1
1. Convert the feature set to a fixed length
feature vector
2. Apply global proximity measurements

xi

xj
 
f ( xi , xi )
Strategy 1
1. Convert the feature set into a fixed-length
feature vector
– The bag-of-word representation
2. Apply global proximity measurements

xi

xj
 
f ( xi , xi )
Image
Slide credit: Prof. Fei-Fei, Li
Bag of ‘words’
Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
sensory, brain,
thought that the retinal image was transmitted
perception,
point by point visual,
to visual centers
in the brain; the
cerebral cortex
was a movie
screen,cortex,
so to speak,
retinal,
cerebral
upon which the image in the eye was projected.
eye, cell,
optical
Through the discoveries
of Hubel
and Wiesel we
now know that behind
the origin
of the visual
nerve,
image
perception in the brain there is a considerably
more complicatedHubel,
course of Wiesel
events. By following
the visual impulses along their path to the
various cell layers of the optical cortex, Hubel and
Wiesel have been able to demonstrate that the
message about the image falling on the retina
undergoes a step-wise analysis in a system of
nerve cells stored in columns. In this system each
cell has its specific function and is responsible for
a specific detail in the pattern of the retinal image.
China is forecasting a trade surplus of $90bn
(£51bn) to $100bn this year, a threefold increase
on 2004's $32bn. The Commerce Ministry said
the surplus would be created by a predicted 30%
jump in exports to $750bn, compared with a 18%
China,The
trade,
rise in imports to $660bn.
figures are likely to
further annoysurplus,
the US, which
has long argued that
commerce,
China's exports are unfairly helped by a
US,
deliberatelyexports,
undervaluedimports,
yuan. Beijing
agrees
the surplusyuan,
is too high,
but says
the yuan is only
bank,
domestic,
one factor. Bank of China governor Zhou
Xiaochuan saidforeign,
the countryincrease,
also needed to do
more to boost domestic
demand
so more goods
trade,
value
stayed within the country. China increased the
value of the yuan against the dollar by 2.1% in
July and permitted it to trade within a narrow
band, but the US wants the yuan to be allowed to
trade freely. However, Beijing has made it clear
that it will take its time and tread carefully before
allowing the yuan to rise further in value.
Bags of visual words
• Summarize entire image
based on its distribution
(histogram) of word
occurrences.
• Analogous to bag of words
representation commonly
used for documents.
Offline
feature detection
& representation
Online
codewords dictionary
image representation
Database
Relevant images
Bag of Words Representation
2.
1.
feature detection
& representation
image representation
3.
codewords dictionary
1.Feature detection and representation
1.Feature detection and representation
• Regular grid
– Vogel et al. 2003
– Fei-Fei et al. 2005
1.Feature detection and representation
• Regular grid
– Vogel et al. 2003
– Fei-Fei et al. 2005
• Interest point detector
– Csurka et al. 2004
– Fei-Fei et al. 2005
– Sivic et al. 2005
1.Feature detection and representation
• Regular grid
– Vogel et al. 2003
– Fei-Fei et al. 2005
• Interest point detector
– Csurka et al. 2004
– Fei-Fei et al. 2005
– Sivic et al. 2005
• Other methods
– Random sampling (Ullman et al. 2002)
– Segmentation based patches (Barnard et al. 2003)
1.Feature detection and representation
…
Compute SIFT
descriptor
[Lowe’99]
Detect patches
[Mikojaczyk and Schmid ’02]
[Matas et al. ’02]
[Sivic et al. ’03]
Slide credit: Josef Sivic
2. Codewords dictionary formation
…
128-d space
Slide credit: Josef Sivic
2. Codewords dictionary formation
…
Vector quantization,
clustering
128-d space
Slide credit: Josef Sivic
2. Codewords dictionary formation
…
Vector quantization,
clustering
128-d space
Slide credit: Josef Sivic
Clustering and vector quantization
• Clustering is a common method for learning a
visual vocabulary or codebook
• Unsupervised learning process
• Each cluster center produced by a clustering approach
becomes a codevector
• Codebook can be learned on separate training set
• Provided the training set is sufficiently representative, the
codebook will be “universal”
• The codebook is used for quantizing features
• A vector quantizer takes a feature vector and maps it to the
index of the nearest codevector in a codebook
• Codebook = visual vocabulary
• Codevector = visual word
Slide credit: Prof. Lazebnik
Dictionary formation
Input: Number of codewords
Example: 2-D samples
2 codewords
4 codewords
6 codewords
Dictionary formation
Example: categorize 2-d data into 3 clusters
2.5
2.5
2
2
1.5
1.5
y
3
y
3
1
1
0.5
0.5
0
0
-2
-1.5
-1
-0.5
0
0.5
1
x
Good Clustering
1.5
2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
x
Sub-optimal Clustering
Dictionary formation
• The Linde-Buzo-Gray (LBG) algorithm
– also known as the generalized Lloyd algorithm
– also known as the k-means algorithm
k-means Clustering
Each point is a SIFT feature in db
k-means Clustering
• Find k reference vectors (codewords) which best represent
data
• Reference vectors, mj, j =1,...,k
• Find the nearest (most similar) reference:
x t  mi  min x t  m j
j
• Compute the reconstruction error


E mi i 1 X   t  b x  mi
k
t
i
i
t
2
1 if xt  mi  min xt  m j
j
bit  
0 otherwise
Lecture Notes for E Alpaydın 2010
Introduction to Machine Learning 2e © The
MIT Press (V1.0)
k-means Clustering
Lecture Notes for E Alpaydın 2010
Introduction to Machine Learning 2e © The
MIT Press (V1.0)
Lecture Notes for E Alpaydın 2010
Introduction to Machine Learning 2e © The
MIT Press (V1.0)
k-means Clustering
• Disadvantage:
– A local search procedure
– The final mi highly depend on the initial mi
• The methods to initial mi
– Randomly select k instances
– Calculate the mean of all data and add small random
vectors
– Calculate the principal component partitioning the data
into k groups, and then take the means of these groups
Lecture Notes for E Alpaydın 2010
Introduction to Machine Learning 2e © The
MIT Press (V1.0)
Applying k-means on SIFT
…
Appearance codebook
Source: B. Leibe
Another codebook
…
…
…
…
…
Appearance codebook
Source: B. Leibe
Image patch examples of codewords
Sivic et al. 2005
frequency
3. Image representation
…..
codewords
Bags-of-words for
content-based image
retrieval: Video Google
Slide from Andrew Zisserman
Sivic & Zisserman, ICCV 2003
Video Google (cont.)
Sivic & Zisserman, ICCV 2003
• Demo online at :
Retrieved frames
1. Collect all words within
query region
2. Inverted file index to find
relevant frames
3. Compare word counts
4. Spatial verification
Query
region
http://www.robots.ox.ac.uk/~vgg/research/vgoo
gle/index.html
34
Summary of bag-of-words
• Convert a set of local features to a fixed length
feature vector
• Procedure
1. Detect local features
2. Build a visual vocabulary from a collection of
images
3. Generate the bag-of-word representation
Bags of words: pros and cons
+
+
+
+
flexible to geometry / deformations / viewpoint
compact summary of image content
provides vector representation for sets
very good results in practice
- basic model ignores geometry – must verify afterwards,
or encode via features
- background and foreground mixed when bag covers
whole image
- optimal vocabulary formation remains unclear
Slide credit: Prof. Grauman
Visual vocabularies: Issues
• How to choose vocabulary size?
– Too small: visual words not representative of all
patches
– Too large: quantization artifacts, overfitting
• Computational efficiency
– Vocabulary trees
(Nister & Stewenius, CVPR 2006)
Resources
• http://people.csail.mit.edu/fergus/iccv2005/b
agwords.html (MATLAB code)
• OpenCV
– BasicBOWTrainer
– BOWGenerator
Strategy 1
1. Convert the feature set into a fixed-length
feature vector
– The bag-of-word representation
2. Apply global proximity measurements

xi

xj
 
f ( xi , xi )
Pang-Ning Tan, Michael Steinbach, and Vipin
Kumar, Introduction to Data Mining, Addison
Wesley, 2005. (Chapter 2)
Measurement
• Proximity: is used to refer to either similarity
or dissimilarity
?

xi
 
f ( xi , xi )

xj
– Sparse or dense
– Dimensionality (# of attributes)
– Distribution, data range, and more…
Examples:
Bag-of-words
Pixel intensities
Color histograms
Definitions
• Similarity
– The degree to which two images are alike
• Dissimilarity
– The degree to which two images are different
– Distance is used to refer to a special class of
dissimilarities
• Non-negative and often fall in [0, 1]
Transformations
• Similarities ↔ dissimilarities d = 0, 1, 10, 100
1
s
d 1
s  ed
d  min_ d
s  1
max_ d  min_ d
1, 0.5, 0.09, 0.01
1, 0.37, 0, 0
1, 0.99, 0.9, 0
Popular Metrics
• The Minkowski distance
1/ r

r
d (x, y )    xk  yk 
 k 1

n
– r = 1. City block (Manhattan, taxicab, L1 norm)
distance.
– r = 2. Euclidean distance (L2 norm).
– r = ∞. Supremum (Lmax or L∞ norm) distance.
• Chi-squared statistics
χ2
2
n
(
x

y
)
1
 2 (x, y )   i i
2 i 1 xi  yi
Metric Properties
• Positivity
– d(x, y) ≥ 0 for all x and y,
– d(x, y) = 0 only if x = y.
• Symmetry
– d(x, y) = d(y, x) for all x and y.
• Triangle Inequality
– d(x, z) ≤ d(x, y) + d(y, z) for all x, y, and z.
Measures that satisfy all three properties are known as metrics.
Non-metric measures?
Non-metric Dissimilarities (1)
• Example: Set Differences
– Given two sets A and B
How can it be modified to
– d(A, B) = size(A - B)
hold the properties?
– A = {1, 2, 3, 4}
d(A, B) = size(A - B) + size(B - A)
– B = {2, 3, 4}
– d(A, B) = 1
– d(B, A) = 0
– symmetry, triangle inequality
Non-metric Dissimilarities (2)
• Example: Time
if t1  t2
 t2 - t1
d (t1 , t2 )  
24  t2 - t1 if t1  t2
– d(1pm, 2pm) = 1 hour
– d(2pm, 1pm) = 23 hours
Similarities (1)
• Properties
– Positivity
– Symmetry
– The triangle inequality typically does not hold.
Similarities (2)
• Cosine Similarity
xy
cos( x, y ) 

x y

n
x/||x||
y/||y||
x yi
i 1 i

n
2
i 1 i
x
2
y
i1 i
n
The cosine similarity does not take the magnitude of the
two data objects into account when computing similarity.
Similarities (3)
• The Tanimoto coefficient (to handle
asymmetric attributes)
EJ (x, y ) 
xy
x  y  xy
2
2
• The histogram intersection
n
K int (x, y )   min( xi , yi )
i 1
Similarities (4)
• Correlation
s xy
cov( x, y)
corr (x, y ) 

std (x) * std (y ) s x s y
1 n
s xy 
( xk  x )( yk  y )

n  1 k 1
1 n
2
sx 
(
x

x
)

k
n  1 k 1
1 n
2
sy 
(
y

y
)
 k
n  1 k 1
Issues in Proximity Calculation
• Attributes have different scales.
• Attributes are correlated.
• Different types of attributes exist (e.g.,
quantitative and qualitative).
• Attributes have different weights.
A Generalization
• The Mahalanobis distance
mahalanobis(x, y )  (x  y ) 1 (x  y )T
 1 : the inverse of the covariance matrix of the data
Similarities of heterogeneous objects
1. For the kth attribute, compute sk(x, y) in the
range [0, 1].
2. Define an indicator variable, δk as follows:
0 if one of the object has a missiong value for the kth attributes
k  
 1 otherwise
3. Compute the overall similarity

similarity
similarity(x(,xy, )y )
n n
ws s(x(x, y, y) )
k k11 k k k k k
n
k 1 k


Weighted Distances
• The extended Minkowski distance
1/ r 1/ r
r r
k k k

 
d (x, y )    w
xk 
x y y 
 k 1
 
n
Summary
• The proximity measure should fit the data
representation
– Example: Use Euclidean distance for dense,
continuous data
– Example: Ignore 0-0 matches for sparse data
• Data normalization is important for obtaining
a proper proximity measure
Strategy 2
1. Build feature point correspondence
2. Compute a score from the correspondence
K. Grauman and T. Darrell. The Pyramid Match
Kernel: Discriminative Classification with Sets of
Image Features. IEEE ICCV, 2005.
Problem
• How to measure the proximity between two
sets of features?
– Each instance is unordered set of vectors
– Varying number of vectors per instance
Slide credits: Prof. Grauman
Existing method (1)
• Fit (parametric) model to each set, compare
with distance over models
GMM1
GMM2
Restrictive assumptions!
High complexity!
Existing method (2)
• Compute pair-wise similarity between all
vectors in each set
mxn
Ignoring set statistics!
High complexity!
Partial matching for sets of features
Compare sets by computing
a partial matching between
their features.
Robust to clutter, occlusion…
Formulation
• Node ~ Local feature
• Edge ~ Patch similarity
• Image matching →
bipartite graph matching
Optimal match
X
Y
• Find the maximum weight
matching in a bipartite graph
Effective but slow!
Pyramid match
Optimal match: O(m3)
Pyramid match: O(mL)
optimal partial
matching
Pyramid match overview
• Pyramid match kernel measures similarity of a
partial matching between two sets:
– Place multi-dimensional, multi-resolution grid
over point sets
– Consider points matched at finest resolution
where they fall into same grid cell
No explicit search for matches!
Pyramid match kernel
Number of newly matched
pairs at level i
Approximate
partial match
similarity
Measure of difficulty
of a match at level i
Feature extraction
,
Histogram pyramid: level i
has bins of size 2i
Counting matches
Histogram
intersection
Counting new matches
Histogram
intersection
matches at this level
matches at previous level
Difference in histogram intersections across
levels counts number of new pairs matched
Pyramid match kernel
histogram pyramids
number of newly matched pairs at level i
measure of difficulty of a
match at level i
Weights are inversely proportional to bin size!
Efficiency
For sets with m features of dimension d, and
pyramids with L levels, computational
complexity of
Pyramid match kernel:
Existing set kernel approaches:
or
Example pyramid match
Level 0 (fine level)
Example pyramid match
Level 1 (coarser level)
Example pyramid match
Level 2 (coarser level)
FAST!
Pyramid Match
≈
Optimal Match
SLOW!
Object recognition results
• ETH-80 database: 8 object classes
• Features:
– Harris detector
– PCA-SIFT descriptor, d=10
Kernel
Complexity
Recognition rate
Match [Wallraven et al.]
84%
Bhattacharyya affinity
[Kondor & Jebara]
85%
Pyramid match
84%
Eichhorn and Chapelle 2004
Summary: Pyramid match kernel
optimal partial
matching between
sets of features
difficulty of a match at level i
number of new matches at level i
Summary: Pyramid match kernel
• A similarity measure based on implicit
correspondences that approximates the
optimal partial matching
– linear time complexity
– model-free
– insensitive to clutter
– fast, effective for object retrieval and recognition
Disadvantages?
• Place a grid to quantize the feature space: not
effective for high-dimensional space
• The spatial arrangement of features is still
ignored: may produce geometrically unfaithful
matching
S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of
Features: Spatial Pyramid Matching for Recognizing
Natural Scene Categories. IEEE CVPR, 2006.
BoW issue: No spatial layout preserved!
Slide credits: Prof. S. Lazebnik
Spatial pyramid match
• Extension of bag-of-words: Make a pyramid of bagof-words histograms
• Locally orderless representation at several levels of
resolution
Spatial pyramid representation
level 0
Lazebnik, Schmid & Ponce (CVPR 2006)
Spatial pyramid representation
level 0
level 1
Lazebnik, Schmid & Ponce (CVPR 2006)
Spatial pyramid representation
level 0
level 1
Lazebnik, Schmid & Ponce (CVPR 2006)
level 2
Spatial pyramid match
• Based on pyramid match kernels
– PM: build pyramid in feature space, discard spatial
information
– SPM: build pyramid in image space
Sum over PMKs computed
in image coordinate space
Spatial pyramid match
Example: 200 visual words
d = 200
d = 800
4200 features
d = 3200
Spatial pyramid match
• Can capture scene categories well---texture-like patterns but
with some variability in the positions of all the local pieces.
Spatial pyramid match
• Sensitive to global shifts of the view
Difficult categories
Easy categories
Resources
• A Pyramid Match Toolkit
– http://people.csail.mit.edu/jjl/libpmk/
• Spatial Pyramid Match
– http://www.cs.unc.edu/~lazebnik/research/Spatia
lPyramid.zip (updated 2/29/2012)
Download