INF 386, V-2003 Why “CBIR” ? Selected Themes from Digital Image Analysis

advertisement
INF 386, V-2003
Selected Themes
from Digital Image Analysis
Lecture 7
09.04.2003
Color and Texture based
Image Retrieval
Why “CBIR” ?
• Large databases of digital images are
accessible.
— high volumes produced by scanners and
digital cameras
— larger storage capacities for lower costs
— easy access to emormous image
volumes via internet
• Manual indexing by keywords is
— very time consuming
Fritz Albregtsen
— unrewarding
— unlikely to specify all aspects of image
Department of Informatics
University of Oslo
INF 386, 2003, Lecture 1, page 1 of 24
INF 386, 2003, Lecture 1, page 2 of 24
Some systems - so far
• QBIC (Nieblack et al. (IBM))
Querying images by content, using color, texture
and shape
SPIE Conf. on Storage and Retrieval for Image and
Video Databases, 1993.
• Chabot
Retrieval from a relational database of images
Computer, 1995.
• VIRAGE (Bach et al.)
Virage image search engine: an open framework for
image management
SPIE Conf. on Storage and Retrieval for Image and
Video Databases, 1996.
• Photobook (Pentland et al.)
Photobook: content-based manipulation of image
databases
Int. Journ. of Computer Vision, 1996.
• Excalibur (Feder)
Image recognition and content-based retrieval for
the world wide web
Advanced Imaging, 1996.
Query characteristics
• Queries formulated by combinations of
low-level image features such as color,
texture and shape.
• Specified explicitely by feature values
or by feature range.
• Implicite specification by example.
• Spatial organization of features,
giving absolute or relative location.
• Relevance feedback:
allow user to refine search by
indicating relevance of returned images.
• VisualSEEK (Smith)
Integrated Spatial and Feature Image Systems:
Retrieval, Analysis and Compression
PhD Thesis, Columbia University, 1997.
INF 386, 2003, Lecture 1, page 3 of 24
INF 386, 2003, Lecture 1, page 4 of 24
Selecting features
Problems
• What features will generally describe the
content of an image well?
• How to summarize the distribution of
these features over an image?
• How to measure the dissimilarity between
distributions of features?
• How to effectively display the results of a
search?
• How to browse images of a database in an
intuitive and efficient way?
• The focus is often on color.
• The distribution of colors within the image is often a
useful clue to the content of the image.
• Absolute or relative locations of different color
distributions improve result.
• One has to select some color representation
— color space (e.g. RGB, IHS, Lab, ...)
— representation of distribution
• While color is a single-pixel property, texture
describes the appearance of bigger regions.
— Statistical methods
— Structural methods
— MRF methods
— Filter-based methods
• For both color and texture, one has to select features
that relate to perceptual similarity.
INF 386, 2003, Lecture 1, page 5 of 24
INF 386, 2003, Lecture 1, page 6 of 24
Describing distributions
• Describing CBIR features has to do with
— Perceptual significance
— Invariance (several aspects)
— Efficiency
• Distributions should be described in a way
that reflects a human’s appreciation of
similarities and differences.
• At the same time, the distributions should
be represented by a small data set, for
efficiency, but rich enough to reproduce
the essential information.
• Histograms may use regular or adaptive
binning, using a prototype determined by
e.g. vector quantization.
• Signatures represent a set of feature
clusters, each represented by mean (or
mode) and the fraction of pixels that
belong to the cluster. Clusters in signatures
are defined for each image.
• A histogram can be seen as a signature,
where each bin is a cluster.
INF 386, 2003, Lecture 1, page 7 of 24
Distances and metrics
• A space is called a metric space if for any of
its two elements x amd y, there is a number
ρ(x, y), called the distance, that satisfies the
following properties
— ρ(x, y) ≥ 0 (non-negativity)
— ρ(x, y) = 0 if and only if x = y (identity)
— ρ(x, y) = ρ(y, x) (symmetry)
— ρ(x, z) ≤ ρ(x, y) + ρ(y, z) (∆ inequality)
• Distances between two points x and µ
in n-dimensional space
1) Euclidian
DE (x, µ) =k x − µ k=
"
n
X
k=1
(xk − µk )2
#1/2
2) “City block”/”Taxi”/ “Absolute value”
n
X
|xk − µk |
D4(x, µ) =
k=1
3) “Chessboard”/”Maximum value”
D8(x, µ) = max |xk − µk |
INF 386, 2003, Lecture 1, page 8 of 24
Bin-by-bin dissimilarity
The distance between two distributions.
Useful when comparing e.g. histograms
in image search and retrieval.
• Minkowski distance:
dLp (H, K) =
X
i
|hi − ki|p
!1/p
L1 often used to compute dissimilarity
between color images.
L2 and L∞ often used for texture
dissimilarity.
L1-based retrieval may give many false
negatives, as neighboring bins are not
considered.
• Histogram intersection:
P
min(hi, ki)
d∩ = 1 − i P
i ki
Attractive because it handles partial
matches when area of one histogram is
smaller than the other.
When areas are equal, it is equaivalent to
normalized L1 distance.
INF 386, 2003, Lecture 1, page 9 of 24
Bin-by-bin dissimilarity -II
• Kullback-Leibner divergence:
X
hi
hi log
dKL(H, K) =
ki
i
Measures how inefficient it would be to
code one histogram using the other as
code-book.
Non-symmetric, and sensitive to binning.
• Jeffrey divergence:
X
hi
ki
hi log
+ ki log
dJ (H, K) =
m
mi
i
i
mi = (hi + ki)/2
Is a modification of K-L; symmetric and
more robust to noise and binning.
• χ2 statistics:
dχ2 (H, K) =
X (hi − mi)2
i
mi
Measures how unlikely it is that one
distribution was drawn from the
population represented by the other.
INF 386, 2003, Lecture 1, page 10 of 24
Cross-bin measures
Drawbacks of bin-by-bin
• Compares contents of corresponding
histogram bins hi and ki for all i, but not hi
and kj for i 6= j
• K-L is justified by information theory, and
χ2 by statistics, but they do not necessarily
match perceptual similarity well.
• This can be fixed by using
correspondences between bins, and the
cross-bin distance..
• Bin-by-bin is sensitive to bin size.
Coarse binning may not give sufficient
discrimination. Too fine binning may
place similar features in different bins.
• Cross-bin dissimilarity measures always
yield better results when bins get smaller.
• We need a cross-bin distance.
Cross-bin distances use the ground
distance dij , def. as the distance between
the representative features for bin i and
bin j.
• Quadratic-form distance
q
dA(H, K) = (h − k)T A(h − k)
where h and k are vectors listing all the
entries in H and K. This is used for color in
QBIC.
• Cross-bin information comes in via a
similarity matrix
A = [aij ]
where
dij
dmax
With this choice, it can be shown that A is a
metric.
aij = 1 −
• Quadratic-form distance may give false
positives, as it will overestimate similarity
of (color) distributions without a
pronounced mode.
INF 386, 2003, Lecture 1, page 11 of 24
INF 386, 2003, Lecture 1, page 12 of 24
Cross-bin measures - II
• 1-D match distance
dM (H, K) =
Cross-bin measures - III
X
i
|ĥi − k̂i|
P
where ĥi = j≤i hj is the cumulative
histogram of {hi}, and similarly for {ki}.
• The match distance is the L1 distance
between the cumulative histograms.
• For histograms having equal areas, this is a
special case of the EMD (later).
• The 1-D match distance does not extend to
higher dimensions, because the ralation
j ≤ i is not a total ordering in more than
one dimension.
• Match distance may be extended to
multi-dimensional histograms by graph
matching.
INF 386, 2003, Lecture 1, page 13 of 24
• Kolmogorov-Smirnov statistics
dKS (H, K) = max(|ĥi − k̂i|)
i
where ĥi and {ki} are cumulative
histograms.
• K-S statistics is defined on the cumulative
distributions, so that no binning is actually
required.
• Under the null hypothesis (data drawn
from same distribution), the distribution
of the statistics can be calculated, giving
the significance of the result.
• Similar to match distance, it is defined only
for one dimension.
INF 386, 2003, Lecture 1, page 14 of 24
Special case of EMD
Earth Mover’s Distance (EMD)
• One of several measures of the minimum cost of
matching elements between two histograms.
• Given two distributions, one seen as piles of earth in
feature space, the other as a collection of holes in
the same space, we need to solve the transportation
problem, finding the least amount of work needed
to fill the holes with earth.
• The Monge-Kantorowitch mass transfer problem
(1781). This distance first used in computer vision
by Werman, Peleg and Rosenfeld 1985.
• EMD applies to histograms and signatures in any
dimensions.
• It allows for partial matches.
• Generally: Solve linear optimization problem.
• If ground distance is a metric and total weights of
signatures is equal, the EMD is a true metric.
INF 386, 2003, Lecture 1, page 15 of 24
• Minimum cost distance between two
one-dimensional distributions f (t) and g(t) is the L1
distance between the cumulative distribution
functions
Z ∞ Z ∞
Z ∞
f (t)dt −
g(t)dt dx
−∞
−∞
−∞
• If feature space is one-dimensional, ground distance
is d(pi, qj ) = |pi − qj |, and the total weights of the two
signatures are equal:
ψ(P, Q) =
m+n−1
X
k=1
|p̂k − q̂k |(rk+1 − rk )
where r1 , r2 , ..., rm+n is the sorted list
p1 , p2 , ..., pm, q1 , q2 , ..., qn, and
p̂k =
m
X
i=1
[Pi ≤ rk ] wpi , q̂k =
n
X
j=1
[qj ≤ rk ] wqj
where [·] is 1 when its argument is true, and 0
otherwise.
Here P = {(p1, wp1), ..., (pm, wpm} is the first signature
with m clusters, where pi is the cluster representative
and wpi is the weight of the cluster;
Q = {(q1, wq1), ..., (qn , wqn} is the second signature
with n clusters;
and D = [di,j ] is a ground distance matrix where di,j
is the ground distance between clusters pi and qj .
INF 386, 2003, Lecture 1, page 16 of 24
Color Spaces
• RGB
Based on adding the three primaries.
Commonly used in CRT displays.
• CMY(K)
Subtractive system commonly used in
printing.
Black is redundant, and used only for
technical reasons.
• HSL and HSB
Intuitive spaces. Allow users to specify
colors easily
Separates luminance, which is good for
coding and transmission.
• YIQ, YUV, YCbCr
Used for different standards of TV
transmission
(NTSC, PAL, digital TV).
• Opponent Colors
Used for modelling color perception.
Based on the fact that some pairs of hues
cannot coexist in a single color sensation.
INF 386, 2003, Lecture 1, page 17 of 24
Perceptally Uniform Spaces
• For image retrieval, it is important to measure
differences between colors so that it matches
perceptual similarity.
• This is simplified by perceptually uniform color
spaces.
• In 1976, CIE standardized two such color spaces,
Luv and Lab.
• L defines luminance, a, b and u, v define the
chrominance.
• Both spaces are defined with respect to the CIE
XY Z color space, using a reference white [XnYn Zn]
• XYZ is related to RGB by linear conversions
 
 
R
0.412453 0.357580 0.180423
X
 
 

 Y  =  0.212671 0.715160 0.072169   G 
B
0.019334 0.119193 0.950227
Z

and


 
X
3.240479 −1.537150 −0.498535
R


  
 G  =  −0.969256 1.875992 0.041556   Y 
Z
0.055648 −0.204043 1.057311
B

INF 386, 2003, Lecture 1, page 18 of 24
Texture-Based Image Similarity
Luminance and Chrominance
• Luminance is the same for both spaces:
(
116(Y /Yn)1/3 − 16 if Y /Yn > 0.008856
∗
L =
903.3(Y /Yn)
otherwise
• The chrominances are different:
— CIE Luv:
0
0
u∗ = 13L∗(u − un )
0
0
v ∗ = 13L∗(v − vn )
where
4X
u =
X + 15Y + 3Z
9Y
0
v =
X + 15Y + 3Z
0
— CIE Lab:
a∗ = 500(f (X/Xn) − f (Y /Yn))
b∗ = 200(f (y/Yn) − f (Z/Zn))
where
f (t) =
(
if Y /Yn > 0.008856
t1/3
7.787t + 16/116 otherwise
• Any set of texture features may be used.
• A texture may be represented by a vector of
values, each corresponding to the energy
in a specific scale and orientation
subband.
• Spectral decomposition methods include
— Quadrature filters (Knutsson and
Granlund 1983)
— Gabor filters (Farrokhnia and Jain 1991)
— Oriented pyramids of derivatives of a
Gaussian (Perona 1991)
— Various wavelets (Daubechies 1991)
• The size of the texel may be larger than the
support of the filter.
• Texture regions may be inhomogeneous
because of foreshortening and variations
in illumination.
• Natural textures are regular only in a
statistical sense.
• Images often contain several textures.
INF 386, 2003, Lecture 1, page 19 of 24
INF 386, 2003, Lecture 1, page 20 of 24
Comparing Dissimilarity
Texture Image Preprocessing
• The size of the support of the filters may be a
problem:
— They may be large enough to straddle boundaries
between different textures.
This will give a mixed description.
— They may be too small to see enough of the
texture to give a reliable description.
This gives descriptions of components of the
texture.
• Textures exhibit variations.
• It is preferable to sift and summarize , before using
texture features to compute texture signatures
— eliminate vectors that describe mixtures
— average away variations between adjacent
descriptors of similar patches.
• This should be done in texture vector space, not in
intensity image.
• Requires generalization of gradient to a vector
function, using texture contrast to define significant
regions as regions where the contrast is low after a
few iterations of smoothing.
• Only texture features that are in significant regions
are used to compute the texture signature.
• A meaningful quality measure must be
defined.
• Image retrieval is measured by precision,
which is the number of relevant images
retrieved relative to the number of
retrieved images, and recall, which is the
number of relevant images retrieved,
relative to the total number of relevant
images in the database.
• The relative importance of good recall vs.
good precision differs according to the task
at hand.
• Performance comparisons should account
for the variety of parameters that can affect
the behaviour of each measure used.
• Difference between feature-by-feature
approach, and a systems approach.
• Processing steps that affect performance
independently should be evaluated
separately, to lower complexity and
heighten insight.
• Ground truth should be available.
INF 386, 2003, Lecture 1, page 21 of 24
INF 386, 2003, Lecture 1, page 22 of 24
Multidimensional Scaling
Some general results
• Bin-by-bin dissimilarity measures improve
by increasing number of bins up to a point,
then performance degrades.
• Cross-bin dissimilarity measures perform
better.
• Signatures carry less information than
histograms, but perform better.
• Jeffrey divergence and χ2 statistics give
almost identical results.
• In color space, the L2 distance, by
construction, matches the perceptual
similarity between colors.
• In histogram space, L1 is better than L2,
which is better than L∞
INF 386, 2003, Lecture 1, page 23 of 24
• Given a set of n images and the
dissimilarities δij between them, the MDS
technique computes a configuration of
points {pi} in a low-dimensional Euclidian
space Rd so that the Euclidian distances
dij = ||pi − pj || between the points in Rd
match the original dissimilarities δij
between the corresponding images as well
as possible.
• Kruskal (1964) formulated a minimization
of
"P
#
2 1/2
(f
(δ
)
−
d
)
ij
ij
i,j
P 2
Ψ=
i,j dij
• There are two types of MDS:
— In the metric MDS, f (δij ) is a monotonic,
metric-preserving function.
— In non-metric MDS, f (δij ) is a weakly
monotonic transformation that only
preserves the rank-ordering of the δij ’s.
INF 386, 2003, Lecture 1, page 24 of 24
Download