Texture and Shape for Image Retrieval – Multimedia Analysis and

advertisement
Texture and Shape for Image Retrieval
– Multimedia Analysis and Indexing
Winston H. Hsu
National Taiwan University, Taipei
Office: R512, CSIE Building
Communication and Multimedia Lab (通訊與多媒體實驗室)
http://www.csie.ntu.edu.tw/~winston
October 23, 2007
Outline

Texture




Statistical features
Spectral features
Edge
Shape
MMAI, Fall 07 - Winston Hsu, NTU
-2-
1
Reminder

Homework #2



Midterm




Due: TA@501 (noon, Tuesday, November 13)
Rule – “deliver quality work on time with integrity!!”
A small recap of what we mentioned (major literatures)
High-level concepts mentioned in the course
Open book (no computer) but requiring no print-out
Mailing list

http://cmlmail.csie.ntu.edu.tw/mailman/listinfo/mmai
-3-
MMAI, Fall 07 - Winston Hsu, NTU
Syllabus (tentative)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
9/25/07
10/02/07
10/09/07
10/16/07
10/23/07
10/30/07
11/06/07
11/13/07
11/20/07
11/27/07
12/04/07
12/11/07
12/18/07
12/25/07
01/01/08
01/08/08
01/15/08
01/22/08
MMAI, Fall 07 - Winston Hsu, NTU
holiday
introduction
mpeg; shot detection
cbr overview; color
texture+shape; relevance feedback
multidimensional indexing; feature reduction
midterm
gmm+cbir; svm+cbir (graphical/discriminative models)
structure discovery (sports; story)
TRECVID; concept detection; image annotation
concept detection; image annotation
un-/supervised clustering (clustering)
video retrieval
intro audio/music
holiday
project presentation #1, #2
final (no course)
project report due
-4-
2
Scenario of Content-Based Image Retrieval
retrieved images
query image
Image Database
feature
extraction


distance
metric



feature (vector) space
-5-
MMAI, Fall 07 - Winston Hsu, NTU
Fusion of Multimodal Features

How to weigh the feature significance ?



Cross-validation approach
User-selected
Automatically weighting by relevance feedback
Score ->
1
Ranking ->
0
1
0
1
Fusion approaches such as:
Sum (Borda fuse)
WtSum (weigthed Borda Fuse)
Max (Round-Robin)
0
Retrieval Results
by
Different Features
MMAI, Fall 07 - Winston Hsu, NTU
N
Normalised Results
* From Kieran Mc Donald
-6-
3
-7-
MMAI, Fall 07 - Winston Hsu, NTU
Texture

What is texture



Why texture?



Has structures or repetitious pattern, i.e., checkboard
Has statistical patterns, i.e., grass, sand, rock
Applications to satellite images, medical images
Describe contents of real world images, i.e., clouds,
fabrics, surfaces, wood, stone
Data set

e.g., Brodatz: famous texture photographs for imagetexture analysis

Man-made textures & natural objects
MMAI, Fall 07 - Winston Hsu, NTU
-8-
4
Mosaic of Brodatz Texture
MMAI, Fall 07 - Winston Hsu, NTU
-9-
Types of Computational Texture Features


Structural – describing arrangement of texture elements
Statistical – characterizing texture in terms of statistical
features





Co-occurrence matrix
Tamura (coarseness, directionality, contrast)
Multiresolution simultaneous autoregressive model (MRSAR)
Edge histogram
Spectral – based on analysis in spatial-frequency
domain





Fourier domain energy distribution
Gabor
Pyramid-structure wavelet transform (PWT)
Tree-structure wavelet transform (TWT)
Laws Filter
MMAI, Fall 07 - Winston Hsu, NTU
-10-
5
Co-occurrence Matrix




Co-occurrence matrix Cd
Specified with a displacement vector d = {(row, column)}
Entry Cd(i, j) indicates how many times a pixel with gray
level i is separated from a pixel of gray level j by the
displacement vector d
Usually use normalized version of Cd
physical meaning?

d = (1, 1)
Sometimes use symmetric version of Cd
-11-
MMAI, Fall 07 - Winston Hsu, NTU
Co-occurrence Matrix (cont.)

Examples
MMAI, Fall 07 - Winston Hsu, NTU
* From Prof. Leow Wee Kheng, NUS
-12-
6
Co-occurrence Matrix (cont.)



Consider the following example (black = 1, white = 0)
For d=(1,1), the only non-zero entries are at (0,0) and
(1,1)  captures diagonal structure
For d=(0,1), the only non-zero entries are at (0,1) and
(1,0)  captures horizontal structure
MMAI, Fall 07 - Winston Hsu, NTU
-13-
Co-occurrence Matrix (cont.)

Measures on the following features


What does it mean when entropy has the largest value as the Nd(i,j) are
equal?
A almost-obsolete feature


Not effective for classification and retrieval
Expensive to compute
MMAI, Fall 07 - Winston Hsu, NTU
-14-
7
Tamura – Selected Textual Properties
fine / coarse
high contrast / low contrast
roughness / smooth
directional / non-directional
line-like / blob-like
regular / irregular
-15-
MMAI, Fall 07 - Winston Hsu, NTU
Usefulness in Describing Texture

Psychophysical experiments – high correlation between
some groups of properties




Orientation
Line-like

Regularity


Coarseness
Contrast
Roughness
Similar correlations
Computational measures



Coarseness
Contrast
Orientation
MMAI, Fall 07 - Winston Hsu, NTU
-16-
8
Tamura – Coarseness

Goal


Pick a large size as best when coarse texture is
present, or a small size when only fine texture
Step 1: Compute averages at different scales at
every points
MMAI, Fall 07 - Winston Hsu, NTU
-17-
Tamura – Coarseness (cont.)

Step 2: compute neighborhood difference at
each scale on opposite sides of different
directions
MMAI, Fall 07 - Winston Hsu, NTU
-18-
9
Tamura – Coarseness (cont.)

Step 3: select the scale with the largest variation

Step 4: compute the coarseness
crs
-19-
MMAI, Fall 07 - Winston Hsu, NTU
Tamura – Contrast



Gaussian-like histogram distribution  low contrast
Histogram polarization. Is it Gaussian? How many peaks it has?
Where they are?
Polarization can be estimated by the kurtosis (曲率度)
MMAI, Fall 07 - Winston Hsu, NTU
-20-
10
Tamura – Contrast (cont.)
distribution with
two separate peaks
unimodal distribution

Contrast estimate is given by:
-21-
MMAI, Fall 07 - Winston Hsu, NTU
Tamura – Orientation

Building the histogram of local edges at different
orientations

By deriving the edge magnitude at X and Y directions
MMAI, Fall 07 - Winston Hsu, NTU
-22-
11
Tamura – Orientation (cont.)

Compute the estimate from the sharpness of the peaks

By summing the second moments around each peak
e.g., flat histogram
 large 2nd moment (variance)
 small orientation
-23-
MMAI, Fall 07 - Winston Hsu, NTU
(MR)SAR

Each pixel is a random variable whose value is estimated
from its neighboring pixels + noise


A kid of Markov Random Field model
SAR Model (Simultaneous Autoregressive)


[Mao’92]
Describes each pixel in terms of its neighboring pixels.
MRSAR Model (MultiResolution SAR)



Describing granularities by representing textures at variety of
resolutions
SAR
SAR applied at various image levels
Metric  parameter differences
SAR
model
parameters
SAR
MMAI, Fall 07 - Winston Hsu, NTU
input image
image pyramid
-24-
12
Edge Histogram


Edge histogram (EHD)
Captures the spatial distribution of the edge in six statues: 0º,
45º, 90º, 135º, non direction and no edge.

Utilizing the filters
90° edge


0 ° edge
45 ° edge
135 ° edge
non-directional edge
Global EHD of an image: Concatenating 16 sub EHDs into a 96 bins
Local EHD of a segment

Grouping the edge histogram of the image-blocks fallen into the segment
Macro-block
Image-block
-25-
MMAI, Fall 07 - Winston Hsu, NTU
Vector Space Concept



Orthonormal Bases (d-dim. vectors)
Any vector in a vector space can be expanded by the set
of orthonormal signals

Response for basis k,

Transform to the new bases
(1D/2D) Fourier bases are sets of orthornomal signals
MMAI, Fall 07 - Winston Hsu, NTU
-26-
13
The Fourier Transform

Represent function on a
new basis


Think of functions as
vectors, with many
components
We now apply a linear
transformation to transform
the basis



In the expression, u and v
select the basis element,
so a function of x and y
becomes a function of u
and v
basis elements have the
!i 2 " ( ux +vy )
form e
dot product with each
basis element
F( g( x, y))( u, v) = ## g( x, y)e !i2" ( ux +vy)dxdy
R2
-27-
MMAI, Fall 07 - Winston Hsu, NTU
Visual Sinus Pattern*
MMAI, Fall 07 - Winston Hsu, NTU
*The following 5 slides are from Jaap van de Loosdrecht, Noordelijke
Hogeschool Leeuwarden
-28-
14
Visual Sinus Pattern w/ Low Frequency
MMAI, Fall 07 - Winston Hsu, NTU
-29-
Sinus Pattern Rotated 45 Deg.
MMAI, Fall 07 - Winston Hsu, NTU
-30-
15
2D Sinus Pattern
-31-
MMAI, Fall 07 - Winston Hsu, NTU
2D Rectangle


Difference in spatial vs. frequency domain
1D sync function of different scales
MMAI, Fall 07 - Winston Hsu, NTU
-32-
16
Interpreting the Power Spectrum

Explain structures in power spectrum
2
3 dark
3 bright
high frequency
low frequency
DC
1
-33-
MMAI, Fall 07 - Winston Hsu, NTU
Phase and Magnitude

Fourier transform of a
real function is complex






difficult to plot, visualize
instead, we can think of the
phase and magnitude of
the transform
Phase is the phase of the
complex transform
Magnitude is the
magnitude of the complex
transform
MMAI, Fall 07 - Winston Hsu, NTU
Curious fact



all natural images have
about similar magnitude
transform
hence, phase seems to
matter, but magnitude
largely doesn’t
Same for audio?
Demonstration

Take two pictures, swap
the phase transforms,
compute the inverse - what
does the result look like?
-34-
17
MMAI, Fall 07 - Winston Hsu, NTU
-35-
This is the
magnitude
transform
of the zebra
pic
MMAI, Fall 07 - Winston Hsu, NTU
-36-
18
This is the
phase
transform
of the zebra
pic
MMAI, Fall 07 - Winston Hsu, NTU
-37-
MMAI, Fall 07 - Winston Hsu, NTU
-38-
19
This is the
magnitude
transform
of the
cheetah pic
MMAI, Fall 07 - Winston Hsu, NTU
-39-
This is the
phase
transform
of the
cheetah pic
MMAI, Fall 07 - Winston Hsu, NTU
-40-
20
Reconstruction
with zebra
phase, cheetah
magnitude
MMAI, Fall 07 - Winston Hsu, NTU
-41-
Reconstruction
with cheetah
phase, zebra
magnitude
MMAI, Fall 07 - Winston Hsu, NTU
-42-
21
Natural Images and Their FT

What happened to the FT patterns when the texture scale and
orientation are changed?
-43-
MMAI, Fall 07 - Winston Hsu, NTU
Frequency Domain Features
Fourier domain energy distribution

Angular features (directionality)

Radial features (coarseness)
FT
where,
where,
Uniform division may not be the best!!
MMAI, Fall 07 - Winston Hsu, NTU
-44-
22
Gabor Texture



Fourier coefficients depend on the entire image (Global)  we lose
spatial information
Objective: local spatial frequency analysis
Gabor kernels: looks like Fourier basis multiplied by a Gaussian






The product of a symmetric (even) Gaussian with an oriented sinusoid
Gabor filters come in pairs: symmetric and anti-symmetric (odd)
Each pair recover symmetric and anti-symmetric components in a
particular direction
(kx, ky): the spatial frequency to which the filter responds strongly
σ : the scale of the filter. When σ = infinity, similar to FT
We need to apply a number of Gabor filters are different scales,
orientations, and spatial frequencies
-45-
MMAI, Fall 07 - Winston Hsu, NTU
Example – Gabor Kernel




Zebra stripes at different scales and orientations and convolved with
the Gabor kernel
The response falls off when the stripes are larger or smaller
The response is large when the spatial frequency of the bars
roughly matches the windowed by the Gaussian in the Gabor kernel
Local spatial frequency analysis
zebra image
Gabor kernel
magnitude of
the filtered image
MMAI, Fall 07 - Winston Hsu, NTU
-46-
23
Gabor Texture (cont.)



Image I(x,y) convoluted with Gabor filters hmn
(totally M x N)
Using first and 2nd moments for each scale and
orientations
Features: e.g., 4 scales, 6 orientations
 48 dimensions
odd
even
Gabor kernels
-47-
MMAI, Fall 07 - Winston Hsu, NTU
Gabor Texture (cont.)

orientation
Arranging the mean energy in a 2D form




structured: localized pattern
oriented (or directional): column pattern
granular: row pattern
random: random pattern
MMAI, Fall 07 - Winston Hsu, NTU
scale
frequency domain
-48-
24
Laws Texture Energy Features



Non-Fourier type bases
Match better to intuitive texture features
The filter algorithm
Filter the input image using texture filters
Computer texture energy by summing the absolute
value of filtered results in local neighborhoods around
each pixel
Combine features to achieve rotational invariance



-49-
MMAI, Fall 07 - Winston Hsu, NTU
Law’s Texture Masks (1)

Basic 1D masks  can be extended to create
2D masks

L5 (Level) = [ 1 4 6 4 1 ]
(Gaussian) gives a center-weighted local average

E5 (Edge) = [ -1 -2 0 2 1 ]
(gradient) responds to row or column step edges

S5 (Spot)
= [ -1 0 2 0 -1 ]
(LoG) detects spots

R5 (Ripple) = [ 1 -4 6 -4 1 ]
(Gabor) detects ripples
MMAI, Fall 07 - Winston Hsu, NTU
-50-
25
Law’s Texture Masks (2)

Create 2D mask
E5
L5
E5L5
-51-
MMAI, Fall 07 - Winston Hsu, NTU
Laws Filters (2D)
MMAI, Fall 07 - Winston Hsu, NTU
-52-
26
Laws Process
-53-
MMAI, Fall 07 - Winston Hsu, NTU
Wavelet Features (PWT, TWT)

Wavelet



PWT: pyramid-structured wavelet transform



Decomposition of signal with a family of basis functions with
recursive filtering and sub-sampling
Each level, decomposes 2D signal into 4 subbands, LL, LH, HL,
HH (L=low, H=high)
Recursively decomposes the LL band
Feature dimension (3x3x1+1)x2 = 20
TWT: pyramid-structured wavelet transform


Some information in the middle frequency channels
Feature dimension 40x2 = 80
MMAI, Fall 07 - Winston Hsu, NTU
original image
PWT
TWT
-54-
27
Texture Comparisons

[Ma’98]
Retrieval performance of different texture features according to the
number of relevant images retrieved at various scopes using Corel
Photo galleries
# of relevant
images
MRSAR (M)
Gabor
TWT
PWT
MRSAR
Tamura (improved)
Coarseness histogram
directionality
edge histogram
Tamura
# of top matches considered
-55-
MMAI, Fall 07 - Winston Hsu, NTU
Texture Comparisons (cont.)

[Ma’98]
Retrieval performance of texture features in terms of the number of
top matches considered using Brodatz album
Running
recall
Gabor
MRSAR (M)
TWT
PWT MRSAR
Tamura (improved)
Tamura
Coarseness histogram
directionality
edge histogram
Running
# of top matches considered
MMAI, Fall 07 - Winston Hsu, NTU
-56-
28
Texture Comparisons (cont.)

Images of rock samples in applications related to oil exploitation
MMAI, Fall 07 - Winston Hsu, NTU
Texture Comparisons (cont.)

Images of rock samples in applications related to oil exploitation

[Li’00]
-57-
[Li’00]
Gabor descriptors outperform the others
MMAI, Fall 07 - Winston Hsu, NTU
-58-
29
Learned Similarity

[Ma’96]
Distance metrics DO matter



All based on
Gabor features
Euclidean vs.
learned (supervised)
distance metric
The later was
maintained with
texture thesaurus
Euclidean
distance
learned (supervised)
distance
MMAI, Fall 07 - Winston Hsu, NTU
Shape





Region-base descriptor
Contour-based Shape Descriptor
2D/3D Shape Descriptor
Some relevant ones are included in MPEG-7
Not easy to derive automatically
MMAI, Fall 07 - Winston Hsu, NTU
-59-
[Bober’01]
-60-
30
Region-based vs. Contour-based Descriptor

Columns indicate contour similarity


Outline of contours
Rows indicate region similarity

Distribution of pixels
-61-
MMAI, Fall 07 - Winston Hsu, NTU
Region-based Descriptor


Express pixel distribution within a 2D object region
Employs a complex 2D Angular Radial Transformation
(ART)





35 fields each of 4 bits
Rotational and scale invariance
Robust to some non-rigid transformation
L1 metric on transformed coefficients
Advantages




Describing complex shapes with disconnected regions
Robust to segmentation noise
Small size
Fast extraction and matching
MMAI, Fall 07 - Winston Hsu, NTU
-62-
31
Contour-based Descriptor


It’s based on Curvature (曲率) Scale-Space (CSS)
representation
Found to be superior to








Zernike moments
ART
Fourier-based
Turning angles
Wavelets
Rotational and scale invariance
Robust to some non-rigid transformations
For example



Applicable to (a)
Discriminating differences in (b)
Finding similarities in (c)-(e)
(a)
(b)
(c)
(d)
(e)
MMAI, Fall 07 - Winston Hsu, NTU
-63-
Problems in Shape-based Indexing
Many existing approaches assume
 Segmentation is given
 Human operator circle object of interest
 Lack of clutter and shadows
 Objects are rigid
 Planar (2-D) shape models
 Models are known in advance
MMAI, Fall 07 - Winston Hsu, NTU
-64-
32
Summary

Texture features



Texture computation are time-consuming




Statistical
Spectral
compressed domain features?
Shape features
Multimodal fusion are quite helpful
Next week


Efficient indexing on high-dimensional data
Feature reduction
MMAI, Fall 07 - Winston Hsu, NTU
-65-
33
Download