Document 13510802

advertisement
Overview
•Importance of Features
•Mathematical Notation & Background
•Fourier Transform
•Windowed Fourier Transform
•Wavelets
•Feature Anecdote
•Literature & Homework
Fall 2004
Pattern Recognition for Vision
General Remarks
Classifier
Feature vector (x1, x2 ,…, xn)
Feature Extraction
Photograph by MIT OCW.
“The choice of features is more important
than the choice of the classifier.”
Fall 2004
Pattern Recognition for Vision
General Remarks—Application specific
Traffic sign recognition
Color, shape (Hough transform)
Texture Recognition
DFT, WT
Action Recognition
Motion-based features
Fall 2004
Pattern Recognition for Vision
General Remarks
Is the choice of features really more important
than the choice of the classifier?
We know that a fly uses optical
flow features for navigation.
Still, technical systems using optical flow
for navigation are (far) behind
capabilities of a fly.
Fall 2004
Pattern Recognition for Vision
General Remarks—why talk about FT, WFT, WT?
Fourier Transform(FT), Windowed FT (WFT) and Wavelet
Transform (WT)
•used in many computer vision applications •derivation from signal processing
•basic tools for engineers
Other features:
color, motion features (optical flow),
gradient features, SIFT (orientation histograms),
affine invariant features, steerable filters (overcomplete wavelets),
…
Fall 2004
Pattern Recognition for Vision
Background—Notation
Z, R, C
H, L, L (R )
2
f
function spaces
Norm
a + jb = a 2 + b 2
Absolute value
f ,g
Inner product
f (t )
Complex conjugate
fˆ (ω )
Fourier transform
cos 2πω t + j sin 2πω t = e j 2πω t
Fall 2004
integers, real, complex
Euler formula
Pattern Recognition for Vision
Background—Vector and Function Spaces
Vectors and Functions
u = [u1 ,..., u N ]
T
f (n), n ∈ Z
f (t ), t ∈ R
Fall 2004
Pattern Recognition for Vision
Background—Inner Product&Norm
Inner Product&Norm
N
u, v ≡ ∑ un vn
u ≡ u, u
2
n =1
2
L norm: u
2
L2
= ∑ un
2
n
∞
f ( n), g ( n) ≡ ∑ f (n) g (n)
f (n) ≡ f (n), f (n)
2
−∞
f (t ), g (t ) ≡
∞
∫
f (t ) g (t ) dt
f (t ) ≡ f (t ), f (t )
2
−∞
Fall 2004
Pattern Recognition for Vision
Background—Function Spaces
Inner Product cont.
f (t ) > 0 for all f ∈ H, f ≠ 0
Positivity:
f , g = g, f
Hermiticity:
Linearity:
f , cg + h = c f , g + f , h for f , g , h ∈ H, c ∈C
Triangle & Schwarz inequality
f +g ≤ f + g ,
f,g ≤ f
g for all f , g ∈ H
L2 (R ) Function space, finite energy
{
H ≡ f : R → C, f
Fall 2004
2
}
≡ ∫ f (t ) dt < ∞
2
Pattern Recognition for Vision
Background—Basis
Basis of a Vector and Function Space
N
N
b
,...,
b
is
a
basis
of
C
if
∀
u
∈
C
{1
N}
N
u = ∑ un b n ,
n =1
{u1 ,..., uN } is unique, un =
{ f1 ,..., f N } is a basis of H if ∀ g ∈
N
g (t ) = ∑ cn f n (t ),
n =1
bn , u
H
{cn ,..., cN } is unique, cn =
f n (t ), g (t )
g (t ) = ∫ g� (ω ) fω (t ) dω , g� (ω ) is unique, g� (ω ) = f ω (t ), g (t )
Orthonormal Basis
fω , fω ′
Fall 2004
0 ω ≠ ω ′
=
1 ω = ω ′
f ω = fω
g� (ω ) = fω , g
Pattern Recognition for Vision
Fourier Series—Continuous Signal
Continuous, periodic signal
1.5
T /2
ck =
1
0.5
∫
f (t )e
− j 2π kt
T
dt
−T / 2
0
sk = e
-0.5
-1
-15
-10
-5
0
5
10
T
f (t ) periodic with period T
f (t ) ∈ L2 ([ −T / 2, T / 2])
Fall 2004
15
j 2π kt
T
ck = sk , f
, sk , sl
T
= T δ kl
2
T
∞
, f (t ) T
1
f (t ) = ∑ ck e
T k =−∞
1 ∞
= ∑ ck
T k =−∞
2
j 2π kt
T
Pattern Recognition for Vision
Fourier Series—Examples
1
1
0.8
0.9
0.6
0.8
0.4
0.7
0.2
0.6
0
0
0.1
0.2
0.3
0.4
0.5
0.5
-0.2
0.4
-0.4
0.3
-0.6
0.2
-0.8
0.1
-1
-8
-6
-4
0
-2
0.6
0.7
0.8
0.9
1
−1/ T
0
2
4
6
8
T
2
1/ T
1
1.8
0.8
1.6
τ
1.4
1.2
0.6
0.4
1/ T
1
0.2
0.8
0.6
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.4
0
-5
1/ τ
-0.2
0.2
-4
-3
-2
-1
0
1
2
3
4
5
-0.4
-4
-3
-2
-1
0
1
2
3
4
T
Fall 2004
Pattern Recognition for Vision
Fourier Transform
f(x)
Continuous signals, f (t ) ∈ L2 (R )
T →∞
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
1
0.3
0.5
0
0.3
0.5
0
0.1
0.2
0.3
0.4
0.2
0.2
0.1
0.1
0
-2
-1.5
-1
-0.5
0
L2 ([ −T / 2, T / 2]) → L2 (R )
0
x
ωk = k / T
0.6
0.7
0.5
T /2
ck =
∫
0.8
1
0.9
1
1.5
1
∆ω = ωk +1 − ωk =
T
2
f (t )e − j 2πωk t dt → fˆ (ω ) =
∞
∫
f (t )e− j 2πωt dt
−∞
−T / 2
j 2π kt
∞
1 ∞
f (t ) = ∑ ck e T = ∑ ck e j 2πωk ∆ωk
T k =−∞
k =−∞
T → ∞ f (t ) =
∞
∫
fˆ (ω )e j 2πωt d ω
−∞
Fall 2004
Pattern Recognition for Vision
Fourier Transform
Another look at the inverse Fourier transform
∞
∞
 j 2πω t
'
'
− j 2πω t '
f (t ) = ∫  ∫ f (t )e
dt  e
dω
−∞  −∞

=
∞ ∞
∫∫
−∞ −∞
Fall 2004
f (t ' )e − j 2πω (t −t ) dω dt ' =
'
∞
∫
f (t ' )δ (t − t ' ) dt '
−∞
Pattern Recognition for Vision
Fourier Transform—Properties
Linearity:
A f (t ) + B g (t )
Shift:
f (t − t 0 )
∞
Convolution:
∫
A fˆ (ω ) + Bgˆ (ω )
− j 2πωt 0 ˆ
e
f (ω )
fˆ (ω ) gˆ (ω )
f (t ' ) g ( t − t ' ) d t '
−∞
Derivative:
d f (t )
dt
j 2πω fˆ (ω )
f ( At )
1 ˆ ω 
f 
A  A
Scaling:
∞
Correlation :
∫
f (t ' ) g (t + t ' )dt '
fˆ (ω ) gˆ (ω )
−∞
∞
Autocorrelation:
∫
f (t ) f (t + t )dt
'
'
'
fˆ (ω )
2
−∞
Fall 2004
Pattern Recognition for Vision
Fourier Transform—Plancherel&Parseval
Plancherel’s Theorem
∞
∫
f dt =
2
−∞
∞
∫
2
fˆ dω
−∞
f
2
= fˆ
2
Parseval Identity
∞
∫
−∞
f g dt =
∞
∫
fˆ gˆ dω
−∞
f , g = fˆ , gˆ
Fall 2004
Pattern Recognition for Vision
Fourier Transform—Examples
Time
e−t
2
σ 2π e
/(2σ )
2
1
1
1
1
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.7
0.6
0.6
0.6
0.6
0.5
0.5
0.5
0.5
0.4
0.4
0.4
0.4
1
0.3
0.5
0
0
0.1
0.2
0.3
0.4
F(f)
f(x)
Frequency
0.3
0.5
1
0.3
0.5
0
0.6
0.7
0.8
0.9
1
0.2
0.2
0.2
0.1
0.1
0.1
0
-2
-1.5
-1
-0.5
0
0
x
0.5
1
1.5
0.1
0.2
-1.5
0.3
-1
0.4
-0.5
1
1
0.3
0.5
0.2
0.1
0
0
f
0.6
0.7
0.5
0.8
1
0.9
1.5
1
2
sin(πω T )
= T sinc(ω T )
πω T
T
rect (t / T )
1.5
0
-2
2
0
−2π 2σ 2ω 2
1
0.9
0.9
0.8
0.8
0.8
0.6
0.7
0.7
1
f(x)
0.6
0.6
0.4
0.5
0.5
0.2
0.4
0.4
0.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.3
0.5
0
0.6
0.7
0.8
0.9
0
1
0.1
0.2
0.3
0.4
0.3
0.5
0.6
0.7
0.8
0.9
1
0.2
0.2
-0.2
0.1
0.1
0
-1
-0.8
-0.6
-0.4
-0.2
0
0
0.2
0.4
0.6
0.8
1
-0.4
-4
-3
-2
-1
0
0
1
2
3
4
x
Fall 2004
Pattern Recognition for Vision
Fourier Transform—Examples
Time
Frequency
sin(2πΩt )
= 2Ω sinc(2Ωt )
2Ω
2πΩt
1
rect (ω /(2Ω))
1
1.5
1
0.9
0.9
0.8
0.8
0.8
0.6
0.7
0.7
1
0.6
0.4
f(x)
0.6
0.5
0.5
0.2
0.4
0.4
0.5
0
0
0.1
0.2
0.3
0.4
0.3
0.5
1
0.5
0
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.3
0.5
0.6
0.7
0.2
0.4
0.8
0.9
1
0.2
0.2
-0.2
0.1
0.1
-0.4
-4
-3
-2
-1
0
0
1
2
3
4
0
-1
-0.8
-0.6
-0.4
-0.2
0
0
0.6
0.8
1
x
Ω
Fall 2004
Pattern Recognition for Vision
Fourier Transform—Sampling Theorem
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Ω
0
-4
-3
-2
-1
0
1
2
3
4
1/(2Ω )
fˆ (ω ) = 0, for ω > Ω
∞
 n 
f (t ) = ∑ sinc ( 2Ωt − n ) f 

Ω
2


n =−∞
Fall 2004
Pattern Recognition for Vision
Fourier Transform—Sampling Theorem
fˆ (ω ) = 0, for ω > Ω, f (t ) =
∞
∑ sinc ( 2Ωt − n ) f
n =−∞
 n 


2
Ω


∞
n
, f (t ) = ∑ sinc ( 2Ω(t − tn ) ) f ( tn )
tn =
2Ω
n =−∞
expand fˆ (ω ) into a Fourier series:
Ω
∞
1
fˆ (ω ) =
cn e− j 2πωtn , cn = ∫ fˆ (ω )e j 2πωtn d ω = ∫ fˆ (ω )e j 2πωtn d ω = f (tn )
∑
2Ω
−Ω
−∞
�������
Inverse FT
f (t ) =
∞
∫
fˆ (ω )e j 2πωt d ω =
−∞
Ω
∫
fˆ (ω )e j 2πωt d ω
−Ω
Ω
Ω
1
1
j 2πω ( t −tn )
j 2πω ( t −tn )
f
t
e
d
ω
f
t
e
(
)
(
)
dω
= ∫
=
∑
∑
n
n
∫
2Ω
2Ω −Ω
−Ω
���������
Inverse FT of rect function
=
∞
∑ sinc ( 2Ω(t − t ) ) f ( t )
n =−∞
Fall 2004
n
n
basis of bandlimited functions
Pattern Recognition for Vision
Fourier Series—Discrete Signals
Discrete Fourier Transform (DFT)
Discrete, periodic signal
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
-4
Fall 2004
-3
-2
-1
0
N −1
ck = ∑ f (n) e
− j 2π kn
N
n =0
1
f ( n) =
N
0
1
2
3
N −1
∑c
k =0
k
e
j 2π kn
N
4
Pattern Recognition for Vision
Fourier Transform—Summary
Time
Continuous,
periodic
Continuous
Fall 2004
Frequency
FS
FT
Discrete
Continuous
Pattern Recognition for Vision
Fourier Transform—Summary
Frequency
Time
DFT
Discrete, periodic
Fall 2004
Discrete, periodic
Pattern Recognition for Vision
Discrete Fourier Transform—Fast Fourier Transform
Decimation in Space
N −1
ck = ∑ f (n) e
− j 2π kn
N
O( N 2 )
n=0
=
N / 2 −1
∑
− j 2π k 2 n
N
N / 2 −1
∑
�
���������
���������
f (2n) e
+
n=0
=
N / 2 −1
∑
f (2n +1) e
n=0
f (2n) e
− j 2π kn
N /2
+e
− j 2π k N / 2 −1
N
n=0
Wk
∑
− j 2π k
N
= −e
∑
cko
− j 2π (k + N / 2)
N
N / 2 −1
f (2n) e
− j 2π kn
N /2
=
∑
n=0
Fall 2004
⇔ Wk = −Wk + N / 2
N / 2 −1
n=0
N / 2 −1
f (2n +1) e
− j 2π kn
N /2
n=0
ck
e
e
− j 2π k (2n+1)
N
∑
f (2n) e
− j 2π ( k + N / 2)n
N /2
⇔ cke = cke+ N / 2 n=0
f (2n +1) e
− j 2π kn
N /2
=
N / 2 −1
∑
f (2
n +1) e
− j 2π ( k + N / 2)n
N /2
⇔ cko = cko+
N / 2
n=0
Pattern Recognition for Vision
Discrete Fourier Transform— Fast Fourier Transform
Wk = −Wk + N / 2
ck = cke +Wk cko 

0 ≤ k < N
/ 2
e
o
ck + N / 2 = ck −Wk ck 
cke = cke+ N / 2
cko = cko+ N / 2
N −1
ck = ∑ f (n) e
Recursion:
− j 2π kn
N
ĉ0 = ĉ + ĉ
e
0
n=0
N ' −1
c ek = ∑ f (2n) e
− j 2π kn
N'
n=0
N −1
'
c k = ∑ f (2n +1) e
o
− j 2π kn
N '
n=0
N = N / 2 O( N / 2)
'
Fall 2004
2
o
0
ĉ1 = ĉ0e − ĉ0o
Complexity:
M / 2 ld( M )
Multiplications
M ld(M )
Summations
Pattern Recognition for Vision
Discrete Fourier Transform—Image Analysis
x
y
N −1
∑ ∑
y =0
f ( x, y ) e
− j 2π k x x
N
− j 2π k y y
e
M
x=0
− j 2π k y y
− j 2π k x x


1
N
M
(
,
)
f
x
y
e
e
=

∑ ∑
MN y = 0  x = 0

M −1 N −1
Courtesy of Professors Tomaso Poggio and
Sayan Mukherjee. Used with permission.
M 1-D DFT’s
rows
N 1-D DFT’s
columns
Fall 2004
1
c( k x , k y ) =
MN
M −1
N −1
c ( k x , y ) = ∑ f ( x, y ) e
− j 2π k x x
N
x =0
M −1
c( k x , k y ) = ∑ c(k x , y ) e
− j 2π k y y
M
y =0
Pattern Recognition for Vision
Discrete Fourier Transform—Image Analysis
Example
y
x
Fall 2004
ky
kx
Pattern Recognition for Vision
Discrete Fourier Transform—Image Analysis
x
Image
y
(B)
(A)
Spectrum
ωy
(A)
Vertical edges
ωx
Horizontal edges
(B)
Amplitude of (A) &
Phase of (B)
Fall 2004
Courtesy of Professors Tomaso Poggio and Sayan Mukherjee. Used with permission.
Pattern Recognition for Vision
Discrete Fourier Transform—Image Analysis
Template Matching
corr ( x) =
∞
∫
f (x ' ) g ( x ' − x)dx '
g ′( x
) = g(−x)
−∞
corr ( x) = f ∗ g ′ =
∞
∫
f (x ' )g ′(x − x ' )dx '
fˆ (ω ) gˆ ′(ω
)
−∞
g ( x)
f ( x)
Fall 2004
corr ( x)
Pattern Recognition for Vision
Discrete Fourier Transform—Image Analysis
Shape description with Fourier descriptors
y
r
r
ϕ
ϕ
x
Fall 2004
Invariance:
•Scale
•Rotation
•Translation
Pattern Recognition for Vision
Discrete Fourier Transform—Image Analysis
Shape description with Fourier descriptors
Im
Re
Im
t
t
Re
Fall 2004
t
Pattern Recognition for Vision
Windowed Fourier Transform—Motivation
Fourier transform of a chirp signal
cos(π t 2 ), ωinst = t
Fall 2004
Pattern Recognition for Vision
Windowed Fourier Transform
Windowed Fourier Transform (WFT)
f� (ω , t ) =
∞
∫
f (u ) g (u − t )e
− j 2πω u
supp g ⊂ [ −T , 0]
du
−∞
supp ft ⊂ [t − T , t ]
ft (u ) = f (u ) g (u − t )
Fourier transform: f� (ω , t ) =
∞
∫
ft (u )e − j 2πω u du
−∞
g (u − t )
f (u )
t −T
Fall 2004
t
u
Pattern Recognition for Vision
Windowed Fourier Transform—Time Frequency Symmetry
f� (ω , t ) =
∞
∫
f (u ) g (u − t )e− j 2πω u du
−∞
substitute g (u − t )e j 2πωu by gω ,t (u )
f� (ω , t ) =
gˆω ,t (ν ) =
∞
∞
∫
Parseval’s Identity
gω ,t (u ) f (u )du = gω ,t , f = gˆ ω ,t , fˆ
−∞
∫ g (u − t )e
j 2πωu − j 2πν u
e
−∞
du =
∞
∫ g (u − t )e
− j 2π u (ν −ω )
du ,
−∞
substitute u ′ by u − t :
∞
∫
−∞
g (u ′)e − j 2π ( u′+t )(ν −ω ) du ′ = e − j 2π t (ν −ω ) gˆ (ν − ω )
f� (ω , t ) = e − j 2πωt
∞
∫
gˆ (ν − ω ) fˆ (ν )e j 2πν t dν
−∞
Fall 2004
Pattern Recognition for Vision
Windowed Fourier Transform—Time Frequency Localization
Time Frequency Symmetry
f� (ω , t ) =
f� (ω , t ) = e
f (u )
∞
∫
f (u ) g (u − t )e
−∞ ∞
− j 2πω t
∫
− j 2πω u
du
gˆ (ν − ω ) fˆ (ν )e j 2πν t dν
−∞
ω
g (u − t0 )
f� (ω , t0 )
supp g ⊂ [ −T / 2, T / 2]
fˆ (ν )
t0
ω
gˆ (ν − ω 0 )
ω0
Fall 2004
u
ω0
ν
t0
t
f� (ω 0 , t )
t
Pattern Recognition for Vision
Windowed Fourier Transform—Time Frequency Localization
Time Frequency Localization
gˆ (ω ) = 1
2
g (t ) = 1
2
tm =
σt =
2
∞
∞
ωm =
∫ t g (t ) dt
2
−∞
∫ (t − tm ) g (t ) dt
2
2
σω =
2
∞
∞
∫ ω gˆ (ω )
2
dω
−∞
∫ (ω − ω
) gˆ (ω ) d ω
2
2
m
−∞
−∞
Heisenberg’s uncertainty principle
4πσ ωσ t ≥ 1
g (t ) = (2a) e
1/ 4
−π at 2
gˆ (ω ) = (2 / a) e
1/ 4
Fall 2004
−πω 2 / a
σt =
1
4π a
t m = ωm = 0
a
σω =
4π
Pattern Recognition for Vision
Windowed Fourier Transform—Time Frequency Localization
Uncertainty Principle
Time
1
1
0.8
0.9
0.6
0.8
0.4
0.7
0.2
0.6
0
0
0.1
0.2
0.3
0.4
0.5
0.5
-0.2
0.4
-0.4
0.3
-0.6
0.2
-0.8
0.1
-1
-8
-6
-4
-2
0
0
Frequency
0.6
0.7
2
0.8
4
T
0.9
6
1
−1/ T
8
1
0.8
0.9
0.6
0.8
0.4
0.7
0.2
0.6
0
−T
T
1/ T
1
0
0.1
0.2
0.3
0.4
0.5
0.5
-0.2
0.4
-0.4
0.3
-0.6
0.2
-0.8
0.1
-1
-8
-6
-4
-2
0
0
0.6
0.7
2
0.8
4
0.9
6
1
8
1/ T
Fall 2004
Pattern Recognition for Vision
Windowed Fourier Transform—Time Frequency Localization
Time Frequency Localization
ω
f� (ω 0 , t0 ) describes f
σω
ω0
σt
σt
σω
t0
Fall 2004
σω
σω
σt
within a resolution cell:
σt
Uncertainty principle:
[t0 ± σ t ][ω0 ± σ ω ]
1
σ tσ ω >
4π
t
Pattern Recognition for Vision
Windowed Fourier—Reconstruction
Reconstruction Formula
1
f (u ) = ∫∫ g ω ,t (u ) f� (ω , t )dω dt C = g
C
2
f t (u ) = f (u ) g (u − t )
f� (ω , t ) =
∞
∫
ft (u )e
− j 2πω u
du
−∞
g (u − t ) f (u ) =
∞
∫
f� (ω , t )e j 2πωu d ω
−∞
∞
f (u )
∞ ∞
� (ω , t ) g (u − t )e j 2πωu d ω dt
g
u
t
dt
f
(
−
)
=
∫−∞
∫−∞ −∞∫
��
����
�
gω ,t ( u )
������
�
2
C
Fall 2004
Pattern Recognition for Vision
Windowed Fourier—Reconstruction Reconstruction Formula
f (u )
u
ω
Reconstruction
t
Fall 2004
Pattern Recognition for Vision
Windowed Fourier—Redundancy
Redundancy
∞ ∞
1
f (u ) = ∫ ∫ f� (ω , t ) g (u − t )e j 2πωu dω dt
��
����
�
C −∞ −∞
g (u )
ω ,t
Parseval for WFT
f� (ω , t ) = gω ,t (u ), f (u )
L2 ( R )
1
g� ω ,t (ω ′, t ′), f� (ω ′, t ′)
=
C
g� ω ,t (ω ′, t ′) = gω ,t (u ), gω ′,t ′ (u ) = K g (ω , t ω ′, t ′) =
L2 ( R 2 )
∞
∫ gω
,t
(u ) gω ′,t ′ (u ) du
−∞
�f (ω , t ) = 1 K (ω , t ω ′, t ′) f� (ω ′, t ′)dt ′dω ′
g
����
�
C ∫∫ ��
K g (ω ′,t ′ ω ,t )
-Values of f� are correlated
-Not any function f� (ω , t ) ∈ L2 can be a WFT
- f� (ω , t ) ∈ F ,F ⊂ L2 is a reproducing kernel Hilbert space
Fall 2004
Pattern Recognition for Vision
Windowed Fourier Transform—Sampling Theorem
Sampling Theorem
f� (n, m) =
ω
∞
∫
f (u ) g (u − nts )e − j 2π mω su du
−∞
Reconstruction if
ω s ts < 1
ωs
ts
Fall 2004
t
Pattern Recognition for Vision
Windowed Fourier Transform—Examples
cos(π u ), ωinst = u
2
Fall 2004
Pattern Recognition for Vision
Time Scale Analysis—Motivation
f (t )
ω
t
σt
Reconstruction
t
Features with time scales much shorter and longer than σ t
have to be synthesized with many notes.
Fall 2004
Pattern Recognition for Vision
Time Scale Analysis—Continuous Wavelet Transform
ψ s ,t (u ) =
1
s
p
∞
 u −t 
2
∈
u
L
ψ
,
ψ
(
)
, s ≠ 0, p > 0

 s 
f� ( s, t ) = ∫ ψ s ,t ( u ) f (u ) du = ψ s ,t , f ,
f ∈L
2
−∞
ψ (u ) = ue
ψ −0.5,−10 red
−u2
Fall 2004
ψ 1,0
green
ψ 2,15
blue
Pattern Recognition for Vision
Wavelet Transform—Reconstruction
Reconstruction Formula
Admissibility
condition:
ψˆ (±ω )
0 < C± = ∫
dω < ∞
ω
0
∞
2
∞ ∞
f (u ) = ∫ ∫ ψ s ,t (u ) f� ( s, t ) s 2 p −3dtds
0 −∞
s ,t
ψ
{ } reciprocal wavelet family of {ψ s,t }
C
C s ,t
if C− = C+ = then ψ s ,t = ψ , {ψ s ,t } is self reciprocal:
2
2
∞ ∞
2
f (u ) = ∫ ∫ ψ s ,t (u ) f� ( s, t ) s 2 p −3 dtds
C 0 −∞
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Reconstruction
Admissibility condition:
∞
0 < C± = ∫
0
ψˆ (0) = 0
ψˆ (±ω )
ω
2
dω < ∞
∞
∫ ψ (u )du = 0
−∞
Symmetry condition:
C− = C+
if ψ is symmetric, ψˆ is symmetric
if ψ is real-valued, ψˆ ( −ω ) = ψˆ (ω )
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Plancherel&Parseval
Plancherel Formula
1
�f , f� =
L C +C
+
−
∞ ∞
∫∫
s
2 p −3
f� ( s, t ) dsdt
2
−∞ −∞
Parseval Identity
f,g
L2 ( R )
= f� , g�
1
�f , g� =
L C +C
+
−
Fall 2004
L
∞ ∞
∫∫
∀ f , g ∈ L (R )
2
s
2 p −3
f� ( s, t ) g� ( s, t )dsdt
−∞ −∞
Pattern Recognition for Vision
Wavelet Transform—Time Frequency Symmetry
1
 u −t 
ψ s ,t (u ) =
ψ

s
s


p = 0.5, s > 0 :
ψˆ s (ω ) = s e− j 2πω tψˆ ( sω )
∞
f� ( s, t ) = ψ s ,t , f = ψˆ s ,t , fˆ = ∫ ψˆ s ,t (ω ) fˆ (ω )du
−∞
∞
f� ( s, t ) = s ∫ ψˆ ( sω ) fˆ (ω ) e j 2πω t dω
−∞
�f ( s, t ) = 1
s
Fall 2004
∞
 u −t 
∫−∞ψ  s  f (u )du
Pattern Recognition for Vision
Wavelet Transform—Time Frequency Localization
Time Frequency Localization
ψ (u ) is centered at t0 with σ t , ψˆ (ω ) is centered at ω0 with σ ω
�f ( s, t ′) = 1
s
∞
 u − t′ 
∫−∞ψ  s  f (u )du
time window centered at s t0 + t ′ with sσ t
∞
f� ( s, t ′) = s ∫ ψˆ ( sω ) fˆ (ω ) e j 2πω t ′ dω
−∞
frequency window centered at ω 0 / s with σ ω / s
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Time Frequency Localization
Time Frequency Localization
ω
s
1/ 2
1
2ω 0
ω0
2 ω0 / 2
Fall 2004
0.5σ t
2σ ω
σω
0.5σ ω
0.5σ t
size of resolution cells
is constant:
2σ ω
σt
2σ t
σω
0.5σ ω
σt
σ tσ ω
2σ t
t
Pattern Recognition for Vision
Wavelet Transform—Redundancy
Redundancy
f1 , f 2
2
L
= f�1 , f�2
L
f� ( s, t ) = ψ s ,t , f
ψ� s′,t ′ = ψ s ,t ,ψ s′,t ′
L2
L2
= ψ� s ,t , f�
L
= Kψ ( s′, t ′ s, t )
�f ( s, t ) = s 2 p −3 K ( s′, t ′ s, t ) f� ( s′, t ′)ds′t ′
ψ
∫∫
-Values of f� are correlated
-Not any function f� ( s, t ) ∈ L2 can be a WFT
- f� ( s, t ) ∈ L, L ⊂ L2 is a reproducing kernel Hilbert space
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Frames
Notion of Frames
f� ( s, t ) = ψ s ,t , f
x
u3
u2
e3
T
e2
X
Y
e1
u1
Hilbert space X , dim X = N
{u1 ,..., u M } ∈ X , M > N ,
Hilbert space Y , dim Y = M
y = Tx
Tx = ∑ x, u j e j , T : X → Y
j
x, T *y = Tx, y , T * : Y → X
Unlike the basis the frames {u1 ,..., u M } can be linearly dependent
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Frames
if x ∈ X is uniquely determined by Tx ∈ Y
then {u1 ,..., u M } ∈ X is a frame of X
T is injective
{u1 ,..., u M } ∈ X
is a frame of X if:
A x ≤ Tx ≤ B x
2
2
2
∀x ∈ X , B ≥ A ≥ 0
Reconstruction of x from y = Tx:
x = Sy = (T *T ) −1T *y
G = T *T is selfadjoint → eigenvalues are real
Frames are the framework for continuous and discrete wavelets
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Discrete Wavelets �f ( s, t ) = s 2 p −3 K ( s′, t ′ s, t ) f� ( s′, t ′)ds′t ′
ψ
∫∫
Redundancy: Discrete wavelets
ψ ( s, t ), s, t ∈ R to ψ (a, b), a, b ∈ Z
Exponential sampling
s(a) = σ a , σ > 1, elementary dilation step
t (a, b) = bσ a β , β > 0
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Discrete Wavelets Sampling of the s, t plane
s
σ1
σ
0
b =1
b=2
a =1
β
a=0
σx
a=x
a
t
(
a
,
b
)
=
b
σ
β, β > 0
s(a) = σ , σ > 1
t
a
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Multiresolution Analysis Multiresolution analysis of discrete signals
f (n), n ∈ Ζ
Sampling f� ( s, t ) on a dyadic grid, orthogonal wavelets: σ = 2, β = 1: s = 2a , t = b 2a
a, b ∈ Ζ
Two half band filters with impulse response h(n), g (n)
f H ( n) = f ( n) ∗ h( n) = ∑ f ( k ) h( n − k )
k
f L ( n) = f ( n) ∗ g ( n) = ∑ f ( k ) g ( n − k )
k
Downsampling: f H * ( n) = f H (2n), f L* ( n) = f L (2n)
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Multiresolution Analysis g ( n)
f ( n)
f L* ,l (n)
2
Analysis
Wavelet coefficients
h ( n)
2
f L ,l ( n )
f H * ,l ( n)
2
g * ( n)
f H * ,l ( n)
+
Reconstruction
2
h* ( n )
g (n), h(n) are a Quadrature mirror filter
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Multiresolution Analysis
Haar Wavelets
h
1
0.5
1
2
t
h are orthonormal
easy to compute but
poor frequency resolution
g
1
t
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Multiresolution Analysis
Daubechie Wavelets
g (t )
h(t )
g (t )
h(t )
f (t )
f ( n)
f * ( n)
Matlab Documentation
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Multiresolution Analysis
Haar Wavelets (Matlab Toolbox)
Screenshot from Matlab Toolbox removed due to copyright reasons.
Fall 2004
Pattern Recognition for Vision
Wavelet Transform—Applications •Image Compression
•Texture Analysis
•De-noising
•Features for Object Detection and Recognition
Fall 2004
Pattern Recognition for Vision
My Face Detector in 2000
Face examples
Classification
Result
Off-line
training
Photograph by MIT OCW.
Classifier
Non-linear SVM
5 min / image
Feature vector (x1, x2 ,…, xn)
Feature Extraction
Non-face examples
Pixel pattern
19x19
Search for faces at
different resolutions and
locations
Photograph by MIT OCW.
Fall 2004
Pattern Recognition for Vision
My Face Detector in 2000
My experiments with different types of features
Photographs courtesy of CMU/VASC Image Database at _________________________________________
http://vasc.ri.cmu.edu/idb/html/face/frontal_images/
Gradient
AI Memo 2000
Fall 2004
Pattern Recognition for Vision
Speeding-up Face Detection
19x19
detection
Finalpolynomial
classifier
17x17 polynomial classifier
Original image
3x3 classifier
9x9 classifier
17x17 linear classifier
30%
1%
7%
8%
22%
Propagated
70%
100%
Removed
Photograph courtesy of CMU/VASC Image Database at http://vasc.ri.cmu.edu/idb/html/face/frontal_images/
Fall 2004
Pattern Recognition for Vision
Hierarchical Face Detection—Heisele, CVPR 2001
Fall 2004
Pattern Recognition for Vision
Viola&Jones Detector—Viola&Jones, CVPR 2001
Speed is proportional to the average number of features
computed per sub-window.
On the MIT+CMU test set, an average of 9 features out
of a total of 6061 are computed per sub-window.
On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps).
Roughly 15 times faster than Rowley-Baluja-Kanade
and 600 times faster than Schneiderman-Kanade.
Fall 2004
Pattern Recognition for Vision
Viola&Jones Detector—Image Features
Photo removed due to
copyright considerations.
“Rectangle filters”
Similar to Haar wavelets
Differences between sums
of pixels in adjacent
rectangles
{
ht(x) =
+1 if ft(x) > θt
-1 otherwise
160,000 × 100 = 16,000,000
Unique Features
Series of photos and figures from Rapid object detection using a boosted cascade of simple features. Viola, P., and M. Jones.
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference
1 (2001): I-511 - I-518. Graphs courtesy of IEEE. Copyright 2001 IEEE. Used with Permission.
Fall 2004
Pattern Recognition for Vision
Viola&Jones Detector
How exactly does it work??
Read the CVPR 2001 paper…….
or wait until the lecture on object detection by Mike Jones
Fall 2004
Pattern Recognition for Vision
Integral Image—Viola&Jones CVPR 2001
Define the Integral Image
Any rectangular sum can be computed in constant time:
I ' ( x, y ) = ∑ I ( x ' , y ' )
x '≤ x
y '≤ y
Rectangle features can be computed as differences between
rectangles
D = 1 + 4 − (2 + 3)
= A + ( A + B + C + D) − ( A + C + A + B)
=D
Series of photos and figures from Rapid object detection using a boosted cascade of simple features. Viola, P., and M. Jones.
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference
1 (2001): I-511 - I-518. Graphs courtesy of IEEE. Copyright 2001 IEEE. Used with Permission.
Fall 2004
Pattern Recognition for Vision
Example Classifier for Face Detection—Viola&Jones CVPR 2001
A classifier with 200 rectangle features was learned using AdaBoost
95% correct detection on test set with 1 in 14084
false positives.
Not quite competitive...
Photographs removed due to
copyright considerations.
ROC curve for 200 feature classifier
Series of photos and figures from Rapid object detection using a boosted cascade of simple features. Viola, P., and M. Jones.
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference
1 (2001): I-511 - I-518. Graphs courtesy of IEEE. Copyright 2001 IEEE. Used with Permission.
Fall 2004
Pattern Recognition for Vision
Literature Ingrid Daubechies “Ten Lectures on Wavelets”
For mathematicians only, many proofs
Gerald Kaiser “A Friendly Guide to Wavelets”
Not friendly, but easier to understand than above
Christian Blatter “Wavelets—A Primer”
more friendly than Kaiser…
Stephane Mallat “Multifrequency Channel Decomposition”
IEEE Acoustics Speech & Signal Proc. 1989
One of the early papers on multiresolution analysis
Fall 2004
Pattern Recognition for Vision
Homework
•Some proofs (optional)
•Template matching with Fourier transform
Shape representation with Fourier descriptors
(stick to instructions)
•Playing with wavelets
Fall 2004
Pattern Recognition for Vision
Download