ch9 (Pattern Comparison ).ppt

advertisement
PATTERN COMPARISON TECHNIQUES
Test Pattern:
T  {t1 , t2 , t3 ,..., ti },
Reference Pattern:
R  {t , t ,..., t }.
j
j
1
j
2
j
J
1
4.2 SPEECH (ENDPIONT) DETECTION
2
4.3 DISTORTION MEASURESMATHEMATICAL CONSIDERATIONS
x and y: two feature vectors defined on a vector space X
The properties of metric or distance function d:
( a ) 0  d ( x, y )  
for x, y  
d ( x, y )  0 if and only if x  y ;
(b) d ( x, y )  d ( y, x) for x, y   ;
(c) d ( x, y )  d ( x, z )  d ( y, z ) for x, y, z   .
A distance function is called invariant if
(d ) d ( x  z , y  z )  d ( x, y ).
3
PERCEPTUAL CONSIDERATIONS
Spectral changes that do not fundamentally
change the perceived sound include:
4
PERCEPTUAL CONSIDERATIONS
Spectral changes that lead to
phonetically different sounds include:
5
PERCEPTUAL CONSIDERATIONS
Just-discriminable change:
known as JND (just-noticeable difference),
DL (difference limen), or differential threshold
6
4.4 DISTORTION MEASURESPERCEPTUAL CONSIDERATIONS
7
4.4 DISTORTION MEASURESPERCEPTUAL CONSIDERATIONS
8
Spectral Distortion Measures
Spectral Density
Fourier Coefficients of
Spectral Density
Autocorrelation Function
9
Spectral Distortion Measures
Short-term autocorrelation
Then S ( ) is an energy spectral density
10
Spectral Distortion Measures
Autocorrelation matrices
11
Spectral Distortion Measures
If σ/A(z) is the all-pole model for the speech spectrum,
The residual energy resulting from “inverse filtering”
the input signal with an all-zero filter A(z) is:
12
Spectral Distortion Measures
Important properties of all-pole modeling:
The recursive minimization relationship:
13
LOG SPECTRAL DISTANCE
14
LOG SPECTRAL DISTANCE
15
CEPSTRAL DISTANCES
The complex cepstrum of a signal is defined as
The Fourier transform of log of the signal spectrum.
The Fourier series representa tion of log(S(  )) can be expressed as :

c e
log S ( ) 
n  
 jn
n
where c n  c  n are real and referred to as the cepstral coefficien ts.

Note that : c0   log S ( )

d
2
For a pair of spectra, by applying Parseval' s theorem, we can
relate the L 2 cepstral distance of the spectra to the rms log spectral distance

d 22   | log S ( )  log S ( ) | 2



 (c
n  
n
d
2
 c n ) 2
16
CEPSTRAL DISTANCES

Laurent expansion : log[  / A( z )]  log    c n z  n
n 1
Differenti ating both sides of the equation w ith respect to
z -1 and equating the coefficien ts of like powers of z -1 , we derive :
1 n 1
c n   a n   kck a n  k for n  0 where a 0  1 and a k  0 for k  p.
n k 1
In terms of the log power spectrum, the Taylor series expansion becomes :
j
log[  / | A(e ) | ] 
2
2

 j n
2
c
e
where
c

log

and c  n  c n
n
0
n  
L
Truncated cepstral distance
d c2 ( L)   (c n  c n ) 2
n 1
17
CEPSTRAL DISTANCES
18
CEPSTRAL DISTANCES
19
Weighted Cepstral Distances and Liftering
It can be shown that under certain regular conditions, the cepstral coefficients,
except c0, have:
1) Zero means
2) Variances essentially inversed proportional to the square of the coefficient
index:
 
1
Ec  2
n
2
n
If we normalize the cepstral
distance by the variance inverse:
20
Weighted Cepstral Distances and Liftering
Differentiating both sides of the Fourier series equation of spectrum:
This is an L2 distance based upon the differences
between the spectral slopes
21
Cepstral Weighting or Liftering Procedure
h is usually
chosen as L/2
and L is typically
10 to 16
22
A useful form of
weighted cepstral distance:
23
Likelihood Distortions
Previously defined:
Itakura-Saito
distortion measure
Where

2

of
 2 are one-step prediction errors
S ( ) and S ( ) as defined:
and
24
25
Likelihood Distortions
The residual energy can be easily evaluated by:
26
Likelihood Distortions
By replacing
S ( )
by its optimal p-th order LPC model spectrum:
If we set σ2 to match the residual energy α :
Which is often referred to as
Itakura distortion measure
27
Likelihood Distortions
Another way to write the Itakura distortion measure is:
Another gain-independent distortion measure is called the
Likelihood Ratio distortion:
28
4.5.4 Likelihood Distortions
 1

 1

1
1
  d IS 

d LR 
,
,
 | A |2 | A |2 
 | A |2 | A |2 
 p

 p

j 2
 | A(e ) |
d

1
j

2
 | A (e ) | 2
p
t

a Rpa

2
p
 1.
29
4.5.4 Likelihood Distortions
1
1
2
u  exp(log u )  1  log u  (log u )  (log u ) 2  ...,
2!
3!
and
 /  p2  1,

d I (1 / | Ap | ,1 / | A | )  log 2
p
2
2

 2  1,
p

for 2  1,
p
 1

1
.
 d LR 
,
 | A |2 | A |2 
 p

That is, when the distortion is small, the Itakura distortion measure
is not very different from the LR distortion measure
30
4.5.4 Likelihood Distortions
d IS (s, s)  d IS (s, s)
31
4.5.4 Likelihood Distortions
Consider the Itakura-Saito distortion between
the input and output of a linear system H(z)
S ( )
X (n)
A( z )
H ( z) 
B( z )
S ( )
X (n)
32
4.5.4 Likelihood Distortions
j
2
S ( )  H (e ) S ( ).
j
V ( )   log H (e )
2



2
1
d
j



d IS ( S , S )  
 log H (e )  1
.
2
 
j

2
H
(
e
)


A( z )
H ( z) 
,
B( z )
33
4.5.4 Likelihood Distortions
p1
A( z )  1   ai z 1
i 1
p2
B( z )  1   ai z 1
i 1
d IS ( S , S )  


j
2
) d
 B (e
d
d
1

1
2
2


2
H (e j ) 2
A(e j ) 2
1
 1

1
 d IS  2 , 2 
 A B 


34
4.5.5 Variations of Likelihood Distortions
Symmetric distortion measures:

1
m
m
( m)
d x ( s, s)  d IS ( s, s)  d IS (s, s)
2

1
35
m
.
4.5.5 Variations of Likelihood Distortions
COSH distortion
m  1,
1
d ( s, s )  d IS ( s, s )  d IS ( s , s ).
2
(1)
x
d
1  V ( )
V ( )
 V ( )  1
 V ( )  1  e
d ( s, s )   e
2
2 

d 
 d COSH ( s, s ).
  cosh[V ( )]  1

2
V2 V4
 .

cosh V  1 
4!
2!
1 2
d COSH ( s, s )  d 2 ( s, s ).
so :
2
(1)
x


36
4.5.5 Variations of Likelihood
Distortions
37
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
Psychophysical studies have shown that human perception of the
frequency Content of sounds does not follow a linear scale.
This research has led to the idea of defining subjective pitch
of pure tones.
For each tone with an actual frequency, f, measured in Hz,
a subjective pitch is measured on a scale called the “mel” scale.
As a reference point, the pitch of a 1 kHz tone, 40 dB above the
perceptual hearing threshold, is defined as 1000 mels.
38
39
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
40
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
41
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
42
Critical
Critical Center
Frequency Bank
Bank
(Hz )
Number (Hz )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
50
150
250
350
450
570
700
840
1,000
1,170
1,370
1,600
1,850
2,150
2,500
2,900
3, 400
4,000
4,800
5,800
7,000
8,500
10,500
13,500
100
100
100
110
120
140
150
160
190
210
240
280
320
380
450
550
700
900
1,100
1,300
1,800
2,500
3,500
LowerCutoff UpperCutoff
Frequency
(Hz )
100
200
300
400
510
630
770
920
1,080
1,270
1,840
1,720
2,000
2,320
2,700
3,150
3,700
4, 400
5,300
6, 400
7,700
9,500
12,000
Frequency
(Hz )
100
200
300
400
510
630
770
920
1,080
1, 270
1,840
Examples of
Critical bandwidth
1,720
2,000
2,320
2,700
3,150
3,700
4, 400
5,300
6, 400
7,700
9,500
12,000
15,500
43
Warped cepstral distance
~2
db
d 2 (s, s)   | log S ( (b))  log S ( (b)) |
,
B
2
B
b is the frequency in Barks, S(θ(b)) is the spectrum on a
B
2
Bark scale, and B is the Nyquist frequency in Barks.
 
~2
B
j ( b )( i  k )
2
i
i
k
k
B
i   k  
1
d 
2B


  (c  c)(c

  (c  c)(c
i   k  
i
i
k
 c )  e
db
 ck ) wik ,
44
4.5.6 Spectral Distortion Using a Warped Frequency Scale
Where the warping function is defined by
B
wik   e
j ( b )( i  k )
B
~2
d c ( L) 
L
db
.
2B
L


(
c

c
)
(
c

c
  i i k k )wik .
i  L k  L
45
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
    1000   b 
 (b)  1 (b)  

 tan   for | b | 6
 3333   0.76   13 
  
( b 8.776) / 10
 (b)   2 (b)  
for | b | 13
 (1000) 10
 3333 
1
 (b)  [1 (b)   2 (b)]
for 6  | b |  13.
2
46
4.5.6 Spectral Distortion Using a Warped Frequency Scale
47
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
48
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
Mel-frequency cepstrum:
 
1  
c n   (log S k ) cos n  k    ,
2 K 
k 1
 
K
~
~
Sk ,
~
n  1,2, . . . , L
k  1,..., K is the output power of the triangular filters
L
~
~
2

d ( L )   ( cn  c n )
2
~c
Mel-frequency cepstral distance
n 1
49
4.5.7 Alternative Spectral Representations
and Distortion Measures
50
4.5.7 Alternative Spectral
Representations and Distortion
Measures
Ai 1 1  ki
gi 

,
Ai
1  ki
i  1,2, . . . , p
Ai 1
1  ki
log g i  log
 log
,
Ai
1  ki
P( z )  A( z )  z
 ( p 1)
Q( z )  A( z )  z
 ( p 1)
1
A( z )
1
A( z ).
51
4.5.7 Alternative Spectral Representations
and Distortion Measures
52
Summary of Spectral Distortion Measures
Distortion
Measure
L p Metric
Truncated Cepstral
Dis tan ce
Notation
dp
2
c
d ( L)
 (c
n 1
n
 cn )
L
d
2
cW
Computation
p d 



log
s
(

)

log
s
(

)



2 

L
Weighted ( Liftered )
Cepstral Dis tan ce)
Expression
 w(n)(c
n 1
n
1
2
 cn )
2
p
2 FFTs, log s, int egral
L*,
L*,
53
Summary of Spectral Distortion Measures
Distortion Measure
Itakura  Saito
Distortion
Itakura
Distortion
Notation
Expression
Computation
d IS
S ( ) d
 2
 S ( ) 2 log  2  1
2
 p2  A d
 p2
log 2  1
2
2 


Ap 2
p*,
  A2

d 

log  

2

2

Ap




at Rp a
log
2
p*,

dI
p

Likelihood Ratio
Distortion
d LR


A
Ap
2
2
d
1
2
t
a Rpa

2
p
1
p*,
54
Summary of Spectral Distortion Measures
Distortion Measure
COSH Dis tan ce
Weighted Likelihood
Ratio Distortion
Wighted Slope
Metric
Notation
Expression
Computation
dCOSH

S ( )  d
  cosh log
1

S ( )  2

1
d IS (s, s)  d IS (s, s)
2
2 p*,




r
(
n
)
r
(
n
)

 2  (cn  cn )

2
 
n 1  


L*,
dW LR
L
dW SM
K
u E ES  ES    u (i )(i )  (i )
i 1
2
K *,
55
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
 cn (t )  jn
 log S ( , t )
 
e ,
t
t
n  

56
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
Fitting the cepstral trajectory
by a second order polynomial,
Choose h1, h2, h3 such that
E is minimized.
E
M
2 2
[
c
(
t
)

(
h

h
t

h
t

1
2
3 )]
t  M
M
Differentiating E with respect
to h1, h2, and h3 and setting
to zero results in 3 equations:
2
[
c
(
t
)

h

h
t

h
t

1
2
3 ]0
t  M
M
2
2
[
c
(
t
)
t

h
t

h
t

h
t

1
2
3 ]0
t  M
M
2
2
3
4
[
c
(
t
)
t

h
t

h
t

h
t

1
2
3 ]0
t  M
57
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
The solutions to these equations are:
M
h2 
 tc(t )
t  M
TM
h3  1
Tm
M
M
t  M
t  M
M
4
2
c
(
t
)

(
2
M

1
)
r

 c(t )
TM2  (2 M  1)  t
t  M
1  M

h1 
c
(
t
)

h
T

3 M ,

2 M  1 t   M

TM 
M
2
t
 .
t  M
58
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
59
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
 cn (  t )
t
t 0
 h2 
M
 t c (  t ) T
t  M
n
M
 cn (  t )
t  0  2h3
2
t
M
  M



2
2TM   c(  t )  (2M  1)   t c(  t ) 
t   M

t   M



M

2
4
TM  (2M  1)   t 
t   M 
2
60
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
A differential spectral distance:
d 22 (1)  



 log S ( , t )  log S ( , t ) d

  (
t
t
2 n  
2
2
(1)
n
   (n1) ) ,
A second differential spectral distance:
d 22 (1)  


2

 log S ( , t )  log S ( , t ) d

  (
t
t
2 n  
2
2
( 2)
n
   (n2 ) ) 2 .
61
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
d
2
2
  d  d
2
1 2
2
2 2 (1)
 d
2
3 2 ( 2 )
,
Cepstral weighting or liftering by differentiating
 cn (t )  jn

[log S ( , t )]    jn
e
t 
t
n  

2


  jn
n  
(1)
n
(t )e
 jn
.
62
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
A weighted differential cepstral distance:
d

2
2 w




 n (
2
n  
2
 log S ( , t )  log S ( , t ) d

t 
t 
2
2
(1)
n
2
  n ) .
(1) 2
63
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
Taking the L2 distance as an example:
d

2
2 w 



2
 1  2  
d

 log S ( , t )  log S ( , t 

2
   t 
2
 n  c (t )  c (t )   

2
n  
1 n
(1)
2 n
1 n
(t )   2 n (t )
(1)

2





2
2
2
2
2
(1)
(1)
2
 1   n [cn (t )  cn (t )]    2   n [ n (t )   n (t )] 
n  

n  


 21 2  n 2 [cn (t )  cn (t )] [ n(1) (t )   n(1) (t )]
n  

 12 d 22W   22 d 22W  21 2  n 2 [cn (t )  cn (t )] [ n(1) (t )   n(1) (t )].
64
n  
Download