Џ*ࡱ*က**************>***** ***********************************************

advertisement
PATTERN COMPARISON TECHNIQUES
Test Pattern:
T  {t1 , t2 , t3 ,..., ti },
Reference Pattern:
R  {t , t ,..., t }.
j
j
1
j
2
j
J
1
4.2 SPEECH (ENDPIONT) DETECTION
2
4.3 DISTORTION MEASURESMATHEMATICAL CONSIDERATIONS
x and y: two feature vectors defined on a vector space X
The properties of metric or distance function d:
( a ) 0  d ( x, y )  
for x, y  
d ( x, y )  0 if and only if x  y ;
(b) d ( x, y )  d ( y, x) for x, y   ;
(c) d ( x, y )  d ( x, z )  d ( y, z ) for x, y, z   .
A distance function is called invariant if
(d ) d ( x  z , y  z )  d ( x, y ).
3
PERCEPTUAL CONSIDERATIONS
Spectral changes that do not fundamentally
change the perceived sound include:
4
PERCEPTUAL CONSIDERATIONS
Spectral changes that lead to
phonetically different sounds include:
5
PERCEPTUAL CONSIDERATIONS
Just-discriminable change:
known as JND (just-noticeable difference),
DL (difference limen), or differential threshold
6
4.4 DISTORTION MEASURESPERCEPTUAL CONSIDERATIONS
7
4.4 DISTORTION MEASURESPERCEPTUAL CONSIDERATIONS
8
Spectral Distortion Measures
Spectral Density
Fourier Coefficients of
Spectral Density
Autocorrelation Function
9
Spectral Distortion Measures
Short-term autocorrelation
Then S ( ) is an energy spectral density
10
Spectral Distortion Measures
Autocorrelation matrices
11
Spectral Distortion Measures
If σ/A(z) is the all-pole model for the speech spectrum,
The residual energy resulting from “inverse filtering”
the input signal with an all-zero filter A(z) is:
12
Spectral Distortion Measures
Important properties of all-pole modeling:
The recursive minimization relationship:
13
LOG SPECTRAL DISTANCE
14
LOG SPECTRAL DISTANCE
15
CEPSTRAL DISTANCES
The complex cepstrum of a signal is defined as
The Fourier transform of log of the signal spectrum.
The Fourier series representa tion of log(S(  )) can be expressed as :

c e
log S ( ) 
n  
 jn
n
where c n  c  n are real and referred to as the cepstral coefficien ts.

Note that : c0   log S ( )

d
2
For a pair of spectra, by applying Parseval' s theorem, we can
relate the L 2 cepstral distance of the spectra to the rms log spectral distance

d 22   | log S ( )  log S ( ) | 2



 (c
n  
n
d
2
 c n ) 2
16
CEPSTRAL DISTANCES

Laurent expansion : log[  / A( z )]  log    c n z  n
n 1
Differenti ating both sides of the equation w ith respect to
z -1 and equating the coefficien ts of like powers of z -1 , we derive :
1 n 1
c n  a n   kck a n  k for n  0 where a 0  1 and a k  0 for k  p.
n k 1
In terms of the log power spectrum, the Taylor series expansion becomes :
j
log[  / | A(e ) | ] 
2
2

 j n
2
c
e
where
c

log

and c  n  c n
n
0
n  
L
Truncated cepstral distance
d c2 ( L)   (c n  c n ) 2
n 1
17
CEPSTRAL DISTANCES
18
CEPSTRAL DISTANCES
19
Weighted Cepstral Distances and Liftering
It can be shown that under certain regular conditions, the cepstral coefficients,
except c0, have:
1) Zero means
2) Variances essentially inversed proportional to the square of the coefficient
index:
 
1
Ec  2
n
2
n
If we normalize the cepstral
distance by the variance inverse:
20
Weighted Cepstral Distances and Liftering
Differentiating both sides of the Fourier series equation of spectrum:
This is an L2 distance based upon the differences
between the spectral slopes
21
Cepstral Weighting or Liftering Procedure
h is usually
chosen as L/2
and L is typically
10 to 16
22
A useful form of
weighted cepstral distance:
23
Likelihood Distortions
Previously defined:
Itakura-Saito
distortion measure
Where

2

of
 2 are one-step prediction errors
S ( ) and S ( ) as defined:
and
24
25
Likelihood Distortions
The residual energy can be easily evaluated by:
26
Likelihood Distortions
By replacing
S ( )
by its optimal p-th order LPC model spectrum:
If we set σ2 to match the residual energy α :
Which is often referred to as
Itakura distortion measure
27
Likelihood Distortions
Another way to write the Itakura distortion measure is:
Another gain-independent distortion measure is called the
Likelihood Ratio distortion:
28
4.5.4 Likelihood Distortions
 1
1

d LR
,
2
 | A | | A |2
 p

 1
1
  d IS 
,
2

 | A | | A |2

 p




| A(e j ) | 2 d

1
j

2
 | A (e
) | 2
p


at Rpa

2
p
 1.
29
4.5.4 Likelihood Distortions
1
1
2
u  exp(log u )  1  log u  (log u )  (log u ) 2  ...,
2!
3!
and
 /  p2  1,

d I (1 / | A p | ,1 / | A | )  log 2
p
2
2

 2  1,
p

for 2  1,
p
 1
1

 d LR
,
 | A |2 | A |2
 p

.


That is, when the distortion is small, the Itakura distortion measure
is not very different from the LR distortion measure
30
4.5.4 Likelihood Distortions
d IS (s, s)  d IS (s, s)
31
4.5.4 Likelihood Distortions
Consider the Itakura-Saito distortion between
the input and output of a linear system H(z)
S ( )
X (n)
A( z )
H ( z) 
B( z )
S ( )
X (n)
32
4.5.4 Likelihood Distortions
j
2
S ( )  H (e ) S ( ).
j
V ( )   log H (e )
2



2
1
d
j



d IS ( S , S )  
 log H (e )  1
.
2
 
j

2
H
(
e
)


A( z )
H ( z) 
,
B( z )
33
4.5.4 Likelihood Distortions
p1
A( z )  1   ai z 1
i 1
p2
B( z )  1   ai z 1
i 1
d IS ( S , S )  


j
2
) d
 B (e
d
1  
1
2
2


H (e j ) 2
A(e j ) 2
1
 1

1
 d IS  2 , 2 
 A B 


34
4.5.5 Variations of Likelihood Distortions
Symmetric distortion measures:
d
( m)
x

( s, s ) 
1
m
m
d IS (s, s )  d IS (s , s)
2

1
m
.
35
4.5.5 Variations of Likelihood Distortions
COSH distortion
m  1,
1
d ( s, s )  d IS ( s, s )  d IS ( s , s ).
2
(1)
x
d
1  V ( )
V ( )
 V ( )  1
 V ( )  1  e
d ( s, s )   e
2
2 

d 
 d COSH ( s, s ).
  cosh[V ( )]  1

2
V2 V4
 .

cosh V  1 
4!
2!
1 2
d COSH ( s, s )  d 2 ( s, s ).
so :
2
(1)
x


36
4.5.5 Variations of Likelihood
Distortions
37
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
Psychophysical studies have shown that human perception of the
frequency Content of sounds does not follow a linear scale.
This research has led to the idea of defining subjective pitch
of pure tones.
For each tone with an actual frequency, f, measured in Hz,
a subjective pitch is measured on a scale called the “mel” scale.
As a reference point, the pitch of a 1 kHz tone, 40 dB above the
perceptual hearing threshold, is defined as 1000 mels.
38
39
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
40
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
41
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
42
Critical
Center
Frequency
Bank
Number (Hz)
1
2
3
4
5
50
150
6
7
570
700
840
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
250
350
450
1,000
1,170
1,370
1,600
1,850
2,150
2,500
2,900
3, 400
4,000
4,800
5,800
7,000
8,500
Critical
LowerCutof f UpperCutof f
Band
Frequency
Frequency
(Hz)
(Hz)
(Hz)
100
100
100
100
100
110
120
200
300
400
510
140
150
160
630
770
920
190
210
240
280
1,080
1,270
1,840
1,720
320
380
450
2,000
2,320
2,700
550
700
900
1,100
3,150
3,700
4,400
5,300
1,300
1,800
10,500 2,500
13,500 3,500
200
300
400
510
630
770
920
1,080
1, 270
1,840
1,720
2,000
2,320
Examples of
Critical bandwidth
2,700
3,150
3,700
4, 400
5,300
6, 400
7,700
9,500
6, 400
7,700
9,500
12,000
12,000
15,500
43
Warped cepstral distance
~2
db
d 2 ( s, s )   | log S ( (b))  log S ( (b)) |
,
B
2B
B
2
b is the frequency in Barks, S(θ(b)) is the spectrum on a
Bark scale, and B is the Nyquist frequency in Barks.
 
~2
B
j ( b )( i  k )
2
i
i
k
k
B
i   k  
1
d 
2B


  (c  c)(c

  (c  c)(c
i   k  
i
i
k
 c )  e
db
 ck ) wik ,
44
4.5.6 Spectral Distortion Using a Warped Frequency Scale
Where the warping function is defined by
B
wik   e
db
.
2B
j ( b )( i  k )
B
~2
d c ( L) 
L
L
  (c
i  L k  L
i
 ci )(c k  c k ) wik .
45
4.5.6 Spectral Distortion Using a Warped Frequency Scale
    1000   b 
 (b)  1 (b)  

 tan  
 3333   0.76   13 
for | b | 6
  
( b 8.776) / 10
 (b)   2 (b)  
for | b | 13
 (1000) 10
 3333 
1
 (b)  [1 (b)   2 (b)]
for 6  | b |  13.
2
46
4.5.6 Spectral Distortion Using a Warped Frequency Scale
47
4.5.6 Spectral Distortion Using a
Warped Frequency Scale
48
4.5.6 Spectral Distortion Using a Warped Frequency Scale
Mel-frequency cepstrum:
 
1  
c n   (log S k ) cos n  k    ,
2 K 
k 1
 
K
~
~
Sk ,
~
n  1,2, . . . , L
k  1,..., K is the output power of the triangular filters
L
~
~
2

d ( L )   ( cn  c n )
2
~c
Mel-frequency cepstral distance
n 1
49
4.5.7 Alternative Spectral Representations and Distortion Measures
50
4.5.7 Alternative Spectral Representations and Distortion Measures
Wave reflection occurs at each sectional boundary with
reflection coefficients denoted by k i ,
i  1,2, . . . , p
Ai 1 1  k i
gi 

,
Ai
1  ki
i  1,2, . . . , p
Ai 1
1  ki
log g i  log
 log
,
Ai
1  ki
51
4.5.7 Alternative Spectral Representations and Distortion Measures
Another possible parametric representation of the all-pole spectrum is the
set of line spectral frequencies (LSFs) defined as the roots of the following
two polynomials based Upon the inverse filter A(z):
P( z )  A( z )  z  ( p 1) A( z 1 )
Q( z )  A( z )  z ( p 1) A( z 1 ).
These two polynomials are equivalent to artificially augmenting the
p-section nonuniform acoustic tube with an extra section that is either
completely closed (area=0) or completely open (area=∞). LSF
parameters, due to their particular structure, possess properties similar
to those of the formant frequencies and bandwidths.
n
d   (log g i  log g i ) 2
2
g
i 1
52
4.5.7 Alternative Spectral Representations and Distortion Measures
Weighted slope metric proposed by Klatt:
K
d W SM ( S , S )  u E E S  E S    u (i )(i )   (i )
2
i 1
u E : the weighting constant for the aboslute energy difference
u (i ) : the weighting coefficien for the critical band spectral slope difference
K : the total number of critical bands
In one implementa tion :
u (i )  [u s (i )  u s (i )] / 2



u GM
u LM
where u s (i )  
.

 u LM  VLM (i )   u GM  VGM (i ) 
53
4.5.7 Alternative Spectral Representations and Distortion Measures
K
d W SM ( S , S )  u E E S  E S    u (i )(i )   (i )
2
i 1
u (i )  [u s (i )  u s (i )] / 2



u GM
u LM
where u s (i )  


)
i
(
V

u
)
i
(
V

u
GM
LM
  GM
 LM

VLM (i ) and VGM (i ) are the log spectral difference s (in dB) between
the spectral magnitude at the ith critical band and its nearest local
maximum (LM), and the global maximum (GM) spectral peaks, respective ly.
The coefficien ts u LM and u GM are used to balance the contributi ons due to
the local and the global spectral characteri stics and to prevent singularit y
in u s (i ) and u s (i ).
54
4.5.7 Alternative Spectral Representations and Distortion Measures
55
Summary of Spectral Distortion Measures
Distortion
Measure
L p Metric
Truncated Cepstral
Distance
Notation
dp
Expression
Computation
p d 



log
s
(

)

log
s
(

)



2 
1
d ( L)
2FFTs, log s, int egral
2
L*,
2

w
(
n
)(
c

c
)

n
n
L*,
L
2
c
p
 (cn  cn )
n 1
L
Weighted (Liftered)
Cepstral Distance)
d
2
cW
n 1
56
Summary of Spectral Distortion Measures
Distortion Measure
Notation
Itakura  Saito
Distortion
Itakura
Distortion
Expression
 2
S ( ) d
 S ( ) 2  log   2  1
2
 p2  A d
 p2
 log 2  1
2 
2


A p 2
p*,
  A2

d 

log  

2

2

Ap




at Rpa
log
2
p*,

d IS
dI


Likelihood Ratio
Distortion
Computation


d LR
A
Ap
2
2
p
d
1
2
at Rp a

2
p
1
p*,
57
Summary of Spectral Distortion Measures
Distortion Measure
Notation
Expression
Computation
dCOSH

S ( )  d
cosh
log
  S ( )  2  1
1
d IS (s, s )  d IS (s , s)
2
2 p*,




r
(
n
)
r
(
n
)

 2  (cn  cn )

2
 
n 1  


L*,

COSH Distance
Weighted Likelihood
Ratio Distortion
Wighted Slope
Metric
dW LR
L
dW SM
K
u E E S  E S    u (i )(i )   (i )
i 1
2
K *,
58
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
A first-order differential (log) spectrum is defined by:
 cn (t )  jn
 log S ( , t )
 
e ,
t
t
n  

59
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
Fitting the cepstral trajectory
by a second order polynomial,
Choose h1, h2, h3 such that
E is minimized.
E
M
2 2
[
c
(
t
)

(
h

h
t

h
t

1
2
3 )]
t  M
M
Differentiating E with respect
to h1, h2, and h3 and setting
to zero results in 3 equations:
2
[
c
(
t
)

h

h
t

h
t

1
2
3 ]0
t  M
M
2
2
[
c
(
t
)
t

h
t

h
t

h
t

1
2
3 ]0
t  M
M
2
2
3
4
[
c
(
t
)
t

h
t

h
t

h
t

1
2
3 ]0
t  M
60
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
The solutions to these equations are:
M
 tc(t )
h2 
h3 
t  M
TM
TM
M
M
t  M
t  M
M
4
2
c
(
t
)

(
2
M

1
)
r

 c(t )
TM2  (2 M  1)  t
t  M
1  M

h1 
c
(
t
)

h
T

3 M ,

2 M  1 t   M

TM 
M
t
t  M
2
.
61
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
62
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
The first and second time derivatives of cn can be obtained by differentiating
the fitting curve, giving
 cn (  t )
t
t 0
 h2 
M
 t c (  t ) T
t  M
n
M
 cn (  t )
t  0  2h3
2
t
  M

 M 2

2TM   c(  t )  (2M  1)   t c(  t ) 
t   M

t   M



M

2
4
TM  (2M  1)   t 
t   M 
2
63
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
A differential spectral distance:
d 22 (1)  



 log S ( , t )  log S ( , t ) d

  (
t
t
2 n  
2
2
(1)
n
   (n1) ) ,
A second differential spectral distance:
d
2
2
( 2)

where 


(1)
n
2

 log S ( , t )  log S ( , t ) d

  (
2
2
2 n  
t
t
2
2
c n (  t )

and 
t
t 0
( 2)
n
( 2)
n
   (n2) ) 2
 2 c n (  t )

t 2
t 0
64
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
Combining the first and second differential spectral distances with the
Cepstral distance results in:
d
2
2
  1d   2 d
2
2
2
2 (1)
  3d
2
2 ( 2 )
,
usually  1   2   3  1
Cepstral weighting or liftering by differentiating

 cn (t )  jn
2
[log S ( , t )]    jn
e
t 
t
n  


(1)
 jn

jn

(
t
)
e
.

n
n  
65
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
A weighted differential cepstral distance:
d

2
2 w




 n (
2
n  
2
 log S ( , t )  log S ( , t ) d

t 
t 
2
2
(1)
n
2
(1) 2

n ) .
66
4.6 INCORPORATION OF SPECTRAL DYNAMIC
FEATURES INTO THE DISTORTION MEASURE
Other operators can be added to produce a combined representation
Of the spectrum and the differential spectra. As an example:

 
2 

 log S ( , t )    jn [c n (t )  n(1) (t )]e  jn

n  
   t 
Taking the L2 distance


d 22w 


 1   2  2 

 log S ( , t )  log S ( , t 

   t 


(1)
2
(1)


n

c
(
t
)


c
(
t
)



(
t
)



(t )
 1n
1 n
2 n
2 n
n  
2
d
2

2

  2

2
2
    n [c n (t )  c n (t )]    2   n 2 [ n(1) (t )   n (1) (t )] 2 
n  

n  

2
1
 21 2
 d
2
1

n
2
n  
2
2W
[c n (t )  c n (t )] [ n(1) (t )   n (1) (t )]
 d
2
2
2
2W
 21 2

n
n  
2
[c n (t )  c n (t )] [ n(1) (t )   n (1) (t )].
67
Download