Distortion Measures

advertisement
Topics covered in this chapter
– Three basic problems in pattern comparison
• How to detect the speech signal in a recording
interval (i.e. separate speech from background)
• How to locally compare spectra from two speech
utterances (local spectral distortion measure),
and
• How to globally align and normalize the distance
between two speech patterns (sequences of
spectral vectors) which may or may not
represent the same linguistic sequence of
sounds (word, phrase, sentence, etc.)
1
Distortion Measures
• Mathematical considerations to find out the
dissimilarity between two feature vectors.
• Let x and y are two vectors defined on a
vector space X.
• A metric or distance function d on the vector
space X as a real valued function on the
Cartesian product XX is defined as ……
2
Distortion Measures
a) 0  d ( x, y )   for x,y  X and d(x, y)  0 iff x  y
(posit ivedefinit ness propert y)
b) d(x, y)  d(y,x) for x,y  X (symmet ry)
c) d(x, y)  d ( x, z )  d(y,z) for x,y  X (t riangular
inequalit ycondit ion)
in addit ion the dist ort ionfunct ionis called invariantif
d) d(x  z, y  z)  d(x, y)
3
Distortion Measures
• If a measure of a distance d, satisfies only
the positive definiteness property then it is
called as distortion measure if vectors are
representation of the speech spectra.
• Distance in speech recognition means measure
of dissimilarity.
• For speech processing, an important
consideration in choosing a measure of
distance is its subjective meaningfulness
• The mathematical measure of distance to be
useful in speech processing should consider
the lingustic characteristics.
4
Distortion Measures
For example a large difference in the
waveform error does not always imply
large subjective differences.
5
Distortion Measures
• Perceptual considerations: the choice of an
appropriate measure of spectral dissimilarity
is the concept of subjective judgment of
sound difference or phonetic relevance.
• Spectral changes that keep the sound the
same perceptually should be associated with
small distances.
• And spectral changes that keep the sound the
different perceptually should be associated
with large distances
6
Distortion Measures
• Consider comparing two spectral
representations, S(w) and S’(w) using a
distance measure d(S,S’)
• If the spectral content of two signal are
phonetically same (same sound) then the
distance measure d is ideally very small
7
Distortion Measures
• Spectral changes due to large phonetic
distance include
– Significant differences in formant
locations. i.e the spectral resonance of S(w)
and S’(w) occure at very different
frequencies.
– Significant differences in formant
bandwidths. i.e the frequency widths of
spectral resonance of S(w) and S’(w) are
very different.
For each of these cases sounds are
different so the spectral distance measure
8
d(S,S’) is ideally very large
Distortion Measures
To relate a physical measure of difference to
subjective perceived measure of difference it
is important to understand auditory
sensitivity to changes in frequencies,
bandwidths of the speech spectrum, signal
sensitivity and fundamental frequency.
9
Distortion Measures
• This sensitivity is presented in the form
of just discriminable change – the
change in a physical parameter such
that the auditory system can reliably
detect the change as measured in
standard listening test.
10
Spectral-distortion measures
• Measuring the difference between two
speech patterns in terms of average spectral
distortion is reasonable way both in terms of
its mathematically tractability and its
computational efficiency
• Perceived sound differences can be
interpreted in terms of differences of
spectral features
11
Log spectral distance
• Consider two spectra S(w) and S’(w). The
difference between two spectra on a log
magnitude versus frequency scale is
defined by
V( )  logS( ) - logS' ( ) - - - - - - - - - -1
• A distance or distortion measure between
S and S’ can be defined by

d
d (S, S' )  (d p )   V ( )
2
2

p
P
12

d
d (S, S' )  (d p )   V ( S , S ' )
2
2

p
P
This is related to how humans perceive
sound differences
13
Log spectral distance
• For P=1 the above equation defines the mean
absolute log spectral distortion
• For P=2, equation defines the rms log spectral
distortion that has application in many speech
processing systems
• For P tends to infinity, equation reduces to
the peak log spectral distrotion
14
Cepstral distances
• For the Cepstral coefficients we use the
rms log spectral distance.
S(t )  h(t )* x(t ) vocalt ractcomponent sand excit ement
S(  )  H ( ) X ( ) in t hefrequencydomain
T akinglog of S(  )
log S ( )  log H ( )  log X ( )
T hepower spect rum
log | S(  ) |2 

 jn
c
e
 n          1
n  
16
Cepstral distances
T heCepstralcoefficients can be obtainedfrom the
LP Ccoefficients

2
d
d ceps ( S , S ' )   log S ( )  log S ' ( )
3
2

2


2
(
c

c
'
)
 n n where cn and cn' are cepstral
n  
coefficients of S(  ) and S' ( ) respectively.
17
Weighted cepstral distances and
liftering
• Liftering makes the system more robust to
noise,
• Liftering is done to obtain the equal variance
• Liftering is significant for the improvement
for the recognition performance
• If we incorporate n2 factor into the cepstral
distance to normalize the contribution from
each cepstarl term, the distance
d
2
2w


 n (c
2
n  
n
c ) 
' 2
n

 (nc
n  
n
 nc )        2
' 2
n
20
24
Weighted cepstral distances and
liftering
• The original sharp spectral peaks are highly
sensitive to the LPC analysis condition and the
resulting peakiness creates unnecessary
sensitivity in spectral comparison
• The liftering process tends to reduce the
sensitivity without altering the fundamental
“formant” structure.
• i.e the undesirable (noiselike) components of
the LPC spectrum are reduced or removed,
while essential characteristics of the
“formant” structure are retained
25
Weighted cepstral distances and
liftering
• A useful form of weighted cepstral distance
is
L
d
2
cw
  ( w(n)cn  w(n)c )
n 1
' 2
n
• Where w(n) is any lifter function.
26
Itakura and Saito
• The log spectral difference V(w) is defined
by V(w) = log S(w) – log S’(w) is the basis of
many distortion measures
• The distortion measure proposed by
Itakura and Saito in their formulation of
linear prediction as an approximate
maximum likelihood estimation is
27
Itakura and Saito

d IS ( S , S ' ) 
 e


dw
 V ( w)  1
2

2
V ( w)
S ( w) dw

d IS ( S , S ' )  
 log
1
2
S ' ( w) 2
 '

where  and  ' are predictionerrorsof S(w)
2
2
and S' (w) respectively.
where


dw 
2
   exp  log S(w)

2 
-
28
Itakura and Saito
• The Itakura Satio distortion measure can be
used to illustrate the spectral matching
properties by replacing S’(w) with the pth order
all pole spectrum


2


 1
d Is  S ,
 2
2

 A e jw  


where  2 is t hegain
 

 S (w) Ae 

jw 2

 S (w) Ae 

jw 2
dw
 log 2   log 2  1
2
dw
  , where is t heresidual energy
2
29
Itakura
let us consider  
then theItakuradistortionmeasureis
2


dI 


2
jw



 A(e )
1
1 
dw

,
 log 

2
2 
2
jw
2 
Ap
A 
 Ap (e )



30
Likelihood Distortions
• The role of the gain terms is not explicit in
the Itakura distortion because the signal
level essentially makes no difference in the
human understanding of speech so long as
it is unambiguously heard.
• Gain independent distortion measure called
likelihood ration distortion can be derived
directly from IS distortion measure




1 
1 
 1
 1
dI 
,

d
,
LR 
2
2 
2
2 
 Ap

 Ap

A
A




31
Likelihood Distortions
When the distortion is very small the Itakura
distortion measure is not very different from
the likelihood distortion measure.
32
Variations of likelihood
distortions
• Compare to the cepstral distance likelihood
distortions are asymmetric.
• To symmetries the distortion measure there
are two methods
– COSH distortion
– Weighted likelihood distortion
33
COSH distortion
• COSH distortion is given by

d COSH

S ( w)  dw
  coshlog
1

 S ' ( w)  2

• The COSH distortion is almost identical to
twice the log spectral distance for small
distortions
34
Weighted likelihood ratio
distortion
The purpose of weighting is to take the
spectral shape into account as a weighting
function such that different spectral
components along frequency axis can be
emphasized or de-emphasized to reflect some
of the observed perceptual effects
35
Weighted likelihood ratio
distortion
 rˆ(n) rˆ' (n) 
dWLR    2  2  cn  c'n 
' 

where c n and c'n are cepstralcoefficient s of
log
1
A
2
and log
1
A'
2
and rˆ(n) and rˆ' (n) are autocorrelationsequences
for
2
A
2
and
 '2
A'
2
respectively
36
Comparison of dWLR and d22
2
2
1
In d (log
A
2
- log
1
A'
2
)
in d WLR thisis replacedby linear deviation
(
1
A
2
-
1
A'
2
) which shows heavieremphasis
in spectralpeak areas than thecompresseddeviation
1
1
(log 2 - log 2 )
A
A'
T hispropertyis required in theapplications where
extraordinary emphasisof spectralpeaksis necessary,
such as speech recognition in noisy environment
37
Weighted slope metric distortion
measure
Based on a series of experiments designed to
measure the subjective “phonetic” distance
between pairs of synthetic vowels and
fricatives, it is found that by controlled
variation of several acoustic parameters and
spectral distortions including formant
frequency, formant amplitude, spectral tilt,
highpass, lowpass, and notch filtering only
formant frequency deviation was phonetically
relevant
38
Weighted slope metric distortion
measure
WSM attach a weight on the spectral slope
difference near spectral peaks, rather than
the spectral amplitude difference, and take
the overall energy difference explicitly into
consideration
K
dWSM ( S , S ' )  u E Es  Es '   u (i ) s (i )   s ' (i )
2
i 1
where u E is t he weight ingconst antfor absolut e energy
Es  Es ' bet ween S and S' , u(i) is t he weight ing
coefficient s for a crit icalband spect ralslope difference
 s (i )   s ' (i ), bet ween S and S' and K is t he t ot alno.of
crit icalbands considerd
S
39
Summary
• The spectral distortion measures are
designed to measure dissimilarity or distance
between two (power) spectra of speech
• Many of these dissimilarity measures are not
metrics because they do not satisfy the
symmetry property
• If an objective speech distortion measure
needs to reflect the subjective reality of
human perception of sound differences, or
even phonetic disparity, the asymmetry seems
to be actual desirable.
40
S
Summary
• All distortion measures are equally important
because certain distortion measures may be
better for an less noisy environment, while
others may be robust when the background is
more noisy.
41
Summary
• Log spectral: Lp metric requires large
amount of calculations because we need 2
FFT’s to obtain S(w) and S’(w), logarithms
of all values of S and S’ and an integral

P d 

d p    log( S ( w)  log S ' ( w)

2


 
1/ p
42
Summary
• Truncated and weighted cepstral: Requires
only L operations where L is of the order of
12-16 hence calculations required are less
compared to Lp metric
L
d ( L )   ( cn  cn )
2
c
n 1
L
d
2
CW
' 2
  W ( n ) ( cn  cn )
' 2
n 1
43
Summary
• The likelihood, Itakura-Saito, Itakura and
COSH measurements: all requires on the
order of p is the LPC order of all pole
polynomial (8-12). Hence the computations are
same for cepstral measures
44
Summary

S ( w) dw

d IS ( S , S ' )  
 log
1
2
S ' ( w) 2
 '

2
 A2

dw

d I  log 

2
 Ap 2 

d LR 
 A

d COSH
A
2
2
p
dw
1
2

S ( w)  dw
  coshlog
1

S ' ( w)  2

45
Summary
• Weighted likelihood ratio distortion: Requires
L operations, similar to that of the cepstral
measures
 rˆ(n) rˆ' (n) 
' 2
   2  2 (cn  cn )
' 
n 1  
L
dWLR
46
Summary
• Weighted Slope metric (WSM): Requires K
operations, where K is the number of
frequency bands used in computations (3264)
K
dWSM ( S , S ' )  u E Es  Es '   u (i) s (i)   s ' (i)
2
i 1
47
Summary
• From all these points we can say that all the
measures are both physically reasonable and
computationally tractable for speech
recognition except for the Lp metrics.
• Hence, practically we are going to use all the
measures to study the speech recognition
system
48
Download