PATTERN COMPARISON TECHNIQUES Test Pattern: T {t1 , t2 , t3 ,..., ti }, Reference Pattern: R {t , t ,..., t }. j j 1 j 2 j J 1 4.2 SPEECH (ENDPIONT) DETECTION 2 4.3 DISTORTION MEASURESMATHEMATICAL CONSIDERATIONS x and y: two feature vectors defined on a vector space X The properties of metric or distance function d: ( a ) 0 d ( x, y ) for x, y d ( x, y ) 0 if and only if x y ; (b) d ( x, y ) d ( y, x) for x, y ; (c) d ( x, y ) d ( x, z ) d ( y, z ) for x, y, z . A distance function is called invariant if (d ) d ( x z , y z ) d ( x, y ). 3 PERCEPTUAL CONSIDERATIONS Spectral changes that do not fundamentally change the perceived sound include: 4 PERCEPTUAL CONSIDERATIONS Spectral changes that lead to phonetically different sounds include: 5 PERCEPTUAL CONSIDERATIONS Just-discriminable change: known as JND (just-noticeable difference), DL (difference limen), or differential threshold 6 4.4 DISTORTION MEASURESPERCEPTUAL CONSIDERATIONS 7 4.4 DISTORTION MEASURESPERCEPTUAL CONSIDERATIONS 8 Spectral Distortion Measures Spectral Density Fourier Coefficients of Spectral Density Autocorrelation Function 9 Spectral Distortion Measures Short-term autocorrelation Then S ( ) is an energy spectral density 10 Spectral Distortion Measures Autocorrelation matrices 11 Spectral Distortion Measures If σ/A(z) is the all-pole model for the speech spectrum, The residual energy resulting from “inverse filtering” the input signal with an all-zero filter A(z) is: 12 Spectral Distortion Measures Important properties of all-pole modeling: The recursive minimization relationship: 13 LOG SPECTRAL DISTANCE 14 LOG SPECTRAL DISTANCE 15 CEPSTRAL DISTANCES The complex cepstrum of a signal is defined as The Fourier transform of log of the signal spectrum. The Fourier series representa tion of log(S( )) can be expressed as : c e log S ( ) n jn n where c n c n are real and referred to as the cepstral coefficien ts. Note that : c0 log S ( ) d 2 For a pair of spectra, by applying Parseval' s theorem, we can relate the L 2 cepstral distance of the spectra to the rms log spectral distance d 22 | log S ( ) log S ( ) | 2 (c n n d 2 c n ) 2 16 CEPSTRAL DISTANCES Laurent expansion : log[ / A( z )] log c n z n n 1 Differenti ating both sides of the equation w ith respect to z -1 and equating the coefficien ts of like powers of z -1 , we derive : 1 n 1 c n a n kck a n k for n 0 where a 0 1 and a k 0 for k p. n k 1 In terms of the log power spectrum, the Taylor series expansion becomes : j log[ / | A(e ) | ] 2 2 j n 2 c e where c log and c n c n n 0 n L Truncated cepstral distance d c2 ( L) (c n c n ) 2 n 1 17 CEPSTRAL DISTANCES 18 CEPSTRAL DISTANCES 19 Weighted Cepstral Distances and Liftering It can be shown that under certain regular conditions, the cepstral coefficients, except c0, have: 1) Zero means 2) Variances essentially inversed proportional to the square of the coefficient index: 1 Ec 2 n 2 n If we normalize the cepstral distance by the variance inverse: 20 Weighted Cepstral Distances and Liftering Differentiating both sides of the Fourier series equation of spectrum: This is an L2 distance based upon the differences between the spectral slopes 21 Cepstral Weighting or Liftering Procedure h is usually chosen as L/2 and L is typically 10 to 16 22 A useful form of weighted cepstral distance: 23 Likelihood Distortions Previously defined: Itakura-Saito distortion measure Where 2 of 2 are one-step prediction errors S ( ) and S ( ) as defined: and 24 25 Likelihood Distortions The residual energy can be easily evaluated by: 26 Likelihood Distortions By replacing S ( ) by its optimal p-th order LPC model spectrum: If we set σ2 to match the residual energy α : Which is often referred to as Itakura distortion measure 27 Likelihood Distortions Another way to write the Itakura distortion measure is: Another gain-independent distortion measure is called the Likelihood Ratio distortion: 28 4.5.4 Likelihood Distortions 1 1 1 1 d IS d LR , , | A |2 | A |2 | A |2 | A |2 p p j 2 | A(e ) | d 1 j 2 | A (e ) | 2 p t a Rpa 2 p 1. 29 4.5.4 Likelihood Distortions 1 1 2 u exp(log u ) 1 log u (log u ) (log u ) 2 ..., 2! 3! and / p2 1, d I (1 / | Ap | ,1 / | A | ) log 2 p 2 2 2 1, p for 2 1, p 1 1 . d LR , | A |2 | A |2 p That is, when the distortion is small, the Itakura distortion measure is not very different from the LR distortion measure 30 4.5.4 Likelihood Distortions d IS (s, s) d IS (s, s) 31 4.5.4 Likelihood Distortions Consider the Itakura-Saito distortion between the input and output of a linear system H(z) S ( ) X (n) A( z ) H ( z) B( z ) S ( ) X (n) 32 4.5.4 Likelihood Distortions j 2 S ( ) H (e ) S ( ). j V ( ) log H (e ) 2 2 1 d j d IS ( S , S ) log H (e ) 1 . 2 j 2 H ( e ) A( z ) H ( z) , B( z ) 33 4.5.4 Likelihood Distortions p1 A( z ) 1 ai z 1 i 1 p2 B( z ) 1 ai z 1 i 1 d IS ( S , S ) j 2 ) d B (e d d 1 1 2 2 2 H (e j ) 2 A(e j ) 2 1 1 1 d IS 2 , 2 A B 34 4.5.5 Variations of Likelihood Distortions Symmetric distortion measures: 1 m m ( m) d x ( s, s) d IS ( s, s) d IS (s, s) 2 1 35 m . 4.5.5 Variations of Likelihood Distortions COSH distortion m 1, 1 d ( s, s ) d IS ( s, s ) d IS ( s , s ). 2 (1) x d 1 V ( ) V ( ) V ( ) 1 V ( ) 1 e d ( s, s ) e 2 2 d d COSH ( s, s ). cosh[V ( )] 1 2 V2 V4 . cosh V 1 4! 2! 1 2 d COSH ( s, s ) d 2 ( s, s ). so : 2 (1) x 36 4.5.5 Variations of Likelihood Distortions 37 4.5.6 Spectral Distortion Using a Warped Frequency Scale Psychophysical studies have shown that human perception of the frequency Content of sounds does not follow a linear scale. This research has led to the idea of defining subjective pitch of pure tones. For each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the “mel” scale. As a reference point, the pitch of a 1 kHz tone, 40 dB above the perceptual hearing threshold, is defined as 1000 mels. 38 39 4.5.6 Spectral Distortion Using a Warped Frequency Scale 40 4.5.6 Spectral Distortion Using a Warped Frequency Scale 41 4.5.6 Spectral Distortion Using a Warped Frequency Scale 42 Critical Critical Center Frequency Bank Bank (Hz ) Number (Hz ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 50 150 250 350 450 570 700 840 1,000 1,170 1,370 1,600 1,850 2,150 2,500 2,900 3, 400 4,000 4,800 5,800 7,000 8,500 10,500 13,500 100 100 100 110 120 140 150 160 190 210 240 280 320 380 450 550 700 900 1,100 1,300 1,800 2,500 3,500 LowerCutoff UpperCutoff Frequency (Hz ) 100 200 300 400 510 630 770 920 1,080 1,270 1,840 1,720 2,000 2,320 2,700 3,150 3,700 4, 400 5,300 6, 400 7,700 9,500 12,000 Frequency (Hz ) 100 200 300 400 510 630 770 920 1,080 1, 270 1,840 Examples of Critical bandwidth 1,720 2,000 2,320 2,700 3,150 3,700 4, 400 5,300 6, 400 7,700 9,500 12,000 15,500 43 Warped cepstral distance ~2 db d 2 (s, s) | log S ( (b)) log S ( (b)) | , B 2 B b is the frequency in Barks, S(θ(b)) is the spectrum on a B 2 Bark scale, and B is the Nyquist frequency in Barks. ~2 B j ( b )( i k ) 2 i i k k B i k 1 d 2B (c c)(c (c c)(c i k i i k c ) e db ck ) wik , 44 4.5.6 Spectral Distortion Using a Warped Frequency Scale Where the warping function is defined by B wik e j ( b )( i k ) B ~2 d c ( L) L db . 2B L ( c c ) ( c c i i k k )wik . i L k L 45 4.5.6 Spectral Distortion Using a Warped Frequency Scale 1000 b (b) 1 (b) tan for | b | 6 3333 0.76 13 ( b 8.776) / 10 (b) 2 (b) for | b | 13 (1000) 10 3333 1 (b) [1 (b) 2 (b)] for 6 | b | 13. 2 46 4.5.6 Spectral Distortion Using a Warped Frequency Scale 47 4.5.6 Spectral Distortion Using a Warped Frequency Scale 48 4.5.6 Spectral Distortion Using a Warped Frequency Scale Mel-frequency cepstrum: 1 c n (log S k ) cos n k , 2 K k 1 K ~ ~ Sk , ~ n 1,2, . . . , L k 1,..., K is the output power of the triangular filters L ~ ~ 2 d ( L ) ( cn c n ) 2 ~c Mel-frequency cepstral distance n 1 49 4.5.7 Alternative Spectral Representations and Distortion Measures 50 4.5.7 Alternative Spectral Representations and Distortion Measures Ai 1 1 ki gi , Ai 1 ki i 1,2, . . . , p Ai 1 1 ki log g i log log , Ai 1 ki P( z ) A( z ) z ( p 1) Q( z ) A( z ) z ( p 1) 1 A( z ) 1 A( z ). 51 4.5.7 Alternative Spectral Representations and Distortion Measures 52 Summary of Spectral Distortion Measures Distortion Measure L p Metric Truncated Cepstral Dis tan ce Notation dp 2 c d ( L) (c n 1 n cn ) L d 2 cW Computation p d log s ( ) log s ( ) 2 L Weighted ( Liftered ) Cepstral Dis tan ce) Expression w(n)(c n 1 n 1 2 cn ) 2 p 2 FFTs, log s, int egral L*, L*, 53 Summary of Spectral Distortion Measures Distortion Measure Itakura Saito Distortion Itakura Distortion Notation Expression Computation d IS S ( ) d 2 S ( ) 2 log 2 1 2 p2 A d p2 log 2 1 2 2 Ap 2 p*, A2 d log 2 2 Ap at Rp a log 2 p*, dI p Likelihood Ratio Distortion d LR A Ap 2 2 d 1 2 t a Rpa 2 p 1 p*, 54 Summary of Spectral Distortion Measures Distortion Measure COSH Dis tan ce Weighted Likelihood Ratio Distortion Wighted Slope Metric Notation Expression Computation dCOSH S ( ) d cosh log 1 S ( ) 2 1 d IS (s, s) d IS (s, s) 2 2 p*, r ( n ) r ( n ) 2 (cn cn ) 2 n 1 L*, dW LR L dW SM K u E ES ES u (i )(i ) (i ) i 1 2 K *, 55 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE cn (t ) jn log S ( , t ) e , t t n 56 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE Fitting the cepstral trajectory by a second order polynomial, Choose h1, h2, h3 such that E is minimized. E M 2 2 [ c ( t ) ( h h t h t 1 2 3 )] t M M Differentiating E with respect to h1, h2, and h3 and setting to zero results in 3 equations: 2 [ c ( t ) h h t h t 1 2 3 ]0 t M M 2 2 [ c ( t ) t h t h t h t 1 2 3 ]0 t M M 2 2 3 4 [ c ( t ) t h t h t h t 1 2 3 ]0 t M 57 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE The solutions to these equations are: M h2 tc(t ) t M TM h3 1 Tm M M t M t M M 4 2 c ( t ) ( 2 M 1 ) r c(t ) TM2 (2 M 1) t t M 1 M h1 c ( t ) h T 3 M , 2 M 1 t M TM M 2 t . t M 58 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE 59 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE cn ( t ) t t 0 h2 M t c ( t ) T t M n M cn ( t ) t 0 2h3 2 t M M 2 2TM c( t ) (2M 1) t c( t ) t M t M M 2 4 TM (2M 1) t t M 2 60 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE A differential spectral distance: d 22 (1) log S ( , t ) log S ( , t ) d ( t t 2 n 2 2 (1) n (n1) ) , A second differential spectral distance: d 22 (1) 2 log S ( , t ) log S ( , t ) d ( t t 2 n 2 2 ( 2) n (n2 ) ) 2 . 61 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE d 2 2 d d 2 1 2 2 2 2 (1) d 2 3 2 ( 2 ) , Cepstral weighting or liftering by differentiating cn (t ) jn [log S ( , t )] jn e t t n 2 jn n (1) n (t )e jn . 62 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE A weighted differential cepstral distance: d 2 2 w n ( 2 n 2 log S ( , t ) log S ( , t ) d t t 2 2 (1) n 2 n ) . (1) 2 63 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE Taking the L2 distance as an example: d 2 2 w 2 1 2 d log S ( , t ) log S ( , t 2 t 2 n c (t ) c (t ) 2 n 1 n (1) 2 n 1 n (t ) 2 n (t ) (1) 2 2 2 2 2 2 (1) (1) 2 1 n [cn (t ) cn (t )] 2 n [ n (t ) n (t )] n n 21 2 n 2 [cn (t ) cn (t )] [ n(1) (t ) n(1) (t )] n 12 d 22W 22 d 22W 21 2 n 2 [cn (t ) cn (t )] [ n(1) (t ) n(1) (t )]. 64 n