Speech Recognition Chapter 3 Signal Processing Front End Convert the speech waveform in some type of parametric representation. sk Signal Processing Front End O=o(1)o(2)..o(T) Filterbank Linear Prediction Analysis Parametric Representation: Zero crossing rate, Short time Energy, Short time spectral envelope, etc. Filterbank Introduction Filterbank Front End Uniform Filterbank Design No-uniform Filterbank Design Implementation xi (n) s(n) hi (n) L 1 hi (m) s(n m) m 0 X i (z) S(z) Hi (z) s(n) hi (n) Filter-Bank Front End si (n) s(n) hi (n) L 1 hi (m) s(n m) m 0 Shifts the band signal spectrum to low-frequency band and creates high frequencies images. (half-wave or full wave rectifiers) Retrains the DC component and Eliminates the high-frequencies images created by the nonlinearity. (20-30 Hz) Each filterbank gives a measurement of the Energy of the speech in each band. Reduce data (40-60 Hz) Log, m-law Original Signal 500 Hz concentration Peaks at 500,1000, 1500,... images DC concentration Uniform FB Design Filter Bandwidth (no overlapping) Central F f i s N i , 1 i Q Frequencies Number of filters Q N N is the number of uniformly spaced filters required to span the frequency range of speech. 2 No-uniform Filterbank Design Logarithmic Frequency scale Critical Band Scale (Fig. 3.9) – Mel Scale – Bark Scale Logarithm Frequency Scale FB Design For Q bandpass filters, fi central frequencies and bandwidth bi: b1 C bi bi 1 , Logarithmic grow factor. Usually 2. arbitrary bandwidth of the first filter 2i Q (bi b1 ) f i f1 b j 2 j 1 i 1 arbitrary central frequency of the first filter C 200Hz; C 50Hz; f 1 300Hz; f 1 225Hz; = 2; Q 4; = 1.33 Q 12; Implementation of FBs Basics on filter implementation Filterbanks Implementation Spectral Analysis Basics Filterbank Implementation using STFT Basics on Filter Implementation Infinite Impulse Response (IIR) Filters Finite Impulse Response (FIR) Filters Filterbanks in Speech Recognition – 8<Q<32 – Practical systems use no-uniform spaced filterbanks to characterised the speech spectrum in a manner considered more consistent with human perception. Infinite Impulse Response (IIR) Filters IIR Filter Design IIR Filter Implementation IIR Filter Design From Analog Filters – Impulse Invariance – Analog-to-Digital Transformation – Filter Design Computer Aided Design Impulse Invariance Chooses the unit-step response of the digital filter as equally spaced samples of the impulse response of the analog filter h(n) ha (nT ) where T is the sampling period. In this case the design procedure is: – Calculate the partial fraction expansion of Ha ( s) N Ak Ha ( s) k 1 s sk – Calculate N Ak H ( z) sk T 1 1 e z k 1 Filter Design Butterworth Filter Chevyshev Filter Elliptic Filter Butterworth Filter Introduction Design Example Introduction Properties: – The magnitude is maximally flat in the passband. – The approximation is monotonic in the passband and the stopband. The square magnitude of the filter is of the form: j 2 Ha (e ) 1 1 j / j c 2N The roots of the denominator polynomial are then 1 s p (1) 2 N jc Thus there are 2N poles equally spaced in an angle on a circule of radio c in the s-plane The poles are symmetrically located with respect to the imaginary axis. The angular spacing between poles is p/N radians. s-plane Design Example Lets assume that we require a a filter such that the passband magnitude is constant within 1dB for frequencies below 0.2p and the stopband attenuation is greater than 15dB for frequencies between 0.3p and p. Then if the passband is normalised to 1 at w=0, then we require: 20 log10 H (e j 2p ) 1 20 log10 H (e j 2p ) 15 Analog Butterworh filter 20 log 10 H a (e j 2p ) 1 20 log10 H a (e j 2p ) 15 Calculating N and c 1 2 1 20 log10 1 1 j / j 2 N c 1 20 log10 1 j / j 2 N c 1 2 15 1 j / jc 100.1 2N N 6 1 j / j c c 0.7032 2N 101.5 For this values the poles of the Butterworth filter are: 01820 . j 0.6792 0.4972 j 0.4972 0.6792 j 01820 . Thus the Butterworth filter is as follows: 0.12093 H a ( s) 2 s 0.3640s 0.4945 s 2 0.9945s 0.4945 s 2 1.3585s 0.4945 Expressing this equation as a partial-fraction expansion and performing the transformation: 0.2871 0.4466z 1 2.1428 11454 . z 1 18558 . 0.6304 z 1 H z 1 12971 . z 1 0.6949 z 2 1 10691 . z 1 0.3699 z 2 1 0.9972 z 1 0.2570z 2 IIR Filter Implementation M H ( z) b z n0 N n n 1 an z n n 1 N 8; M 8 Direct form (4.3.1, Oppenheim & Schafer) Cascade (4.3.3, Oppenheim & Schafer) Parallel (4.3.2, Oppenheim & Schafer) Direct form I (first the zeros) y( k ) M N b x ( k n) a n0 n n 1 n z b0 1 z 1 b1 a1 y ( k n) a N 1 z 1 z 1 bM Direct form I (first the poles) N M n 1 n0 y( k ) a n y( k n) bn x( k n) aN z a1 1 z b0 1 b1 a N 1 z 1 z 1 Direct form II aN bM N c ( k ) x ( k ) a n c( k n ) n 1 z 1 a1 b0 b1 M y( k ) b c( k n ) n0 n (saves memory) a N 1 z 1 aN bM Cascade Form M b z n N H ( z) 1 n N 1 n a z n n 1 1n z 1 2 n z 2 A 1 2 1 z z n 1 1n 2n 2 n Implanting every second order sub-system as direct form II x ( n) A z 1 11 11 z 1 21 z 1 b0 21 1, N 1 2 z 1 2, N 1 2 b0 1, 2, N 1 2 N 1 2 y ( n) Parallel Form M H ( z) b z n N 1 n N 1 on 1 1n z 1 Ck z 1 2 no n 1 1 1n z 2 n z MN n n C 01 z 1 n a z Assuming M=N n 11 2 11 z 1 n 21 x ( n) Implanting every second order sub-system as direct form II 0, N 1 2 z 1 1, N 1 2 z 1 2, N 1 2 1, N 1 2 y (n Finite Impulse Response N 1 H ( z ) h( n) z n n 0 N 64 Direct form (4.5.1, Oppenheim & Schafer) Cascade (4.5.2, Oppenheim & Schafer) Direct Form y( k ) N 1 h( n) x ( k n) n0 z 1 z 1 h0 h1 z 1 z 1 hN 2 hN 2 x ( n) h2 hN 1 y ( n) Cascade Form N H ( z ) hn z n0 N n 0n 1n z 1 2 n z 2 2 n 1 Implanting every second order sub-system as direct form II 11 x ( n) z 1 b0 11 01, z 1 z 1 z 1 21 N 2 b0 1, N 2 2, N 2 y ( n) Filterbank Implementation For FIR filters for i 1,2,, Q N 1 si ( k ) h (n) s( k n) n0 i Advantage: It is simply. Linear phase when carefully design. Disadvantage: Since N at least 64, high computational requirements. NQ NQ multiplications additions For IIR filters for i 1,2,Q N M xi ( k ) a x ( k n) bni s( k n) n 1 i n i n0 Advantages: It is simply. Since N and M around 8, Low computational requirement. Disadvantage: Distortion of Phase. Spectral Analysis Discrete-Time Fourier Tranform (DTFT). Short-Time Fourier Transform (STFT). Windowing effects Windows Discrete Fourier Transform (DFT) Discrete-Time Fourier Transform Discrete-Time Fourier Tranform (DTFT) Synthesis Definition S (e jT ) s e n jnT n If s n is a periodic waveform S(e jT N 1 ) sne jnT n0 DTFT Analysis Definition 2p T sn p S (e jT )e jnT d 2 0 (1) Short-Time Fourier Transform Since speech signal change with the time, we we short-time analysis. Hence the Short-Time Fourier Transform (STFT) is defined as: N 1 S'(e jT ) wn sne jnT n0 Hence, the speech signal is multiplied in time by a window wn . Windowing Effects S'(e jT ) is only an approximation to S(e jT ) S ' (e jT N 1 2p T jnT jT jnT ) wn p S (e )e de 0 2 n0 (Juang, Ex. 3.2, pp. 85 and 86) S ' (e jT 2p T ) p S (e jT )W (e j ( ) T )e jnT d 2 0 Graphical Interpretation S (e jT ) S(e jT )W(e j (1 )T ) S (e jT )W(e j (2 )T ) 1 () 2 1 ( ) 2 S '(e jT ) Windows There are not ideal windows. Side lobes only contributes to spectral distortion. Therefore, we wish (Fig. 1.4 Deller) – Main lobe wide bandwidth – Side lobes low amplitude. Hamming window is a good choice (Secc. 1.1.5 Deller). Main lobe Side lobes 101-point FIR filters For large N (Narrow Bandwidth Analysis) (Fig. 3.11-3.14 – Good spectral resolution – Bad time resolution For small N (Wide Bandwidth Analysis) – Bad spectral resolution (very smooth) – Good time resolution (quasi-stationary segments) ) Discrete Fourier Transform (DFT) The spectrum is sampled, as follow: 2pp p , p=0,1,2 ,...N-1 TN By substituting it in 1 S p S (e j p N 1 sn e j p n , 0 p N-1 n0 The analysis DFT is as follows: N 1 sn ) , the DFT is defined as follows: S pe j p n , 0 n N-1 p0 Time resolution depends on N. N samples in time corresponds to N samples in frequency Filterbank Implementation using STFT Uniform Filterbanks No-uniform Filterbanks sn (m) Window the signal Break the signal. un ( k ) sn ( Nr k ); 0 k N -1 Add the segments r Take the DFT Modulate the DFT. U n (k ) N 1 u n0 xi (n) e n ( k )e j p n j pn U n (k ) N Number of Filters