Part3

Speech Recognition Chapter 3 Signal Processing Front End Convert the speech waveform in some type of parametric representation. sk Signal Processing Front End O=o(1)o(2)..o(T) Filterbank Linear Prediction Analysis Parametric Representation: Zero crossing rate, Short time Energy, Short time spectral envelope, etc. Filterbank  Introduction  Filterbank Front End  Uniform Filterbank Design  No-uniform Filterbank Design  Implementation xi (n)  s(n)  hi (n) L 1   hi (m) s(n  m) m 0 X i (z)  S(z) Hi (z)  s(n)  hi (n) Filter-Bank Front End si (n)  s(n)  hi (n) L 1   hi (m) s(n  m) m 0 Shifts the band signal spectrum to low-frequency band and creates high frequencies images. (half-wave or full wave rectifiers) Retrains the DC component and Eliminates the high-frequencies images created by the nonlinearity. (20-30 Hz) Each filterbank gives a measurement of the Energy of the speech in each band. Reduce data (40-60 Hz) Log, m-law Original Signal 500 Hz concentration Peaks at 500,1000, 1500,... images DC concentration Uniform FB Design Filter Bandwidth (no overlapping) Central F  f i   s N  i , 1  i  Q Frequencies Number of filters Q N N is the number of uniformly spaced filters required to span the frequency range of speech. 2 No-uniform Filterbank Design Logarithmic Frequency scale  Critical Band Scale (Fig. 3.9)  – Mel Scale – Bark Scale Logarithm Frequency Scale FB Design  For Q bandpass filters, fi central frequencies and bandwidth bi: b1  C bi  bi 1 , Logarithmic grow factor. Usually 2. arbitrary bandwidth of the first filter 2i Q (bi  b1 ) f i  f1   b j  2 j 1 i 1 arbitrary central frequency of the first filter C  200Hz; C  50Hz; f 1  300Hz; f 1  225Hz;  = 2; Q  4;  = 1.33 Q  12; Implementation of FBs  Basics on filter implementation  Filterbanks Implementation  Spectral Analysis Basics  Filterbank Implementation using STFT Basics on Filter Implementation Infinite Impulse Response (IIR) Filters  Finite Impulse Response (FIR) Filters  Filterbanks in Speech Recognition  – 8<Q<32 – Practical systems use no-uniform spaced filterbanks to characterised the speech spectrum in a manner considered more consistent with human perception. Infinite Impulse Response (IIR) Filters IIR Filter Design  IIR Filter Implementation  IIR Filter Design  From Analog Filters – Impulse Invariance – Analog-to-Digital Transformation – Filter Design  Computer Aided Design Impulse Invariance Chooses the unit-step response of the digital filter as equally spaced samples of the impulse response of the analog filter h(n)  ha (nT ) where T is the sampling period.  In this case the design procedure is: – Calculate the partial fraction expansion of Ha ( s) N Ak Ha ( s)   k 1 s  sk  – Calculate N Ak H ( z)   sk T 1 1  e z k 1 Filter Design Butterworth Filter  Chevyshev Filter  Elliptic Filter  Butterworth Filter Introduction  Design Example  Introduction  Properties: – The magnitude is maximally flat in the passband. – The approximation is monotonic in the passband and the stopband.  The square magnitude of the filter is of the form: j 2 Ha (e )  1 1   j / j c  2N The roots of the denominator polynomial are then 1 s p  (1) 2 N  jc  Thus there are 2N poles equally spaced in an angle on a circule of radio  c in the s-plane The poles are symmetrically located with respect to the imaginary axis. The angular spacing between poles is p/N radians. s-plane Design Example   Lets assume that we require a a filter such that the passband magnitude is constant within 1dB for frequencies below 0.2p and the stopband attenuation is greater than 15dB for frequencies between 0.3p and p. Then if the passband is normalised to 1 at w=0, then we require: 20 log10 H (e j 2p )  1 20 log10 H (e j 2p )  15 Analog Butterworh filter 20 log 10 H a (e j 2p )  1 20 log10 H a (e j 2p )  15 Calculating N and  c 1   2     1  20 log10   1  1   j / j  2 N      c      1  20 log10   1   j / j  2 N    c  1 2     15   1   j / jc   100.1 2N N 6 1   j / j c   c  0.7032 2N  101.5 For this values the poles of the Butterworth filter are:  01820 .  j 0.6792  0.4972  j 0.4972  0.6792  j 01820 . Thus the Butterworth filter is as follows: 0.12093 H a ( s)  2 s  0.3640s  0.4945 s 2  0.9945s  0.4945 s 2  1.3585s  0.4945     Expressing this equation as a partial-fraction expansion and performing the transformation: 0.2871  0.4466z 1  2.1428  11454 . z 1 18558 .  0.6304 z 1 H  z    1  12971 . z 1  0.6949 z 2 1  10691 . z 1  0.3699 z 2 1  0.9972 z 1  0.2570z 2 IIR Filter Implementation M H ( z)  b z n0 N n n 1   an z n n 1 N  8; M  8 Direct form (4.3.1, Oppenheim & Schafer) Cascade (4.3.3, Oppenheim & Schafer) Parallel (4.3.2, Oppenheim & Schafer) Direct form I (first the zeros) y( k )  M N  b x ( k  n)   a n0 n n 1 n z b0 1 z 1 b1 a1 y ( k  n) a N 1 z 1 z 1 bM Direct form I (first the poles) N M n 1 n0 y( k )   a n y( k  n)   bn x( k  n) aN z a1 1 z b0 1 b1 a N 1 z 1 z 1 Direct form II aN bM N c ( k )  x ( k )   a n c( k  n ) n 1 z 1 a1 b0 b1 M y( k )   b c( k  n ) n0 n (saves memory) a N 1 z 1 aN bM Cascade Form M b z n  N H ( z)  1 n N 1 n a z n  n 1  1n z 1  2 n z 2  A 1 2 1   z   z n 1 1n 2n 2 n Implanting every second order sub-system as direct form II x ( n) A z 1 11 11 z 1  21 z 1 b0 21  1, N 1 2 z 1  2, N 1 2 b0  1,  2, N 1 2 N 1 2 y ( n) Parallel Form M H ( z)  b z n  N 1 n N 1  on 1   1n z 1   Ck z   1 2 no n 1 1  1n z   2 n z MN n n  C  01 z 1 n a z Assuming M=N n 11 2  11 z 1 n  21 x ( n) Implanting every second order sub-system as direct form II  0, N 1 2 z 1  1, N 1 2 z 1  2, N 1 2  1, N 1 2 y (n Finite Impulse Response N 1 H ( z )   h( n) z  n n 0 N  64 Direct form (4.5.1, Oppenheim & Schafer) Cascade (4.5.2, Oppenheim & Schafer) Direct Form y( k )  N 1  h( n) x ( k  n) n0 z 1 z 1 h0 h1 z 1 z 1 hN 2 hN 2 x ( n) h2 hN 1 y ( n) Cascade Form N H ( z )   hn z n0 N n    0n  1n z 1  2 n z  2  2 n 1 Implanting every second order sub-system as direct form II  11 x ( n) z 1 b0 11 01, z 1 z 1 z 1 21 N 2 b0  1, N 2 2, N 2  y ( n) Filterbank Implementation For FIR filters for i  1,2,, Q N 1 si ( k )   h (n) s( k  n) n0 i Advantage: It is simply. Linear phase when carefully design. Disadvantage: Since N at least 64, high computational requirements. NQ NQ multiplications additions For IIR filters for i  1,2,Q N M xi ( k )   a x ( k  n)   bni s( k  n) n 1 i n i n0 Advantages: It is simply. Since N and M around 8, Low computational requirement. Disadvantage: Distortion of Phase. Spectral Analysis  Discrete-Time Fourier Tranform (DTFT).  Short-Time Fourier Transform (STFT).  Windowing effects  Windows  Discrete Fourier Transform (DFT) Discrete-Time Fourier Transform  Discrete-Time Fourier Tranform (DTFT) Synthesis Definition S (e jT )   s e n    jnT n If s n is a periodic waveform S(e jT N 1 )   sne  jnT n0  DTFT Analysis Definition 2p T sn  p  S (e jT )e jnT d 2 0 (1) Short-Time Fourier Transform   Since speech signal change with the time, we we short-time analysis. Hence the Short-Time Fourier Transform (STFT) is defined as: N 1 S'(e jT )   wn sne jnT n0  Hence, the speech signal is multiplied in time by a window wn . Windowing Effects  S'(e jT ) is only an approximation to S(e jT ) S ' (e jT N 1 2p T   jnT jT jnT )   wn  p  S (e )e de 0 2  n0 (Juang, Ex. 3.2, pp. 85 and 86) S ' (e jT 2p T )  p  S (e jT )W (e j (   ) T )e jnT d 2 0 Graphical Interpretation S (e jT ) S(e jT )W(e j (1 )T ) S (e jT )W(e j (2 )T ) 1 () 2 1 ( ) 2 S '(e jT ) Windows    There are not ideal windows. Side lobes only contributes to spectral distortion. Therefore, we wish (Fig. 1.4 Deller) – Main lobe wide bandwidth – Side lobes low amplitude.  Hamming window is a good choice (Secc. 1.1.5 Deller). Main lobe Side lobes 101-point FIR filters  For large N (Narrow Bandwidth Analysis) (Fig. 3.11-3.14 – Good spectral resolution – Bad time resolution  For small N (Wide Bandwidth Analysis) – Bad spectral resolution (very smooth) – Good time resolution (quasi-stationary segments) ) Discrete Fourier Transform (DFT)  The spectrum is sampled, as follow: 2pp p  , p=0,1,2 ,...N-1 TN  By substituting it in 1 S p  S (e  j p N 1  sn e  j p n , 0  p  N-1 n0 The analysis DFT is as follows: N 1 sn   ) , the DFT is defined as follows:  S pe j p n , 0  n  N-1 p0 Time resolution depends on N. N samples in time corresponds to N samples in frequency Filterbank Implementation using STFT Uniform Filterbanks  No-uniform Filterbanks  sn (m) Window the signal Break the signal. un ( k )   sn ( Nr  k ); 0  k  N -1 Add the segments r Take the DFT Modulate the DFT. U n (k )  N 1 u n0 xi (n)  e n ( k )e  j p n  j pn U n (k ) N  Number of Filters

Part3

Related documents

Products

Support

Part3

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib