Linear Prediction 1 Linear Prediction (Introduction): The object of linear prediction is to estimate the output sequence from a linear combination of input samples, past output samples or both : q p j 0 i 1 yˆ (n) b( j ) x(n j ) a(i) y(n i) The factors a(i) and b(j) are called predictor coefficients. 2 Linear Prediction (Introduction): Many systems of interest to us are describable by a linear, constant-coefficient difference equation : p q i 0 j 0 a(i) y(n i) b( j ) x(n j ) If Y(z)/X(z)=H(z), where H(z) is a ratio of polynomials N(z)/D(z), then q p j 0 i 0 N ( z ) b( j ) z j and D( z ) a(i ) z i Thus the predicator coefficient given us immediate access to the poles and zeros of H(z). 3 Linear Prediction (Types of System Model): There are two important variants : All-pole model (in statistics, autoregressive (AR) model ): All-zero model (in statistics, moving-average (MA) model ) : The numerator N(z) is a constant. The denominator D(z) is equal to unity. The mixed pole-zero model is called the autoregressive moving-average (ARMA) model. 4 Linear Prediction (Derivation of LP equations): Given a zero-mean signal py(n), in the AR model : yˆ (n) a(i) y(n i) The error is : i 1 e(n) y (n) yˆ (n) p a (i ) y (n i ) i 0 To derive the predicator we use the orthogonality principle, the principle states that the desired coefficients are those which make the error orthogonal to the samples y(n-1), y(n-2),…, y(n-p). 5 Linear Prediction (Derivation of LP equations): Thus we require that y(n j )e(n) 0 for j 1, 2, ..., p Or, Interchanging the operation of averaging and summing, and representing < > by summing over n, we have p a(i) y(n i) y(n j) 0, j 1,..., p i 0 n The required predicators are found by solving these equations. 6 Linear Prediction (Derivation of LP equations): The orthogonality principle also states that resulting minimum error is given by E e ( n ) y ( n ) e( n ) 2 Or, p a(i) y(n i) y(n) E i 0 n We can minimize the error over all time : p a(i)r i 0 p i j a(i)r i 0 i 0, j 1,2, ...,p E where 7 Linear Prediction (Applications): Autocorrelation matching : We have a signal y(n) with known autocorrelation ryy (n) . We model this with the AR system shown below : e(n) z (n) σ 1-A(z) H ( z) A( z ) p 1 ai z i i 1 8 Linear Prediction (Order of Linear Prediction): The choice of predictor order depends on the analysis bandwidth. The rule of thumb is : For a normal vocal tract, there is an average of about one formant per kilohertz of BW. One formant require two complex conjugate poles. Hence for every formant we require two predicator coefficients, or two coefficients per kilohertz of bandwidth. 9 Linear Prediction (AR Modeling of Speech Signal): True Model: Pitch Gain s(n) DT Voiced Impulse generator G(z) Glottal Filter Speech Signal U(n) Voiced V Volume velocity U H(z) Vocal tract Filter R(z) LP Filter Uncorrelated Unvoiced Noise generator Gain 10 Linear Prediction (AR Modeling of Speech Signal): Using LP analysis : Pitch Gain DT Voiced Impulse generator estimate Speech V U White Noise Unvoiced generator s(n) All-Pole Filter (AR) Signal H(z) 11 3.3 LINEAR PREDICTIVE CODING MODEL FOR SREECH RECOGNITION u (n) s (n) A(z ) G 12 3.3.1 The LPC Model s (n) a1 s (n 1) a 2 s (n 2) ... a p s (n p ), Convert this to equality by including an excitation term: p s (n) ai s (n i ) Gu(n), i 1 p S ( z ) ai z i S ( z ) GU ( z ) i 1 S ( z) H ( z) GU ( z ) 1 p 1 a i z i 1 . A( z ) i 1 13 3.3.2 LPC Analysis Equations p s (n) ak s (n k ) Gu( n). k 1 ~ p s (n) ak s (n k ). k 1 The prediction error: p ~ e( n ) s ( n ) s ( n ) s ( n ) ak s ( n k ) k 1 Error transfer function: p E( z) k A( z ) 1 ak z . S ( z) k 1 14 3.3.2 LPC Analysis Equations S n ( m) s ( n m) e n ( m) e( n m) We seek to minimize the mean squared error signal: En 2 e n ( m) m 2 E n s n ( m) a k s n ( m k ) . m k 1 p 15 E n 0, a k s k 1,2,..., p p n ( m i ) s n ( m) a k S n ( m i ) S n ( m k ) k 1 m (*) m Terms of short-term covariance: n (i, k ) S n (m i ) S n (m k ) m With this notation, we can write (*) as: p n (i,0) a k n (i, k ) i 1,2,..., p k 1 A set of P equations, P unknowns 16 3.3.2 LPC Analysis Equations The minimum mean-squared error can be expressed as: p E s ( m) a s ( m) s ( m k ) 2 n n k 1 m k n n m p n (0,0) a k n (0, k ). k 1 17 3.3.3 The Autocorrelation Method s (m n).w(m), 0 m N 1 s n ( m) otherwise. 0, The mean squared error is: En w(m): a window zero outside 0≤m≤N-1 N 1 p 2 e n ( m) m 0 And: n (i, k ) n (i, k ) N 1 p s m 0 n (m i ) s n (m k ), N 1 ( i k ) s m 0 n (m) s n (m i k ), 1 i p 0k p 1 i p . 0k p 18 3.3.3 The Autocorrelation Method n (i, k ) n 1 ( i k ) s m 0 n (m) s n (m i k ), 1 i p . 0k p Since n (i, k ) is only a function of i - k, the covariance function reduces to simple autocorrel ation function : n (i, k ) rn (i k ) 19 3.3.3 The Autocorrelation Method Since the autocorrel ation function is symmetric, i.e. rn (k ) rn (k ) so : p r (| i k |) a k 1 n k rn (i ), 1 i p and can be expressed in matrix form as : rn (1) rn (2) ...rn ( p 1) a1 rn (1) rn (0) r (1) r ( 0 ) r ( 1 ) ... r ( p 2 ) r ( 2 ) n n n n a n . 2 rn (2) rn (1) rn (0) ...rn ( p 3) rn (3) a rn ( p 1) rn ( p 2) rn ( p 3) ... rn (0) p rn ( p ) 20 3.3.3 The Autocorrelation Method 21 3.3.3 The Autocorrelation Method 22 3.3.3 The Autocorrelation Method 23 3.3.4 The Covariance Method change the interval of computing error to 0 m N 1 and to use the unweighted speech directly : N 1 E n en2 (m) m 0 with n (i, k ) defined as : N 1 n (i, k ) s n (m i )s n (m k ), m 0 1 i p 0k p or, by change of variables , n (i, k ) N i 1 s mi n (m) s n (m i k ), 1 i p 0k p . 24 3.3.4 The Covariance Method n (1,1) (2,1) n n (3,1) n ( p,1) n (1,2) n (1,3) n (1, p) a1 n (1,0) n (2,2) n (2,3) n (2, p) n (2,0) a2 n (3,2) n (3,3) n (3, p) n (3,0) . a 3 n ( p,2) n ( p,3) n ( p, p) a n ( p,0) p The resulting covariance matrix is symmetric, but not Toeplitz, and can be solved efficiently by a set of techniques called Cholesky decomposition 25 3.3.6 Examples of LPC Analysis 26 3.3.6 Examples of LPC Analysis 27 3.3.7 LPC Processor for Speech Recognition 28 3.3.7 LPC Processor for Speech Recognition Preemphasis: typically a first-order FIR, To spectrally flatten the signal Most widely the following filter is used: ~ 1 H ( z) 1 a z , ~ 0.9 a 1.0. ~ s(n) s(n) a s(n 1). 29 3.3.7 LPC Processor for Speech Recognition Frame Blocking: ~ x (n) s( M n), n 0,1,..., N 1 0,1,..., L 1. 30 3.3.7 LPC Processor for Speech Recognition Windowing ~ 0 n N 1. x (n) x (n) w(n), Hamming Window: 2n w(n) 0.54 0.46 cos , N 1 0 n N 1. Autocorrelation analysis r (m) N 1 m ~ n 0 ~ x (n) x (n m), m 0,1,..., p, 31 3.3.7 LPC Processor for Speech Recognition LPC Analysis, to find LPC coefficients, reflection coefficients (PARCOR), the log area ratio coefficients, the cepstral coefficients, … Durbin’s method E ( 0 ) r (0) L 1 ( i 1) k i r (i ) j r (| i j |) E (i 1) , (*) j 1 i(i ) k i 1 i p (ji ) (ji 1) k i i(i j1) E ( i ) (1 k i2 ) E (i 1) , note : the summation in (*) is ommitted for i 1 32 3.3.7 LPC Processor for Speech Recognition a m LPC coefficien ts m( p ) , 1 m p k m PARCOR coefficien ts 1 km g m log area ratio coefficien ts log 1 km . 33 3.3.7 LPC Processor for Speech Recognition LPC parameter conversion to cepstral coefficients c0 ln 2 2 is the gain term in LPC model m 1 k c m a m c k a m k , k 1 m m 1 k c m c k a m k , k 1 m 1 m p m p, 34 3.3.7 LPC Processor for Speech Recognition Parameter weighting Low-order cepstral coefficients are sensitive to overall spectral slope High-order cepstral coefficients are sensitive to noise The weighting is done to minimize these sensitivities log | S (e j ) | c m m e j m ( jm)c j log s (e ) m m e j m 35 3.3.7 LPC Processor for Speech Recognition j jm log | s(e ) | c m e , m c m c m ( jm). c m wm c m, 1 m Q, Q m , wm 1 sin 2 Q 1 m Q 36 3.3.7 LPC Processor for Speech Recognition Temporal cepstral derivative Fourier series representa tion of the time derivative : c (t ) j log | s(e , t ) | m e jm t t m Approximat e by an orthogonal polynomial fit over a finite - length win dow K c m (t ) c m (t ) c m (t k ) t k K or optionally : c m (t ) c m (t K ) c m (t K ) Finally ot (c1 (t ), c 2 (t ),..., c Q (t ), c1 (t ), c 2 (t ),..., cQ (t )) 37 3.3.9 Typical LPC Analysis Parameters N M P Q K number of samples in the analysis frame number of samples shift between frames LPC analysis order dimension of LPC derived cepstral vector number of frames over which cepstral time derivatives are computed 38 Typical Values of LPC Analysis Parameters for SpeechRecognition System parameter Fs 6.67 kHz Fs 8 kHz Fs 10 kHz N 300 (45 msec) 240 (30 msec) 300 (30 msec) M 100 (15 msec) 80 (10 msec) 100 (10 msec) p 8 10 10 Q 12 12 12 K 3 3 3 39