ECE8423 8443––Adaptive Pattern Recognition ECE Signal Processing LECTURE 04: LINEAR PREDICTION • Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin Recursions Spectral Modeling Inverse Filtering and Deconvolution • Resources: ECE 4773: Into To DSP ECE 8463: Fund. Of Speech WIKI: Minimum Phase Markel and Gray: Linear Prediction Deller: DT Processing of Speech AJR: LP Modeling of Speech MC: MATLAB Demo • URL: .../publications/courses/ece_8423/lectures/current/lecture_04.ppt • MP3: .../publications/courses/ece_8423/lectures/current/lecture_04.mp3 The Linear Prediction (LP) Model • Consider a pth-order linear prediction model: xˆ n p a x n n i 0 i x(n) x(n n0 ) i 1 Without loss of generality, assume n0 = 0. • The prediction error is defined as: e n x n xˆ n x n p a x n i i i 1 • We can define an objective function: J E {e 2 n } n E x 2 n E x 2 E x n xˆ n E x n p i 1 a i x n i 2 2 p p E 2 x ( n ) a i x n i E a i x n i i 1 i 1 2 p p 2 E a i x ( n ) x n i E a i x n i i 1 i 1 p n 2 E x 2 2 i 1 2 p a i E x ( n ) x n i E a i x n i i 1 ECE 8423: Lecture 04, Slide 1 xˆ ( n ) {a i } – + e (n ) Minimization of the Objective Function • Differentiate w.r.t. al: 2 p p 2 E x n 2 a i E x ( n ) x n i E a i x n i a l a l i 1 i 1 J a l E x n 2 p 2 a E x ( n ) x n i i a l i 1 a l 2 p E a i x n i i 1 p 2 E x ( n ) x n l 2 E a i x n i x ( n l ) 0 i 1 • Rearranging terms: p E a i x n i x ( n l ) E x ( n ) x n l i 1 • Interchanging the order of summation and expectation on the left (why?): p a E x n i x ( n l ) E x ( n ) x n l i i 1 • Define a covariance function: c ( i , j ) E x n i x ( n j ) ECE 8423: Lecture 04, Slide 2 The Yule-Walker Equations (aka Normal Equations) • We can rewrite our prediction equation as: p p a E x n i x ( n l ) E x ( n ) x n l i a c (i , l ) c ( 0 , l ) i i 1 i 1 • This is known as the Yule-Walker equation. Its solution produces what we refer to as the Covariance Method for linear prediction. a 1 c (1,1) a 2 c ( 2 ,1) ... a p c ( p ,1) c ( 0 ,1) a 1 c (1, 2 ) a 2 c ( 2 , 2 ) ... a p c ( p , 2 ) c ( 0 , 2 ) ... a 1 c (1, p ) a 2 c ( 2 , p ) ... a p c ( p , p ) c ( 0 , p ) • We can write this set of p equations in matrix form: Ca c and can easily solve for the prediction coefficients: where: a1 a2 a a p c (1,1) c ( 2 ,1) C c ( p ,1) c (1, 2 ) c ( 2,2 ) c ( p ,2 ) c (1, p ) c (2, p ) c( p, p ) -1 a C c c ( 0 ,1) c (0,2 ) c c ( 0 , p ) • Note that the covariance matrix is symmetric: c( 1,2 ) c( 2 ,1 ) ECE 8423: Lecture 04, Slide 3 Autocorrelation Method • C is a covariance matrix, which means it has some special properties: Symmetric: under what conditions does its inverse exist? Fast Inversion: we can factor this matrix into upper and lower triangular matrices and derive a fast algorithm for inversion known as the Cholesky decomposition. • If we assume stationary inputs, we can convert covariances to correlations: a1 a2 a a p r (0) r (1) R r ( p 1) r (1) r (0) r ( p 2) r ( p 1) r ( p 2) r (0) r (1) r (2) r r ( p ) • This is known as the Autocorrelation Method. This matrix is symmetric, but is also Toeplitz, which means the inverse can be performed efficiently using an iterative algorithm we will introduce shortly. • Note that the Covariance Method requires p(p-1)/2 unique values for the matrix, and p values for the associated vector. A fast algorithm, known as the Factored Covariance Algorithm, exists to compute C. • The Autocorrelation method requires p+1 values to produce p LP coefficients. ECE 8423: Lecture 04, Slide 4 Linear Prediction Error • Recall our expression for J, the prediction error energy: J E {e 2 n } E x n xˆ n 2 E x n p i 1 a i x n i 2 We can substitute our expression for the predictor coefficients, and show: p J r (0) a r (i ) i Autocorrel ation Method i 1 p J c ( 0 ,0 ) a c (0, i) i Covariance Method i 1 These relations are significant because they show the error obeys the same linear prediction equation that we applied to the signal. This result has two interesting implications: Missing values of the autocorrelation function can be calculated using this relation under certain assumptions (e.g., maximum entropy). The autocorrelation function shares many properties with the linear prediction model (e.g., minimum phase). In fact, the two representations are interchangeable. ECE 8423: Lecture 04, Slide 5 Linear Filter Interpretation of Linear Prediction • Recall our expression for the error signal: e n x n xˆ n x n p a x n i i i 1 • We can rewrite this using the z-Transform: Z e n E ( z ) Z x n p X (z) ai z i i 1 p i 1 a i x n i X ( z ) X ( z ) 1 p ai z i 1 i • This implies we can view the computation of the error as a filtering process: p E ( z ) X ( z ) A( z ) where A(z) 1 ai z i x(n) H ( z ) A( z ) e (n ) i 1 • This, of course, implies we can invert the process and generate the original signal from the error signal: e (n ) H ( z ) 1 / A( z ) • This rather remarkable view of the process exposes some important questions about the nature of this filter: A(z) is an FIR filter. Under what conditions is it minimum phase? Under what conditions is the inverse, 1/A(z), stable? ECE 8423: Lecture 04, Slide 6 x(n) Residual Error • To the right are some examples of the linear prediction error for voiced speech signals. • The points where the prediction error peaks are points in the signal where the signal is least predictable by a linear prediction model. In the case of voiced speech, this relates to the manner in which the signal is produced. • Speech compression and synthesis systems exploit the linear prediction model as a first-order attempt to remove redundancy from the signal. -1 a C c • The LP model is independent of the energy of the input signal. It is also independent of the phase of the input signal because the LP filter is a minimum phase filter. ECE 8423: Lecture 04, Slide 7 x(n) e (n ) H ( z ) A( z ) H ( z ) 1 / A( z ) e (n ) x(n) Durbin Recursion • There are several efficient algorithms to compute the LP coefficients without doing a matrix inverse. One of the most popular and insightful is known as the Durbin recursion: E (i) r (0) i 1 ( i 1 ) ( i 1 ) k i r (i ) a j r (i j ) / E j 1 (i) a j ki (i) aj E (i) ( i 1 ) aj ( i 1 ) k i a i j (1 k i ) E 2 1 i p 1 j i 1 ( i 1 ) • The intermediate coefficients, {ki}, are referred to as reflection coefficients. To compute a pth order model, all orders from 1 to p are computed. • This recursion is significant for several reasons: The error energy decreases as the LP order increases, indicating the model continually improves. There is a one-to-one mapping between {ri}, {ki}, and {ai}. For the LP filter to be stable, k i 1 . Note that the Autocorrelation Method guarantees the filter to be stable. The Covariance Method does not. ECE 8423: Lecture 04, Slide 8 The Burg Algorithm • Digital filters can be implemented using many different forms. One very important and popular form is a lattice filter, shown to the right. • Itakura showed the {ki}’s can be computed directly: N 1 e ki ( i 1 ) ( m )b ( i 1 ) ( m 1) m o N 1 N 1 ( i 1 ) 2 ( i 1 ) 2 ( e ( m )) ( b ( m 1 )) m 0 m 0 1/ 2 • Burg demonstrated that the LP approach can be viewed as a maximum entropy spectral estimate, and derived an expression for the reflection coefficients that guarantees: 1 k i 1 . N 1 2 e ki ( i 1 ) ( m )b ( i 1 ) ( m 1) m o N 1 (e m 0 ( i 1 ) N 1 ( m )) 2 (b ( i 1 ) ( m 1)) m 0 • Makhoul showed that a family of lattice-based formulations exist. • Most importantly, the filter coefficients can be updated in real-time in O(n). ECE 8423: Lecture 04, Slide 9 2 The Autoregressive Model • Suppose we model our signal as the output of a linear filter with a white noise input: w (n ) H ( z ) 1 / A( z ) x(n) • The inverse LP filter can be thought of as an all-pole (IIR) filter: H (z) 1 A( z ) 1 1 a1 z 1 a2 z 2 ... a p z p • This is referred to as an autoregressive (AR) model. • If the system is actually a mixed model, referred to as an autoregressive moving average (ARMA) model: H (z) B(z) A( z ) 1 b1 z 1 b2 z 2 ... b q z q 1 a1 z 1 a2 z 2 ... a p z p • The LP model can still approximate such a system because: 1 1 a1 z 1 1 ( a1 ) z 1 ( a1 ) z 2 2 ... Hence, even if the system has poles and zeroes, the LP model is capable of approximating the system’s overall impulse or frequency response. ECE 8423: Lecture 04, Slide 10 Spectral Matching and Blind Deconvolution • Recall our expression for the error energy: E ( i ) (1 k i2 ) E ( i 1) • The LP filter becomes increasingly more accurate if you increase the order of the model. • We can interpret this as a spectral matching process, as shown to the right. As the order increases, the LP model better models the envelope of the spectrum of the original signal. • The LP model attempts to minimize the error equally across the entire spectrum. • If the spectrum of the input signal has a systematic variation, such as a bandpass filter shape, or a spectral tilt, the LP model will attempt to model this. Therefore, we typically pre-whiten the signal before LP analysis. • The process by which the LP filter learns the spectrum of the input signal is often referred to as blind deconvolution. ECE 8423: Lecture 04, Slide 11 Summary • There are many interpretations and motivations for linear prediction ranging from minimum mean-square error estimation to maximum entropy spectral estimation. • There are many implementations of the filter, including the direct form and the lattice representation. • There are many representations for the coefficients including predictor and reflection coefficients. • The LP approach can be extended to estimate the parameters of most digital filters, and can also be applied to the problem of digital filter design. • The filter can be estimated in batch mode using a frame-based analysis, or it can be updated on a sample basis using a sequential or iterative estimator. Hence, the LP model is our first adaptive filter. Such a filter can be viewed as a time-varying digital filter that tracks a signal in real-time. • Under appropriate Gaussian assumptions, LP analysis can be shown to be a maximum likelihood estimate of the model parameters. • Further, two models can be compared using a metric called the log likelihood ratio. Many other metrics exist to compare such models, including cepstral and principal components approaches. ECE 8423: Lecture 04, Slide 12