Chapter 10 Estimation 10.1 Introduction (1) As mentioned in the previous section, the problem to be solved in estimation is to estimate the “true” signal from a measured signal. A typical example is to estimate the signal, Ŝ (t), from a measured noisy signal X(t) = S(t) + N(t), where N(t) is a white noise. (2) The problem of estimation can also be viewed as filtering as it filters out the noise from the signal. (3) In this chapter, we will first discuss linear regression, which correlates two measurements. Then, we will discuss how to correlate the measurement to the signal. This is done based on filters. Various filters will be discussed such as minimum mean square error (MSE) filters followed by Kalman filters and Wiener filters. However, we will start our discussions from linear regression. 10.2 Linear Regression (1) The idea of estimation is originated from linear regression. Therefore, we first discuss linear regression. Linear regression is also one of the most commonly used techniques in engineering. (2) An example of linear regression: In an electrical circuit current is determined by voltage. From a series of experiments, following data sets are obtained: voltage (X): -2 -1 0 1 2 current (Y): -4 -1 0 1 4 - - Find the correlation between the voltage and current. First, we assume there is a linear relationship between X and Y, and hence, Y can be estimated by: Yˆ = b0 + b1X Now, we want to find b0 and b1. The error between the true value of Y and estimated Yˆ is: e = Y - Yˆ Thus, we have the linear regression model: Y = b0 + b1X + e - Suppose we have n observations, (Xi, Yi), i = 1, 2, …, n, (n = 5 in the about example) then the parameters, b0 and b1 can be obtained by minimizing the mean square error (MSE): n n i 1 i 1 e 2 ei2 Yi b0 b1 X - 2 The solution to this minimization problem can be found using simple calculus: e 2 Yi b0 b1 X i 0 b0 e 2 X i Yi b0 b1 X i 0 b1 Let: 1 Xi n 1 Y Yi n X It can be shown that: X iYi nXY bˆ1 X i2 nX 2 bˆ Y bˆ X 0 - 1 In the above example X = 0, Y = 0, b̂ 1 = 18/10 = 1.8, b̂ 0 = 0 (3) The estimation error - Given a set of data, one can always fit a linear regression model. However, it may not be a good fit. - Whether the model is a good fit to the data can be determined based on the residue error. From the linear regression model: ei = Yi - Yˆ i = (Yi - Y ) – ( Yˆ i – Y ) squaring and summing both sides: (Yi Yˆi ) 2 [(Yi Y ) (Yˆi Y )]2 2 Yi Y Yˆi Y 2 Y Y Yˆ Y 2 i i Through a series calculus it can be shown that: (Yi Y ) 2 (Yi Yˆi ) 2 (Yˆi Y ) 2 or: S = S1 + S2 - This indicate that the sum of the squared error about the mean consists of two parts: The sum of the squared error about the linear regression S1, which has a degree-offreedom of n – 2 (there are two parameters in the linear regression model), and the sum of the squared difference between the regression and the mean S2, which has a degree-of-freedom 1. If the model is a good fit, S1 would be small. On the other hand, if , It can be shown that, F - S2 1 ~ F (1, n 2) S1 ( n 2) This can be used to check if the model is a good fit. In the above example F = (32.4/1) / (1.6/3) = 60.75 Since F1, 3(0.1) = 5.54 < F, the model is a good fit. 10.3 Linear Minimum Mean Squared Error Estimators (1) Now let us consider a simple example of signal estimation. Suppose that X and S are random variables (not random processes yet): X=S+N Having observed X we seek a linear estimator: Ŝ = a + hX - of S such that E{(S – X)2} is minimized by choosing a and h. Using the same method as the linear regression (mean squares error or MSE estimation): E{( S a hX ) 2 } 2 E{S a hX } 0 a E{( S a hX ) 2 } 2 E{( S a hX ) X } 0 h Hence, S – a – hX = 0 E{XS} – aX – h E{X2} = 0 Solving these equations, it follows: h XS XX a = S – hX - This is the linear MSE estimation. the error of the estimation is: E S a hX SS 2 2 XS XX (2) Note: - This deviation is true regardless of the distribution of noise N (hence, we cannot simply use the mean as the estimator) - In theory, we can find a and h with just one observation X provided the standard deviations XX, XS and SS are known. However, this is not very accurate. To improve the accuracy, we need multiple (continues) observations X(1), X(2), …, X(n). (2) MSE estimation using multiple observations - Assuming we have n observations, X(1), X(2), …, X(n), and let: n Sˆ h0 hi X (i ) i 1 - Our goal is to find hi, i = 1, 2, …, n such that E{(S – X)2} is minimized. Using the same method (mean squares estimation) again we have: n E S h0 hi X (i) 0 i 1 n E S h0 hi X (i) X ( j ) 0 , j = 1, 2, …, n i 1 Solving these equations resulting in: n h0 S hi X ( i ) i 1 n h C i 1 i XX (i, j ) SX ( j ) , j = 1, 2, …, n where, CXX(i, j) = E{[X(i) – X(i)][X(j) – X(j)]} SX(i, j) = E{[S – S][X(j) – X(j)]} (3) The estimation error is the residual: P = E{(S - Ŝ )2} Using the orthogonality, it can be shown that the estimation error is: N P( N ) E{S (n) Sˆ (n)} RSS (0) hi RSS (i ) i 1 Since RSS(i) 0, it is seen that the more observations, the less the estimation error. (4) The matrix form - MSE estimation can be represented in a vector form. Let: XT = [X(1), X(2), …, X(n)] hT = [h1, h2, …, hn] XX = [CXX(i, j)] TSX SX (1) , SX ( 2) ,..., SX ( n) Then the solution becomes: h XX1 SX (5) An example: Suppose we have two observations: X(1) and X(2) and it is known: X(1) = 0.02 CXX(1, 1) = (0.005)2 X(2) = 0.006 CXX(2, 2) = (0.0009)2 Also, CXX(1, 1) = 0 Suppose furthermore the liner MSE estimator is Sˆ h0 h1 X (1) h2 X (2) where, S = 1 S = 0.01 SX(1) = 0.00003 SX(2) = 0.000004 - Using the matrix form: h1 (0.005) 2 h 2 0 0 (0.0009) 2 2 0.00003 0.000004 Hence: h1 = 1.2, h2 = 4.94 and h0 = 1 – (0.02)(1.2) – (4.94)(0.006) = 0.946 Finally, the estimation error is: P(2) = E{(S - Ŝ )2} = (0.01)2 – (1.2)(0.00003) – (4.94)(0.000004) = 0.000044 (6) Note: - In the above discussion, it is assumed that the signal is linear function of the measurement. Otherwise, it may result in a large error. - In the estimation, it is necessary to know the statistics of the signal (mean and covariance). - It is assume that the signal S is a constant. If the signal is a sequence we have to use the filtering technique presented below. 10.4 Filters (1) Introduction - In this section we consider the problem of estimating a random sequence, S(n), while observing the signal sequence, X(m). - Note: we have two sequences: S(n) and X(m), - If m > n, we filtering the signal - If m < n, we are predicting the signal (2) Digital filter - Suppose the relationship between the signal S and the measurement X is as follows: X(n) = S(n) + v(n) - - where, S(n) is a zero-mean signal random sequence, v(n) is the zero-mean white noise sequence and S(n) and v(n) are uncorrelated. Note that we assume that S(n) is a zero-mean random sequence. This is in fact a rather general form of signal. For example, if a signal has a linear trend, we can remove it by curve fitting. As another example, a deterministic signal can be viewed as a special random signal with just one member function. Again, we assume the estimator, Ŝ (n), is a linear combination of the observations: Sˆ (n) h( k ) X ( n k ) k - Similar to the study above, the minimum linear mean square error (MSE) estimator is given by minimizing: 2 E{ S (n) Sˆ (n) } and can be determined by applying the orthogonality conditions: E S (n) hi X (n k ) X (n 1) 0 , for all i k or: R XX (i ) - h R k i XX (i k ) , for all i Furthermore, the error of the estimation is given by: P E{S (n) Sˆ (n)} RSS (0) hi RSS (i ) i (3) The frequency response of the estimation - Apply Fourier transform to both side of the above equation, we have: SXS(f) = H(f) SXX(f), f< ½ Or: H( f ) - S XS ( f ) , f< ½ S XX ( f ) This represents the correlation between the signal and the measurement. In particular, H(f) is called a filter as it removes the noise affect from the measured signal X(f) resulting the signal S(f). Consider the estimation error: P E{S (n) Sˆ (n)} RSS (0) hi RSS (i ) i Taking Fourier transform resulting in the following: P(f) =SXS(f) - H(f) SSS(f), f< ½ Thus, by inverse Fourier transform, we can find the estimation error: 1/ 2 P (m) 1 / 2 - P( f ) exp( j 2fm)df Problem: it involves infinite sum and the summation includes the terms of future observation. Hence cannot be used in practice. Solution: filtering. In particular, we will discuss two types of filters: - Kalman filters - Wiener filters 10.5 Kalman Filters (1) Let us start from the model: - the model of the signal (a Markov process with Gaussian noise): S(n) = a(n)S(n-1) + w(n), w(n) ~ N(0, w) - the measured signal X(n) = S(n) + v(n), v(n) ~ N(0, v) - Objective: find Ŝ (n+1) based on the observations, X(1), X(2), …, X(n). (2) The basic idea: - If S(n) = S is a constant and v(n) is zero mean noise, then X (1) X (2) ... X (n 1) Sˆ (n) n 1 and: X (1) X (2) ... X (n) Sˆ (n 1) n - Thus, we have the recursive estimation formula: n 1 ˆ 1 Sˆ (n 1) S ( n) X ( n) n n - Note that using the recursive form, we don’t have to worry about the infinite sum. However, we are interested in the case where S(n) S. Therefore, the recursive form would be: Ŝ (n+1) = a(n) Ŝ (n+1) + b(n)X(n) or: Ŝ (n+1) = a(n) Ŝ (n) + b(n)VX(n) - where, VX(n) = X(n) - Ŝ (n). This implies that the current signal is a linear combination of previous signal and an error. Now, all we have to do is to determine a(n) and b(n). By using MSE, Kalman delivered a set up of recursive equation as follows. (3) The Kalman filter procedure Initialization n = 1; p(1) = 2 ( > w and v) Ŝ (1) = S (a constant) Iteration 1 get data: W(n), V(n), a(n+1), X(n+1) p ( n) k ( n) p (n) V2 (n) Ŝ (n+1) = a(n+1){ Ŝ (n) + k(n)[X(n) - Ŝ (n)]} p(n+1) = a2(n+1)[1 – k(n)]p(n) + W2 (n) n=n+1 goto 1 - There is a five page proof in the textbook It can be extended to matrix form. (4) An example: - The model of the signal S(n) = 0.6S(n-1) + w(n), w(n) ~ N(0, ½) X(n) = S(n) + v(n), v(n) ~ N(0, 1 2 ) - The Kalman filter Initialization: n = 1; p(1) = 1; Ŝ (1) = 0; Iteration 1: p (1) 1 k (1) 0.6667 2 p (1) v (1) 1 12 Ŝ (2) = a(2){ Ŝ (1) + k(1)[X(1) - Ŝ (1)]} = (0.6){0 + (0.667)[X(1) – 0] = 0.4X(1) p(2) = a2(2)[1 – k(1)]p(1) + W2 (1) = (0.6)2[1 – 0.6667](1) + (1/4) = 0.37 n=1+1=2 Iteration 2: k (2) p ( 2) 0.37 0.425 2 p (2) v (2) 0.37 12 Ŝ (3) = a(3){ Ŝ (2) + k(2)[X(2) - Ŝ (2)]} = 0.138X(1) + 0.255X(2) p(3) = a2(3)[1 – k(2)]p(2) + W2 (2) = 0.326 n=2+1=3 Iteration 3: …… - - Note - Kalman filter is a time dependent function of measured signals - The model of the signal (the signal is a Markov process with time dependent coefficient a(n)) must be known. - The statistics of the noises (w and v) must be known. Kalman filter calculations can be done by using MATLAB as well. (5) The limit of the filter - If a(n) does not vary with n, and w(n) and v(n) are both stationary (i.e., w and v are constants), then both k(n) and p(n) will approach limits as n approaches infinity. - The limiting k(n) and p(n) can be found by assuming that p(n + 1) = p(n) = p. It is as follows: w2 v2 (a 2 1) w2 v2 (a 2 1) 4 v2 w2 2 lim p(n) p n - 2 Following the example above p k 0.25 (0.5)(0.36 1) 0.25 (0.5)(0.36 1)2 (4)(0.25)(0.5) 2 0.32 0.32 0.39 0.32 0.5 10.6 Wiener Filters (1) Introduction - Wiener filters is also based on the minimization the MSE and the use of orthogonality conditions. -