Chapter 10 Estimation

advertisement
Chapter 10 Estimation
10.1 Introduction
(1) As mentioned in the previous section, the problem to be solved in estimation is to
estimate the “true” signal from a measured signal. A typical example is to estimate
the signal, Ŝ (t), from a measured noisy signal X(t) = S(t) + N(t), where N(t) is a white
noise.
(2) The problem of estimation can also be viewed as filtering as it filters out the noise
from the signal.
(3) In this chapter, we will first discuss linear regression, which correlates two
measurements. Then, we will discuss how to correlate the measurement to the signal.
This is done based on filters. Various filters will be discussed such as minimum mean
square error (MSE) filters followed by Kalman filters and Wiener filters. However,
we will start our discussions from linear regression.
10.2 Linear Regression
(1) The idea of estimation is originated from linear regression. Therefore, we first discuss
linear regression. Linear regression is also one of the most commonly used techniques
in engineering.
(2) An example of linear regression: In an electrical circuit current is determined by
voltage. From a series of experiments, following data sets are obtained:
voltage (X): -2
-1
0
1
2
current (Y): -4
-1
0
1
4
-
-
Find the correlation between the voltage and current.
First, we assume there is a linear relationship between X and Y, and hence, Y can be
estimated by:
Yˆ = b0 + b1X
Now, we want to find b0 and b1.
The error between the true value of Y and estimated Yˆ is:
e = Y - Yˆ
Thus, we have the linear regression model:
Y = b0 + b1X + e
-
Suppose we have n observations, (Xi, Yi), i = 1, 2, …, n, (n = 5 in the about example)
then the parameters, b0 and b1 can be obtained by minimizing the mean square error
(MSE):
n
n
i 1
i 1
e 2   ei2   Yi  b0  b1 X 
-
2
The solution to this minimization problem can be found using simple calculus:
e
 2 Yi  b0  b1 X i   0
b0
e
 2  X i Yi  b0  b1 X i   0
b1
Let:
1
 Xi
n
1
Y   Yi
n
X 
It can be shown that:
  X iYi   nXY
bˆ1 
 X i2  nX 2
bˆ  Y  bˆ X
0
-
1
In the above example
X = 0, Y = 0,
b̂ 1 = 18/10 = 1.8, b̂ 0 = 0
(3) The estimation error
- Given a set of data, one can always fit a linear regression model. However, it may not
be a good fit.
- Whether the model is a good fit to the data can be determined based on the residue
error. From the linear regression model:
ei = Yi - Yˆ i = (Yi - Y ) – ( Yˆ i – Y )
squaring and summing both sides:
 (Yi  Yˆi ) 2  [(Yi  Y )  (Yˆi  Y )]2

2
  Yi  Y    Yˆi  Y
  2 Y  Y Yˆ  Y 
2
i
i
Through a series calculus it can be shown that:
 (Yi  Y ) 2   (Yi  Yˆi ) 2   (Yˆi  Y ) 2
or:
S = S1 + S2
-
This indicate that the sum of the squared error about the mean consists of two parts:
The sum of the squared error about the linear regression S1, which has a degree-offreedom of n – 2 (there are two parameters in the linear regression model), and the
sum of the squared difference between the regression and the mean S2, which has a
degree-of-freedom 1. If the model is a good fit, S1 would be small. On the other hand,
if ,
It can be shown that,
F
-
S2 1
~ F (1, n  2)
S1 ( n  2)
This can be used to check if the model is a good fit.
In the above example
F = (32.4/1) / (1.6/3) = 60.75
Since F1, 3(0.1) = 5.54 < F, the model is a good fit.
10.3 Linear Minimum Mean Squared Error Estimators
(1) Now let us consider a simple example of signal estimation. Suppose that X and S are
random variables (not random processes yet):
X=S+N
Having observed X we seek a linear estimator:
Ŝ = a + hX
-
of S such that E{(S – X)2} is minimized by choosing a and h.
Using the same method as the linear regression (mean squares error or MSE
estimation):
E{( S  a  hX ) 2 }
 2 E{S  a  hX }  0
a
E{( S  a  hX ) 2 }
 2 E{( S  a  hX ) X }  0
h
Hence,
S – a – hX = 0
E{XS} – aX – h E{X2} = 0
Solving these equations, it follows:

h  XS
 XX
a = S – hX
-
This is the linear MSE estimation.
the error of the estimation is:


E S  a  hX    SS 
2
2
 XS
 XX
(2) Note:
- This deviation is true regardless of the distribution of noise N (hence, we cannot
simply use the mean as the estimator)
- In theory, we can find a and h with just one observation X provided the standard
deviations XX, XS and SS are known. However, this is not very accurate. To
improve the accuracy, we need multiple (continues) observations X(1), X(2), …,
X(n).
(2) MSE estimation using multiple observations
- Assuming we have n observations, X(1), X(2), …, X(n), and let:
n
Sˆ  h0   hi X (i )
i 1
-
Our goal is to find hi, i = 1, 2, …, n such that E{(S – X)2} is minimized.
Using the same method (mean squares estimation) again we have:
n


E S  h0   hi X (i)  0
i 1


n



E S  h0   hi X (i) X ( j )  0 , j = 1, 2, …, n
i 1



Solving these equations resulting in:
n
h0   S   hi  X ( i )
i 1
n
h C
i 1
i
XX
(i, j )  SX ( j ) , j = 1, 2, …, n
where,
CXX(i, j) = E{[X(i) – X(i)][X(j) – X(j)]}
SX(i, j) = E{[S – S][X(j) – X(j)]}
(3) The estimation error is the residual:
P = E{(S - Ŝ )2}
Using the orthogonality, it can be shown that the estimation error is:
N
P( N )  E{S (n)  Sˆ (n)}  RSS (0)   hi RSS (i )
i 1
Since RSS(i)  0, it is seen that the more observations, the less the estimation error.
(4) The matrix form
- MSE estimation can be represented in a vector form. Let:
XT = [X(1), X(2), …, X(n)]
hT = [h1, h2, …, hn]
XX = [CXX(i, j)]
TSX   SX (1) , SX ( 2) ,..., SX ( n)


Then the solution becomes:
h   XX1  SX
(5) An example: Suppose we have two observations: X(1) and X(2) and it is known:
X(1) = 0.02
CXX(1, 1) = (0.005)2
X(2) = 0.006
CXX(2, 2) = (0.0009)2
Also,
CXX(1, 1) = 0
Suppose furthermore the liner MSE estimator is
Sˆ  h0  h1 X (1)  h2 X (2)
where,
S = 1
S = 0.01
SX(1) = 0.00003
SX(2) = 0.000004

- Using the matrix form:
 h1  (0.005) 2
h   
 2  0

0

(0.0009) 2 
2
 0.00003 
0.000004


Hence:
h1 = 1.2, h2 = 4.94
and
h0 = 1 – (0.02)(1.2) – (4.94)(0.006) = 0.946
Finally, the estimation error is:
P(2) = E{(S - Ŝ )2} = (0.01)2 – (1.2)(0.00003) – (4.94)(0.000004) = 0.000044
(6) Note:
- In the above discussion, it is assumed that the signal is linear function of the
measurement. Otherwise, it may result in a large error.
- In the estimation, it is necessary to know the statistics of the signal (mean and
covariance).
- It is assume that the signal S is a constant. If the signal is a sequence we have to use
the filtering technique presented below.
10.4 Filters
(1) Introduction
- In this section we consider the problem of estimating a random sequence, S(n), while
observing the signal sequence, X(m).
- Note: we have two sequences: S(n) and X(m),
- If m > n, we filtering the signal
- If m < n, we are predicting the signal
(2) Digital filter
- Suppose the relationship between the signal S and the measurement X is as follows:
X(n) = S(n) + v(n)
-
-
where, S(n) is a zero-mean signal random sequence, v(n) is the zero-mean white noise
sequence and S(n) and v(n) are uncorrelated.
Note that we assume that S(n) is a zero-mean random sequence. This is in fact a
rather general form of signal. For example, if a signal has a linear trend, we can
remove it by curve fitting. As another example, a deterministic signal can be viewed
as a special random signal with just one member function.
Again, we assume the estimator, Ŝ (n), is a linear combination of the observations:
Sˆ (n) 

 h( k ) X ( n  k )
k  
-
Similar to the study above, the minimum linear mean square error (MSE) estimator is
given by minimizing:
2
E{ S (n)  Sˆ (n) }


and can be determined by applying the orthogonality conditions:




E S (n)   hi X (n  k ) X (n  1)  0 , for all i
k  



or:
R XX (i ) 
-

h R
k  
i
XX
(i  k ) , for all i
Furthermore, the error of the estimation is given by:

P  E{S (n)  Sˆ (n)}  RSS (0)   hi RSS (i )
i  
(3) The frequency response of the estimation
- Apply Fourier transform to both side of the above equation, we have:
SXS(f) = H(f) SXX(f), f< ½
Or:
H( f ) 
-
S XS ( f )
, f< ½
S XX ( f )
This represents the correlation between the signal and the measurement. In particular,
H(f) is called a filter as it removes the noise affect from the measured signal X(f)
resulting the signal S(f).
Consider the estimation error:

P  E{S (n)  Sˆ (n)}  RSS (0)   hi RSS (i )
i  
Taking Fourier transform resulting in the following:
P(f) =SXS(f) - H(f) SSS(f), f< ½
Thus, by inverse Fourier transform, we can find the estimation error:
1/ 2
P (m)  
1 / 2
-
P( f ) exp(  j 2fm)df
Problem: it involves infinite sum and the summation includes the terms of future
observation. Hence cannot be used in practice.
Solution: filtering. In particular, we will discuss two types of filters:
- Kalman filters
- Wiener filters
10.5 Kalman Filters
(1) Let us start from the model:
- the model of the signal (a Markov process with Gaussian noise):
S(n) = a(n)S(n-1) + w(n), w(n) ~ N(0, w)
-
the measured signal
X(n) = S(n) + v(n), v(n) ~ N(0, v)
- Objective: find Ŝ (n+1) based on the observations, X(1), X(2), …, X(n).
(2) The basic idea:
- If S(n) = S is a constant and v(n) is zero mean noise, then
X (1)  X (2)  ...  X (n  1)
Sˆ (n) 
n 1
and:
X (1)  X (2)  ...  X (n)
Sˆ (n  1) 
n
-
Thus, we have the recursive estimation formula:
n 1 ˆ
1
Sˆ (n  1) 
S ( n)  X ( n)
n
n
-
Note that using the recursive form, we don’t have to worry about the infinite sum.
However, we are interested in the case where S(n)  S. Therefore, the recursive form
would be:
Ŝ (n+1) = a(n) Ŝ (n+1) + b(n)X(n)
or:
Ŝ (n+1) = a(n) Ŝ (n) + b(n)VX(n)
-
where, VX(n) = X(n) - Ŝ (n). This implies that the current signal is a linear
combination of previous signal and an error.
Now, all we have to do is to determine a(n) and b(n). By using MSE, Kalman
delivered a set up of recursive equation as follows.
(3) The Kalman filter procedure
Initialization
n = 1;
p(1) = 2 ( > w and v)
Ŝ (1) = S (a constant)
Iteration
1 get data: W(n), V(n), a(n+1), X(n+1)
p ( n)
k ( n) 
p (n)   V2 (n)
Ŝ (n+1) = a(n+1){ Ŝ (n) + k(n)[X(n) - Ŝ (n)]}
p(n+1) = a2(n+1)[1 – k(n)]p(n) +  W2 (n)
n=n+1
goto 1
-
There is a five page proof in the textbook
It can be extended to matrix form.
(4) An example:
- The model of the signal
S(n) = 0.6S(n-1) + w(n), w(n) ~ N(0, ½)
X(n) = S(n) + v(n), v(n) ~ N(0, 1 2 )
-
The Kalman filter
Initialization:
n = 1;
p(1) = 1;
Ŝ (1) = 0;
Iteration 1:
p (1)
1
k (1) 

 0.6667
2
p (1)   v (1) 1  12
Ŝ (2) = a(2){ Ŝ (1) + k(1)[X(1) - Ŝ (1)]} = (0.6){0 + (0.667)[X(1) – 0] = 0.4X(1)
p(2) = a2(2)[1 – k(1)]p(1) +  W2 (1) = (0.6)2[1 – 0.6667](1) + (1/4) = 0.37
n=1+1=2
Iteration 2:
k (2) 
p ( 2)
0.37

 0.425
2
p (2)   v (2) 0.37  12
Ŝ (3) = a(3){ Ŝ (2) + k(2)[X(2) - Ŝ (2)]} = 0.138X(1) + 0.255X(2)
p(3) = a2(3)[1 – k(2)]p(2) +  W2 (2) = 0.326
n=2+1=3
Iteration 3: ……
-
-
Note
- Kalman filter is a time dependent function of measured signals
- The model of the signal (the signal is a Markov process with time dependent
coefficient a(n)) must be known.
- The statistics of the noises (w and v) must be known.
Kalman filter calculations can be done by using MATLAB as well.
(5) The limit of the filter
- If a(n) does not vary with n, and w(n) and v(n) are both stationary (i.e., w and v are
constants), then both k(n) and p(n) will approach limits as n approaches infinity.
- The limiting k(n) and p(n) can be found by assuming that p(n + 1) = p(n) = p. It is as
follows:
 w2   v2 (a 2  1)   w2   v2 (a 2  1)  4 v2 w2
2
lim p(n)  p 
n 
-
2
Following the example above
p
k
0.25  (0.5)(0.36  1) 
0.25  (0.5)(0.36  1)2  (4)(0.25)(0.5)
2
 0.32
0.32
 0.39
0.32  0.5
10.6 Wiener Filters
(1) Introduction
- Wiener filters is also based on the minimization the MSE and the use of orthogonality
conditions.
-
Download