Lecture Notes No. 4

advertisement
2.160 Identification, Estimation, and Learning
Lecture Notes No. 4
February 17, 2006
3. Random Variables and Random Processes
Deterministic System:
Input
Output
In realty, the observed output is noisy and does not fit the model perfectly. In the
deterministic approach we treat such discrepancies as “error’.
Random Process:
An alternative approach is to explicitly model the process describing how the
“error” is generated.
Exogenous
Input
Observed
Output
Process
Process
Noise
Measurement
Noise
`
Random Process
Objective of modeling random processes:
• Use stochastic properties of the process for better estimating parameters and
state of the process, and
• Better understand, analyze, and evaluate performance of the estimator.
3.1 Random Variables
You may have already learned probability and stochastic processes in some subjects. The
following are fundamentals that will be used regularly for the rest of this course. Check if
you feel comfortable with each of the following definitions and terminology. (Check the
box of each item below.) If not, consult standard textbooks on the subject. See the
references at the end of this chapter.
1
1) Random Variable
X: Random variable is a function that maps every point in the sample space
to the real axis line.
2) Probability distribution Fx(x) and probability density function fx(x); PDF
FX ( x) = Prob( X ≤ x )
f x ( x) =
d
Fx ( x )
dx
f x (x)
x
In the statistics and probability literature, the convention is that capital X represents
a random variable while lower-case x is used for an instantiation/realization of the random
variable.
3) Joint probability densities
Let X and Y be two random variables
fXY(x,y) = Prob (X = x and Y = y, simultaneously)
4) Statistically independent (or simply Independent) random variables
fXY(x,y)= fX(x) fY(y)
5) Conditional probability density
f X Y = Pr ob ( X given Y = y ) =
f XY ( x, y )
fY ( y)
= (Joint probability density divided by Probability density of Y=y)
2
If X and Y are independent,
f XY ( x, y ) f X ( x ) f Y ( y )
=
= f X ( x )
fY ( y )
fY ( y )
Occurrence of Y = y does not influence the occurrence of X = x.
fX Y =
6) Bayes Rule
fX Y =
f Y X ( x )
f X ( x )
� f X Y f Y ( y ) = f XY ( x,
y ) = f Y X ( x ) f X ( x )
fY ( y )
7Expectation
)
∞
Expected value of X
E[ X ] =
∫ xf
X
( x
)dx
−∞
mean, average
For discrete random variables,
E
[ X ] = ∑ pi xi
i
8) Variance
2
2
VarX = E[( X − E ( X )) 2 ] = E[ X 2 − 2 XE ( X ) + (E ( X ) ) ] = E[ X 2 ] − (E ( X ) )
9) Moment
k-th moment of X
∞
E[ x ] =
k
∫x
k
f X ( x )
dx
−∞
k=1 mean, k=2, k=3 higher moments
10) Normal (Gaussian) Random Variable
X ~ N (m x
, σ 2
) : Random variable X has a normal distribution with mean mx and
variance σ 2
The first and second moments completely characterize the distribution.
1
1
f X ( x) =
exp[ − 2 ( x − m x )]
2σ
2π σ
∞
σ 2
= ∫ ( x − m x
) 2 f X ( x )dx
−∞
3
11) Correlation
The expectation of the product of two random variables, X and Y
∞ ∞
E[ XY ] =
∫ ∫ xy f
XY
( x, y ) dxdy
−∞−∞
Joint probability
If X and Y are independent
∞ ∞
E[ XY ] =
∫ ∫ xy f
X
(x) fY ( y ) dxdy
−∞−∞
∞
=
∫x f
−∞
∞
X
( x) dx ⋅ ∫ y fY ( y ) dy
−∞
= E[ X ] E[Y ]
Note that, although the correlation is zero, the two random variables are not
necessarily independent.
12) Orthogonality
X and Y are said to be orthogonal, if
E [XY] = 0
13) Covariance
Covariance of X and Y = E [(X-mX) (Y-mY)]
3.2 Random Processes
Random Variable X
Random Process X(t)
X(t; S1)
S1
X(t; S2)
t
S2
Sample
Space
X(t; Sn)
t
Waveform Space
(Waveform Ensemble)
t1
t2
t
4
A Random Process = a family (ensemble) of time functions having a probability measure
Characterization of a random process
•
First-order Densities
fx2
fx1
X(t1)
•
fx3
X(t2)
X(t3)
Second-order densities
Consider a joint probability density fX1X2(x1,x2)
If the random process has some correlation between two random variables X(t1) and X(t2), it
can be captured by the following autocorrelation: (“auto” means correlation within the
single same random process)
RXX (t1 , t2 ) = E[ X (t1 ) X (t2 )] = ∫∫
−∞
+∞
x1 x2 f X1X 2 (x1 , x2 ) dx
1dx2
If the auto-correlation function depends only on the time difference τ = t1 − t2
, it reduces to
R XX
(τ ) = E[ X (t + τ ) X (t )]
R XX
(τ ) = R
XX
(−τ ) even function
5
Then the process is called “Wide Sense Stationary”.
Auto-covariance:
C XX (t1,t 2 ) = E[( X (t1 ) − mX (t1 )) ⋅ ( X (t 2 ) − mX (t 2 ))]
= RXX (t1,t 2 ) − mX (t1 ) mX (t 2 )
•
Higher-order densities
Joint density fX1X2…. Xn (x1,x2 ,…,xn)
E [ X (t1 ) X (t 2 )… X (
t n )] n-th order
Total characterization: If joint densities of X(t1) ,X(t 2), …
.., X(tn) for all n are
known, the random process is said to be totally (completely) characterized. …
Unrealistic.
Reference Textbook on Random processes
Lonnie Ludeman , “Random Processes - Filtering, Estimation, and Detection”, Wiley 2003,
1-2597
ISBN 0-47
5-6
Robert Brown and Patrick Hwang, “Introduction to Random Signals and Applied Kalman
, ISBN 0-47
5
Filthering, Third Edition”, Wiley 1997
1-12839-2, TK5102.9.B7
6
Application: Adaptive Noise Cancellation
Let us consider a simple example of the above definitions and properties of random
processes. Active noise cancellation is a technique for canceling unwanted noise by
measuring the noise source as well as the signal source. It was originated by Widrow’s
research in the 60’s, but recently this technique was used for advanced headsets, like
“Bose”. So, people are using it daily, without knowing it. Here is how it works.
Live Concert Recording
Microphone 2
Audience Noise
Recording
Microphone 1
Player Recording
True signal from
the player +
x(t)
+
w(t)
Microphone 1
Player Recording
v(t)
z(t)
_
wı (t )
Interference
Dynamics
Noise from the
audience
y(t) +
Microphone 2
Measuring Noise
Adaptive
Filter
Parameter
Estimator
7
Process:
The recorded signal
y (t ) = x(t ) + w(t )
Assumed interference dynamics (FIR):
w(t ) = b1v(t − 1) + b2v(t − 2) + � + bm v(
t − m)
Interference dynamics model (FIR):
wı (t ;θı) = bı1v(t − 1) + bı2 v(t − 2) + � + bımv(t − m) = ϕ T (t ) ⋅θı
Noise cancellation:
z (t : θı) = y (t ) − wı (t : θı)
Problem: Tune the FIR parameters θı so that the recovered signal z (t : θı) is as close to the
original true signal x(t) as possible. Assume that the interference w(t) is strongly correlated
to noise v(t) and that the true signal x(t) is uncorrelated with the noise v(t). Consider the
expectation of the squared output z (t : θı) , i.e. the average power of signal z (t : θı) ,
E[ z (t : θı) 2 ] = E[{x(t ) + w(t ) − ϕ T (t ) ⋅ θı}2 ]
= E[x 2 (t )] + 2E[ x(t ){w(t ) − ϕ T (t ) ⋅ θı}] + E[{w(t ) − ϕ T (t ) ⋅ θı}2 ]
Our objective is to find the parameter vector θı that minimizes the mean squared error
E[{w(t ) − ϕ T (t ) ⋅ θı}2 ] . Examining the second term:
E[ x(t ) w(t )] = E[ x(t )b1v(t − 1)] + E[ x(t )b2v(t − 2)] + � + E[ x(t )bm v(t − m)] = 0
E[ x(t ) ⋅ ϕ T (t )θı] = 0
Therefore, minimizing the average power of z (t : θı) with respect to parameter vector θı is
equivalent to minimizing E [{ w ( t ) − ϕ T ( t ) ⋅ θı } 2 ] ,
θı = arg min E[{w(t ) − ϕ T (t ) ⋅ θ }2 ] = arg min E[ z (t : θ ) 2 ]
θ
θ
since E[ x(t ) 2 ] is not a function of the parameter vector and is not relevant to minimization
of the squared error.
We can use the Recursive Least Squares algorithm with forgetting factor α ( 0 < α ≤ 1 ):
θı(t ) = θı(t − 1) +
Pt =
1
α
(Pt −1 −
Pt −1ϕ (t )
{ y (t ) − ϕ T (t )θı(t − 1)}
T
α + ϕ (t ) Pt −1ϕ (t )
Pt −1ϕ (t )ϕ T (t ) Pt −1
)
α + ϕ T (t ) Pt −1ϕ (t )
8
Download