2.160 Identification, Estimation, and Learning Lecture Notes No. 4 February 17, 2006 3. Random Variables and Random Processes Deterministic System: Input Output In realty, the observed output is noisy and does not fit the model perfectly. In the deterministic approach we treat such discrepancies as “error’. Random Process: An alternative approach is to explicitly model the process describing how the “error” is generated. Exogenous Input Observed Output Process Process Noise Measurement Noise ` Random Process Objective of modeling random processes: • Use stochastic properties of the process for better estimating parameters and state of the process, and • Better understand, analyze, and evaluate performance of the estimator. 3.1 Random Variables You may have already learned probability and stochastic processes in some subjects. The following are fundamentals that will be used regularly for the rest of this course. Check if you feel comfortable with each of the following definitions and terminology. (Check the box of each item below.) If not, consult standard textbooks on the subject. See the references at the end of this chapter. 1 1) Random Variable X: Random variable is a function that maps every point in the sample space to the real axis line. 2) Probability distribution Fx(x) and probability density function fx(x); PDF FX ( x) = Prob( X ≤ x ) f x ( x) = d Fx ( x ) dx f x (x) x In the statistics and probability literature, the convention is that capital X represents a random variable while lower-case x is used for an instantiation/realization of the random variable. 3) Joint probability densities Let X and Y be two random variables fXY(x,y) = Prob (X = x and Y = y, simultaneously) 4) Statistically independent (or simply Independent) random variables fXY(x,y)= fX(x) fY(y) 5) Conditional probability density f X Y = Pr ob ( X given Y = y ) = f XY ( x, y ) fY ( y) = (Joint probability density divided by Probability density of Y=y) 2 If X and Y are independent, f XY ( x, y ) f X ( x ) f Y ( y ) = = f X ( x ) fY ( y ) fY ( y ) Occurrence of Y = y does not influence the occurrence of X = x. fX Y = 6) Bayes Rule fX Y = f Y X ( x ) f X ( x ) � f X Y f Y ( y ) = f XY ( x, y ) = f Y X ( x ) f X ( x ) fY ( y ) 7Expectation ) ∞ Expected value of X E[ X ] = ∫ xf X ( x )dx −∞ mean, average For discrete random variables, E [ X ] = ∑ pi xi i 8) Variance 2 2 VarX = E[( X − E ( X )) 2 ] = E[ X 2 − 2 XE ( X ) + (E ( X ) ) ] = E[ X 2 ] − (E ( X ) ) 9) Moment k-th moment of X ∞ E[ x ] = k ∫x k f X ( x ) dx −∞ k=1 mean, k=2, k=3 higher moments 10) Normal (Gaussian) Random Variable X ~ N (m x , σ 2 ) : Random variable X has a normal distribution with mean mx and variance σ 2 The first and second moments completely characterize the distribution. 1 1 f X ( x) = exp[ − 2 ( x − m x )] 2σ 2π σ ∞ σ 2 = ∫ ( x − m x ) 2 f X ( x )dx −∞ 3 11) Correlation The expectation of the product of two random variables, X and Y ∞ ∞ E[ XY ] = ∫ ∫ xy f XY ( x, y ) dxdy −∞−∞ Joint probability If X and Y are independent ∞ ∞ E[ XY ] = ∫ ∫ xy f X (x) fY ( y ) dxdy −∞−∞ ∞ = ∫x f −∞ ∞ X ( x) dx ⋅ ∫ y fY ( y ) dy −∞ = E[ X ] E[Y ] Note that, although the correlation is zero, the two random variables are not necessarily independent. 12) Orthogonality X and Y are said to be orthogonal, if E [XY] = 0 13) Covariance Covariance of X and Y = E [(X-mX) (Y-mY)] 3.2 Random Processes Random Variable X Random Process X(t) X(t; S1) S1 X(t; S2) t S2 Sample Space X(t; Sn) t Waveform Space (Waveform Ensemble) t1 t2 t 4 A Random Process = a family (ensemble) of time functions having a probability measure Characterization of a random process • First-order Densities fx2 fx1 X(t1) • fx3 X(t2) X(t3) Second-order densities Consider a joint probability density fX1X2(x1,x2) If the random process has some correlation between two random variables X(t1) and X(t2), it can be captured by the following autocorrelation: (“auto” means correlation within the single same random process) RXX (t1 , t2 ) = E[ X (t1 ) X (t2 )] = ∫∫ −∞ +∞ x1 x2 f X1X 2 (x1 , x2 ) dx 1dx2 If the auto-correlation function depends only on the time difference τ = t1 − t2 , it reduces to R XX (τ ) = E[ X (t + τ ) X (t )] R XX (τ ) = R XX (−τ ) even function 5 Then the process is called “Wide Sense Stationary”. Auto-covariance: C XX (t1,t 2 ) = E[( X (t1 ) − mX (t1 )) ⋅ ( X (t 2 ) − mX (t 2 ))] = RXX (t1,t 2 ) − mX (t1 ) mX (t 2 ) • Higher-order densities Joint density fX1X2…. Xn (x1,x2 ,…,xn) E [ X (t1 ) X (t 2 )… X ( t n )] n-th order Total characterization: If joint densities of X(t1) ,X(t 2), … .., X(tn) for all n are known, the random process is said to be totally (completely) characterized. … Unrealistic. Reference Textbook on Random processes Lonnie Ludeman , “Random Processes - Filtering, Estimation, and Detection”, Wiley 2003, 1-2597 ISBN 0-47 5-6 Robert Brown and Patrick Hwang, “Introduction to Random Signals and Applied Kalman , ISBN 0-47 5 Filthering, Third Edition”, Wiley 1997 1-12839-2, TK5102.9.B7 6 Application: Adaptive Noise Cancellation Let us consider a simple example of the above definitions and properties of random processes. Active noise cancellation is a technique for canceling unwanted noise by measuring the noise source as well as the signal source. It was originated by Widrow’s research in the 60’s, but recently this technique was used for advanced headsets, like “Bose”. So, people are using it daily, without knowing it. Here is how it works. Live Concert Recording Microphone 2 Audience Noise Recording Microphone 1 Player Recording True signal from the player + x(t) + w(t) Microphone 1 Player Recording v(t) z(t) _ wı (t ) Interference Dynamics Noise from the audience y(t) + Microphone 2 Measuring Noise Adaptive Filter Parameter Estimator 7 Process: The recorded signal y (t ) = x(t ) + w(t ) Assumed interference dynamics (FIR): w(t ) = b1v(t − 1) + b2v(t − 2) + � + bm v( t − m) Interference dynamics model (FIR): wı (t ;θı) = bı1v(t − 1) + bı2 v(t − 2) + � + bımv(t − m) = ϕ T (t ) ⋅θı Noise cancellation: z (t : θı) = y (t ) − wı (t : θı) Problem: Tune the FIR parameters θı so that the recovered signal z (t : θı) is as close to the original true signal x(t) as possible. Assume that the interference w(t) is strongly correlated to noise v(t) and that the true signal x(t) is uncorrelated with the noise v(t). Consider the expectation of the squared output z (t : θı) , i.e. the average power of signal z (t : θı) , E[ z (t : θı) 2 ] = E[{x(t ) + w(t ) − ϕ T (t ) ⋅ θı}2 ] = E[x 2 (t )] + 2E[ x(t ){w(t ) − ϕ T (t ) ⋅ θı}] + E[{w(t ) − ϕ T (t ) ⋅ θı}2 ] Our objective is to find the parameter vector θı that minimizes the mean squared error E[{w(t ) − ϕ T (t ) ⋅ θı}2 ] . Examining the second term: E[ x(t ) w(t )] = E[ x(t )b1v(t − 1)] + E[ x(t )b2v(t − 2)] + � + E[ x(t )bm v(t − m)] = 0 E[ x(t ) ⋅ ϕ T (t )θı] = 0 Therefore, minimizing the average power of z (t : θı) with respect to parameter vector θı is equivalent to minimizing E [{ w ( t ) − ϕ T ( t ) ⋅ θı } 2 ] , θı = arg min E[{w(t ) − ϕ T (t ) ⋅ θ }2 ] = arg min E[ z (t : θ ) 2 ] θ θ since E[ x(t ) 2 ] is not a function of the parameter vector and is not relevant to minimization of the squared error. We can use the Recursive Least Squares algorithm with forgetting factor α ( 0 < α ≤ 1 ): θı(t ) = θı(t − 1) + Pt = 1 α (Pt −1 − Pt −1ϕ (t ) { y (t ) − ϕ T (t )θı(t − 1)} T α + ϕ (t ) Pt −1ϕ (t ) Pt −1ϕ (t )ϕ T (t ) Pt −1 ) α + ϕ T (t ) Pt −1ϕ (t ) 8