MOMENTS OF MORE THAN ONE RANDOM VARIABLE Lecture IX Covariance and Correlation 2 Definition 4.3.1: Cov X , Y E X E X Y E Y E XY XEY E X Y E X E Y E XY E X E Y E X E Y E X E Y E XY E X E Y Lecture IX Fall 2007 3 Note that this is simply a generalization of the standard variance formulation. Specifically, letting Y->X yields: Cov XX EXX EX EX E X EX 2 2 Lecture IX Fall 2007 4 From a sample perspective, we have: 1 n 2 V X t 1 xt n 1 n Cov X , Y t 1 xt yt n Lecture IX Fall 2007 5 Together the variance and covariance matrices are typically written as a variance matrix: Cov X , Y xx xy V X Cov X , Y V Y yy yx Cov X , Y xy yx CovY , X Lecture IX Fall 2007 Sample Variance Matrix 6 Substituting the sample measures into the variance matrix yields 1 n 1 n s xx s xy n t 1 xt xt n t 1 xt yt S 1 1 n n s yx s yy yt xt yy t 1 t 1 t t n n n 1 t 1 xt xt n n yt xt t 1 Lecture IX x y t 1 t t n yy t 1 t t n Fall 2007 Matrix Form of Sample Variance 7 The sample covariance matrix can then be written as: 1 x1 n y1 x1 xn yn xn Lecture IX y1 yn Fall 2007 Theoretical Variance Matrix 8 In terms of the theoretical distribution, the variance matrix can be written as: x 2 f x, y dxdy xy f x, y dxdy xy f x , y dxdy 2 y f x , y dxdy Lecture IX Fall 2007 Example 4.3.2 9 X\Y -1 0 1 1 0.167 0.083 0.167 0 0.083 0.000 0.083 -1 0.167 0.083 0.167 0.417 0.167 0.417 0.417 0.167 0.417 Lecture IX Fall 2007 10 Cov(X,Y)=-0.167 0.000 0.167 0.000 0.000 0.000 0.167 0.000 -0.167 0 Lecture IX Fall 2007 11 V(X)=0.167 0.083 0.167 0.000 0.000 0.000 0.167 0.083 0.167 0.8 Lecture IX Fall 2007 12 Theorem 4.3.2. V(X±Y)=V(X)+V(Y)±Cov(X,Y) V X Y E X Y X Y E XX 2 XY YY E XX E YY 2 E XY V ( X ) V (Y ) 2Cov( X , Y ) Lecture IX Fall 2007 13 Note that this result can be obtained from the variance matrix. Specifically, X+Y can be written as a vector operation: X 1 Y X Y 1 Lecture IX Fall 2007 14 Given this vectorization of the problem we can define the variance of the sum as: 1 xx 1 ' xy xy 1 1 xx xy xy yy yy 1 1 xx 2 xy yy Lecture IX Fall 2007 15 Theorem 4.3.3. Let Xi, i=1,2,… be pairwise independent. Then V X n i 1 V X i i 1 n i Lecture IX Fall 2007 16 The simplest proof to this theorem is to use the variance matrix. Note in the preceding example, if X and Y are independent, we have: 1 xx xy 1 xx xy xy yy 1 ' xy yy 1 xx 2 xy yy xx yy Lecture IX Fall 2007 1 1 17 Extending this result to three variables implies: 1 11 12 13 1 1 ' 1 12 22 23 1 13 23 33 1 11 2 12 2 13 22 2 23 33 Lecture IX Fall 2007 Correlation 18 Definition 4.3.2. The correlation coefficient for two variables is defined as: Corr ( X , Y ) CX ,Y 2 x Lecture IX 2 y Fall 2007 19 Note that the covariance between any random variable and a constant is equal to zero. Letting Y equal to zero we have: E X E X Y E Y E X E X 0 E 0 0 Lecture IX Fall 2007 Least Squares Regression 20 We define the ordinary least squares estimator as that set of parameters that minimizes the squared error of the estimate: min E Y X , 2 min E Y 2E Y 2 E XY min E Y 2 2Y 2 XY 2 2X 2 X 2 , 2 , 2 2EX 2 E X 2 Lecture IX Fall 2007 21 The first order conditions for this minimization problem then becomes: S 2 EY 2 2 E X 0 S 2 2 EXY 2E X 2 E X 0 Lecture IX Fall 2007 22 Solving the first equation for α yields: EY EX Substituting this expression into the second first order condition yields: Lecture IX Fall 2007 23 EXY EY EX EX E X 2 0 EXY EY EX E X EX 0 Cov( X , Y ) V X 0 Cov( X , Y ) V (X ) 2 2 Lecture IX Fall 2007 General Matrix Forms 24 min S 2 min Y X ' Y X min Y ' Y Y ' X X ' ' Y ' X ' X S Y ' X X ' Y ' X ' X X ' X 2 2 X ' Y 2 X ' X 0 X ' X ( X 'Y ) 1 Lecture IX Fall 2007 25 Theorem 4.3.6. The best linear predictor (or more exactly, the minimum mean-squared-error linear predictor) of Y based on X is given by α*+β*X, where α* and β* are the least square estimates. Lecture IX Fall 2007