MOMENTS OF MORE THAN ONE RANDOM VARIABLE Lecture IX

advertisement
MOMENTS OF MORE THAN
ONE RANDOM VARIABLE
Lecture IX
Covariance and Correlation
2

Definition 4.3.1:
Cov X , Y   E  X  E X Y  E Y 
 E XY  XEY   E  X Y  E X E Y 
 E XY   E X E Y   E X E Y   E X E Y 
 E XY   E X E Y 
Lecture IX
Fall 2007
3

Note that this is simply a generalization of the
standard variance formulation. Specifically, letting
Y->X yields:
Cov XX   EXX   EX EX 
 
 E X  EX 
2
2
Lecture IX
Fall 2007
4

From a sample perspective, we have:
1 n 2
V  X   t 1 xt
n
1 n
Cov X , Y   t 1 xt yt
n
Lecture IX
Fall 2007
5

Together the variance and covariance matrices are
typically written as a variance matrix:
Cov X , Y   xx  xy 
 V X 










Cov
X
,
Y
V
Y
yy 

  yx
Cov X , Y    xy   yx  CovY , X 
Lecture IX
Fall 2007
Sample Variance Matrix
6

Substituting the sample measures into the variance
matrix yields
1 n
1 n

 s xx s xy   n t 1 xt xt n t 1 xt yt 
S



1
1
n
n
 s yx s yy    yt xt
yy

t 1
t 1 t t
n
n

n

1 t 1 xt xt
  n
n  yt xt
 t 1


Lecture IX

x
y
t 1 t t

n
yy
t 1 t t 
n
Fall 2007
Matrix Form of Sample Variance
7

The sample covariance matrix can then be written
as:
1  x1

n  y1
 x1
 xn  


 yn  
 xn
Lecture IX
y1 


yn 
Fall 2007
Theoretical Variance Matrix
8

In terms of the theoretical distribution, the variance
matrix can be written as:
   x 2 f x, y dxdy

 

  

xy f x, y dxdy


  



xy
f
x
,
y
dxdy
 

 
2



y
f
x
,
y
dxdy
 



Lecture IX
Fall 2007
Example 4.3.2
9
X\Y
-1
0
1
1 0.167 0.083 0.167
0 0.083 0.000 0.083
-1 0.167 0.083 0.167
0.417
0.167
0.417
0.417 0.167 0.417
Lecture IX
Fall 2007
10
Cov(X,Y)=-0.167 0.000 0.167
0.000 0.000 0.000
0.167 0.000 -0.167
0
Lecture IX
Fall 2007
11
V(X)=0.167 0.083 0.167
0.000 0.000 0.000
0.167 0.083 0.167
0.8
Lecture IX
Fall 2007
12

Theorem 4.3.2. V(X±Y)=V(X)+V(Y)±Cov(X,Y)
V  X  Y   E  X  Y  X  Y 
 E  XX  2 XY  YY 
 E XX   E YY   2 E XY 
 V ( X )  V (Y )  2Cov( X , Y )
Lecture IX
Fall 2007
13

Note that this result can be obtained from the
variance matrix. Specifically, X+Y can be written
as a vector operation:
X
1
Y    X  Y
1
Lecture IX
Fall 2007
14

Given this vectorization of the problem we can
define the variance of the sum as:
1  xx
1 ' 
   xy
 xy  1
1
  xx   xy  xy   yy  



 yy  1
1
  xx  2 xy   yy
Lecture IX
Fall 2007
15

Theorem 4.3.3. Let Xi, i=1,2,… be pairwise
independent. Then
V
 X   
n
i 1


V
X
i
i 1
n
i
Lecture IX
Fall 2007
16

The simplest proof to this theorem is to use the
variance matrix. Note in the preceding example, if
X and Y are independent, we have:
1  xx  xy  1
  xx   xy  xy   yy

1 ' 


   xy  yy  1
  xx  2 xy   yy

  xx   yy
Lecture IX
Fall 2007
1
1


17

Extending this result to three variables implies:
1  11  12  13  1
1 ' 





1

12
22
23
 
 
1  13  23  33  1
 11  2 12  2 13   22  2 23   33
Lecture IX
Fall 2007
Correlation
18

Definition 4.3.2. The correlation coefficient for two
variables is defined as:
Corr ( X , Y ) 
CX ,Y 

2
x
Lecture IX

2
y
Fall 2007
19

Note that the covariance between any random
variable and a constant is equal to zero. Letting Y
equal to zero we have:
E  X  E X Y  E Y   E  X  E X 0 
 E 0  0
Lecture IX
Fall 2007
Least Squares Regression
20

We define the ordinary least squares estimator as
that set of parameters that minimizes the squared
error of the estimate:

min E Y    X 
 ,
2


 min E Y  2E Y   2 E XY   
 min E Y 2  2Y  2 XY   2  2X   2 X 2
 ,
2
 ,
2

 
 2EX    2 E X 2
Lecture IX
Fall 2007
21

The first order conditions for this minimization
problem then becomes:
S
 2 EY   2  2 E X   0

S
2
 2 EXY   2E X   2 E X  0

 
Lecture IX
Fall 2007
22

Solving the first equation for α yields:
  EY   EX 

Substituting this expression into the second first
order condition yields:
Lecture IX
Fall 2007
23
 
 EXY   EY   EX EX   E X 2  0
 

 EXY   EY EX    E X  EX   0
 Cov( X , Y )  V  X   0
Cov( X , Y )

V (X )
2
2
Lecture IX
Fall 2007
General Matrix Forms
24
min S 2  min Y  X ' Y  X 


 min Y ' Y  Y ' X  X '  ' Y   ' X ' X 

  S  Y ' X  X ' Y   ' X ' X  X ' X
2
 2 X ' Y  2 X ' X  0
   X ' X  ( X 'Y )
1
Lecture IX
Fall 2007
25

Theorem 4.3.6. The best linear predictor (or more
exactly, the minimum mean-squared-error linear
predictor) of Y based on X is given by α*+β*X,
where α* and β* are the least square estimates.
Lecture IX
Fall 2007
Download