Review Of Probability And Statistics

advertisement
Probability and Statistics Review
Expectation
n
Definition: Expectation is the probability-weighted average: E ( X )   xi Pr  X  xi  .
i 1
Definition: Conditional expectation is E ( X | Y  y j )   xi Pr  X  xi | Y  y j  .
n
i 1
Properties:
 E (a)  a
 E (aX )  aE ( X )


 m
 m
E   X j    E  X j  (Expectation is linear)
 j 1  j 1
EY  EX ( X | Y )  E( X ) (Law of iterated expectations)
Variance
Definition: Variance is the squared deviation from the mean. Var ( X )  E ( X  E ( X )) 2  .
2
Definition: Conditional variance is Var ( X | Y )  E  X  E ( X | Y ) | Y  .


Properties:
 

Var ( X )  E X 2   E ( X )



Var (a  X )  Var ( X )
2
Var (aX )  a 2Var ( X )
VarX ( X )  EYVarX ( X | Y )  VarY EX ( X | Y ) (Variance decomposition)
Covariance
Definition: Covariance is Cov( X , Y )  E ( X  E( X ))(Y  E(Y )) .
Properties:
 Cov( X , X )  Var ( X )
 Cov( X , Y )  Cov(Y , X )
 Cov( X , a)  0
 Cov( X , Y )  E ( XY )  E ( X ) E (Y )

m

m
m 1
 Var   X j   Var ( X j )  2
m
 Cov( X j , X k )
j 1 k  j 1
 j 1  j 1
l
 m
 m l
 Cov   X j ,  Yk    Cov( X j , Z k ) (Covariance is bilinear)
k 1
 j 1
 j 1 k 1
Correlation
Cov( X , Y )
Var ( X )Var (Y )
Geometric intuition: Corr ( X , Y )  cos( ) , the angle between x and y ,
where x  X  E ( X ) and y  Y  E (Y ) .
Definition: Correlation is Corr ( X , Y ) 
Properties:
 Corr ( X , Y )  cos( )  1, 1



If Corr ( X , Y )  0 , the variables are orthogonal, i.e.   90 .
If Corr ( X , Y )  1 , the vectors have the same direction, i.e.   0 .
If Corr ( X , Y )  1 , the vectors have diametrally oposite directions, i.e.   180 .
Relationship between Conditional Expectation, Independence, and Covariance
What is the relationship between the following statements?
(1) X and Y are independent, i.e. Pr Y  y j | X  xi   Pr Y  y j  for any xi and y j
(2) EY (Y | X )  E (Y )
(3) Cov( X , Y )  0
(1) implies (2)
EY (Y | X )   y j Pr Y  y j | X  xi  
def
n
 y j Pr Y  y j   E(Y )
(1) indep n
j 1
def
j 1
(2) implies (3)
LIE
const X
(2)
E ( XY )  EX  EY ( XY | X )  EX  XEY (Y | X )  EX  XE (Y )
So Cov( X , Y )  E ( XY )  E ( X )  E (Y )  0
const E (Y )

E ( X ) E (Y )
(3) does not necessarily imply (2)
Example: Let’s say the following outcomes can happen with equal probability:
X
-2
-1
0
1
2
Y
4
1
0
1
4
You can check that E ( X )  0 , E (Y )  2 , and E ( XY )  0 ,
so Cov( X , Y )  E ( XY )  E ( X ) E (Y )  0
However, E (Y | X )  X 2 , which is different from E (Y )  2 , so (2) fails even if (3) holds.
(2) does not necessarily imply (1)
1
Example: Let’s say Pr Y  1  Pr Y  0  Pr Y  1  , Pr Y  0 | X  1  1 , and
3
Pr Y  1| X  1  Pr Y  1| X  1  0 .
Then it follows that E (Y )  0 and EY (Y | X )  0 but Pr Y  0  Pr Y  0 | X  1 .
So (1) implies (2) implies (3), but not the other way around.
TF Laura Serban
ECON 1123 - Fall 2006
Derivations of Properties
(Optional)
Expectation
Property: E (a )  a
def
1
Proof: E (a )   a 1  a
i 1
Property: E (aX )  aE ( X )
def
n
n
i 1
i 1
def
Proof: E (aX )   axi Pr  X  xi   a  xi Pr  X  xi   aE ( X )
 m
 m
Property: E   X j    E  X j  (Expectation is linear)
 j 1  j 1
nm
m

 def n1
 m

Proof: E   X j       x ji Pr  X1  x1i ,..., X m  xmi   
 j 1  i1 1 im 1  j 1

m
n1
nj
nm
m
im
j 1 i 1
 x ji Pr  X1  x1i ,..., X m  xmi    x ji Pr  X j  x ji    E  X j 
 
j 1 i1 1
Property: EY  EX ( X | Y )  E( X ) (Law of iterated expectations)
Proof: EY E X ( X | Y )   xi P( X  xi | Y  y j ) P(Y  y j ) 
j
i
  xi  P( X  xi & Y  y j )   xi P( X  xi )  E ( X )
i
j
i
def
m
j 1
Variance
 
Property: Var ( X )  E X 2   E ( X )
2
def
2
2
Proof: Var ( X )  E  X  E ( X )    E  X 2  2 XE ( X )   E ( X )   




 
 
 E X 2  2  E ( X )   E ( X )  E X 2   E ( X ) 
2
2
2
Property: Var (a  X )  Var ( X )
def
def
2
2
Proof: Var (a  X )  E  a  X  E (a  X )    E  a  X  a  E ( X )    Var ( X )




Property: Var (aX )  a 2Var ( X )
def
def
2
2
2
Proof: Var (aX )  E  aX  E (aX )    E a 2  X  E ( X )    a 2 E  X  E ( X )    a 2Var ( X )






Property: VarX ( X )  EYVarX ( X | Y )  VarY EX ( X | Y ) (Variance decomposition)
Proof: EYVarX ( X | Y )  VarY EX ( X | Y ) 


 EY EX ( X 2 | Y )   EX ( X | Y )  EY
2
 E ( X | Y )    E E ( X | Y ) 
2
2
X
Y
X
 EY E X ( X 2 | Y )   EY E X ( X | Y )   E X ( X 2 )   E X ( X )   VarX ( X )
2
2
Covariance
Property: Cov( X , X )  Var ( X )
def
def
Proof: Cov( X , X )  E ( X  E ( X ))( X  E ( X ))  E ( X  E ( X )) 2   Var ( X )
Property: Cov( X , Y )  Cov(Y , X )
def
def
Proof: Cov( X , Y )  E ( X  E ( X ))(Y  E (Y ))  E (Y  E (Y ))( X  E ( X ))   Var (Y , X )
Property: Cov( X , a)  0
def
Proof: Cov( X , a)  E ( X  E ( X ))(a  E (a))  E (Y  E (Y ))  0  0
Property: Cov( X , Y )  E ( XY )  E ( X ) E (Y )
def
Proof: Cov( X , Y )  E ( X  E ( X ))(Y  E (Y ))  E  XY  XE (Y )  E ( X )Y  E ( X ) E (Y )  
 E ( XY )  2 E ( X ) E (Y )  E ( X ) E (Y )  E ( XY )  E ( X ) E (Y )
m 1 m
 m
 m
Property: Var   X j   Var ( X j )  2 Cov( X j , X k )
j 1 j 1
 j 1  j 1
2
2
 m
 def 
 m
  
 
 m
  m
Proof: Var   X j   E   X j  E   X j     E    X j  E ( X j )    
 j 1 
 j 1   
 

  j 1
  j 1
m 1 m
m 1 m
m
 m
2
 E   X j  E ( X j )   2  X j  E ( X j )   X k  E ( X k )   Var ( X j )  2 Cov( X j , X k )
j 1 j 1
j 1 j 1
 j 1
 j 1
l
 m
 m l
Property: Cov   X j ,  Yk    Cov( X j , Z k ) (Covariance is bilinear)
k 1
 j 1
 j 1 k 1
l

 m

 m
  l
 l
 
 m

Proof: Cov   X j ,  Yk   E   X j  E   X j    Yk  E   Yk    
k 1
 k 1   
 j 1

 j 1    k 1
  j 1


  m
 l
m l

 
 E     X j  E ( X j )     Yk  E (Yk )     E   X j  E ( X j )  Yk  E (Yk )   
 
  k 1
 j 1 k 1

  j 1
  E  X j  E ( X j )  Yk  E (Yk )    Cov( X j , Z k )
m
l
j 1 k 1
def
m
l
j 1 k 1
Correlation
Property: Corr ( X , Y )  cos( )  1, 1
Proof: Set x  X  E ( X ) and y  Y  E (Y )
The sample correlation is
n
Corr ( X , Y ) 
n
 ( X i  X )(Yi  Y )

i 1
n
(X
i 1
n
 (Y  Y )
2
i  X) 
2
i
i 1
x y
i
i 1
n
x
i 1
2
i

i
n
y
i 1
2
i
def
Remember the dot product from vector algebra: x  y  x  y  cos( ) .
1, if i  j
This means that the unit vectors satisfy 1i  1j  
0, if i  j
n
 n n
 n
  n
Another way to write the dot product is x  y    xi 1i     y j 1j    xi y j 1i  1j   xi yi .
i 1
 i 1
  j 1
 i 1 j 1
The length of the vector is x 
n
 xi2 and y 
i 1
x  y  cos( )
n
y
i 1
2
i
.
 cos( ) .
x  y
The result also indicates precisely how correlation is a normalization of covariance.
This shows that Corr ( X , Y ) 
Download