Probability and Statistics Review Expectation n Definition: Expectation is the probability-weighted average: E ( X ) xi Pr X xi . i 1 Definition: Conditional expectation is E ( X | Y y j ) xi Pr X xi | Y y j . n i 1 Properties: E (a) a E (aX ) aE ( X ) m m E X j E X j (Expectation is linear) j 1 j 1 EY EX ( X | Y ) E( X ) (Law of iterated expectations) Variance Definition: Variance is the squared deviation from the mean. Var ( X ) E ( X E ( X )) 2 . 2 Definition: Conditional variance is Var ( X | Y ) E X E ( X | Y ) | Y . Properties: Var ( X ) E X 2 E ( X ) Var (a X ) Var ( X ) 2 Var (aX ) a 2Var ( X ) VarX ( X ) EYVarX ( X | Y ) VarY EX ( X | Y ) (Variance decomposition) Covariance Definition: Covariance is Cov( X , Y ) E ( X E( X ))(Y E(Y )) . Properties: Cov( X , X ) Var ( X ) Cov( X , Y ) Cov(Y , X ) Cov( X , a) 0 Cov( X , Y ) E ( XY ) E ( X ) E (Y ) m m m 1 Var X j Var ( X j ) 2 m Cov( X j , X k ) j 1 k j 1 j 1 j 1 l m m l Cov X j , Yk Cov( X j , Z k ) (Covariance is bilinear) k 1 j 1 j 1 k 1 Correlation Cov( X , Y ) Var ( X )Var (Y ) Geometric intuition: Corr ( X , Y ) cos( ) , the angle between x and y , where x X E ( X ) and y Y E (Y ) . Definition: Correlation is Corr ( X , Y ) Properties: Corr ( X , Y ) cos( ) 1, 1 If Corr ( X , Y ) 0 , the variables are orthogonal, i.e. 90 . If Corr ( X , Y ) 1 , the vectors have the same direction, i.e. 0 . If Corr ( X , Y ) 1 , the vectors have diametrally oposite directions, i.e. 180 . Relationship between Conditional Expectation, Independence, and Covariance What is the relationship between the following statements? (1) X and Y are independent, i.e. Pr Y y j | X xi Pr Y y j for any xi and y j (2) EY (Y | X ) E (Y ) (3) Cov( X , Y ) 0 (1) implies (2) EY (Y | X ) y j Pr Y y j | X xi def n y j Pr Y y j E(Y ) (1) indep n j 1 def j 1 (2) implies (3) LIE const X (2) E ( XY ) EX EY ( XY | X ) EX XEY (Y | X ) EX XE (Y ) So Cov( X , Y ) E ( XY ) E ( X ) E (Y ) 0 const E (Y ) E ( X ) E (Y ) (3) does not necessarily imply (2) Example: Let’s say the following outcomes can happen with equal probability: X -2 -1 0 1 2 Y 4 1 0 1 4 You can check that E ( X ) 0 , E (Y ) 2 , and E ( XY ) 0 , so Cov( X , Y ) E ( XY ) E ( X ) E (Y ) 0 However, E (Y | X ) X 2 , which is different from E (Y ) 2 , so (2) fails even if (3) holds. (2) does not necessarily imply (1) 1 Example: Let’s say Pr Y 1 Pr Y 0 Pr Y 1 , Pr Y 0 | X 1 1 , and 3 Pr Y 1| X 1 Pr Y 1| X 1 0 . Then it follows that E (Y ) 0 and EY (Y | X ) 0 but Pr Y 0 Pr Y 0 | X 1 . So (1) implies (2) implies (3), but not the other way around. TF Laura Serban ECON 1123 - Fall 2006 Derivations of Properties (Optional) Expectation Property: E (a ) a def 1 Proof: E (a ) a 1 a i 1 Property: E (aX ) aE ( X ) def n n i 1 i 1 def Proof: E (aX ) axi Pr X xi a xi Pr X xi aE ( X ) m m Property: E X j E X j (Expectation is linear) j 1 j 1 nm m def n1 m Proof: E X j x ji Pr X1 x1i ,..., X m xmi j 1 i1 1 im 1 j 1 m n1 nj nm m im j 1 i 1 x ji Pr X1 x1i ,..., X m xmi x ji Pr X j x ji E X j j 1 i1 1 Property: EY EX ( X | Y ) E( X ) (Law of iterated expectations) Proof: EY E X ( X | Y ) xi P( X xi | Y y j ) P(Y y j ) j i xi P( X xi & Y y j ) xi P( X xi ) E ( X ) i j i def m j 1 Variance Property: Var ( X ) E X 2 E ( X ) 2 def 2 2 Proof: Var ( X ) E X E ( X ) E X 2 2 XE ( X ) E ( X ) E X 2 2 E ( X ) E ( X ) E X 2 E ( X ) 2 2 2 Property: Var (a X ) Var ( X ) def def 2 2 Proof: Var (a X ) E a X E (a X ) E a X a E ( X ) Var ( X ) Property: Var (aX ) a 2Var ( X ) def def 2 2 2 Proof: Var (aX ) E aX E (aX ) E a 2 X E ( X ) a 2 E X E ( X ) a 2Var ( X ) Property: VarX ( X ) EYVarX ( X | Y ) VarY EX ( X | Y ) (Variance decomposition) Proof: EYVarX ( X | Y ) VarY EX ( X | Y ) EY EX ( X 2 | Y ) EX ( X | Y ) EY 2 E ( X | Y ) E E ( X | Y ) 2 2 X Y X EY E X ( X 2 | Y ) EY E X ( X | Y ) E X ( X 2 ) E X ( X ) VarX ( X ) 2 2 Covariance Property: Cov( X , X ) Var ( X ) def def Proof: Cov( X , X ) E ( X E ( X ))( X E ( X )) E ( X E ( X )) 2 Var ( X ) Property: Cov( X , Y ) Cov(Y , X ) def def Proof: Cov( X , Y ) E ( X E ( X ))(Y E (Y )) E (Y E (Y ))( X E ( X )) Var (Y , X ) Property: Cov( X , a) 0 def Proof: Cov( X , a) E ( X E ( X ))(a E (a)) E (Y E (Y )) 0 0 Property: Cov( X , Y ) E ( XY ) E ( X ) E (Y ) def Proof: Cov( X , Y ) E ( X E ( X ))(Y E (Y )) E XY XE (Y ) E ( X )Y E ( X ) E (Y ) E ( XY ) 2 E ( X ) E (Y ) E ( X ) E (Y ) E ( XY ) E ( X ) E (Y ) m 1 m m m Property: Var X j Var ( X j ) 2 Cov( X j , X k ) j 1 j 1 j 1 j 1 2 2 m def m m m Proof: Var X j E X j E X j E X j E ( X j ) j 1 j 1 j 1 j 1 m 1 m m 1 m m m 2 E X j E ( X j ) 2 X j E ( X j ) X k E ( X k ) Var ( X j ) 2 Cov( X j , X k ) j 1 j 1 j 1 j 1 j 1 j 1 l m m l Property: Cov X j , Yk Cov( X j , Z k ) (Covariance is bilinear) k 1 j 1 j 1 k 1 l m m l l m Proof: Cov X j , Yk E X j E X j Yk E Yk k 1 k 1 j 1 j 1 k 1 j 1 m l m l E X j E ( X j ) Yk E (Yk ) E X j E ( X j ) Yk E (Yk ) k 1 j 1 k 1 j 1 E X j E ( X j ) Yk E (Yk ) Cov( X j , Z k ) m l j 1 k 1 def m l j 1 k 1 Correlation Property: Corr ( X , Y ) cos( ) 1, 1 Proof: Set x X E ( X ) and y Y E (Y ) The sample correlation is n Corr ( X , Y ) n ( X i X )(Yi Y ) i 1 n (X i 1 n (Y Y ) 2 i X) 2 i i 1 x y i i 1 n x i 1 2 i i n y i 1 2 i def Remember the dot product from vector algebra: x y x y cos( ) . 1, if i j This means that the unit vectors satisfy 1i 1j 0, if i j n n n n n Another way to write the dot product is x y xi 1i y j 1j xi y j 1i 1j xi yi . i 1 i 1 j 1 i 1 j 1 The length of the vector is x n xi2 and y i 1 x y cos( ) n y i 1 2 i . cos( ) . x y The result also indicates precisely how correlation is a normalization of covariance. This shows that Corr ( X , Y )