Notes 21 - Wharton Statistics Department

Statistics 510: Notes 21 Reading: Sections 7.4-7.5 Schedule: I will e-mail homework 9 by Friday. It will not be due in two weeks (the Friday after Thanksgiving). I. Covariance, Variance of Sums and Correlations (Chapter 7.4) The covariance between two random variables is a measure of how they are related. The covariance between X and Y, denoted by Cov( X , Y ) , is defined by Cov( X , Y )  E[( X  E[ X ])(Y  E[Y ])] . Interpretation of Covariance: When Cov( X , Y )  0 , higher than expected values of X tend to occur together with higher than expected values of Y. When Cov( X , Y )  0 , higher than expected values of X tend to occur together with lower than expected values of Y. Example 1: From the Excel file stockdata.xls, we find that the correlations of the monthly log stock returns, in percentages, of Merck & Company, Johnson & Johnson, General Electric, General Motors and Ford Motor Company from January 1990 to December 1999 are Covariance 1 Merck Merck Merck Merck J&J J&J J&J GE GE GM J&J GE GM Ford GE GM Ford GM Ford Ford 36.17205 26.85792 15.92461 16.19535 48.45063 37.28136 40.37623 27.72861 26.63151 84.76252 Correlation: The magnitude of the covariance depends on the variance of X and the variance of Y. A dimensionless measure of the relationship between X and Y is the correlation  ( X , Y ) : Cov( X , Y ) ( X ,Y )  Var ( X )Var (Y ) . The correlation is always between -1 and 1. If X and Y are independent, then  ( X , Y )  0 but the converse is not true. Generally, the correlation is a measure of the degree of linear dependence between X and Y. Note that for a  0, b  0 ,  (aX , bY )  Cov(aX , bY ) abCov( X , Y )   ( X ,Y ) Var (aX ) Var(bY ) a Var( X ) b Var(Y ) (this is what is meant by saying that the correlation is dimensionless – if X and Y are measured in certain units, and the units are changed so that X becomes aX and Y becomes bY, the correlation is not changed). Example 1 continued: From the Excel file stockdata.xls, we find that the correlations of the monthly log stock returns, in percentages, of Merck & Company, Johnson & Johnson, 2 General Electric, General Motors and Ford Motor Company from January 1990 to December 1999 are Merck Merck Merck Merck J&J J&J J&J GE GE GM Correlation 0.419128 0.296728 0.182804 0.182194 0.449641 0.359493 0.381549 0.254941 0.239957 0.793547 J&J GE GM Ford GE GM Ford GM Ford Ford Properties of covariance: By expanding the right side of the definition of the covariance, we see that Cov( X , Y )  E[ XY  E[ X ]Y  XE[Y ]  E[Y ]E[ X ]]  E[ XY ]  E[ X ]E[Y ]  E[ X ]E[Y ]  E[ X ]E[Y ]  E[ XY ]  E[ X ]E[Y ] Note that if X and Y are independent, then E[ XY ]           xyf ( x, y )dxdy    xyf X ( x) fY ( y )dxdy     xf X ( x)  yfY ( y )dy  dx         xf X ( x)E[Y ]dx   E[ X ]E[Y ] Thus, if X and Y are independent, 3 Cov( X , Y )  E[ XY ]  E[ X ]E[Y ]  0 (1.1) The converse of (1.1) is not true. Consider the sample space S  {(2, 4), (1,1), (0, 0), (1,1), (2, 4)} with each point having equal probability. Define the random variable X to be the first component of the sample point chosen, and Y the second. Therefore, X (2, 4)  2, Y (2, 4)  4, and so on. X and Y are dependent, yet their covariance is 0. The former is true because 1 12 2  P( X  1, Y  1)  P ( X  1) * P (Y  1)   5 5 5 25 To verify the latter, note that 1 1 E ( XY )  [(8)  (1)  0  1  8]  0, E ( X )  [(2)  (1)  0  1  2)]  0 5 5 1 and E (Y )  [4  1  0  1  4]  2 5 Thus, Cov( X , Y )  E ( XY )  E ( X ) E (Y )  0  0* 2  0 . Proposition 4.2 lists some properties of covariance. Proposition 4.2: (i) Cov( X , Y )  Cov(Y , X ) (ii) Cov( X , X )  Var ( X ) (iii) Cov(aX , Y )  aCov( X , Y ) m  n  n m (iv) Cov   X i ,  Y j    Cov( X i , Y j ) . j 1  i 1  i 1 j 1 Combining properties (ii) and (iv), we have that 4  n  n Var   X i   Var ( X i )  2 Cov( X i , X j ) (1.2) i j  i 1  i 1 If X 1 , , X n are pairwise independent (meaning that X i and X j are independent for i  j ), then Equation (1.2) reduces to  n  n Var   X i   Var ( X i ) .  i 1  i 1 Example 2: Let X be a hypergeometric (n, m, N  m) random variable – i.e., X is the number of white balls drawn in n random draws from an urn without replacement that originally consists of m white balls and N  m black balls. Find the variance of X. 1 if ith ball selected is white I  Let i 0 if ith ball selected is black .  Then X  I1   I n and Var ( X )  Var ( I1  I n ) n   Var ( I i )  2 Cov( I i , I j ) i 1 i j m E ( I )  I i2  I i so that Var ( I ) i To calculate and i , note N m m m2 2 2 2 E ( I i )  , Var ( I i )  E ( I i )  [ E ( I i )]   2 . N N N 5 To calculate Cov( I i , I j ) , we use the formula Cov( I i , I j )  E ( I i I j )  E ( I i ) E ( I j ) . Note that I i I j  1 if both the ith and jth balls are white, and 0 otherwise. Thus, E ( I i I j ) =P(ith and jth balls are white). By considering the sequence of experiments look at the ith ball, look at the jth ball, then look at the 1st ball, look at the 2nd’s ball, ..., look at the i-1 ball, look at the i+1 ball, ..., look at the j-1 ball, look at the j+1 ball, ..., look at the nth ball, we see that P(ith and jth balls are white) = m *(m  1)*( N  2)*( N  3)* *( N  n  1) m *(m  1)  N *( N  1)* *( N  n  1) N *( N  1) . n *(n  1) E ( I I )  i j Thus, N *( N  1) and Cov( I i , I j )  E ( I i I j )  E ( I i ) E ( I j )  m *(m  1) m m  * N *( N  1) N N  m  m 1 m     N  N 1 N   m  m N    N  ( N  1) N  and 6 Var ( X )  Var ( I1  In ) n   Var ( I i )  2 Cov( I i , I j ) i 1 i j  m  m 2  n(n  1) m  m  N   n    2   2 N  ( N  1) N   N  N   m  m  n 1   1  1   N  N  N  1  Note that the variance of a binomial random variable with n m p  trials and probability N of success for each trial is m m 1   , so the variance for the hypergeometric is N N n 1   1  smaller by a factor of  N  1  ; this is due to the negative covariance between I i and I j for the hypergeometric. II. Conditional Expectation (Chapter 7.5) Recall that if X and Y are joint discrete random variables, the conditional probability mass function of X, given that Y  y , is defined for all y, such that P(Y  y )  0 , by p ( x, y ) p X |Y ( x | y )  P( X  x | Y  y )  pY ( y ) . 7 It is natural to define, in this case, the conditional expectation of X, given that Y  y for all values of y such that pY ( y )  0 by E[ X | Y  y ]   xP{ X  x | Y  y} x   xp X |Y ( x | y ) . x The conditional expectation of X, given that Y  y , represents the long run mean value of X in many independent repetitions of experiments in which Y  y . For continuous random variables, the conditional expectation of X, given that Y  y , by  E[ X | Y  y]   xf X |Y ( x | y)dx  provided that fY ( y)  0 . Example 3: Suppose that events occur according to a Poisson process with rate  . Let N be the number of events occurring in the time period [0,1]. For p  1 , let X be the number of events occurring in the time period [0, p ] . Find the conditional probability mass function and the conditional expectation of X given that N  n . 8 9 III. Computing Expectations and Probabilities by Conditioning (Section 7.5.2-7.5.3) Let us denote by E[ X | Y ] that function of the random variable Y whose value at Y  y is E[ X | Y  y ] . Note that E[ X | Y ] is itself a random variable. An important property of conditional expectations is the following proposition: Proposition 7.5.1: E[ X ]  E[ E[ X | Y ]] (1.3) If Y is a discrete random variable, then equation (1.3) states that E[ X ]   E[ X | Y  y ]P{Y  y} (1.4) y If Y is a continuous random variable, then equation (1.3) states that  E[ X ]   E[ X | Y  y] fY ( y)dy  One way to understand equation (1.4) is to interpret it as follows: To calculate E[ X ] , we may take a weighted average of the conditional expected value of X, given that Y  y , each of the terms E[ X | Y  y ] being weighted by the probability of the event on which it is conditioned. Equation (1.4) is a “law of total expectation” that is analogous to the law of total probability (Section 3.3, notes 6). 10 Example 4: A miner is trapped in a mine containing 3 doors. The first door leads to a tunnel that will take him to safety after 3 hours of travel. The second door leads to a tunnel that will return him to the mine after 5 hours of travel. The third door leads to a tunnel that will return him to the mine after 7 hours. If we assume that the miner is at at all times equally likely to choose any one of the doors, what is the expected length of time until he reaches safety? 11 Example 5: A random rectangle is formed in the following way: The base, X, is a uniform [0,1] random variable and after having generated the base, the height is chosen to be uniform on [0, X ] . Find the expected area of the rectangle. 12 IV. Conditional Variance (Section 7.5.4) The conditional variance of X | Y  y is the expected squared difference of the random variable X and its conditional mean, conditioning on the event that Y  y : Var ( X | Y  y )  E[ X  E[ X | Y  y ] | Y  y ] . 2 There is a very useful formula for the variance of a random variable X in terms of the conditional mean and conditional variance of X | Y : Proposition 7.5.2: Var ( X )  E[Var ( X | Y )]  Var ( E[ X | Y ]) . Proof: By the same reasoning that yields Var ( X )  E[ X 2 ]  ( E[ X ])2 , we have that Var ( X | Y )  E[ X 2 | Y ]  ( E[ X | Y ])2 . Thus, E[Var ( X | Y )]  E[ E[ X 2 | Y ]]  E[( E[ X | Y ]) 2 ] (1.5)  E[ X 2 ]  E[( E[ X | Y ]) 2 ] Also, as E[ E[ X | Y ]]  E[ X ] , we have that Var ( E[ X | Y ])  E[( E[ X | Y ])2 ]  ( E[ X ]) 2 (1.6) Hence, by adding Equations (1.5) and (1.6), we have that Var ( X )  E[Var ( X | Y )]  Var ( E[ X | Y ]) . Example 6: Suppose that by any time t, the number of people that have arrived at a train depot is a Poisson random variable with mean t . If the initial train arrives at 13 the depot at a time (independent of when the passengers arrive) that is uniformly distributed over (0, T ) , what is the mean and variance of the number of passengers that enter the train? 14

Notes 21 - Wharton Statistics Department

Related documents

Products

Support

Notes 21 - Wharton Statistics Department

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib