251v2outl 4/19/2006 (Open this document in 'Outline' view!) K. Two Random Variables. 1. Regression (Summary). 2. Covariance ( xy and s xy ) a. Population Covariance The population covariance is defined, using probability, as Cov( x, y ) xy E x x y y E xy x y . This can be used to describe the relationship between x and y . If the covariance is positive we can say that x and y tend to move together, while if it is negative we can say that they tend to move in opposite directions. In order to use this formula we must realize that E xy xyPx, y . This means that we must add together the product of x and y , together with their joint probability, for each possible pair of values of x and y . For example, assume that x and y are x related by the following joint probability table: 400 y 600 800 400 .12 600 .15 .10 .05 .16 .07 We begin by taking the upper left hand probability, .12, which is the probability that both x and y are 400, and multiplying it by 400 twice. Then we take the next probability in the same row, .15, which is the probability that x is 600 and y is 400, and multiply it by both 600 and 400. If we continue in this way we get E xy xyPxy .12 400 400 .15600 400 .18800 400 .10 400 600 .05600 600 .08800 600 .16 400 800 .07 600 800 .09 800 800 19200 36000 57600 24000 18000 38400 335600 . 51200 33600 57600 We can now use the following tableau to compute the means and variances of x and y . 800 . .18 .08 .09 x 400 y 600 800 Px xPx 400 .12 600 .15 .10 .05 .16 .07 .38 .27 152 162 800 .18 .08 .09 .35 280 P y .45 yP y y 2 P y 180 72000 .23 138 82800 .32 256 204800 1.00 574 359600 594 x 2 Px 60800 97200 224000 382000 2 Px 1 (a check), E x xPx 594 , E x x Px 382000 , P y 1 , E y yP y 574 and E y y P y 359600 To summarize 2 2 x y 2 2 We will need the variances below. To complete what we have done, write xy Covxy Exy x y 335600 594574 5356 b. The Sample Covariance The sample covariance is much easier to compute, the formula being s xy x x y y xy nx y . n 1 n 1 For example, assume that we have data on income ( x ) and savings ( y )(in thousands) for 5 families. Family 1 2 3 4 5 Sum x 34 .7, y 2.8, x and xy 24 .22 . Then x s x2 2 x y x2 y2 xy 1.9 12.4 6.4 7.0 7.0 34.7 0.0 0.9 0.4 1.2 0.3 2.8 3.61 153.76 40.96 49.00 49.00 296.33 0.00 0.81 0.16 1.44 0.09 2.50 0.00 11.16 2.56 8.40 2.10 24.22 296 .33, y 2 2.50 , 34 .7 2 .8 6.94 and y 0.56 . 5 5 x 2 y 2 nx 2 n 1 296 .33 56.94 2 13 .878 , 4 2.50 50.56 2 0.2330 and since n 1 4 24.22 56.94 0.56 xy 24 .22 , s xy 1.197 . 5 1 The positive sign of s xy , the sample covariance, indicates s 2y ny 2 that x and y tend to move together. 3 3. The Correlation Coefficient ( xy and rxy ) The size of a covariance is relatively meaningless; to judge the strength of the relationship between x and y we need to compute the correlation, which is found by dividing the covariance by the standard deviations of x and y . a. Population Correlation. For the population covariance, recall from above that x2 E x 2 x2 382000 594 2 29164 and y2 E y 2 y2 359600 5742 30124 . So that xy xy x y 5356 5356 0.181 . 170 .77 173 .56 29164 30124 The correlation must always be between positive 1 and negative 1 1.0 1.0 . A correlation close to zero is called weak. A correlation that is close to one in absolute value is called strong. (Actually statisticians prefer to look at the value of the correlation squared.) A strong positive correlation indicates that x and y have a relationship that is close to a straight line with a positive slope. A strong negative correlation means that the relationship approximates a straight line with a negative slope. Unfortunately, the correlation only indicates linear relationships; a nonlinear relationship that is obvious on a graph may give a zero correlation. b. Sample Correlation. Recall that s xy 1.197 , s x2 13 .878 , and s 2y 0.2330 . If we divide the correlation by the two standard deviations, we find s xy 1.197 that rxy 0.6657 . sx s y 13 .878 0.2330 4. Functions of Two Random Variables. Cov(ax b, cy d ) acCov( x, y) and if w ax b and v cy d , wv signac xy or Corr (ax b, cy d ) (sign(ac))Corr ( x, y) , where signac has the value 1 or 1 depending on whether the product of a and c is negative or positive. 5. Sums of Random Variables. 4 a. Ex y Ex E y and Var x y x2 y2 2 xy Varx Var y 2Covx, y b. Independence. (i) Definition. Px, y Px P y (ii) Consequences If x and y are independent, E xy Ex E y , Covx, y 0 , xy 0 and Var x y Var x Var y . c. If a, c and d are constants, Var(ax cy) a 2Var( x) c 2Var( y) 2acCov( x, y) . This and a. imply that Eax cy d aEx cE y d and Var(ax cy d ) a 2Var( x) c 2Var( y) 2acCov( x, y) d. Application to portfolio analysis – Most of this is from the document 251var2 in the supplement. If R P1 R1 P2 R2 and P1 P2 1 , then E R P1 E R1 P2 E R2 and VarR P12VarR1 P22VarR2 2P1 P2CovR1 , R2 is the variance of the return. Thus if P1 and P2 are both .50 , we can say VarR =.25VarR1 +.25VarR2 +.50CovR1 R2 . For example, assume that R1 0.20 , R2 0.30 , but R1R 2 is unknown. Then CovR1 ,R2 R1R2 R1 R2 = R1R2 .20.30=.06 R1R2 . If we use the formula for VarR immediately above, VarR .25.202 +.25.302 +.50.06 R1R2 = .0100 + .0225 + .0300 R1R 2 .0325 + .0300 R1R 2 . Now we can see the effect various values of R1R 2 will have on VarR and R Var R . If R1R2 1, VarR=.0325+.03001 =.0625 and R .2500. If R1R2 0, VarR=.0325+.03000 =.0325 and R .1803. If R1R2 1,VarR=.0325+.0300-1=.0025 and R .0500 . The purpose of this section is to show how to find the minimum value for Var R . Since variance is a measure of risk, minimizing variance minimizes risk, though actually, the best measure of risk is probably the coefficient of variation, the standard deviation divided by the mean, in this case C R E R . Remember that VarR= R2 =P12VarR1 +P22VarR2 +2P1 P2 CovR1 R2 . Also recall that, since P1 and P2 are shares of $1.00, P1+P2 1 , then P2 1 P1 . 5 Remember too, that CovR1 ,R2 R1R2 R1 R2 . If we put all this together, VarR P12VarR1 1-P1 2 VarR2 2P1 1-P1 R1R2 R1 R2 . Now let us assume some values for the standard deviations and the correlation. Let R1 0.4 soVarR1 0.16 , R2 0.3 soVarR2 0.09 and CorrR1 ,R2 R1R2 0.5 . Then VarR P12 0.16 1-P1 2 0.09 2P1 1-P1 0.50.40.6. 0.16 P12 0.09 1-P1 2 20.06 P1 1 P1 0.16P12 0.09 1 2P1 P12 0.12 P1-P12 0.16P12 0.09 0.18P12 0.09P12 0.12P1 0.12P12 If we collect terms in P1 and P12 , we get Var R 0.16 0.09 0.12 P12 0.18 0.12 P1 0.09 or Var R 0.13P12 0.06P1 0.09 . In order to minimize risk we pick our value of P1 to give us a minimum variance. If we know calculus, the way that we find this minimum variance is by taking the first derivative of VarR with respect to P1 and setting it equal to zero. Since d Var R 0.26 P1 0.06 , if we set the variance dP1 equal to zero we get 0.26 P1 0.06 0 , which implies that 0.06 0.2308 . Now since P1 P2 1 , we set 0.26 P2 1 P1 0.7692 . That is, to minimize risk, we put about 23% of our money in stock 1 and 77% in stock 2. If we do not know calculus, we can still minimize VarR 0.13P12 0.06P1 0.09 . Try values of P1 at intervals of 0.1 between zero and one. We will find that the smallest values of VarR occur at P1 0.2 and P1 0.3. Now we can try values P1 of P1 at intervals of 0.01 between 0.2 and 0.3. We will find that the smallest value of VarR occurs at P1 0.23 . 6