251var2 08/14/03 (Open in page layout) Roger Even Bove FORMULAS FOR FUNCTIONS OF RANDOM VARIABLES I. Basic Computational Formulas for Descriptive Statistics Consider the following set of observations: You can easily verify that y 3 6 9 x 7 15 2 Observation number 1 2 3 x = 8 and y = 6. The formula for the sample variance is s x2 x 2 nx 2 n 1 So that s x2 x 2 nx 2 n 1 7 2 15 2 2 2 382 278 192 43 3 1 2 s x 43 . and Similarly s 2y y 2 ny 2 n 1 3 2 6 2 9 2 362 126 108 9 and s y 3 3 1 2 The formula for the sample covariance is s xy xy nx y n 1 1 n 1 xy nx y So that , for the numbers above 1 (7)(3) (15)(6) (2)(9) 3(8)(6) 1 129 144 7.5 s xy 3 1 2 pg. 58 The only thing that we can usually learn from a covariance is whether the variables x and y move together or in opposite directions. If a covariance is positive the two variables tend to move together. If the covariance is negative the two variables tend to move in the same direction. To find out about the strength of the relationship we compute the correlation. The correlation can only have values between 1.0 and -1.0. The sign of the correlation means the same thing as the sign of the covariance. A correlation close to 1.0 is referred to as a strong positive correlation. A correlation close to -1.0 is a strong negative correlation. If the correlation is 1.0, x and y will tend to move proportionally, that is when x rises, y will rise and when x falls y will fall. When x takes a big jump, y will take a big jump. When x takes a small jump, y takes a small jump. If the correlation is -1 we have the same proportionality, but now if x jumps, y will jump in the opposite direction. If the correlation is zero or close to zero, it is weak, which means there is not much tendency of y to do anything in particular if x moves. s xy The formula for sample correlation is rxy and we know that s xy 7.5 , s x 43 and sx s y s y 3 , so that, for the numbers above rxy 7.5 0.381 . 43 3 The negative covariance tells us that x and y have a tendency to move in opposite directions. The negative correlation tells us the same thing, but the fact that it is closer to zero than 1 leads us to feel that the correlation is weak. Actually statisticians tend to measure strength on a zero to one scale by squaring the correlation. In this case rxy2 0.3812 0.145 , which appears quite weak, though far from nonexistent. The sample covariance is regarded as an estimate of the true or population covariance, just as the sample correlation is regarded as an estimate of the population correlation. Formulas for computing these from a population all of whose points are known are not given here. The next section will deal with computing population covariances and correlations when probabilities are known. pg. 59 2 II. FORMULAS FROM PROBABILITY Let the following table describe the joint probabilities of x and y : x 7 y 15 2 sum px 3 6 9 p y 0.1 0.1 0.0 0.2 0.2 0.0 0.1 0.3 0.0 0.3 0.2 0.5 0.3 0.4 0.3 1.0 yp y 1.4 4.5 1.0 y 2 p y 9.8 67.5 2.0 x 3 6 9 px xpx xpx 0.3 0.4 0.3 1.0 0.9 2.9 2.7 6.0 2.7 14.4 24.3 41.4 E x E x2 6.9 E y 79.3 E y2 xpx 6.0 E y yp y = 6.9 Note that x E x and similarly y E y and x2 E x 2 x2 = 41.4 6 2 5.4 and 2 y 2 2 y = 79 .3 6.9 2 31.69 This implies that x 5.4 2.32 and that y 31.69 5.63 . The formula for the covariance is xy E xy x y xy px, y xy . We call this a population covariance, since the probabilities presumably refer to all values of x and y. The most difficult part of this formula is the evaluation of the expected value of xy , E xy . The idea here is to multiply each possible pair of values of x and y by the joint probability of the pair. One way to do this is to take the joint probability table and to add the values of x and y to it. Notice that in the table below the probabilities (like 0.1) are in exactly the same place as in the joint probability table above and are followed by the corresponding x and y. E xy xy px, y 0.1(3)( 7) 0.2(3)(15) 0.0(3)( 2) 0.1(6)( 7) 0.0(6)(15) 0.3(6)( 2) 0.0(9)( 7) 0.1(9)(15) 0.2(9)( 2) 36 .0 So, xy Exy x y 36.0 (6.0)(6.9) 5.4 . Once again we find a negative covariance, indicating a tendency of x and y to move in opposite directions. To measure the strength of the relationship, we must compute the correlation. As with the sample correlation, the population correlation is computed by dividing the covariance by the standard deviations of x and y . This time the formula for the correlation reads: xy xy x y . From above, we know that 5.4 0.41 . (2.32 )(5.63) As with the sample correlation this can only take values between negative and positive one. Since, if we square -0.41 we get 0.17, this too is a weak correlation. xy 5.4, x 2.32 and y 5.63 . Thus xy pg. 60 3 In many situations, especially with population correlations, we are likely to need the covariance and know the correlation. The formula for population correlation can be rewritten as xy xy x y . Thus, if we know xy 0.41, x 2.32 and y 5.63 , we can compute the covariance. xy 0.41(2.32)(5.63) 5.4 The corresponding formula for the sample covariance is s xy rxy s x s y . pg. 61 4 III FUNCTIONS OF RANDOM VARIABLES A. Functions of a Single Random Variable. 1. The Mean. If we know the mean of the distribution of a random variable, we can easily find the mean of a linear function of the same random variable. For example if we know the mean of x we can find the mean of 5x 7 . In the following let a and b be constants that either multiply x or are added to x . Of course, x E x xp(x) , but these formulas apply to x , the sample mean, as well. a) If b is a constant, then E b b . For example E7 7 . If a is a constant, then E ax aEx . For example, E 5x 5E ( x) 5 x , so that if the mean of x is x 3 , then the mean of 5 x will be b) E5x 5Ex 5 x 53 15. c) If b is a constant, then E x b E x b . For example, E x 7 E x 7 , so that if the mean of x is x 3 , then the mean of x 7 will be Ex 7 Ex 7 x 7 3 7 10 . d) If a and b are both constants, then E ax b aEx b . For example, E 5x 7 5E x 7 , so that if the mean of x is x 3 , then the mean of 5x 7 will E 5x 7 5E x 7 5 x 7 53 7 22 . be Note that rules a), b), and c) are really special cases of rule d). Rule a) is rule d) with a set equal to zero. Rule b) is role d) with b set equal to zero. Rule c) is rule d) with a set equal to one. 2. The Variance. If we know the variance of a random variable, we can find the variance of a linear function of the same variable. For example, if we know the variance of x , we can find the variance of 5x 7 . These formulas are stated in terms of the population variance, x2 Var( x) E x 2 E x 2 x2 s2 x 2 nx n 1 x 2 px x2 , but can also be used for the sample variance 2 . a) If b is a constant, then Var b 0 . For example , Var 7 0. This makes perfect sense. A constant does not vary, so its variance is zero. b) If a is a constant, then Varax a 2 Varx . For example, Var5x 52 Varx 25Varx , so that if the variance is Varx x2 20 , then the variance of 5 x will be Var 5x 25Var x 2520 500 . pg. 62 5 c) If b is a constant, then Var x b Var x . For example, Var ( x 7) Var x so that if the variance of x is x2 20 , then the variance of x 7 is Varx 7 Varx x2 20 . Again this is something like common sense. Adding a constant to x doesn't affect how much it varies, so it doesn't affect its variance. d) If a and b are both constants, then Varax b a 2 Varx . For example, Var5x 7 5 2 Varx , so that if the variance of x is x2 20 , then Var5x 7 5 2 Varx 2520 500 . We can summarize this in the table below: b ax E y b aEx xb E x b If y ax b a E x b Var(y) 0 a 2Varx Varx a 2Varx B. The Mean and Variance of Sums of Random Variables. There are two important rules about sums of random variables. The first one seems to be intuitively obvious, the second one much less so. 1. The Mean. If x1 , x2 , x3 , xn are random variables, then E x1 x 2 x3 x n Ex1 Ex 2 Ex3 Ex n . For example, if E ( x1 ) 4,E ( x2 ) 7 and E x3 9 , then E x1 x 2 x3 4 7 9 20 . 2. The Variance. If x1 , x 2 , x3 , x n are independent random variables, then Varx1 x 2 x3 x n Varx1 Varx 2 Varx3 Varx n . For example, if Var( x1 ) 4,Var( x2 ) 9 and Ex3 16 and these variables are independent, then Varx1 x2 x3 4 9 16 29 . This means, of course, that xx x2 x3 29 , and that you cannot add standard deviations.. pg. 63 6 C. Functions of two Random Variables Since these rules work for sample variances and covariances as well as population variances and covariances, we will use Covx, y in place of xy or sxy , and Corr x, y in place of xy or rxy . Let us assume that we have two variables, x and y , with the following properties Ex 1.2,E y 3.7,Varx 4.0,Var y 9.0,Covx, y 3.0 , so that Corr x, y Covx, y Var x Var y 3.0 4 9 3 23 0.3 . 1. Linear Functions of Two Random Variables. Let us introduce two new variables, w and v , so that w ax b , and v cy d , where a, b, c, and d are constants. From the earlier part of this section we know the following: w2 Varw a 2Varx a 2 x2 w Ew Eax b aEx b v2 Var v c 2Var y c 2 y2 v E v E cy d cE y d To this we now add a new rule: Covw, v wv acCovx, y ac xy To find the correlation between w and v , recall that wv w2 a 2 x2 wv wv . But since w v and v2 c 2 y2 , then ac xy a 2 x2 c 2 y2 ac xy ac x y ac xy signac xy . ac x y Note that, because the ac in the numerator cancels the ac in the denominator, the only thing that ac contributes to the result is its sign. If the product of a and c is negative we reverse the sign of xy . Signac thus takes the values 1 or 1 . For example, let w 5 x 1 and v 3x 2 so that a 5 , b 1 , c 3 and d 2 . Then wv ac xy 53 xy . But we already know that xy 3 , so that xy 5 33 45 . Now remember that x2 4.0 and y2 9.0 , so that w2 a 2 x2 5 2 4.0 100 and v2 c 2 y2 32 9.0 81 . Thus, the correlation between w and v can be found in two ways. 45 wv wv 0.5 w v 100 81 or wv signac xy 10.5 0.5 . Again, remember that these rules hold for sample data too. That is s s w2 a 2 s x2 , s v2 c 2 s 2y , swv ac s xy , and rwv wv signacrxy . s w sv pg. 64 7 2. Sums of Random Variables. The question now is what happens if we add together two random variables. We learned in Section B above that you can add means. This means that if u x y the mean of u will be the sum of the mean of x and the mean of y . Formally Eu Ex y Ex E y x y , or, since this also applies to samples, u x y . For example, if x 1.2 and y 3.7 , then u 1.2 3.7 4.9 . But the situation with variances is not so simple. We learned in Section B that we can add variances only if the variables are independent. If the variables are independent their covariance and correlation will both be zero. But if they are not zero, we must take the value of the covariance into account. Often the covariance will not be available and we must compute xy xy x y . Then, if u x y , we can use the formula: u2 Var u Var x y Var x 2Covx, y Var y x2 2 xy y2 Or, if we are working with sample data: s u2 s x2 2s xy s 2y . For example let us assume that x2 4, y2 9 and xy 0.5 . Then x 2, y 3 and xy xy x y 0.523 3 . Thus Var x y x2 2 xy x2 4 23 9 19 . 3. Sums of Functions of Random Variables By combining the information from the last two sections, we can look at the situation that occurs when we deal with a sum of two random variables. To keep things simple, let w ax , u cy and v w u , so that v ax cy . As far as the mean is concerned, we can say that Ev E w u Eax cy E w E u . But E w E ax aEx E u E cy cE y , so that E v aEx cE y . Alternately, v a x c y . For example, If w 3x and u 5 y so that v 3x 5 y , then E v 3E x 5E y and if E x 1.2 and E y 3.7 , Ev 31.2 53.7 22.1 . Also Varv Varw u Varw 2Covw, u Var u . But since w ax and u cy , Varw a 2Varx , Varu c 2Var y and Covw, u acCovx, y , we can write Varv a 2Varx 2acCovx, y c 2Var y or v2 a 2 x2 2ac xy c 2 y2 . To summarize then, Varax cy a 2Varx 2acCovx, y c 2Var y . For example, Let v 3x 5 y . Then Var 3x+5 y 3 2 x2 235 xy 5 2 y2 . So if x 2 , y 3 , and xy 3 , Var3x+5 y 3 2 2 2 2353 5 2 32 36 + 90 + 225 = 351 . pg. 65 8 IV. APPLICATION TO PORTFOLIO ANALYSIS We can now use the formulas above to find the mean and variance of a portfolio of stocks, To keep things simple, assume that we are offered only two stocks, and that the return of the first stock is R1 , while the return of the second stock is R 2 . Let us assume that each stock sells for $1.00 a share, and that we have exactly $1 to invest. Since we can buy fractional shares, we divide our dollar into two parts, P1 and P2 , where P1+P2=1 . Our total return is R=P1 R1+P2 R2 . For example, if P1=.60 , P2=.40 , R1=.08 and R2=.06 , our total return is R=.60 .08 .40 .06 .072 . A. Mean Return We know that E ax+cy aEx cE y , so that E P1 R1+P2 R2 P1 E R1 +P2 E R2 . For example, if we split our money equally between two stocks P1 and P2 both equal .50 . Then the expected return is E R E .50 1 R+.50 R2 .50 E R1 .50 E R2 . In particular, if E R1 0.20 and E R2 0.24 , E R .50.20 .50.24 0.22 . B. Variance of the Return We know that Varax+cy=a 2Varx+c 2Var y +2acCovx,y , so that VarR=VarP1 R1+P2 R2 =P22VarR1 +P22VarR2 +2P1 P2 CovR1 ,R2 is the variance of the return. Thus if P1 and P2 are both .50 , we can say VarR =.25VarR1 +.25VarR2 +.50CovR1 R2 . For example, assume that R1 0.20 , R2 0.30 , but R1R 2 is unknown. Then CovR1 ,R2 R1R2 R1 R2 = R1R2 .20.30=.06 R1R2 . If we use the formula for VarR immediately above, VarR .25.202 +.25.302 +.50.06 R1R2 = .0100 + .0225 + .0300 R1R 2 .0325 + .0300 R1R 2 . Now we can see the effect various values of R1R 2 will have on VarR and R Var R . If R1R2 1, VarR=.0325+.03001 =.0625 and R .2500. If R1R2 0, VarR=.0325+.03000 =.0325 and R .1803. If R1R2 1,VarR=.0325+.0300-1=.0025 and R .0500 . pg. 66 9 C. Variance Minimization. The purpose of this section is to show how to find the minimum value for Var R . Since variance is a measure of risk, minimizing variance minimizes risk. Remember that VarR= R2 =P12VarR1 +P22VarR2 +2P1 P2 CovR1 R2 . Also recall that, since P1 and P2 are shares of $1.00, P1+P2 1 , then P2 P1 1 . Remember too, that CovR1 ,R2 R1R2 R1 R2 . If we put all this together, VarR P12VarR1 1-P1 2 VarR2 2P1 1-P1 R1R2 R1 R2 . Now let us assume some values for the standard deviations and the correlation. Let R1 0.4 soVarR1 0.16 , R2 0.3 soVarR2 0.09 and CorrR1 ,R2 R1R2 0.5 . Then VarR P12 0.16 1-P1 2 0.09 2P1 1-P1 0.50.40.6. 0.16 P12 0.09 1-P1 2 20.06 P1 1 P1 0.16P12 0.09 1 2P1 P12 0.12 P1-P12 0.16P12 0.09 0.18P12 0.09P12 0.12P1 0.12P12 If we collect terms in P1 and P12 , we get Var R 0.16 0.09 0.12 P12 0.18 0.12 P1 0.09 or Var R 0.13P12 0.06P1 0.09 . In order to minimize risk we pick our value of P1 to give us a minimum variance. The way That we find this minimum variance is by taking the first derivative of VarR with respect to P1 and setting it equal to zero. Since d Var R 0.26 P1 0.06 , if we set the variance equal to zero we get dP1 0.06 0.2308 . Now since 0.26 P1 P2 1 , we set P2 1 P1 0.7692 . That is, to minimize risk, we put about 23% of our money in stock 1 and 77% in stock 2. 0.26 P1 0.06 0 , which implies that P1 pg. 67 10