MATH 2560 C F03 Elementary Statistics I LECTURE 18: Means and Variances of Random Variables. 1 Outline ⇒ the mean of a random variable; ⇒ law of large numbers; ⇒ rules for means; ⇒ the variance of a random variable; ⇒ rules for variance; 2 The Mean of a Random Variable ⇒ Probability is the mathematical language that describes the long-run regular behavior of random phenomena. The probability distribution of a random variable is an idealized relative frequency distribution. ⇒ The mean x̄ of a set of observations is their ordinary average. ⇒ The mean of a random variable X is also an average of the possible values of X : not all outcomes need be equally likely. Example 4.19. Simple lottery wager: the state chooses threedigit winning number at random and pays you $500 if you number is chosen. Let X be the amount your ticket pays you, then the probabiity distribution is: Payoff X $0 $500 Probability 0.999 0.001 There are 1000 three-digit numbers, you have probability 1/1000 of winning. What is your average payoff from many tickets? The ordinary average is: 250 = (500 + 0)/2. It makes no sense as the average because $500 is much less likely than $0. The long-run average payoff is 500 × (1/1000) + 0 × (999/1000) = 0.50. This is the mean of the random variable X. (Tickets cost $1, so in the long run the state keeps half the money you wager.) ⇒ The common symbol for the mean of a probability distribution is µ, the Greek letter mu. ⇒ You will often find the mean of a random variable X called the expected value of X. Below is the general definition for the mean of a discrete random variable. Mean of a Discrete Random Variable Suppose that X is a discrete random variable whose distribution is: Value of X : x1 , x2 , x3 ,...,xk Probability:p1 , p2 , p3 ,...pk . To find the mean of X, multiply each possible value by its probability, then add all the products: X µX = x1 p1 + x2 p2 + ... + xk pk = xi pi . Example 4.20. First nine digits ”at random” and Benford’s Law. First nine digits ”at random”. First digit X Probability 1 1/9 2 1/9 3 1/9 4 ... 1/9 ... 9 1/9 The mean of this distribution is: µX = 1(1/9) + 2(1/9) + ... + 9(1/9) = 45(1/9) = 5. Benford’s Law. First digit V Probability 1 2 3 4 ... 0.301 0.176 0.125 0.097 ... 9 0.046 The mean of V is: µV = 1(0.301) + 2(0.0176) + 3(0.125) + 4(0.097) + 5(0.079) + 6(0.067) +7(0.058) + 8(0.051) + 9(0.046) = 3.441. The means reflect the gerater probability of smaller first digits under Benford’s Law. 2.1 Statistical Estimation and Law of Large Numbers ⇒ To estimate µ, we choose an SRS of population and use the sample mean x̄ to estimate the unknown population mean µ. ⇒ µ is a parameter and x̄ is a statistics. ⇒ If we keep on adding observations to our random sample, the statistic x̄ is guaranteed to get as close as we wish to the parameter µ and then stay that close. ⇒ This remarkable fact is called the law of large numbers. Law of Large Numbers Draw independent observations at random from any population with finite mean µ. Decide how accurately you would like to estimate µ. As the number of observations drawn increases, the mean x̄ of the observed values eventually approaches the mean µ of the population as closely as you specified and then stays that close. ⇒ The behavior of x̄ is similar to the idea of probability. Figure 4.14 shows the behavior of the mean height x̄ of n women chosen at random from a population whose heights follow the N (64.5, 2.5) distribution. 3 Rules for Means Rules for Means Rule 1. If X is a random variable and a and b fixed numbers, then µa+bX = a + bµX . Rule 2. If X and Y are random variables, then µX+Y = µX + µY . Example 4.23. The military and the civilian market. Let X and Y be the number of military and number of civilian units sold, respectively. Gain makes a profit of $2000 on each military unit sold and $3500 on each civilian unit. The military market. Units sold 1000 3000 5000 10, 000 Probability 0.1 0.3 0.4 0.2 µX = 1000(0.1)+3000(0.3)+5000(0.4)+10, 000(0.2) = 100+900+2000+2000 = 5000 units. Using Rule 1 we obtain: The profit is: µ2000X = 2000µX = 2000(5000) = $10, 000, 000. The civilian market. Units sold 300 500 750 Probability 0.4 0.5 0.1 µY = 300(0.4) + 500(0.5) + 750(0.1) = 120 + 250 + 75 = 445 units. Using Rule 1 we obtain: The profit is: µ3500Y = 3500µY = 3500(445) = $1, 557, 500. The total profit (military and civilian): Z = 2000X + 3500Y. Using Rule 2 we obtain: µZ = µ2000X + µ3500Y = 10, 000, 000 + 1, 557, 500 = 11, 557, 500 dollars. Combining Rules 1 and 2 we can obtain the result more quickly: µZ = µ2000X+3500Y = 2000µX +3500µY = 2000(5000)+3500(445) = 11, 557, 500 dollars. 4 The Variance of a Random Variable 2 ⇒ We write the variance of a random variable X as σX . ⇒ The variance is an average of the squared deviation (X − µX )2 of the variable X from its mean µX . This is similar to the difinition of the sample variance s2 given in Chapter 1. Below is the definition of the variance for discrete random variable. Variance of a Discrete Random Variable Suppose that X is a discrete random variable whose distribution is: Value of X : x1 , x2 , ..., xn ; Probability: p1 , p2 , ..., pn . Let µ is the mean of X. The variance of X is: 2 σX = (x1 − µX )2 p1 + (x2 − µX )2 p2 + ... + (xk − µX )2 pk = X (xi − µX )2 pi . The standard deviation σX of X is the square root of the variance: q 2 . σX = + σX Example 4.24. The military market (see Example 4.23). Let us find the mean and variance of X by arranging the calculation in the form of a 2 table. Both µX and σX are sums of columns in this table. The military market. xi 1, 000 3, 000 5, 000 10, 000 − pi xi pi (xi − µX )2 pi 0.1 100 (1, 000 − 5, 000)2 (0.1) = 1, 600, 000 0.3 900 (3, 000 − 5, 000)2 (0.3) = 1, 200, 000 0.4 2, 000 (5, 000 − 5, 000)2 (0.4) = 0 0.2 2, 000 (10, 000 − 5, 000)2 (0.2) = 5, 000, 000 2 − µX = 5, 000 σX = 7, 800, 000 The standard deviation is: σX = p 7, 800, 000 = 2792.8. 5 Rules for Variance Rules for Variance Rule 1. If X is a random variable and a and b are fixed numbers, then 2 2 σa+bX = b2 σX . Rule 2. If X and Y are independent random variables, then 2 2 σX+Y = σX + σY2 , 2 2 σX−Y = σX + σY2 . This is the addition rule for variances of independent random variables. Rule 3. If X and Y have correlation ρ, then 2 2 = σX + σY2 + 2ρσX σY , σX+Y 2 2 σX−Y = σX + σY2 − 2ρσX σY . This is the general addition rule for variance of random variables. ⇒ When random variables are not independent, the variance of their sum depends on the correlation between them as well as on their individual variances. ⇒ The correlation between two independent random variables is zero. ⇒ Rule 2 for variance implies that standard deviations of independent random variables do not add. Example 4.25. Simple lottery wager (see Example 4.19). xi pi xi pi (xi − µX )2 pi 0 0.999 0 (0 − 0.5)2 (0.999) = 0.24975 500 0.001 0.5 (500 − 0.5)2 (0.001) = 249.50025 2 − − µX = 0.5 σX = 249.75 The standard deviation is: σX = √ 249.75 = 15.80 dollars. You lose an average: µW = µX − 1 = 0.5 − 1 = −0.5 dollars, where W =X −1 is your winning. Let us buy a ticket on each of two different days: the payoff X and Y are independent. Total payoff X + Y has mean: µX+Y = µX + µY = 0.50 + 0.50 = 1.00 dollars. The variance of X + Y is: 2 2 σX+Y = σX + σY2 = 249.75 + 249.75 = 499.5. The standard deviation is: σX+Y = √ 499.5 = 22.35 dollars. This is not the same as the sum of the individual standard deviations: 15.80 + 15.80 = 31.60. Example 4.26. SAT scores. SAT math score X SAT verbal score Y µX = 625 σX = 90 µY = 590 σY = 100 The mean overall SAT score is: µX+Y = µX + µY = 625 + 590 = 1215. The variance and standard deviations of the total cannot be computed from the information given. We need to know the correlation between X and Y to apply Rule 3. Let ρ = 0.7. Then: 2 2 σX+Y = σX + σY2 + 2ρσX σY = 902 + 1002 + 2(0.7)(90)(100) = 30, 700. The standard deviation of X + Y is equal to: p σX+Y = 30, 700 = 175. Example 4.27. Investment portfolio and diversification. Someone invested 20% in Treasury bills and 80% in an ”index fund” that represents all US common stocks. Let X and Y be the annual return on T bills and on stocks. The portfolio rate of return is: R = 0.2X + 0.8Y. Based on historucal data, we have: X=annual return on T -bills µX = 5.2% σX = 2.9%; Y =annual return on stocks µY = 13.3% σY = 17.0%; Correlation between X and Y : ρ = −0.1. The mean value of R is: µR = 0.2µX + 0.8µY = (0.2 × 5.2) + (0.8 × 13.3) = 11.68%. Applying Rules 1 and 3 we obtain the variance of the portfolio return: 2 2 σR2 = σ0.2X + σ0.8Y + 2ρσ0.2X σ0.8Y 2 = (0.2)2 σX + (0.8)2 σY2 + 2ρ(0.2σX )(0.8σY ) = 183.719. The standard deviation is: σR = √ 183.719 = 13.55%. 6 Summary 1. The probability distribution of a random variable X, like a distribution of data, has a mean µX and a standard deviation σX . 2. The law of large numbers says that the average of the values of X observed in many trials must approach µ. 3. The mean µ is the balance point of the probability histogram or density curve. If X is discrete with possible values xi having probabilities pi , the mean is the average of the values of X, each weighted by its probability: µX = x1 p1 + x2 p2 + ... + xk pk . 2 4. The variance σX is the average squared deviation of the values of the variable from their mean. For a discrete random variable, 2 σX = (x1 − µ)2 p1 + (x2 − µ)2 p2 + ... + (xk − µ)2 pk . 5. The standard deviation σX is the square root of the variance. The standard deviation measures the variability of the distribution about the mean. It is easiest to intrepret for normal distributions. 6. The mean and variance of a continuous random variable can be computed from the density curve, but to do so requires more advanced mathematics. 7. The mean and variances of random variables obey the following rules. If a and b are fixed numbers, then µa+bX = a + bµX , and 2 2 σa+bX = b2 σX . 8. If X and Y are any two random variables, then µX+Y = µX + µY , and if X and Y are independent, then 2 2 σX+Y = σX + σY2 , and 2 2 + σY2 . = σX σX−Y