Statistics Review Problems -- Stat 1040 -- Dr. McGahagan Chapters 16 and 17 Chapter 16 (pp. 285-86 of 3rd edition) Be sure to review 1-8 carefully. 1. Box with 4000 zeros and 6000 ones. 10,000 draws with replacement. Which best describes the situation? a. Exactly 6000 ones will be drawn b. Almost certainly, exactly 6000 ones will be drawn. b. Most likely, the number of ones will be slightly different than 6000. The computation: Let the random variable X = number of ones drawn. Prob (X = 6000) = C(10,000 6000) (pow 6000 .6) (pow 4000 .4) by binomial formula. (binomial-pmf 6000 10000 0.6) = 0.0081 = less than one percent chance of exactly 6000 ones. [The binomial probability mass function is the binomial parallel to the probability density function for other distributions]. To get a sense that the exact expected outcome is rare, scale the problem down to 10 draws: Prob (X = 6 ones) = C(10 6) (pow 0.6 6) (pow 0.4 4) = 210 * 0.0467 * 0.0256 = 0.2508; the most likely single outcome, but has a chance of about 25 percent. Some other outcome is more likely than the exact expected value. 2. If the 10,000 draws are without replacement, you will get the entire population, so exactly 6000 ones will be drawn. 3. If you've lost 10 times in a row, the "law of averages" predicts nothing about what will happen on the next spin of a fair roulette wheel. Both the gambler who thinks he is "due for a win" and the bystander who thinks that his "luck is cold" are wrong. 4. Die to be rolled N times; you win $ 1 if one of the following conditions is met. For each case, would you prefer N to be 60 rolls or 600 rolls? Note that the probability of an ace on any roll is 1/6 = 0.1667 or 16.67 %. a. win if ace shows more than 20 percent of the time. N = 60 is better; the law of averages does say that the percentage of deviations from EV of 16.67 % will become smaller. b. win if ace shows more than 15 percent of the time. N = 600 is better. c. win if ace shows between 15 and 20 percent of the time. N = 600 is better. d. win if the percentage of aces is exactly 16.67 %. Chances are poor either way, but N = 60 offers some hope. See prob. 1; also compute: (binomial-pmf 10 60 (/ 1 6)) = 0.1370 = 13.7 percent. (binomial-pmf 100 600 (/ 1 6)) = 0.0437 = 4.37 percent As the number of possible outcomes becomes greater, the chance of exactly the expected outcome declines. 5.If coin is tossed 100 times, the only way to get a percentage of 50 percent heads is to get exactly 50 heads! The text problem is a garbled statement of the law of averages: the deviation from 50 percent heads is likely to be smaller with 100 tosses than with 10 tosses. 6. Assuming 2 child families and even odds of male and female children, the chance of both being of the same sex will be 50 percent. (Calculate all binomial probabilities, and let X = number of male children; Pr (X = 0) = Pr (F and F) = 0.5 * 0.5 = .25 Pr (X = 1) = Pr [ (F and M) OR (M and F)] = 0.5 * 0.5 + 0.5 * 0.5 = .25 + .25 = 0.5 Pr (X = 2) = Pr (M and M) = 0.5 * 0.5 = .25 Note that X = 0 OR X = 2 meets the condition that both are of the same sex. A deviation from this percentage is more likely if you look at a smaller number of families. 7. Multiple guess test (25 questions, 5 options each, 0.2 chance of correct answer, 4 points per question, and a one point deduction for wrong answer. There should be 5 tickets in the box, with 25 draws. Only one ticket has the winning score of 4; the remainder have a score of -1 on them. The mean of the box is 0, and the EV of sheer guessing is 25 * 0 = 0. The SD of the box is 2; so the variance is 4. For the sum of 25 draws, the variance is 25 * 4 = 100, and hence the standard error of the exam score will be 10. Note that negative exam scores are possible due to the penalty, if there are more than 20 mistakes. Use (binomial-dist 25 .2) to get the probability of any specific number of questions right; you should find there is a 42.07 percent chance of getting a negative score (X <= 4) by guessing. 8. Gambler plays 50 turns of the roulette wheel, betting on 4 numbers; giving 4/38 chance of winning on any one turn. A win means a win of $ 8; a loss means a loss of $ 1. The box model should model the payoffs with 38 tickets, of which just 4 are labeled $ 8, and the other 34 labeled minus $ 1. The net gain in 50 plays will be the sum of 50 draws from that box. 9. Box with more red marbles than blue ones. You win $ 10,000 if at the end of N draws (with replacement), a red marble is drawn more often than a blue one. Would you prefer 100 draws or 200 draws? This is much harder to show definitively: see note 6 on page A 17 for the proof. But it is easy to simulate for a given probability of drawing a red marble, which must be at least a bit more than .5 (say .51). The probability of getting LESS than half red will be given by (binomial-cdf N/2 N 0.51) If N = 100 draws, (binomial-cdf 50 100 0.51) = .4599, so there is a .5401 chance of more red marbles If N = 100 draws, (binomial-cdf 100 200 0.51) = .4158, so there is a .5842 chance of more red marbles. Repeat this with a probability of 0.6 (say) to see that the larger number of draws is even more to be preferred. 10. Two hundred draws mad with replacement from a box given by [ -3 -2 -1 0 1 2 3 ] a. If sum of the 200 draws is 30, what is the average? Answer: 30/200 = + .15 b. If sum of the 200 draws is -20, the average is -20 / 200 = - 0.10 c. In general, average = Sum of draws / Number of draws. d. Consider the alternatives: i. Win $ 1 million if sum of 200 draws is between -5 and +5 ii. Win $ 1 million if average is between -0.025 and + 0.025 The two alternatives are the same, since 5 / 200 = 0.025. Review Questions - Stat 1040 – Dr. McGahagan Chapter 17 -- The Expected Value and the Standard Error (pp. 304-306) 1.One hundred draws with replacement from the box [ 1 6 7 9 9 10 ] a. May be as small as 100 (all ones) or as large at 1000 (all tens). Probability of either extreme = (pow 1/6 100) b. The box has mean of 7 and SD of 3; hence 100 draws will have an EV of 7 and SE = 30. [square root law: standard error of N draws = (sqrt N) * SD of box] This means that the interval from 650 to 750 is the EV +/ 50/30 SD = EV +/- 1.67 SD Using the tables for the normal distribution, the chance of the total falling within that interval is about 90 percent -- 90.11 if you use 1.65 and 91.09 if you use 1.70 as the z-score, and 90.4419 if you use (normal-area (- (/ 50 30)) (/ 50 30)) To simulate this draw: (bind b (list 1 6 6 9 9 10)) sets up the box model. (stats b ) gets the key statistics; all you want are (mean b) and (sd b) (bind sums nil) defines a placeholder for the sums (dotimes (i 1000) (push (sum (draw 100 b)) sums)) (stats sums) Note mean and SD of the 100 draws. Results will vary: one trial gave (mean sums) = 700.097 and (sd sums) = 29.7893. To drive home the point that the distributions of the BOX and of STATISTICS ON DRAWS FROM THE BOX are quite different, draw histograms: (hist b bounds (seq 0 11)) and (hist sums ) Note how much more "normal" the histogram of the sums looks than the histogram of the box. This is an important step to seeing the point of the Central Limit Theorem, which we will soon meet. Chapter 17 (continued) 2.Gambler plays roulette 100 times, betting $ 1 on a column of 12 (of the 38) numbers; a $ 1 bet will net you $2 if you win; of course, if you lose, the house takes the $ 1. Model with a box of 38 tickets, of which 12 are labeled + 3 and 26 labeled -1. The mean of the box will be 2 * 12 / 38 + (- 1) * 26 / 38 = 24 / 38 – 26 /38 = - 2 / 38 = - 0.05263; the average loss is a bit over a nickel on a dollar bet. The SD can be found by the shortcut formula: (big – small) * sqrt [ (fraction big) * (fraction small) ] = [2 - (- 1)] sqrt [ 12/38 * 26/38] = 3 * sqrt (312 / 1444) = 3 * 0.4648 = 1.3945 For the gambler's winnings (the sum of the individual wins or losses, the text formulas give: EV = 100 * -0.0526 = - 5.26 SE = (sqrt 100) * 1.3945 = 13.95 So expected result is a loss of 5.26 cents, “give or take” 13.95. Note that it will not be at all unusual for the gambler to be ahead after 100 plays. The breakeven point is only 5.26 / 13.95 = 0.3770 standard units above the mean, and the tail area for 0.38 is about 35 percent. The gambler wins 35 percent of the time – enough to keep him coming back. Confirm the results with: (bind b (combine (makelist 12 2) (makelist 26 -1))) (stats b) should confirm the shortcut formula. (bind payoff nil) (dotimes (i 10000) (push (sum (draw 100 b)) payoff)) (stats payoff) The simulation gives (on one typical run) : Mean of payoff = - 5.29; SE of payoff = 13.80. Not exact, but close. The histograms (histogram b bounds (seq -2 4)) and (histogram payoff) will again show sharply contrasting distributions. The expected number of wins in 100 plays will be 100 * 12/38 = 1200 / 38 = 31.5789; the SE of the number of wins will be simply (sqrt 100) * sqrt [(12/38) * (26/38)] = 4.6483 3.Shortcut formula for SD of box: SD = (big – little) * sqrt (fraction big * fraction little) a.Box = [ 1, -2, -2] SD = ( 1 - (-2)) * sqrt ( 1/3 * 2/3) = 3 * sqrt (2/9) = 1.4142 Note that -2 is the little value – signs count ! b.Box = [ 15, 15, 16 ] SD = (16 – 15) * sqrt (2/3 * 1/3) = sqrt (2/9) = .4714 c.Box = [ -1, -1, -1, 1] SD = (1 - (- 1) * sqrt (1/4 * 3/4) = 2 * (sqrt 3/16) = 0.8660 d.Box = [0, 0, 0, 1] SD = (1 – 0 ) * sqrt (¼ * ¾) = (sqrt 3/16) = 0.4330 e. Box = [0, 0, 2] SD = (2 – 0) * sqrt (2/3 * 1/3) = 2 (sqrt 2/9) = .9428 4.Roll a die 180 times and count aces. Box = [1,0,0,0,0,0] has mean = 1/6 or 0.1667 and SD = (sqrt 1/6 * 5/6) = .3727 After 180 rolls, we would expect the count of aces to be 180 / 6 = 30 and SE of sum = (sqrt 180) * .3727 = 13.4164 * .3727 = 5.00. Hence the range from 15 to 45 is the EV +/- 3 * SE of sum; from the normal table (or Rule of 1,2,3 ) we expect almost all (more than 99 percent) of the group to get sums in that range. Chapter 17 (continued) 5.Guess the total number of spots on a die thrown N times, with a one dollar penalty for each spot the guess is off. Would you prefer 50 throws or 100 throws? Box this time will be [1,2,3,4,5,6] (contrast with last problem), which has mean 3.5 and SD = 1.7078. EV of sum for 50 throws is 175 and EV of sum for 100 throws is 350, so your best guesses are simple to make. But the SE for the sum of 50 throws will be (sqrt 50) * 1.7078 = 12.0761 and the SE for the sum of 100 throws will be (sqrt 100) * 1.7078 = 17.078. You are likely to make more of a mistake with 100 throws. 6.Consider 100 draws with replacement from the box [1, 1, 2, 3] We are given the results of one experiment: 45 ones, 23 twos and 32 threes. Since the mean of the box is 1.75 and its SD is 0.8292, we would have expected a total of 175 in 100 draws, and would expect a SE of 8.292. We would also expect 25 twos and threes and 50 ones. We actually got a sum of 1* 45 + 2 * 23 + 3 * 32 = 45 + 46 + 96 = 187. This means there was a chance error of 12 (187 – 175) in the sum of the draws. Given the standard error of 8, this is 1.5 standard errors above the expected value, and should be expected to happen 6.68 percent of the time [1.0 – (normal-cdf 1.5) ] The standard error for the number of ones would require a box model [ 1, 1, 0, 0], with mean 0.5 and SD = 0.5 . In 100 draws, we expect 50 ones, and hence the chance error for the number of ones is 50 – 45 = 5. The SE for the number of ones in 100 draws = sqrt (100) * 0.5 = 5. 7. Consider 100 draws with replacement from the box [1,2,3,4,5,6] a. Sum of draws = 321, so average is 3.21 [note that 3.5 is expected value] b. Average of draws = 3.78, and sum will be 378. c. Chance that AVERAGE is between 3 and 4 = Chance that SUM is between 300 and 400. The expected value of the sum is 350, and SE of the sum = (sqrt 100) * SD box = 17.0783 The range from 300 to 400 is EV +/ 50 / 17.0783 SE units = EV +/- 2.92 SE units. Normal table indicates the chance is over 99 percent; (normal-area -2.92 2.92) gives 0.9965. 8.EV and SE of DIFFERENCE of number of heads and tails in 100 tosses Appropriate box model is [ -1, 1]. If you draw one head and one tail, the difference will be zero, as it should. A box model must be numeric, so [heads, tails] is not possible; counting heads as 1 and tails as 0 would mean that the sum of numbers is positive (with EV 50 in 100 draws); counting heads as 0 and tails as -1 gets you an EV of -50. Finally, the box [-1, 0, 1] would only be appropriate with a very thick coin which could land on its side. With the box [-1, 1] the mean is 0 and the SD 1. Hence EV of difference in 100 draws is 100 * 0 = 0, and the SE of the difference is (sqrt 100) * 1 = 10. Chapter 17 (continued) 9.Bet on column or on number. Box for column bet as in problem 2: mean is 2* 12/38 – 1 * 26 /38 = -2/38, SD = (big – little) * (sqrt 12/38 * 26/38) = 3 * .4648 = 1.3945 Box for single number bet = [ 35, -1 ... -1] with 37 minus ones; hence mean = -2/38 = -0.0526, and SD = (big – little ) * (sqrt 1/38 * 37/38) = 36 * 0.1601 = 5.7626 Same EV (loss of 52.60), but betting on a single number has much greater variance for an individual bet or for the sum of 1000 bets. Hence you have a much better chance of winning something by betting on a single number. SE for sum of 1000 column bets = (sqrt 1000) * 1.3945 = 44; SE for 1000 number bets = 182. P(Sum > 0 for 1000 column bets) = P (Z > 52.60 / 44) = P (Z > 1.20) = 15 percent. P(Sum > 0 for 1000 number bets) = P (Z > 52.60 / 182) = P (Z > .289) = 38 percent. You have a greater chance of either winning or losing more than $ 100 with the number bet: A loss of $100 is $ 47.40 below the EV so: P (sum < 100) = P (Z < - 47.40 / 44) = P (Z < - 1.077) = 14 percent for column bet; P (Z < -47.40 / 182) = P (Z < -0.26) = about 40 percent for the single number bet. 10.Quantiles and box models. The key to this problem is that we are given that EV +/- 50 = 75 percent. Hence we find 75 percent (or 74.99, the closest value) in the area column, and that 50 will be 1.15 SE of the sum of draws. A single SE must be 50 / 1.15 = 43.48, so we can calculate Z-scores. For twice the number of draws, the EV will in fact be 800, but the SE will scale up by a factor of the square root of 2, not 2. So the SE is not 86.96 but 61.49 for twice the number of draws, and the Z-score of 100 will be not 1.15 but 100 / 61.49 = 1.626. the probability of being within +/- 1.65 standard errors is not 75 percent, but 95 percent. 11.Sum of positive numbers: the box to address this problem should ignore the negative numbers, and hence should be [0, 0, 0, 1, 3]. It has a mean of 4/5, and a SD of 1.1662. So the sum of 100 draws from this box has an EV of 80 and a SD of 11.662 12.Box model and actual outcomes. The box model is [1, 2, 3, 4, 5, 6, 7], which has mean of 4 and SD of 2. Hence the sum of 100 draws has an expected value of 400 and a SE of 20. a. If the sum of the draws is actually 431 and the EV is 400 the chance error is 31. b. If the sum of the draws is actually 386, and the EV is 400 the chance error is -14. c. If the sum of the draws is actually 417, the chance error is 17. Note that the EV and the SE will not change just because the actual sum differs from the EV.