TOPIC 15 Central Limit Theorem In-Class Activities Activity 15-1: Smoking Rates 15-1, 15-2, 15-9 a. π b. No the sample result will not equal .209 exactly in general because of sampling variability c. The CLT predicts the sampling distribution of pö will be approximately normal, centered at .209 with standard deviation equal to (.209)(.791) .04066 . 100 d. Need to draw and shade graph. INCLUDE GRAPH HERE e. z = (.25 - .209) / .04066 = 1.01 f. P(Z > 1.01) = .1562 (Table II) g. When the sample size increases to 400, the standard deviation of the sampling distribution will decrease to .0203, which means there will be fewer sample proportions as far from the center of .209, so it will be less likely that we will have a sample proportion greater than .25. 0.150 0.175 0.200 0.225 Sample Smoking Percentages (n=400) 0.250 .1566 (applet) 0.275 [smoking.pdf][label: Sample Proportion of Smokers (n=400)] h. z = (.25-.209) / .0203 = 2.02 P(Z > 2.02) = .0217 Yes – this probability has decreased as predicted. i. No – the size of the population of the U.S. did not enter into the calculations. j. The previous calculations would not change in any way. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 1 Activity 15-2: Smoking Rates 15-1, 15-2, 15-9 a. N(.105, .0307) 0.00 0.05 0.10 0.15 Sample Smoking Proportions - Utah (n = 100) 0.20 [utahsmokers.pdf][label: Sample Proportion of Smokers (n=100)] b. z = (.25-.105)/.0307 = 4.72, P(Z > 4.72) ≈ 0.000 c. Yes – you would have strong reason to doubt that this state was Utah, because it the probability of you finding a random sample of 100 people from Utah with 25 smokers is essentially zero – this never happens by chance alone. So if you find a random sample of 25/100 smokers – you have very strong evidence the sample is from some other state where the proportion of smokers is greater than 10.5%. Activity 15-3: Body Temperatures 12-1, 12-19, 15-3, 15-18, 15-19, 19-3, 19-7, 20-11, 22-10, 23-3 a. These numbers are parameters. 98.6o F, 0.7o F b. Yes – our sample size is greater than 30 and we have a simple random sample, so the CLT applies. c. The CLT says the sampling distribution of the sample means will be approximately normal with mean 98.6°F and standard deviation = .7 98.4 98.5 98.6 98.7 Average Body Temperatures (degrees Fahrenheit) 130 .061 . 98.8 [bodytempsnormal.pdf] d. P(98.5 X 98.7) = P(-1.64< Z< 1.64) = .9495 -.0505 = .8990 (Table II) .8989 (applet) e. P(98.2 X 98.4) = P(-1.64< Z< 1.64) = .9495 -.0505 = .8990 (Table II) .8989 (applet) Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 2 f. These answers are the same – we have simply shifted the center of the plot, but the area within ± 0.1 degrees of the center has not changed. g. P 0.1 .061 Z 0.1 .061 P(-1.64 < Z < 1.64) = .9495 -.0505 = .8990 (Table II) .8989 (applet) So there is about a 90% chance that a random sample of 130 will result in a sample mean body temperature that is within ±.1 degrees of the actual population mean μ, if we assume the population standard deviation is σ = 0.7°F. Activity 15-4: Solitaire 11-22, 11-23, 15-4, 15-14, 21-20, 27-18 a. CLT says the sampling distribution will be approximately N(.1111,.0994). z = (.1 - .1111) / .0994= -.11 P(Z < -.11) = .4562(Table II) .4562 (applet) b. P(X≤1) = .308 + .385 = .693 c. No – the probabilities in a and b are not close. d. The technical conditions for the CLT are not satisfied. nπ = 10×(1/9) = 1.111 10 and n(1-π) = 10×(8/9) = 8.888 10. Activity 15-5: Capsized Tour Boat [insert self-check icon] First, weight is a quantitative variable, so the relevant statistic is the sample mean weight of the 47 passengers. Because the question is phrased in terms of the total weight in a sample of 47 adults, you must rephrase it in terms of the sample mean weight. If total weight exceeds 7500 pounds, then the sample mean weight must exceed 7500/47 or 159.574 pounds. So, you want to find the probability that x > 159.574 (with n = 47 and σ = 35). The CLT applies because the sample size (n = 47) is fairly large, greater than 30. The sampling distribution of x is, therefore, approximately normal with mean 167 pounds and standard deviation equal to / n = 35 / 47 = 5.105 pounds. A sketch of this sampling distribution is shown here: [Pick up art WS3_CSE_3_15_42] Now you can use the Normal Probability Calculator applet or the Standard Normal Probability Table to find the probability of interest. The z-score corresponding to a sample mean weight of 159.574 pounds is (159.574 – 167)/5.105 = –1.45. The probability of the weight being less than 159.574 pounds is found from the table to be .0736, so the probability of exceeding this weight is 1 – .0736 = .9264. It’s not surprising the boat capsized with 47 passengers! Homework Activities Activity 15-6: Means or Proportions a. sample mean b. sample proportion c. sample mean Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 3 d. sample mean e. sample proportion Activity 15-7: Candy Colors 1-13, 2-19, 13-1, 13-2, 15-7, 15-8, 16-4, 16-22, 24-15, 24-16 a. The CLT says the sample proportions will be approximately normally distributed with mean = .45 and standard deviation = (.45)(.55) .0568 . 75 b. 0.30 0.35 0.40 0.45 0.50 0.55 Proportion of Orange Reese's Pieces (n = 75) 0.60 0.65 [reeseshaded.pdf] Student guesses. c. P( pö .4) = P(Z < -.88) = .1894 d. P(.35 pö .55) = P(-1.76 < Z < 1.76) =.9608 - .0392 = .9216 (Table II) e. Yes – these probabilities are very close – virtually identical. The simulated probability from the applet was 92% and the normal model predicts a probability of 92.1%. .9217 (applet) Activity 15-8: Candy Colors 1-13, 2-19, 13-1, 13-2, 15-7, 15-8, 16-4, 16-22, 24-15, 25-16 a. The CLT says the sample proportions will be approximately normally distributed with mean = .45 and standard deviation = (.45)(.55) .0372 . The only change from when the sample size was 175 75 is the standard deviation – which is smaller now. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 4 b. 0.0895 0.35 0.40 0.45 0.50 Proportion of Orange Reese's Pieces (n = 175) 0.55 [reeseshaded1.pdf] Student guesses. c. P( pö .4) = P(Z < -1.34) = .0901 (Table II) d. This probability is smaller than when the sample size is 75. This makes sense because the standard deviation (spread) has decreased and thus there are fewer sample proportions as far from the center of .45. e. P(.35 pö .55) = P(-2.69 < Z < 2.69) =.9964 - .0036 = .9928 f. This probability is larger than when the sample size is 75. This makes sense because the standard deviation has increased, which will concentrate more of the area under the curve near .45 (the mean). .0895 (applet) Activity 15-9: Smoking Rates 15-1, 15-2, 15-9 a. The CLT predicts this distribution will be approximately N(.276, .02235). 0.20 b. c. 0.22 0.24 0.26 0.28 0.30 0.32 0.34 Sample Proportion of Smokers in Kentucky (n = 400) 0.36 [kentucky.pdf] .251 .276 ) = P(Z < -1.12) = .1314. Since the normal distribution is .02235 symmetric, .301 will have a z-score of +1.12 and the area to the right of z = 1.12 will also be .1314. Thus we can double this probability to find that the probability of obtaining a sample proportion of Kentucky smokers more than .025 away from .276 is 2×.1314 or .2628. P( pö .251) = P(Z < .226 .276 ) = P(Z < -2.24) = .0125. Thus the probability of obtaining a .02235 sample proportion of Kentucky smokers more than .05 away from .276 is 2×.0125 or .025. P( pö .226) = P(Z < Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 5 d. You would have no reason to doubt that the state is Kentucky because part (b) shows that if the state is Kentucky, you have more than a 26% chance of finding a sample result at least as extreme as 25% smokers. e. Now you would have reason to doubt that the state is Kentucky because part (c) shows that if the state is Kentucky, you have less than a 2.5% chance of finding a sample result at least as extreme as 22.5% smokers. Activity 15-10: Candy Bar Weights 12-10, 14-10, 15-10 2.18 2.20 2.22 2.20 <Z< ) = P(-.50 < Z < .50) = .6915 - .3085 = 0.04 0.04 .3830 (Table II) .3829 (applet) a. P(2.18 < X < 2.22) = P( b. Yes – the CLT applies because the population has a normal distribution as long as your sampling method behaves like a simple random sample. c. The CLT says the sample means will be normally distributed with mean 2.20 ounces and standard deviation = .04 5 .01789 . 2.18 2.150 d. 2.22 2.175 2.200 2.225 Average Candy Bar Weight (n = 5) 2.250 [candybar.pdf] Student guess. This value should be greater than the answer to part a. 2.18 2.20 2.22 2.20 <Z< ) = P(-1.12 < Z < 1.12) = .8686 - .1314 = .01789 .01789 .7372 (Table II) .7364 (applet) This probability is indeed larger than the probability we found in part a. e. P(2.18 < X < 2.22) = P( f. Student guess – they should guess that the probability will increase if the sample size is increased to 40 because this will decrease the standard deviation which will concentrate more area under the curve near the middle (mean). g. Now the standard deviation of the sample means is .006325, so the curve is N(2.2, .006325). 2.18 2.20 2.22 2.20 <Z< ) = P(-3.16 < Z < 3.16) = .9996 - .0008 = .006325 .006325 .9988 (Table II) .9984 (applet) P(2.18 < X < 2.22) = P( This is larger that our answer in part e. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 6 h. The calculations in part f remain approximately correct regardless of the distribution of candy bar weights because the sample size was large (40 > 30). Activity 15-11: Christmas Shopping 14-3, 14-7, 15-11, 19-1 a. No – it is not valid to use the CLT because the sample size is too small and we do not know that the population is normally distributed. b. Yes – with a sample size of 500, the CLT tells us about the sampling distribution of the sample means. It would be N($850, $11.18) 810 820 830 840 850 860 870 Average Expected Christmas Expenditures ($) 880 890 [christmasshopping.pdf] c. 831.61 850 868.39 850 <Z< ) = P(-1.64< Z< 1.64) = .9495 11.18 11.18 -.0505 = .8990 (Table II) .8989 (applet) d. 871.91 850 828.09 850 <Z< ) = P(-1.96< Z< 1.96) = .9750 11.18 11.18 -.0250 = .9500 (Table II) .9499 (applet) e. 821.20 850 878.80 850 <Z< ) = P(-2.58< Z< 2.58) = .9951 11.18 11.18 -.0049 = .9902 (Table II) .9900 (applet) f. First find the z-scores that mark 80% of the area in the middle of the standard normal curve: P($831.61 X $868.39) = P( P($828.09 X $871.91) = P( P($821.20 X $878.80) = P( P(-z* < Z < z*) ≈ .8000 → P(-1.28 < Z < 1.28) ≈ .8000. . As z x 850 / 11.18, and z = 1.28, x (1.28)(11.18) 850 864.31 . Therefore, k = 864.31 -850 = 14.31. g. 981.61 850 1018.39 850 <Z< ) = P(-1.64< Z< 1.64) = 11.18 11.18 .9495 -.0505 = .8990 (Table II) .8989 (applet) P($981.61 X $1018.39) = P( This is exactly what we found in part c. The probability of falling within ±$18.39 of μ is the same, regardless of what value we use for μ. Activity 15-12: Jury Selection Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 7 11-4, 12-1 15-12 a. The CLT applies to the jury pool with a sample size of 75 because it is large (75×.20 = 15 > 10). It would not apply for the jury (sample size 12). b. The CLT predicts the sampling distribution would be approximately N(.2, .046188). 0.05 c. d. 0.10 0.15 0.20 0.25 0.30 Sample Proportion of Senior Citizens (n = 75) P( pö .333) = P(Z 0.35 [jurypool.pdf] .333 .2 ) = P(Z>2.88) = .0020 .046188 This is the same as the empirical probability I found in Activity 11-4e. (Answers will vary). Activity 15-13: Non-English Speakers a. The CLT says this sampling distribution will be approximately N(.315, .046452). 0.1 0.2 0.3 0.4 Sample Proportion of Non-English Speakers 0.5 [nonenglish.pdf] .5 .315 ) = P(Z>3.98) = 0.00 .046452 b. P( pö .5) = = P(Z c. P( pö .25) = P(Z .25 .315 )=P(Z<-1.40) = .0808 (Table II) .046452 d. P(.2 pö .5) = P( .2 .315 .5 .315 Z ) =P(-2.48< Z < 3.98) = 1.000 - .0066 = .9934 .046452 .046452 Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 .0809 (applet) 8 e Ohio California 0.0 . f. 0.1 0.2 0.3 0.4 Sample Proportion of Non-English Speakers 0.5 [nonenglishohio.pdf] Judging from the plot, in Ohio, P( pö .5) is zero, P( pö .25) should be near 1, and P(.2 pö .5) should be near zero. Activity 15-14: Solitaire 11-22, 11-23, 15-4, 15-14, 21-20, 27-18 a. Author A would need to play at least 90 games in order for nπ = n(1/9) ≥ 10 which would let us use the CLT. b. Author B would need to play at least 60 games in order for nπ = n(1/6) ≥ 10 which would let us use the CLT. c. If π = .8, then the authors would need to play at least 10/.8 = 12.5 or 13 games in order to use the CLT to approximate the sampling distribution of the sampling proportion of wins for author B. Activity 15-15: Birth Weights 12-2, 14-9, 15-15, 21-17 2500 3300 ) = P(Z<-1.40) = .0806 (Table II) 570 a. P(X<2500) = P(Z b. P( X 2500) = P(Z c. This probability is less than the probability in part a. This makes sense because we are looking at an average – it is harder for any pair of babies to have an average birth weight below 2500 grams than for a single baby to weigh below this amount. d. P( X 2500) = P(Z 2500 3300 ) =P(Z<-1.99) = .0233 (Table II) 403.051 2500 3300 ) =P(Z<-2.81) = .0025 (Table II) 285 .0802 (applet) .0236 (applet) .0025 (applet) This probability is much less than the probability in part a. This makes sense because we are looking at an average of 4 babies – it is harder for four of babies to have an average birth weight below 2500 grams than for a single baby to weigh below this amount. e. P(3000 < X <3600) =P( 3000 3300 3600 3300 Z ) =P(-.53 < Z < .53 ) = .4038 (Table II) 570 570 .4013 (applet) Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 9 f. Student expectation. g. P(3000 < X <3600) = =P( (Table II) 3000 3300 3600 3300 Z ) = P(-2.35 < Z < 2.35 ) = .9812 127.46 127.46 .9814 (applet) Activity 15-16: Volunteerism 13-16, 15-16 a. .282 .25 P( pö >.282) = P( Z ) = P(Z > 18.1) Prob = 0 (.25)(.75) 60000 b. .282 .25 P( pö >.282) = P( Z ) = P(Z > 1.65) Prob = .0495 (.25)(.75) 500 c. The first scenario (with a sample of 80,000) provides stronger evidence against the claim that 25% of the population served as volunteers. If 25% of the population had indeed served as volunteers, we would never expect to see a sample result as extreme as this with a sample of size 80,000. Activity 15-17: Tip Percentages 14-12, 15-17 a. The CLT says the sampling distribution will be approximately N(15, .566) P( X 16.4) = P(Z 16.4 15 ) = P(Z >2.47) =.0048 (Table II) .566 .0067 (applet) b. Yes – this provides strong evidence that the mean tip percentage is actually greater than 15%, because if it were 15% or less, the chance that we would find a random sample or 50 tables with an average tip percentage of at least 16.4% is less than .5% - so it is extremely unlikely. c. P( X 14.4) = = P(Z 14.4 15 ) P(Z < -1.06) =.1446 .566 This does not provide strong evidence that the population mean tip percentage is less than 15% because if the population mean percentage is 15% (or more) – we would expect to see a random sample of 50 tables with an average percentage tip of 14.4% or less almost 15% of the time, which is not rare. Activity 15-18: Body Temperatures 12-1, 12-19, 15-3, 15-18, 15-19, 19-3, 19-17, 20-11, 22-10, 23-3 The CLT would still apply with a sample of size 40 because the sample size is still large (> 30). The standard deviation of the sampling distribution of the sample mean would increase because of the smaller sample size. This would decrease the probability that the sample mean body temperature would Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 10 fall between 98.5 and 98.7 (or between 98.2 and 98.4 if the population mean were 98.3), and the probability that a random sample of size 130 results in a sample mean body temperature within ± 0.1 degrees of the actual population mean μ. This makes sense because the increased standard deviation means the average body temperatures are more spread out – less concentrated around the population mean μ. Activity 15-19: Body Temperatures 12-1, 12-19, 15-3, 15-18, 15-19, 19-3, 19-17, 20-11, 22-10, 23-3 25 Frequency 20 15 10 5 0 a. 96.75 97.50 98.25 99.00 99.75 Body Temperatures (degrees Fahrenheit) 100.50 [bodytempshistogram.pdf]. These body temperatures are fairly normally distributed with a couple of high outliers above 100°F. b. sample mean = 98.249°F, standard deviation = 0.733°F. c. CLT says the sampling distribution would be N(98.6, .061394). P( X 98.249) = = P(Z d. 98.249 98.6 ) = P(Z<-5.72) = 0.00 .061394 Yes – the probability found in part c is low enough to provide compelling evidence that the population mean body temperature is not 98.6 degrees. If it were, we would never (probability zero) find a sample of 130 health adults with an average body temperature as low as 98.249°F. Since we did find such a sample, we do not believe the population mean temperature is as high as 98.6°F Activity 15-20: IQ Scores 12-9, 14-13, 15-20 110 105 ) = P(Z>.42) = .3372 (Table II) 12 a. P(X>110) = P(Z b. P( X 110) = = P(Z 110 105 ) = P(Z>1.32) = .0934 (Table II) 3.795 .0935 (applet) c. P( X 110) = = P(Z 110 105 ) = P(Z>2.63) = .0043 (Table II) 1.897 .0042(applet) d. Yes – the calculation in part c would be valid even in the distribution of IQs in the population were skewed because the sample size is large (greater than 30). Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 15 .3385 (applet) 11