TOPIC 13 Sampling Distributions: Proportions In-Class Activities Activity 13-1: Candy Colors 1-13, 2-19, 13-1, 13-2, 15-7, 15-8, 16-4, 16-22, 24-15, 24-16 Answers will vary. Here is one representative set of answers. a. Count Proportion (Count/25) Orange Yellow Brown 13 .52 7 .28 5 .2 ) b. This is a statistic. The symbol used to denote the proportion is p. c. This is a parameter. The symbol used to denote the proportion is . d. No – we do not know the proportion of orange candies manufactured by Hershey. e. Yes – we know the proportion of orange candies among the 25 candies that we individually selected. f. It is very unlikely that every student in the class obtained the same proportion of orange candies in her sample. g. Answers will vary. Here is one representative set of answers. [reeses.pdf] [Change axis label to “Sample Proportion of Orange Candies”] h. observational units = samples of 25 candies; colored orange i. This dotplot of the sample proportions is symmetric, mound-shaped (roughly normal) with center of .6 and a spread from about .4 to about .84. The standard deviation is .09. j. Based on the sample results from this class, a reasonable guess for π would be .6. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 variable = proportion of the sample that is 1 k. Most estimates would be reasonably close to π, but a very few estimates would be way off. We see this from the dotplot. Most of the class results are the same – near .6, but a few of the class results are quite extreme (far from .6). l. If each student had taken samples of size 10 instead of size 25 we would expect more variability (greater horizontal spread) in our dotplot. m. If each student had taken samples of size 75 instead of size 25 we would expect less variability (less horizontal spread) in our dotplot. Activity 13-2: Candy Colors 1-13, 2-19, 13-1, 13-2, 15-7, 15-8, 16-4, 16-22, 24-15, 24-16 [insert PC icon] Answers will vary. The following results are from one particular running of the applet. a. pö .44 . b. pö .54,507,.47, .56, .432 . No – I did not get the same sample proportion each time. c. [reeseapplet1.pdf] d. Yes – the distribution appears roughly normal, centered at about .45, with a standard deviation of about .1. e. A normal curve seems to model the simulated sample proportions very well. f. mean of pö values = .449 g. Roughly speaking, more sample proportions are close .45 than are far away from it. standard deviation of pö values = .100 h. i. Number of 500 Sample Proportions Percentage of 500 Sample Proportions Within .10 of .45 Within .20 of .45 354 473 71.5% 95.6% Within .30 of .45 491 99.2% About 95% would capture the actual population proportion. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 2 j. No – you would not have any definitely way of knowing whether or not your sample proportion was within .20 of the population proportion. But you could be reasonably confident that your sample proportion was within .20 of the population proportion because about 95% of the sample proportions would be within .2 of π. k. mean of pö values = .446 l. The shape is still roughly normal and the center is still about .45. The spread however has decreased significantly (from .1 to about .057). standard deviation of pö values = .057 m. The applet reports that 460/500 = 92% of the sample proportions are within .1 of .45. n. This is a much greater percentage (92% versus 71.5%) than it was when our sample size was n = 25. o. The sample proportion is more likely to be closer to the population proportion with a larger sample size. p. .057×2 = .114 q. The applet reports that 336/500 = 95% of the sample proportions are within .114 of .45. r. about 95% s. theoretical mean of pö values = .45 .45±.114 = [.336, .564] theoretical standard deviation of pö values = t. theoretical mean of pö values = .45 theoretical standard deviation of pö values = u. (.45)(.55) .0995 .1 25 (.45)(.55) .057 75 No – the normal model does not summarize this distribution well. This is not a contradiction to the Central Limit Theorem because nπ = 25(.1) = 2.5 10. Activity 13-3: Kissing Couples 13-3, 13-4, 16-6, 17-12, 24-4, 24-14 a. This is a parameter. π = .5 b. nπ = 124(.5) = 62 > 10 and n(1-π) = 124(.5) = 62 > 10, so the CLT does apply. Shape: approx. normal c. d. Center: π = .5 Spread: (.5)(.5) .0449 124 Yes – the histogram does appear to be consistent with what the CLT predicts. It is bell-shaped, centered at about .5 and extends from about .5 – 3(.0449) = .3653 to about .5 + 3(.0449)=.6347. pö = 80/124 = .645 Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 3 e. Yes – it would be very surprising to observe such a sample proportion (.645) if ½ of all kissing couples lean their heads to the right – this sample proportion never occurred in 1000 simulations. f. z = (.645-.5) / .0449 = 3.23 g. Yes – this is a very surprising z-score. P(Z > 2.33) = .0099. If ½ of all kissing couples lean their heads to the right, we would see a sample result as or more extreme is less than 1% of random samples. Activity 13-4: Kissing Couples 13-3, 13-4, 24-4, 24-14 [insert self-check icon] a. Recall that the observed sample proportion of kissing couples who lean their heads to the right is pö = 80/124 = .645. This value is not at all uncommon in the first histogram. b. The CLT says that the sample proportion in this case would vary approximately normally with mean equal to .667 and standard deviation equal to (.667)(.333) .042 124 The z-score for the observed sample proportion of .645 is, therefore, z= .645 .667 0.52 .042 so the observed sample proportion .645 lies only about half of a standard deviation from the population proportion when = .667. c. The observed sample proportion is barely one-half of a standard deviation away from what you would expect if the population proportion were equal to 2/3, not a surprising result at all. Therefore, the sample data provide no reason to doubt that the population proportion of kissing couples who lean their heads to the right equals 2/3. d. The value .645 is pretty far along the lower tail of the second histogram. This indicates that the observed sample proportion would rarely occur if the population proportion were equal to 3/4. Further evidence of this result is provided by the rather large negative z-score: z .645 .750 (.750)(.250) 124 .645 .750 2.69 .039 Therefore, the sample data provide fairly strong evidence that the population proportion of kissing couples who lean their heads to the right is not 3/4 (because it would be rather surprising to find a sample proportion so far from this population proportion by chance alone). e. A reasonable estimate of the population proportion is the sample proportion .645. An estimate of the standard deviation of pö would then be Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 (.645)(.355) .043 124 4 Doubling this standard deviation gives .086. The interval is, therefore, .645 ± .086, which runs from .559 to .731. Notice that 1/2 and 3/4 are not in this interval, but 2/3 is. The interval is consistent with the earlier analysis of the plausibility of the values 1/2, 2/3, and 3/4 for the population proportion of kissing couples who lean their heads to the right. Homework Activities Activity 13-5: Parameters vs. Statistics pö a. statistic b. parameter π c. statistic x d. parameter μ e. parameter σ f. statistic s g. parameter π (population is all voters) h. statistic pö i. statistic pö j. parameter k. statistic pö l. statistic x (population is all American households) m. parameter μ μ n. statistic pö o. statistic x Activity 13-6: Generation M 3-8, 4-14, 13-6, 16-1, 16-3, 16-7, 18-1, 21-11, 21-12 π a. parameter b. statistic pö c. statistic pö d. statistic x e. parameter μ Activity 13-7: Presidential Approval 13-7, 13-8 a. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 5 π 0 .2 .4 .5 .6 .8 1 standard deviation 0 .01265 .01549 .01581 .01549 .01265 0 b. π = .5 produces the most variability. c. π = 0 or π = 1 produces the least variability. d. If none (or all) of a population has a particular characteristic, then none (or all) of a sample must have this characteristic as well, leaving no variability in the sample proportion. Similarly, if the population proportion is close to zero or one, there is not much “room” for the sample proportion to vary away from the population value. But if exactly half of a population has the characteristic, this should produce the most varied sample proportions. e. Using a different sample size (500 rather than 1000) would not change the answers to parts a-c (the amount of variability would change, but not the fact that the variability is largest at = .5) since the sample size is a constant in the denominator for the standard deviation for each of these values. Activity 13-8: Presidential Approval 13-7, 13-8 a. n 100 200 400 800 1600 standard deviation 0.0489898 0.034641 0.0244949 0.0173205 0.0122474 b. As the sample size increases, the standard deviation decreases. c. The sample size must increase by a factor of 4 in order to cut the standard deviation in half. d. No – the answer to part c would not change if we used a proportion other than .4 to calculate the standard deviations. (This time the numerator, π(1- π), is a constant in the calculations.) Activity 13-9: Pet Ownership 13-9, 13-14, 13-15, 18-2, 20-21 a. No – you cannot be certain that the sample proportion of cat households in your sample will be closer to π than your competitor’s because of sampling variability, but it is much more likely. b. Yes – you have a better chance than your competitor of obtaining a sample proportion of cat households that fall within ± .05 of π because you are using a larger sample size. c. n standard deviation 50 0.061 200 0.031 The sample size 200 produces the smaller standard deviation. It is ½ the size of the standard deviation when the sample size is 50 (or 2 times smaller). d. The applet reports that 431/500 = 86.2% of the sample proportions are within .05 of .25. e. The applet reports that 250/500 = 50% of the sample proportions are within .05 of .25. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 6 f. Both distributions are, as expected, approximately normal and centered at .25, but the distribution with samples of size 200 has a much smaller spread than the distribution using samples of size 50. With samples of size 200, the distribution extends from a minimum of only about .18 to a maximum of about .33, and more than 85% of the sample proportions fall within .05 of the mean (.25). In contrast, when the sample size is 50, the sampling distribution extends from 0 to above .4 and only about 50% of sample proportions are within .05 of the mean. Activity 13-10: Calling Heads or Tails 13-10, 17-14, 17-15, 24-19 Answers will vary. Here is one representative set of answers. a. = 16/20 = .8 said they would call heads. This is a statistic. b. [heads.pdf] This distribution is approximately normal, centered at about .5, with standard deviation = .117. c. Based on this simulation, it would be extremely surprising to obtain our class result if, in fact, 50% of the population of students call heads. A value of .8 is in the far right tail of this distribution. Values as extreme as .8 happened in only about .2% of samples in the simulation d. [heads2.pdf] This distribution is also approximately normal, but it is centered at about .7, with standard deviation =.104. This time our class result is not uncommon, falling very close to the center of the distribution (within one standard deviation). END HERE? According to the applet, a result of at least16/20 students calling heads happened 117/1000 times or 11.7% of the time. Activity 13-11: Racquet Spinning 11-9, 13-11, 17-3, 17-18, 18-3, 18-12, 18-13 a. .5 is a parameter because describes the long run result of the spinning process. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 7 b. .46 is a statistic because it is the result of a sample. c. Answers will vary. The answers given here are from one particular running of the applet. d. [racquetapplet.pdf] This distribution is approximately normal, with mean .502 and standard deviation .049. e. The Central Limit Theorem predicts this distribution will be approximately normal with mean .5 and standard deviation .05. The sampling distribution displayed by the applet simulation is very close to this. f. The applet reports that 190/1000 =19% of the samples had a sample proportion of at least .54 and 183/1000 = 18.3% of the samples had a sample proportion of .46 or less. Together this is 37.3% of the samples. g. This answer suggests that .46 is not very unlikely to occur by chance alone if the results are 50/50 in the long run. Such an outcome will happen about 37% of the time by chance alone – so it is certainly not rare. Activity 13-12: Halloween Practices a. .69 is a statistic because it is a number that summarizes a sample. b. No – this finding does not prove that π = .69. If we were to take another random sample of 1005 adults, we would most likely find a different (although similar) proportion of adults who planned to give out Halloween treats. c. If π = .7, then the standard deviation = .0144, so .7 - 2×.0144 = .671. So yes, .69 would fall within 2 standard deviations of .7 in the sampling distribution. d. If π = .6, then the standard deviation = .0154, so .6 + 2×.0155 = .6309. So no, .69 would not fall within 2 standard deviations of .6 in the sampling distribution. e. Using a common standard deviation of .015: π 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 π+2s 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 π-2s 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 So the potential values of π are: .66 - .72. f. Based on our work in part e, the plausible values for the percentage of the population who planned to give out Halloween treats from the doors of their homes in 1999 was between 66% and 72% inclusive. Activity 13-13: Distinguishing Between Colas 13-13, 17-24, 18-9 Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 8 a. = 1/3 b. Roll the die 30 times to represent the 30 trials. If you roll a 5 or 6 – consider this a success (i.e, you successfully identified the odd cola). Otherwise (if you roll a 1, 2, 3 or 4) , you failed to identify the odd cola. c. Below are example results from the applet: d. [colaapplet.pdf] Yes – the shape of this sampling distribution is approximately normal. e. empirical sampling distribution mean - .336 CLT predicts mean = .333 standard deviation = .086 predicts standard deviation = .086 The simulated sampling distribution and CLT values are very, very close. f. The applet reports that in 169/1000 = 16.9% of the samples, the subject guessed correctly 40% or more of the time. g. If a subject was correct 40% of the time in this experiment, I would not be convinced he/she was doing better than guessing would allow, since if he/she was just guessing, he or she would get 40% or more correct about 17% of the time, so this is not all that surprising of an outcome for a guesser. h. If a subject was correct 60% of the time in this experiment, I would be convinced he/she was doing better than guessing would allow, since in this simulation – the subjects never got 60% or more correct by just guessing. i. Applet. j. The rough shapes of the histograms should look like: 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Cola successes 0.7 0.8 0.9 [colaoverlap.pdf] Both distributions are approximately normal and have the same spread! However they have different centers. There is little overlap in the distributions. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 9 k. The applet reports that in 746/1000 = 74.6% of the samples, the subject guessed correctly 60% or more of the time. Activity 13-14: Pet Ownership 13-9, 13-14, 13-15, 18-2, 20-21 a. π = ⅓ is a parameter because it describe all American households. 0.328 0.330 b. 0.332 0.334 Proportion of Cat Owners 0.336 0.338 [catowners.pdf] The CLT says this sampling distribution will be approximately normal, centered at ⅓, with a standard deviation of .001667. c. By the empirical rule, ninety-five percent of all sample proportions should fall between .3297 and .3363 (within 2 standard deviations) d. This interval is so narrow because the sample size (80,000) is an extremely large sample size. e. pö = .316 is a statistic because it is a number obtained from a sample. f. z = (.316-.333) / .001667 = -10.2. g. This is an extremely unusual z-score. P(Z< -10.2) ≈ 0 – so the sample data do provide evidence that the population proportion who own a pet cat is not one-third (we observed a sample result that pretty much never happens when = 1/3 so we are convinced ≠ 1/3). Activity 13-15: Pet Ownership 13-9, 13-14, 13-15, 18-2, 20-21 a. 0.047 0.048 0.049 0.050 0.051 Proportion of Bird Owners 0.052 0.053 [birdowners.pdf] [sample] The CLT says this sampling distribution will be approximately normal, centered at .05, with a standard deviation of .000771. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 10 b. This standard deviation is much smaller because we are assuming π is smaller (.05 rather than .333, further away from .5, see Activity 13-7). c. z = (.046-.05) / .000771 = -5.19. This is a very unusual z-score. P(Z< -5.19) ≈ 0 – so the survey does provide evidence that the population proportion who own a pet bird is not 5% (we observed a sample result that pretty much never happens when = .05, so we are convinced ≠ .05). Activity 13-16: Volunteerism 13-16, 15-16, 21-17 a. 28.2% is a statistic. pö =.282 b. 0.2450 0.2475 0.2500 0.2525 Proportion of Volunteers 0.2550 [volunteerism.pdf] [sample] The CLT says this sampling distribution will be approximately normal, centered at .25, with a standard deviation of .001768. c. z = (.282-.25) / .001768 = 18.1 d. Yes – this z-score is extreme enough to cast doubt on the assertion that 25% of the population participated in a volunteer activity. P(Z >18.1) = 0, so if π really is .25, we would never expect to see such a sample result. Yet we did see this sample result, so we have very strong evidence that π is not .25. e. 0.20 0.22 0.24 0.26 0.28 Proportion of Volunteers 0.30 0.32 [volunteerism2.pdf][sample] The CLT says this sampling distribution will be approximately normal, centered at .25, with a standard deviation of .0194. z = (.282-.25) / .0194 = 1.65 This is not a particularly extreme z-score. P(Z > 1.65) = .0495, which means we have some evidence that would make us doubt that π really is .25, but the evidence is not overwhelming. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 11 f. It makes sense that our answers differ so much because the sample sizes are so different. A small difference in sample and population proportion would be very surprising with a sample of size 80,000, but would not be as surprising with only 500 people. Activity 13-17: Pursuit of Happiness 2-16, 3-25, 13-17, 25-1, 25-2, 25-4 a. z = (.84-.8) / .007286 = 5.49 b. Yes – this z-score is extreme enough to cast doubt on the assertion that 80% of the population felt happy. P(Z >5.49) ≈ 0, so if π really is .80, we would never expect to see such a sample result. Yet we did see this sample result, so we have strong evidence that is not .80. c. π = .82: z .84 .82 2.858 π = .83: z (.82)(.18) 3014 π = .84: z .84 .84 z .86 .82 0.00 π = .85: z .84 .88 .84 .85 1.538 (.85)(.15) 3014 3.164 π = .87: z (.86)(.14) 3014 π = .88: z 1.462 (.83)(.17) 3014 (.84)(.16) 3014 π = .86: .84 .83 .84 .87 4.897 (.87)(.13) 3014 6.758 (.88)(.12) 3014 Plausible values of the population proportion include .83 -.85 since they lie within 2 standard deviations of the observed sample proportion. Activity 13-18: Cursive Writing 13-18, 16-8 a. standard deviation (.15)(.85) n (.15)(.85) (.15)(.85) 204 .05, thus n n (.025)2 b. We need 2 c. (.15)(.85) (.15)(.85) 2 1275 .02, thus n n (.01)2 Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 12 We need a much larger sample size to be as “confident” that our sample proportion will fall within this smaller range of the population proportion. Rossman/Chance, Workshop Statistics, 3/e Solutions, Unit 3, Topic 13 13