Sections 2C and 2D - Stats 10 Problem 7.52 In a simple random sample of 1200 Americans age 20 and over, the proportion with diabetes was found to be 0.115 (or 11.5%). 1. What is the standard error for the estimate of the proportion of all Americans age 20 and over with diabetes? √ SE = √ = p(1 − p) n 0.115 × (1 − 0.115) = 0.0092 1200 or about 0.92% standard error. 2. Find the margin of error, using a 95% confidence level, for estimating this proportion. The margin of error for a 95% CI is m = 1.96 × SE = 1.96 × 0.0092 = 0.018 or about 1.8%. 3. Report the 95% confidence interval [or the proportion of all Americans age 20 and over with diabetes. The lower boundary for the 95% CI is then 0.115 − 0.018 = 0.097 and the upper boundary is 0.115 + 0.018 = 0.133 Therefore, we are 95% confident that the true proportion of persons aged 20 or more with diabetes lies between 0.097 and 0.133. 4. According to the Centers for Disease Control, nationally, 10.7% of all Americans age 20 or over have diabetes. Does the confidence interval you found in part c support or refute this claim? Explain. Yes, the confidence interval found in part (c) would support this claim since 10.7% (0.107) falls within the 95% CI. 1 Problem 7.54 In a 2008 survey, the National Highway Traffic Safely Administration (NHTSA) reported that 83% of people used seat belts. The margin of error is 3 percentage points. 1. Assuming that the confidence level is 95% and the survey was random, find a 95% confidence interval for the percentage of people who used seat belts in 2008. The statement of the problem provides a sample proportion (0.83) and a margin of error (0.03). The lower boundary for the 95% CI is then 0.83 − 0.03 = 0.80 and the upper boundary is 0.83 + 0.03 = 0.86 Therefore, we are 95% confident that the true proportion of persons who wear their seatbelt is between 0.80 and 0.86. 2. The NHTSA said the percentage of people using seat belts in 2000 was 71%. If 71% were suggested as the percentage for 2008, would you reject that as implausible? Why or why not? Does this suggest a change in seat belt use between 2000 and 2008? Explain. A percentage of 71% would be rejected as a plausible percentage of seatbelt users in 2008 since the corresponding proportion of 0.71 lies well outside (in this case below) the 95% CI. This provides evidence that seatbelt use has increased since 2000. Problem 7.58 In the 1960 presidential election, 34,226,731 people voted for Kennedy; 34,108,157 for Nixon; and 197,029 for third-party candidates (www.uselectionatlas.org). 1. What percentage of voters chose Kennedy? The percentage of voters who voted for Kennedy was 34, 226, 731 34, 226.731 = = 0.4994 34, 226.731 + 34, 108, 157 + 197, 029 68, 531, 917 2. Would it be appropriate to find a confidence interval for the proportion of voters choosing Kennedy? Why or why not? It does not make sense to find a confidence interval (for any degree of confidence) for this proportion because this is the population proportion, not a sample proportion. If instead we had been given a sample proportion, then it makes sense to use statistical methods to make an inference as to the population proportion. Problem 7.64 In the 2008 General Social Survey, people were asked whether they thought the sun went around the earth or vice versa. Of 1381 people, 310 thought the sun went around the earth. 1. What proportion of people in the survey believed the sun went around the earth? The proportion of respondents who believe the sun goes around the earth is 310 = 0.224 1381 2 2. Find a 95% confidence interval for the proportion of all people with this belief. For the 95% CI we first calculate the standard error, and from this then the margin of error, m. √ SE = √ = p(1 − p) n 0.224 × (1 − 0.224) = 0.0112 1381 Then we have m = 1.96 × SE = 1.96 × 0.0112 = 0.022 Finally, the lower boundary for the 95% CI is 0.224 − 0.022 = 0.202 and the upper boundary is 0.224 + 0.022 = 0.246 Therefore, we are 95% confident that the true proportion who believe astrology to be very scientific is between 0.202 and 0.246. 3. Suppose a scientist said that 30% of people in the general population believe the sun goes around the earth. Using the confidence interval, would you say that was plausible? Explain your answer. A claim that 30% of the general public believes the sun goes around the earth is not plausible since it lies outside (above) the 95% CI. Problem 8.37 In a Rasmussen poll of 1000 adults in July 2010, 520 of those polled said that schools should ban sugary snacks and soft drinks. 1. Do a majority of adults (more than 50%) support a ban on sugary snacks and soft drinks? Perform a hypothesis test using a significance level of 0,05. Step 1: We begin by defining the hypotheses: H0 : p = 0.50 (the proportion supporting banning is 50%) H1 : p > 0.50 (a majority supports banning sugary foods and soft drinks) Step 2: Choose the one-proportion z-test and check that the conditions are satisfied so that the normal distribution provides an appropriate model for the distribution of sample proportions. Observe that np = 1000 × 0.50 = 500 and n(1 − p) = 1000 × 0.50 = 500 so both are greater than 10, sampling is random, and the population is sufficiently large. The conditions are satisfied. Step 3: In order to test the alternative hypothesis, we find the z-score and from it the p-value. The SE can be found by √ √ 0.50 × (1 − 0.50) 0.25 SE = = = 0.0158 1000 1000 3 The z-score is z= p̂ − p 520/1000 − 0.50 0.52 − 0.50 = = = 1.26 SE 0.0158 0.0158 Because we have a one-tailed alternative hypothesis, the p-value corresponds to the area to the right of z = 1.26. From the Normal table, the p-value is 1 - 0.8962 = 0.1038. Step 4: At α = 0.05 we fail to reject H0 and conclude that the percentage of U.S. adults who support the death penalty has changed since 1996. 2. Choose the best interpretation of the results you obtained in part a: a) The percentage of all adults who favor banning is significantly more than 50%. b) The percentage of all adults who favor banning is not significantly more than 50% Thus, statement (b) is correct: The percentage of all adults who favor banning is not significantly different from 50%. Problem 8.40 According to one source, 50% of plane crashes are due at least in part to pilot error (http://www.planecrashinfo .com). Suppose that in a random sample of 100 separate airplane accidents, 62 of them were due to pilot error (at least in part.) 1. Test the null hypothesis that the proportion of airplane accidents due to pilot error is not 0.50, Use a significance level of 0.05. Step 1: We begin by defining the hypotheses: H0 : p = 0.50 (the proportion of airplane accidents due at least in part to pilot error is 0.50) H1 : p ̸= 0.50 (the proportion of airplane accidents due at least in part to pilot error is not 0.50) Step 2: We use a one-proportion z-test and check that the conditions are satisfied so that the normal distribution provides an appropriate model for the distribution of sample proportions. Observe that np = 100 × 0.50 = 50 and n(1 − p) = 100 × 0.50 = 50 so both are greater than 10, sampling is random, and the population is sufficiently large (there are likely more than 1000 accidents in total). The conditions are satisfied. Step 3: In order to test the alternative hypothesis, we find the z-score and from it the p-value. The SE can be found by √ SE = The z-score is z= 0.50 × (1 − 0.50) = 100 √ 0.25 = 0.05 100 p̂ − p 62/100 − 0.50 0.62 − 0.50 = = = 2.4 SE 0.05 0.05 The area corresponding to the right tail defined by z = 2.4 is 1 - 0.9918 = 0.0082 as found in the Normal table. Because we are using a two-tailed alternative hypothesis, we double this value to get a p-value of 2 × 0.0082 = 0.0164. Step 4: At α = 0.05 we reject H0 and conclude that the proportion of airplane accidents due at least in part to pilot error is not 0.50. 2. Choose the correct interpretation: 4 a) The percentage of plane crashes due to pilot error is not significantly different from 50%. b) The percentage of plane crashes due to pilot error is significantly different from 50%. Thus, statement (b) is correct: The percentage of plane crashes due to pilot error is significantly different from 50%. Problem 8.49 A study used nicotine gum to help people quit smoking. The study was placebo-controlled, randomized, and double-blind. Each participant was interviewed after 28 days, and success was defined as being abstinent from cigarettes for 28 days. The results showed that 174 out of 1649 people using the nicotine gum succeeded, and 66 out of 1648 using the placebo succeeded. Although the sample was not random, the assignment to groups was randomized. 1. Find the proportion of people using nicotine gum that stopped smoking and the proportion of people using the placebo that stopped smoking, and compare them. Is this what the researchers had expected? The proportion of people using nicotine gum who stopped smoking is 174/1649 = 0.1055, while the proportion of people on placebo who stopped smoking is 66/1648 = 0.040. The proportion of persons who stopped smoking while using nicotine gum was higher (than for those taking placebo) as researchers hoped. 2. Find the observed value of the test statistic, assuming that the conditions for a two-proportion z-test hold. The value of the z-statistic can be found most easily using technology. For example, from R, the output is Sample X N Sample p Gum 174 1649 0.105518 Placebo 66 1648 0.040049 Difference = p (Gum) - p (Placebo) Estimate for difference: 0.0654700 95% lower bound for difference: 0.0507060 Test for difference = 0 (vs > 0): Z = 7.23 P-Value = 0.000 Thus, the z-statistic is z = 7.23. Problem 8.61 The Gallup organization frequently conducts polls in which they ask the following question: “In general, do you feel that the laws covering the sale of firearms should be made more strict, less strict, or kept as they are now?” In February 1999, 60% of those surveyed said ‘more strict,’ and on April 26, 1999, shortly after the Columbine High School shootings, 66% of those surveyed said ‘more strict.’ 1. Assume that both polls used samples of 560 people. Determine the number of people in the sample that said ‘more strict’ in February 1999, before the school shootings, and the number that said ‘more strict’ in late April 1999, after the school shootings. The number of persons in the February sample who responded with ‘more strict’ was 336 since 560 × 0.60 = 336. The number of persons in the late April sample who responded with ‘more strict’ was 370 since 560 × 0.66 = 370. 5 2. Do a test to see whether the proportion that said ‘more strict’ is statistically significantly different in the two different surveys, using a significance level of 0.01. Step 1: The hypotheses needed to test whether or not the proportion who said ‘more strict’ is statistically significantly different in the two different time periods (before and after the Columbine massacre) are H0 : p B = p A (the proportions responding "more strict" before and after are the same) H1 : p B ̸= p A (the proportions responding "more strict" before and after are different) where p B represents the proportion from the earlier sample (Before) who responded more strict and p A represents the proportion from the later sample (After) who responded more strict. Step 2: We use the two-proportion z-test. This is valid since the participants were chosen at random. Also, since the pooled proportion is 336 + 370 = 0.6304 560 + 560 then n 1 × p̂ = n 2 × p̂ = 560 × 0.6304 = 353 n 1 × (1 − p̂) = n 2 × (1 − p̂) = 560 × 0.3696 = 207 and each of these is larger than 10. The conditions required for us to use the Normal distribution to represent the distribution of differences of sample proportions have been met. Step 3: The value of the z-statistic can be found most easily using technology. For example, from R, the output is Sample X N Sample p B 336 560 0.600000 A 370 560 0.660714 Difference = p (B) - p (A) Estimate for difference: -0.0607143 99\% CI for difference: (-0.134873, 0.0134444) Test for difference = 0 (vs not = 0): Z = -2.10 P-Value = 0.035 The z-statistic is z = −2.10 and the p-value is 0.035, which fails to be statistically significant at α = 0.01. Step 4: At α = 0.01 we fail to reject H0 and conclude that there is insufficient evidence that the proportions of people who responded more strict to the question differ before and after the Columbine shootings. 3. Repeat the problem, assuming that the sample sizes were both 1120. Step 1 from part (2) remains the same. The conditions described in Step 2 are still met with the increased sample sizes. We recalculate the z-statistic and corresponding p-value and form a new conclusion. Step 3: The value of the z-statistic can be found most easily using technology. For example, from R, the output is Sample X N Sample p B 672 1120 0.600000 A 740 1120 0.660714 Difference = p (B) - p (A) Estimate for difference: -0.0607143 99\% CI for difference: (-0.113152, -0.00827618) Test for difference = 0 (vs not = 0): Z = -2.98 P-Value = 0.003 6 The z-statistic is z = −2.98 and the p-value is 0.003, which is significant at α = 0.01. Step 4: At α = 0.01 we reject H0 and conclude that the difference between the proportions of people who responded more strict to the question before and after the Columbine shootings is statistically significant. 4. Comment on the effect of different sample sizes on the p-value and on the conclusion. The larger sample size (in part (3)) resulted in a z-value that was farther from 0 and a lower p-value than what was observed in part (2). As a result, we were able to conclude in (3) that the difference was statistically significant at α = 0.01. Problems from Gould and Ryan, Introductory Statistics 7