Confidence Intervals for Proportions You’ve already seen the formulas x̄ − zc √σn < µ < x̄ + zc √σn and x̄ − tc √sn < µ < x̄ + tc √sn . Confidence intervals for proportions are relevant for polls. For example, prior to the last presidential election you may have seen it reported that the level of support among voters for Candidate G.W. Bush was 47% with a note that the poll was accurate to within ±3 percent 19 times out of 20. In the language of confidence intervals, this says that the pollsters were 95% confidence that the true proportion of voters that support Candidate Bush was between .44 and .50 (where did these numbers come from?). Here’s what is needed for confidence intervals for proportions using large samples. • Sample size n with r successes. The sample needs to have more than 5 successes and more than 5 failures. In the notation from the next line, this means np̂ > 5 and nq̂ > 5. r • The sample proportion is p̂ = and then q̂ = 1 − p̂. The confidence interval at confidence n level c is then p̂ − E < p < p̂ + E r r pq p̂q̂ where E = zc ≈ zc n n Example. (Section 8.3#2) In a sample of 519 judges it was found that 285 were introverts. (a) Let p represent the proportion of all judges who are introverts. Find a point estimate for p. (b) Find a 99% confidence interval for p, and explain what it means. (c) Do you think np > 5 and nq > 5 for this problem? Why, and why is it important? Answer. (a) p̂ = 285 ≈ .5491. 519 (b) zc = 2.58, n = 519, p̂ = .5491, q̂ = .4509 and so r (.549)(.451) E = 2.58 ≈ .05635, and so .4927 < p < .6054 with 99% confidence 519 (c) Yes, because there are far more than 5 successes (285) and far more than 5 failures (234) in the sample. This is important, because the confidence interval in (b) relies on the fact that the distribution is approximately normal. Example. (Section 8.3 #17) In a survey of 1000 large corporations, 250 said given a choice between a job candidate who smokes and an equally qualified nonsmoker, the nonsmoker would get the job (USA Today). (a) Let p represent the proportion of all corporations preferring a nonsmoking candidate. Find a point estimate for p. (b) Find a 95% confidence interval for p. (c) How would a news writer report these results? What was the margin of error for the 95% confidence interval? Answer. (a) p̂ = 250 = .250 1000 (b) zc = 1.96, n = 1000, p̂ = .250, and q̂ = .750. Therefore we compute E and then p̂ ± E will be the endpoints of the interval. r (.25)(.75) ≈ .02684 and so .2231 < p < .2768 with 95% confidence. E = 1.96 1000 (c) A survey of 1000 large corporations has shown that 25% would intentionally hire a nonsmoker over an equally qualified smoker. The poll is accurate to within ±2.7 percent 19 times out of 20. Choosing a sample size For a Proportion. Deterine how accurate you want your proportion to be, i.e. a poll within ± = 3% with 95% confidence means, e.g. E = .03 and zc = 1.96. Then solve for n. r z 2 pq c E = zc implies n = pq n E If you do not have a good guess to start with for p, the following calculation will provide a sufficiently large sample size no matter what p is 1 zc 2 n= 4 E For a mean. You need to know σ, or have a good quess using s from a sample of size n ≥ 30. Determine your desired margin of error E and confidence level c. Then solve z σ 2 σ c E = zc √ for n, that is n = E n Example. (a) (8.4#10) A sample of 37 adult male desert bighorn sheep indicated the standard deviation of the sheep weights to be 15.8 lb (Source: The Bighorn of Death Valley...) How many adult male sheep should be included in a sample to be 90% sure that the sample mean weight x̄ is within 2.5 lb of the population mean weight µ for all such bighorn in the region. (b) Repeat the question in (a), but given you found an error in the original calculation and the standard deviation was 25.8 lb. (c) Repeat the question in (a), but you double checked, and the standard deviation of 15.8 lb was correct, but you decided you need a 99% confidence interval. Answer. (a) For this question zc = 1.645, σ = 15.8 and the desired E = 2.5. Therefore, z σ 2 (1.645)(15.8) 2 c n= = ≈ 108.09 thus choose a sample size of n = 109. E 2.5 (b) Same as (a), except σ = 25.8, and so n ≈ 288.2 and so we would use a sample size of n = 289. 2 (2.58)(15.8) (c) Same as (a), except zc = 2.58, and so n = ≈ 265.9 and so we would use a 2.5 sample size of n = 266. Example. Suppose president Bush decides to conduct a survey concerning the support for sending additional troops to Iraq. (a) What sample size would he need to estimate the percentage within ±3% with 95% confidence? (assume he doesn’t have an initial estimate for the true proportion) (b) Same question as (a), but within ±1% with 95% confidence. (c) Same question as (a), but within ±3% with 99% confidence. (d) Repeat questions (a) through (c), but assume to start with that a news poll estimated that 27% of Americans support sending additional troups, and that the president used this as a good starting estimate for p. (e) Approximately what sample size do you think the Gallup Organization uses to create polls that are accurate to ±3 percent 19 times out of 20? Answer. (a) For this part, zc = 1.96, the desired E = .03, and there is no initial estimate for p, so 2 1 zc 2 1 1.96 n= ≈ 1067.1 =n= 4 E 4 .03 This means he should use a sample size of 1068. (b) In this case E = .01, and so 1 zc 2 1 n= =n= 4 E 4 1.96 .01 2 = 9604. Notice the huge increase in sample size from (a) to 9604 to get within 1 percent. (c) Same as (a), except zc = 2.58 and so n = 1849 using the same formula. z 2 c where we use .27 as an estimate for p, and .73 as an (d) Now use the formula n = pq E estimate for q. To to get an accurace of .03 with 95 percent confidence, then 2 z 2 1.96 c n = pq ≈ (.27)(.73) ≈ 841.3 E .03 and so a sample size of n = 842 would be sufficient. Use the same formula for the other parts, but with E = .01 in the next calculation, and zc = 2.58 in the final calculation. (e) They will never need a random sample larger than 1068 (see (a)). If there is a good estimate for p, they may be able to use a smaller sample (see (d)). If you look at the fine print of some polls, you will see that often a sample size of approximately 1000 is used.