"How many statisticians does it take to change a light bulb? 1 3" Author Unknown Chapter 19: Confidence Intervals for Proportions (Pages 432 - 450) Overview: Last chapter we looked at how proportions varied from sample to sample. When we use our sample to estimate the parameters, there will be some difference in what the next Joe Schmo says about the population based off his stats and what we said. Since we expect sampling variability, we do our best by creating confidence intervals. Confidence Intervals are formed by taking our estimate and adding or subtracting a margin of error. The methods we use in this chapter will allow us to make statements such as “We have 95% confidence that our interval contains the true proportion.” Statistical Inference allows us to draw a conclusion about the population from a sample. *Inference is most reliable if the sample was chosen by a random sampling design! 2 Types of Formal Statistical Inference: 1) Confidence Intervals – 2) Test of Significance – **both are based on what would happen if we repeated the sample or experiment many times! Confidence Interval – an interval of values computed from sample data that is likely to include the true population value. A confidence interval has 2 parts: 1) 2) As a user of statistics, you (or I) will often choose the confidence level. Most often we choose a high confidence level (90% or above) because we want to be quite sure of our conclusion C = confidence level in decimal form A level C confidence interval for a parameter is an interval computed from sample data by a method that has a probability, C, of producing an interval containing the true value of the parameter. EXAMPLE: C = .95 means 95% confidence Interpreted as: In the long run, about 95% of all confidence intervals computed in this way will capture the true population parameter of the proportion, and about 5% of them will miss the true population parameter of the proportion. When the following conditions are met, we are ready to find the confidence interval for the population proportion, p. Assumptions: 1. The sample values must be independent of each other. 2. The sample size, n , must be large enough. Since it is hard to check assumptions, we verify the following conditions: 1. Randomization Condition: The data values must be sampled randomly. 2. 10% Condition: The sample size, n, is less than 10% of the population. 3. Success/Failure Condition: The sample size has to be big enough to have at least 10 successes and 10 failures. So npˆ 10 and nqˆ 10 . Confidence Interval for a Population Proportion Where p̂ is the estimate and z * pˆ qˆ is the margin of error (MOE) n p̂ = SE ( pˆ ) pˆ qˆ = n n= How do we get z* (critical value)? Let’s say we are finding a 90% confidence interval. (C = .90) Draw a normal curve & mark the middle 90% of the data, find the z value (or z score) that would capture the middle 90% of the data (use the body of the z table or the invnorm(.05) function on the calculator). z = -1.645 and 1.645. z* is the positive z score known as the upper critical value. In this example z* = 1.645. Most common confidence intervals and their corresponding z*: CONFIDENCE LEVEL 80% 90% 95% 99% TAIL AREA (to the right of z*) Z* Steps to Construct a Confidence Interval: 1) Identify the population of interest and the parameter you want to draw conclusions about. 2) Choose the appropriate inference procedure. Verify the conditions/assumptions for using the selected procedure. 3) If the conditions are met, carry out the inference procedure. CI = Sample estimate Margin of error 4) Interpret your results in context of the problem. Let’s try one… 1. Your local newspaper polls a random sample of 330 voters, finding 144 who say they will vote “yes” on the upcoming school budget. Create a confidence interval for actual sentiment of all voters. Solution: Think – We want a confidence interval for the proportion of all voters who will vote “yes” on the upcoming school budget, based on a random sample of 330 voters. In our sample, 144 respondents stated that they would vote “yes”. We want to be reasonably confident in our results, so we will construct a confidence interval of approximately 95% confidence. Plausible Independence Assumption: It is reasonable to think that the responses were independent, provided good surveying techniques were used. Random Sampling Condition: The voters were sampled randomly. 10% Condition: Provided there are more than 3300 eligible voters, 330 is less than 10% of the population of voters. Big Enough Sample Assumption: must have at least 10 successes and 10 failures. Success/Failure Condition: npˆ = 144 and nqˆ = 186 , which are both at least 10, so the sample is large enough. The conditions are satisfied, so I can use the Normal model to find a one proportion z-interval. Show – The margin of error is MOE = The confidence interval is Tell – Choosing the Sample Size To determine the sample size n that will yield a confidence interval for a population proportion with a specific margin of error, MOE: let the MOE be greater than or equal to the expression for the margin of error and solve for n. 2. An experiment finds that 27% of 53 subjects report improvement after using a new medicine. Create a 95% confidence interval for the actual cure rate. Why is this interval so wide? Make it narrower – 90% confidence. What are the advantages and disadvantages? What sample size would we need in a follow up study if we want a margin of error of 5% with 98% confidence? Solution: Think – We want a confidence interval for the proportion of all people who will improve after using a new medication, based on an experiment involving 53 subjects. In our sample, 27% of the subjects improved. Plausible Independence Assumption: One patient responding to the medication shouldn’t have an influence on other patients responding to the medication. Random Sampling Condition: The patients were part of an experiment, which hopefully included random assignment of volunteers to treatment groups. 10% Condition: The 10% condition doesn’t apply. We are testing medication, not patients. Big Enough Sample Assumption: must have at least 10 successes and 10 failures Success/Failure Condition: npˆ =53(0.27)=14 and nqˆ =53(0.73)=39 , which are both at least 10, so the sample is large enough. The conditions are satisfied, so I can use the Normal model to find a one proportion z-interval. Show – The margin of error is The confidence interval is Tell – This interval is quite wide for a couple of reasons. The sample size of 53 people doesn’t provide a great deal of accuracy in our estimate. The standard error is still quite large. Also, the more confident we are that we have succeeded in capturing the true proportion, the less precise our interval becomes. 90% Confidence Interval The margin of error is The confidence interval is We are 90% confident that between medication. and of people will improve after using the new This interval has the advantage of being more precise, but we are less confident in our ability to capture the true proportion within our interval. Finding the sample size required for ± 5%, with 98% confidence. Consider the formula for margin of error. We believe the improvement rate to be 0.27 from our preliminary study. The value of z* for 98% confidence is 3. What sample size does it take to estimate the outcome of an election with a margin of error of 3%? (with 95% confidence) Solution: We’ll do what polling organizations usually do, using 95% confidence and choosing the most cautious proportion, 50% since we have no estimate of the population proportion. A sample size of at least likely voters is required.