Chapter 8: Inference for Proportions 8.1 Estimating a Proportion with Confidence Objectives To always check the three conditions before constructing a confidence interval. To construct and use confidence intervals for proportions. To interpret a confidence interval and the meaning of “confidence”. To identify ways to make the margin of error smaller when constructing a confidence interval. To find the required sample size for a given margin of error. Inference is the process of trying to say something about a population from information we get from a sample. Statistical inference is based on the laws of probability and relies on the data being collected from a random sample or from a randomized experiment. Statistical inference includes both confidence intervals and significance (hypothesis) tests. This section is particularly important because it explains confidence intervals in detail. Confidence Interval for a Proportion To estimate an unknown population proportion, a sample is collected and the proportion of “successes” is calculated (this proportion is a point estimate). If you want to know how accurately this sample proportion estimates the true population proportion, you need some probability statement; therefore you want to know the sampling distribution of the proportion. This distribution is used to give the confidence interval. A confidence interval is a range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability. The specified probability is called the level of confidence. E.g. A researcher wishes to estimate the proportion of x-ray machines that malfunction and produce excess radiation. A random sample of 40 machines is taken and 12 of the machines malfunction. The 95% confidence interval on p, the proportion that malfunctions in the population, is therefore 0.16 ≤ p ≤ .44. In general the form of a confidence interval is: 0.16 CI = statistic margin of error = statistic critical value standard deviation of statistic Note: the margin of error is half the width of the confidence interval. 1 8.1 Estimating a Proportion with Confidence p 0.44 To Construct a Confidence Interval The following four-step process can be used throughout our study of statistical inference: 1. 2. 3. 4. Parameter Identify the population of interest and the parameter you want to draw conclusions about. Conditions Choose the appropriate inference procedure. Verify the conditions for using it. Calculations If the conditions are met, carry out the inference procedure. Interpretation Interpret your results in the context of the problem. Alternatively you may use the PANIC acronym to help you remember the steps in constructing a confidence interval. 2 8.1 Estimating a Proportion with Confidence A Confidence Interval for a Population Proportion p̂ z * p̂( 1 p̂ ) n n sample size p̂ proportionof successesin thesample z* 1.96 for a 95% confidenceinterval 1.645for a 90% confidenceinterval 2.576for a 99% confidenceinterval Other values of z* can be used (the level of confidence is usually 90% or higher). These other values can be found using your tables or the invNorm(area) command on your calculator. Assumptions Assumptions are things that must be true for the inference method to work. It is usually impossible for us to know whether an assumption is true. Instead we check conditions (testable criterion) that support or override an assumption. 1. Independent Trials Assumption: Sometimes we'll simply accept this. If we're flipping a coin or taking foul shots, we can assume the trials are independent. However, if we hope to make inferences about a population proportion based on a random sample drawn without replacement, then this assumption is clearly false. Fortunately we can proceed as we are close enough if the Random Condition (condition 1) and the 10 Percent Condition (condition 2) are met. 2. Normal Distribution Assumption: Since the model is binomial this is false, but checking the number of success and failures can confirm that the sample is large enough to assume the sampling distribution is approximately Normal. In summary these are the conditions that must be checked each time you construct a confidence interval for a proportion: Bernoulli Trials (experiment) 1. The trials are independent. 2. Both np̂ and n( 1 p̂ ) are at least 10 (these calculations must be shown). 3 Sampling without Replacement 1. The sample is random. 2. The sample is less than 10% of the population. 3. Both np̂ and n( 1 p̂ ) are at least 10 (these calculations must be shown). 8.1 Estimating a Proportion with Confidence Example 1: The union representing the Bottle Blowers of America (BBA) is considering a proposal to merge with the Teamsters Union. According to BBA union bylaws, at least threefourths of the union membership must approve any merger. A random sample of 2,000 current BBA members reveals 1,600 plan to vote for the merger proposal. What is the estimate of the population proportion? Develop a 95 percent confidence interval for the population proportion. Basing your decision on this sample information, can you conclude that the necessary proportion of BBA members favor the merger? Why? 4 8.1 Estimating a Proportion with Confidence Interpretations E.g. A 90% confidence interval for the proportion of all buses that have a traffic violation is 0.47 to 0.73. Interpreting a Confidence Interval: What is the meaning (or interpretation) of the confidence interval of 0.47 to 0.73? We are 90% confident that if we could check all of the buses in the population, between 47% and 73% of them would have traffic violations. Note: interpreting a confidence interval should be done every time you calculate a confidence interval. Interpreting a Confidence Level: What is the meaning of 90% confidence? The confidence level, 90%, means that if the sampling process were performed repeatedly, the confidence intervals generated would capture the true population proportion of all buses that have a traffic violation 90 % of the time. This confidence level refers to the method used to construct the interval rather than to any particular interval, such as the one obtained. Never make a statement like: There is a probability of 90% that p is between 0.47 and 0.73. After the sample has been taken and the confidence interval computed, there is no randomness left. Either p is in the confidence interval or it isn’t. Note: interpreting a confidence level should only be done if you are specifically asked to. What Affects the Margin of Error (E)? E z* p̂( 1 p̂ ) n 1. The amount of variation in the population being sampled. The more variable the population, the larger the margin of error. 2. The size of the sample. The larger the sample, the smaller the margin of error. Note: the sampling error is related to the square root of the sample size. For example, to halve the error the sample size must be quadrupled. 3. The chosen level of confidence. A greater confidence level means z* is larger resulting in an increase in the margin of error. 5 8.1 Estimating a Proportion with Confidence Finding a Sample Size To estimate the sample size, n, needed for a given margin of error E, use the formula: p( 1 p ) n ( z*)2 E2 This formula is not given so you need to be able to derive it from the margin of error formula. If you don’t have a rough estimate of p, use p = 0.5 which maximizes the margin of error. Example 2: The American Kennel Club wanted to estimate the proportion of children that have a dog as a pet. If the club wanted the estimate to be within 3% of the population proportion, how many children would they need to contact? Assume a 95% level of confidence and that the club estimated that 30% of the children have a dog as a pet. Example 3: A study needs to estimate the proportion of cities that have private refuse collectors. The investigator wants the margin of error to be within 0.10 of the population proportion, the desired level of confidence is 90%, and no estimate is available for the population proportion. What is the required sample size? 6 8.1 Estimating a Proportion with Confidence