Chapter 9 Estimating a Population Proportion Created by Kathy Fritz Selecting an Estimator What makes a statistic a good estimator of a population characteristic? 1. Choose a statistic that is unbiased Unbiased, since the distribution is In other words, a statistic that doesthat not is Biased, A statistic sincewith the a sampling Unbiased, distribution since centered at the consistently to value underestimate to distribution centered at is thetend actual the distribution of the true population is orvalue overestimate value ofestimator aatpopulation NOT characteristic centered at is anthe unbiased centered the of that characteristic is an unbiased of that the true value population characteristic. true estimator value characteristic. What makes a statistic a good estimator of a population characteristic? 1. Choose a statistic that is unbiased 2. Choose a statistic with a small standard error Unbiased, but has a smaller standarddistribution is The standard deviation of a sampling If a sampling distribution is centered very close to error so it iserror. more called the standard the actual value of the population characteristic, a precise. small standard error ensures that values of the Unbiased, but has a statistic will cluster tightly around the actual value of larger standard A statistic that is unbiased and has a small the population characteristic. error so it is not as standard error is likely to result in an estimate precise. that is close to the actual value of that population characteristic. In a review of ALL criminal cases heard by the Supreme Courts of 11 states from 2000 to 2004, 391 of the 1488 cases were decided in favor of the defendant. Let p be the proportion of all cases reviewed that decided in favor of the defendant. π= 391 1488 = 0.263 Suppose that the proportion p = 0.263 was not known. To estimate this proportion, you plan to select a sample and compute π, the sample proportion that were decided in favor of the defendant. If n = 25, then the standard error of π is standard error of π = π(1 − π) = π 0.263(1 − 0.263) = 0.088 25 Supreme Court Cases Continued . . . Let p be the proportion of all cases reviewed that decided in favor of the defendant. If n = 25, then the standard error of π is π(1 − π) = π 0.263(1 − 0.263) = 0.088 25 π(1 − π) = π 0.263(1 − 0.263) = 0.044 100 How does theof π = standard error sample size affect the standard error π? then the standard error of π is If n =of 100, standard error of π = Supreme Court Cases Continued . . . Suppose that p = 0.40. How does this affect the standard error of π? If n = 25 and p = 0.263, then the standard error of π is standard error of π = π(1 − π) = π 0.263(1 − 0.263) = 0.088 25 If n = 25 and p = 0.40, then the standard error of π is standard error of π = π(1 − π) = π 0.40(1 − 0.40) = 0.098 25 Supreme Court Cases Continued . . . Suppose that p = 0.04. How does this affect the standard error of π? Does it surprise you that π tends to If n = 25 and p = 0.263, then the standard error of π is produce more precise estimates the − π) 0.263(1 − 0.263) farther the populationπ(1 proportion is standard errorfrom of π = = = 0.088 0.5? π 25 If n = 25 and p = 0.04, then the standard error of π is π(1 − π) 0.04(1 − 0. 04) standard error of π = = = 0.039 π 25 For a fixed sample size, the standard error of π is greatest when p = 0.5. Estimating a Population Proportion Margin of Error The value of the sample proportion π provides an estimate of the population proportion p. Let p = 0.484 If π = 0.426, then the estimate is “off” by 0.058. This difference represents the error in the estimate. A different sample might produce an estimate of π = 0.498, resulting in an estimation error of 0.014. The margin of error of a statistic is the Notice that different maximum likely estimation error. samples will produce different π estimates that It is unusual for an estimate to differ from the actual will have different value of the population characteristic by more than the estimation errors. margin of error. Recall the General Properties for Sampling Distributions of π 1. The mean of the π sampling distribution is p. ππ = π When theseerror properties hold, we can π 2. The standard (deviation) of the use what we knowisabout normal sampling distribution distributions to tell us about how π π(1−π) behaves asππan=estimator of p. π 3. If n is large, the π sampling distribution is approximately normal. If a variable has a standard normal distribution, about 95% of the time the value of variable will be between -1.96 and 1.96. Central Area = 0.95 Lower tail area = .025 Upper tail area = .025 -1.96 0 1.96 If n is large, the π sampling distribution is approximately normal with mean p and standard error ππ = π(1−π) . π About 95% of the possible π will fall For any normal distribution, about 95% of the π(1−π) within 1.96 of1.96 the standard populationdeviations observed values will be within π of the mean. proportion p. Central Area = 0.95 This is the margin of error for estimating a population proportion. Lower tail area = .025 Upper tail area = .025 -1.96ππ p 1.96ππ Margin of Error for Estimating a Population Proportion p Appropriate when the following conditions are met 1. The sample is a random sample from the population of interest OR the sample is selected in a way that makes it reasonable to think the sample is representative of the population. 2. The sample size is large enough. This condition is met when either both ππ ≥ 10 and π(1 − π) ≥ 10 OR (equivalently) the sample includes at least 10 successes and at least 10 failures. Margin of Error for Estimating a Population Proportion p Continued . . . When these conditions are met π(1 − π margin of error = 1.96 π Interpretation of margin of error The formula givenfor forthe thesample marginproportion of error istoactually It would be unusual differ margin of population error, but proportion it is common fromthe theestimated actual value of the bytomore referoftoerror. it without the “estimated”. than the margin Anyoftime a margin of errorthe is reported, is an will For 95% all random samples, estimationiterror be less than theestimated margin of margin error. of error. Based on a representative sample of 511 U.S. teenagers ages 12 to 17, International Communications Research estimated that the proportion of teens who support keeping the legal drinking age at 21 is π = 0.64 with a margin of error of 0.04. Let’s see how this margin of error was computed. Check conditions: 1. Given that the sample was representative of the population 2. The sample size is large enough because ππ = 511 0.64 = 327 ≥ 10 and π 1 − π = 511 0.36 = 184 ≥ 10 Legal Drinking Age Continued . . . π = 0.64 with a margin of error of 0.04 Compute margin of error ππππππ ππ πππππ = 1.96 0.64(0.36) 511 = 0.04 Interpretation An estimate of the proportion of U.S. teens who favor keeping the legal drinking age at 21 is 0.64. It is unlikely that this estimate differs from the actual population proportion by more than 0.04. A Large Sample Confidence Interval for a Population Proportion Confidence Interval Confidence Level Developing a Confidence Interval Notice that this line equals π(1−π)we get this π. Suppose 1.96 . We will use π Approximate sampling distribution of π this to we createSuppose an interval of values toget this π. we Suppose get this π. estimate p. This π did not fall within 1.96 p standardUsing this method p (1 ο p ) p (1 ο p ) 1 . 96 1 . 96 deviations of the value of p AND its n n of calculation, the interval does NOT “capture” p. π confidence interval will not capture p π 5% of the time. ThisNotice line represents π the 1.96 that length of This line represents 1.96 When n is large, a 95% confidence interval for p is standard above eachdeviations half of the interval This π felldeviations within 1.96below standard standard This π fell within 1.96 standard the mean. π(1π(1−π) −of π) p AND π(1 − π) deviations of the value of p AND its the mean. equals 1.96 deviations of the value π − 1.96 , π + 1.96 π “captures” p. π π its interval “captures” pinterval . Confidence Intervals A confidence interval (CI) for a population characteristic specifies an interval of plausible values for the characteristic. The interval is constructed in such a way so that the resulting will be successful The primary goalinterval of a confidence interval in capturing the actualanvalue of the population is to estimate unknown population characteristic acharacteristic. specified percentage of time. Confidence level The confidence level associated with a confidence interval is the success rate of the method used to construct the interval. If this method was used to generate an interval estimate over and over again from different random samples, in the long run 95% of the resulting intervals would include Our confidence is in the methodthe – actual value in any one particular interval! of theNOT characteristic being estimated. The diagram to the right is 100 95% confidence intervals for p computed from 100 different random samples. out of the 100with Note7 that the ones confidence do asterisks dointervals not capture not contain p. p. Why not? If we were to compute 100 more confidence intervals for p from 100 different random samples, would we get the same results? Other Confidence Levels Suppose we wanted to create confidence intervals with a 90% confidence level . . . Notice also that the larger the confidence level, Notice that the larger the these critical value will critical be AND the values differ Suppose we wanted to create confidence intervals wider the confidence level . . . for with a 99% interval will be. different confidence levels. The Large-Sample Confidence Interval for p The normal distribution is only an approximation of the Appropriate when the following conditions are sampling distribution of πlet’s andlook the at true confidence level Now general formula. met: may differ somewhat from the reported level. If ππ ≥ 10 and π(1 sample − π) ≥ 10, approximation is reasonable and of the 1. The is athe random sample from the population actual confidence levelisisselected usually quite close to the interest or the sample in a way that makes reported level. This is why it sample is important to verify this it reasonable to think the is representative of condition. the population. 2. The sample size is large enough. This condition is met when either both ππ ≥ 10 and π 1 − π ≥ 10 or (equivalently) the sample includes at least 10 successes and at least 10 failures. The Large-Sample Confidence Interval for p Continued . . . The desired level determines which z critical When theseconfidence conditions are met, a confidence value is used. The population three most common confidence levels interval for the proportion is use the following z critical values: Confidence Level pˆ ο± ( z90%critical 95% 99% z Criticalpˆ Value (1 ο value) 1.645 1.96 2.58 pˆ ) n This is a generic formula for a confidence interval: Estimated Statistic ± critical value (standard error of theerror statistic) standard of π The Large-Sample Confidence Interval for p Continued . . . Interpretation of Confidence Interval You can be confident that the actual value of the population proportion is included in the computed interval. In any given problem, this statement should be worded in context. Interpretation of Confidence Level The confidence level specifies the approximate percentage of time that this method is expected to be successful in capturing the actual population proportion. Recall from Chapter 7 . . . Four Key Questions: 5 Steps: Q Estimate or hypothesis testing? S Sample data or experimental data? T One variable or two? Categorical or numerical? N How many samples or treatments? E (Estimate) – Explain what population characteristic you plan to estimate. M (Method) – Select a method using QSTN C (Check) – Verify that the conditions are met C (Calculate) – Perform the necessary calculations C (Communicate) – Interpret the confidence interval Of 1100 drivers surveyed, 990 admitted to careless or aggressive driving during the previous 6 months. Assuming that it is reasonable to regard this sample of 1100 as representative of the population of drivers, compute a 90% confidence interval to estimate p, the proportion of all drivers who have engaged in careless or aggressive driving in the last 6 months. Step 1 (E): The proportion of drivers who have engaged in careless or aggressive driving during the last 6 months, p, will be estimated. Step 2 (M): Because the answers to the four key questions are Q: estimation, S: sample data, T: one categorical variable, N: one sample, a confidence interval for a population proportion will be considered. Careless or Aggressive Driving Continued . . . Step 3 (C): There are two conditions that need to be met for the confidence interval of this section to be appropriate. 1. You do not know how the sample was selected. In order to proceed, you MUST assume that the sample was representative of the population. 2. Sample size is large enough because 990 ππ = 1100 = 990 ≥ 10 and 1100 π 1 − π = 1100 0.10 = 110 ≥ 10 Step 4 (C): Calculate the interval 0.9(0.1) 0.9 ± 1.645 = (0.885, 0.915) 1100 Careless or Aggressive Driving Continued . . . Step 5 (C): Communicate results Interpret Confidence Interval: Assuming that the sample was representative of the population, you can be about 90% confident that the actual proportion of drivers who engaged in careless or aggressive driving in the past 6 months is somewhere between 0.885 and 0.915. Interpret Confidence level: The method used to construct this interval estimate is successful in capturing the actual value of the population proportion about 90% of the time. Three Things that Affect the Width of a Confidence Interval 1. The higher the confidence level, the wider the interval. 2. The larger the sample size, the narrower the interval. 3. The closer π is to 0.5, the wider the interval. An Alternative to the LargeSample z Interval Even when the sample size conditions are met, sometimes the actual confidence level associated with the method may be noticeably different from the reported confidence level. One way to correct this is to use a modified sample proportion, ππππ , the proportion of successes after adding two successes and two failures to the sample. ππππ ππ’ππππ ππ π π’ππππ π ππ + 2 = π+4 Use this modified sample proportion in place of π in the usual confidence interval formula. Choosing a Sample Size to Achieve a Desired Margin of Error Choosing a Sample Size Using a 95% confidence interval, the sample size required to estimate a population proportion p with a margin of error M is Before collecting any data, π(1 −you π) might wish to determineπa = sample 1.96 size that ensures a π certain margin of error. If we solve this for n . . . 1.96 π = π(1 − π) π 2 If there is no prior knowledge available, then the The value of p may be estimated using prior conservative estimate for p is 0.5.information. Why is the conservative estimate for p = 0.5? The formula for the margin of error is π(1 − π) π = 1.96 π Since we are looking for the sample size that produces a certain margin of error, then we need to focus on the possible values of p(1 - p) 0.1(0.9) = 0.09 0.2(0.8) = 0.16 0.3(0.7) = 0.21 0.4(0.6) = 0.24 0.5(0.5) = 0.25 By using 0.5 for p, we are using the largest possible value for p(1 – p) in our calculations. Researchers have found biochemical markers of cancers in the exhaled breath of cancer patients, but chemical analysis of breath specimens has not yet proven effective in diagnosing cancer. A study is to be performed to investigate whether a dog can be trained to identify the presence or absence of cancer by sniffing breath specimens. How many different breath specimens should be used if you want to estimate the long-run proportion of correct identifications for this dog with a margin of error of 0.10? Always round 2 the 2 sample1.96 size up to the 1.96 π = π 1 −next π whole number. = 0.25 = 96.04 π 0.10 A sample of at least 97 breath specimens should be used. Avoid These Common Mistakes Avoid These Common Mistakes If a 90% confidence interval for p, the proportion of students at a particular college who own a computer, is (0.56, 0.78), you might say Interpretation of interval Interpretation of confidence level Don’t get these two statements confused! Avoid These Common Mistakes 1. In order for an estimate to be useful, you must know something about its accuracy. You should beware of a single number estimate that is not accompanied by a margin of error or some other measure of accuracy. Avoid These Common Mistakes 2. A confidence interval estimate that is wide indicates that you don’t have very precise information about the population characteristic being estimated. Don’t be fooled by a high confidence level. The best strategy for decreasing High confidence is not the same thing as saying you the width of information a confidence have precise about the value of a interval ispopulation to take acharacteristic. larger sample! Avoid These Common Mistakes 3. The accuracy of an estimate depends on the sample size, not the population size. Notice that the margin of error involves the sample size n, and decreases as n increases. The size of the population, N, does need to be considered if sampling without replacement and the sample size is more than 10% of the population size. In this case, the margin of error is adjusted by multiplying it by a finite population correction factor π΅−π π΅−π Avoid These Common Mistakes 4. CONDITIONS ARE IMPORTANT! If conditions are met, the large sample confidence interval provides a method for using sample data to estimate the population proportion with confidence, and the confidence level is a good approximation of the success rate for the method. Avoid These Common Mistakes 5. When reading published reports, don’t fall into the trap of thinking confidence interval every time you see a ± in an expression. In addition to confidence intervals it is common to see both estimate ± margin of error and estimate ± standard error reported. ±