Confidence Intervals On January 26, 2009,based on a nightly survey of 1,500 likely voters, Rasmussen Reports has estimated that President Obama’s approval rating among adult Americans that are likely to vote is 47%. Reading this or hearing it on the nightly news we might ask how certain Rasmussen is that Obama’s approval rating has fallen below 50%. We might also ask how certain Rasmussen is that Obama’s approval rating has changed in the last month, in the last three months, or in the last six months. Finally, we might read another poll that puts Obama’s approval rating at only 53%, still above 50%. Someone who is not knowledgeable about inferential statistics in general and sampling error in particular might just conclude that “Polls can’t be trusted.” Alternatively, a better-informed, more knowledgeable person might first investigate sources of non-sampling error. First, that person would want to make sure that the two polls asked substantially the same question (that is question wording), a source of nonsampling error. Second, the person would want to make sure that the two samples were not drawn from different populations or using different definitions of the population, another source of non-sampling error. Rasmussen samples only “likely voters,” not all adults. As Rasmussen notes, in a self-congratulatory tone, “Some other firms base their approval ratings on samples of all adults. President Obama's numbers are always several points higher in a poll of adults rather than likely voters. That's because some of the President's most enthusiastic supporters, such as young adults, are less likely to turn out to vote.” Finally, after addressing issues of non-sampling error, we might ask if the difference between the two polls is just a matter of the predictable difference of drawing two different random samples from the same population, that is sampling error. There are two inferential statistical approaches to answering these questions, parameter estimation and hypothesis testing. In contingency table analysis, the chi-squared statistic approach is an example of hypothesis testing. For the proportion result of the presidential approval rating, a difference of means test would be an alternative technique in the hypothesis testing approach. The specific technique in the parameter estimation approach would be confidence intervals. We might, unknowingly, encounter confidence intervals reporting of the Rasmussen poll, even in the nightly television news report, that is the news anchor might inform us that, “…with 95% confidence, these results are plus or minus one half of one percent.” That “plus or minus” is called the margin of error and is one of the preliminary statistics used in the confidence interval approach. It is also probably the most commonly encountered inferential statistic, due to it ubiquity on television newscasts. Given that to calculate the standard deviation of a proportion all that is needed is the proportion itself and the only additional piece of information that is needed to calculate the standard error is the sample size, it is possible (and easy) to assess the above questions. Standard Error se sd n Confidence Interval – a) a range estimate of a population parameter. b) an estimation of the range of values around the sample estimate (for example mean or proportion) the range of values likely to inlcude include the population parameter, qualified by a given confidence level. c) an indictator of the reliability of a sample statistic. b) an interval estimate, as opposed to a single point estimate. Confidence Level – a) the likelihood, usually expressed as a percentage, that a confidence interval contains the population parameter. b) increasing the confidence level widens the confidence interval and lowering it narrows the confidence interval. c) 95% is the most common confidence level and corresponds to a situation in which the confidence interval does not include the population parameter one time in 20. Margin of Error – a) the amount of sampling error in an estimate at a certain level of confidence. b) the product of the standard error and the critical value of the confidence level. c) one half the confidence interval. d) perhaps the most commonly encountered inferential statistic because it is the basis of the “plus or minus” attached to the reporting of public opinion surveys, like presidential approval ratings or voting intentions, on television newscasts. For known population variance/standard deviation and sample size greater than 50 a zbased confidence can be calculated as follows: z se cl z se cl For unknown population unknown standard deviation and/or sample size less than or equal to 50, which is almost always the case, a t-based confidence can be calculated in the exact way as the z-based interval but with degrees of freedom used to look up the tvalue: t se cl , df t se cl , df df n 1 In both the z-test and t-test formulas, the population mean (μ) is estimated by the sample mean (X-bar). The z-values and t-values correspond to a given level of confidence and can be looked up in a table of z-score or t-score values. The z-value for a 95% confidence interval is 1.96, which can be used as an approximation of the t-value for larger sample sizes (for example with a sample size of 50 the difference between the two values is 2.3%). The degrees of freedom is one less than the sample size. Calculating the 95% Confidence Interval for the Rasmussen Poll 1. n = 1,500 p = 0.47 2. se p(q) n p(q) 0.47(0.53) 0.000166 0.01289 N 1,500 3. α = .05 4. Zα/2 = Z.05/2 = Z.025 = 1.96 5. me z ( se) 1.96(0.01289) 0.025258 .025 6. lb p me 0.47 0.025258 0.44742lb= 0.44742 lb p me 0.47 0.025258 0.495258 So, based on the Rasmussen Poll and our own calculations with 95% confidence the percentage of likely voters approving of President Obama’s performance is between 44.7% and 49.5%. n = 1,000 p = 0.47 Standard Error = 0.015783 z(alpha=0.05)=1.96 me= 0.030934 lb= 0.439066 ub=0.500934 n = 50 p = 0.47 Standard Error = 0.070583 z(alpha=0.05)=1.96 me= 0.138343 lb= 0.331657 ub=0.608343 n = 50 p = 0.47 Standard Error = 0.070583 z(alpha=0.05)=2.10 me= 0.14872 lb= 0.328128 ub=0.611872