Sections 7.1 and 7.2 This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics Estimate a population parameter: proportion, mean Test some claim (or hypothesis) about a population. Point estimate: a single number Interval estimate: interval of numbers. Confidence Interval Why?: point estimate is not reliable under re-sampling. A confidence interval (CI): an interval of values used to estimate the true population parameter. Point Estimate p= ˆp = nx (pronounced ‘p-hat’) population proportion sample proportion of x successes in a sample of size n. Unbiased estimate (best estimate) qˆ = 1 - pˆ = sample proportion of failures in a sample size of n Example: Photo-Cop Survey Responses 829 adult Minnesotans were surveyed, and 51% of them are opposed to the use of the photo-cop for issuing traffic tickets. Using these survey results, find the best estimate of the proportion of all adult Minnesotans opposed to photocop use. Best point estimate=sample proportion=51%. Confidence Level α: between 0 and 1 A confidence level: 1 - α or 100(1- α)%. E.g. 95%. This is the proportion of times that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times. Other names: degree of confidence or the confidence coefficient. The Critical Value (z-score) Given α Finding zα/2 for 100(1- α)% Confidence Level α =5% α/2 = 2.5% = .025 Sampling Distribution of ^ p The sampling distribution of sample proportion can be approximated by a normal distribution if np≥15 and nq ≥15 : phat is approximately N(p, pq/n), q=1-p. p p pˆ − p z= pˆ qˆ n ^ p ^ Margin of Error of p the maximum likely (with probability 1 – α) difference between the observed proportion ^ and the true population proportion p. p E = zα / 2 ˆp q̂ n ^ Standard Error of p =se Finding the 95% Confidence Interval for a Population Proportion A 95% confidence interval for a population proportion p is: p̂(1 - p̂) p̂ ± 1.96(se), with se = n 100(1-α)% confidence interval for p is pˆ ± zα / 2 ( se) with se = pˆ (1 − pˆ ) n Example: Would You Pay Higher Prices to Protect the Environment? In 2000, the GSS asked: “Are you willing to pay much higher prices in order to protect the environment?” Of n = 1154 respondents, 518 were willing to do so Find and interpret a 95% confidence interval for the population proportion of adult Americans willing to do so at the time of the survey Example: Would You Pay Higher Prices to Protect the Environment? 518 = 0.45 p̂ = 1154 (0.45)(0.55) se = = 0.015 1154 E = 1.96(se) = 1.96(0.015) = 0.03 p̂ ± E = 0.45 ± 0.03 = (0.42, 0.48) What is the Error Probability for the Confidence Interval Method? Summary: Effects of Confidence Level and Sample Size on Margin of Error The margin of error for a confidence interval: Increases as the confidence level increases Decreases as the sample size increases Determining Sample Size Recall : E= pˆ qˆ n zα / 2 (solve for n by algebra) n= zα / 2 pˆ qˆ 2 E2 Sample Size for Estimating Proportion p ˆ When an estimate p of p is known: n= ( zα / 2 )2 pˆ qˆ E2 When no estimate of p is known: n= ( zα / 2)2 0.25 E2 Example: Suppose a sociologist wants to determine the current percentage of U.S. households using e-mail. How many households must be surveyed in order to be 95% confident that the sample percentage is in error by no more than four percentage points? a) Use this result from an earlier study: In 1997, 16.9% of U.S. households used e-mail (based on data from The World Almanac and Book of Facts). b) Assume that we have no prior information suggesting a possible value of p. a) Use this result from an earlier study: In 1997, 16.9% of U.S. households used e-mail (based on data from The World Almanac and Book of Facts). n = [za/2 ]2ˆpˆq E2 = [1.96]2 (0.169)(0.831) 0.042 = 337.194 = 338 households To be 95% confident that our sample percentage is within four percentage points of the true percentage for all households, we should randomly select and survey 338 households. b) Assume that we have no prior information suggesting a possible value of p. n = [za/2 ]2 • 0.25 E2 = (1.96)2 (0.25) 0.042 = 600.25 = 601 households With no prior information, we need a larger sample to achieve the same results with 95% confidence and an error of no more than 4%. Finding the Point Estimate and E from a Confidence Interval Point estimate of p: ˆ p = (upper confidence limit) + (lower confidence limit) 2 Margin of Error: E = (upper confidence limit) — (lower confidence limit) 2