ch_07

Chapter 7 Statistical Inference: Confidence Intervals  Learn …. How to Estimate a Population Parameter Using Sample Data Agresti/Franklin Statistics, 1 of 87  Section 7.1 What Are Point and Interval Estimates of Population Parameters? Agresti/Franklin Statistics, 2 of 87 Point Estimate  A point estimate is a single number that is our “best guess” for the parameter Agresti/Franklin Statistics, 3 of 87 Interval Estimate  An interval estimate is an interval of numbers within which the parameter value is believed to fall. Agresti/Franklin Statistics, 4 of 87 Point Estimate vs Interval Estimate Agresti/Franklin Statistics, 5 of 87 Point Estimate vs Interval Estimate   A point estimate doesn’t tell us how close the estimate is likely to be to the parameter An interval estimate is more useful • It incorporates a margin of error which helps us to gauge the accuracy of the point estimate Agresti/Franklin Statistics, 6 of 87 Point Estimation: How Do We Make a Best Guess for a Population Parameter?  Use an appropriate sample statistic: • For the population mean, use the sample • mean For the population proportion, use the sample proportion Agresti/Franklin Statistics, 7 of 87 Point Estimation: How Do We Make a Best Guess for a Population Parameter?  Point estimates are the most common form of inference reported by the mass media Agresti/Franklin Statistics, 8 of 87 Properties of Point Estimators  Property 1: A good estimator has a sampling distribution that is centered at the parameter • An estimator with this property is unbiased • The sample mean is an unbiased estimator of the population mean • The sample proportion is an unbiased estimator of the population proportion Agresti/Franklin Statistics, 9 of 87 Properties of Point Estimators  Property 2: A good estimator has a small standard error compared to other estimators • This means it tends to fall closer than other estimates to the parameter Agresti/Franklin Statistics, 10 of 87 Interval Estimation: Constructing an Interval that Contains the Parameter (We Hope!)  Inference about a parameter should provide not only a point estimate but should also indicate its likely precision Agresti/Franklin Statistics, 11 of 87 Confidence Interval   A confidence interval is an interval containing the most believable values for a parameter The probability that this method produces an interval that contains the parameter is called the confidence level • This is a number chosen to be close to 1, most commonly 0.95 Agresti/Franklin Statistics, 12 of 87 What is the Logic Behind Constructing a Confidence Interval?  To construct a confidence interval for a population proportion, start with the sampling distribution of a sample proportion Agresti/Franklin Statistics, 13 of 87 The Sampling Distribution of the Sample Proportion     Gives the possible values for the sample proportion and their probabilities Is approximately a normal distribution for large random samples Has a mean equal to the population proportion Has a standard deviation called the standard error Agresti/Franklin Statistics, 14 of 87 A 95% Confidence Interval for a Population Proportion  Fact: Approximately 95% of a normal distribution falls within 1.96 standard deviations of the mean • That means: With probability 0.95, the sample proportion falls within about 1.96 standard errors of the population proportion Agresti/Franklin Statistics, 15 of 87 Margin of Error   The margin of error measures how accurate the point estimate is likely to be in estimating a parameter The distance of 1.96 standard errors in the margin of error for a 95% confidence interval Agresti/Franklin Statistics, 16 of 87 Confidence Interval   A confidence interval is constructed by adding and subtracting a margin of error from a given point estimate When the sampling distribution is approximately normal, a 95% confidence interval has margin of error equal to 1.96 standard errors Agresti/Franklin Statistics, 17 of 87  Section 7.2 How Can We Construct a Confidence Interval to Estimate a Population Proportion? Agresti/Franklin Statistics, 18 of 87 Finding the 95% Confidence Interval for a Population Proportion    We symbolize a population proportion by p The point estimate of the population proportion is the sample proportion We symbolize the sample proportion by p̂ Agresti/Franklin Statistics, 19 of 87 Finding the 95% Confidence Interval for a Population Proportion   A 95% confidence interval uses a margin of error = 1.96(standard errors) [point estimate ± margin of error] = p̂  1.96(standard errors) Agresti/Franklin Statistics, 20 of 87 Finding the 95% Confidence Interval for a Population Proportion  The exact standard error of a sample proportion equals: p (1  p ) n   This formula depends on the unknown population proportion, p In practice, we don’t know p, and we need to estimate the standard error Agresti/Franklin Statistics, 21 of 87 Finding the 95% Confidence Interval for a Population Proportion  In practice, we use an estimated standard error: se  p ˆ (1  p ˆ) n Agresti/Franklin Statistics, 22 of 87 Finding the 95% Confidence Interval for a Population Proportion  A 95% confidence interval for a population proportion p is: p̂  1.96(se), with se  p̂(1 - p̂) n Agresti/Franklin Statistics, 23 of 87 Example: Would You Pay Higher Prices to Protect the Environment?  In 2000, the GSS asked: “Are you willing to pay much higher prices in order to protect the environment?” • Of n = 1154 respondents, 518 were willing to do so Agresti/Franklin Statistics, 24 of 87 Example: Would You Pay Higher Prices to Protect the Environment?  Find and interpret a 95% confidence interval for the population proportion of adult Americans willing to do so at the time of the survey Agresti/Franklin Statistics, 25 of 87 Example: Would You Pay Higher Prices to Protect the Environment? 518 p̂   0.45 1154 (0.45)(0.55) se   0.015 1154 p̂  1.96(se)  1.96(0.015)  0.45  0.03  (0.42, 0.48) Agresti/Franklin Statistics, 26 of 87 Sample Size Needed for Large-Sample Confidence Interval for a Proportion  For the 95% confidence interval for a proportion p to be valid, you should have at least 15 successes and 15 failures: np ˆ  15 and n(1- p̂)  15 Agresti/Franklin Statistics, 27 of 87 “95% Confidence”   With probability 0.95, a sample proportion value occurs such that the confidence interval contains the population proportion, p With probability 0.05, the method produces a confidence interval that misses p Agresti/Franklin Statistics, 28 of 87 How Can We Use Confidence Levels Other than 95%?    In practice, the confidence level 0.95 is the most common choice But, some applications require greater confidence To increase the chance of a correct inference, we use a larger confidence level, such as 0.99 Agresti/Franklin Statistics, 29 of 87 A 99% Confidence Interval for p pˆ  2.58(se) Agresti/Franklin Statistics, 30 of 87 Different Confidence Levels Agresti/Franklin Statistics, 31 of 87 Different Confidence Levels  In using confidence intervals, we must compromise between the desired margin of error and the desired confidence of a correct inference • As the desired confidence level increases, the margin of error gets larger Agresti/Franklin Statistics, 32 of 87 What is the Error Probability for the Confidence Interval Method?  The general formula for the confidence interval for a population proportion is: Sample proportion ± (z-score)(std. error) which in symbols is pˆ  z(se) Agresti/Franklin Statistics, 33 of 87 What is the Error Probability for the Confidence Interval Method? Agresti/Franklin Statistics, 34 of 87 Summary: Confidence Interval for a Population Proportion, p  A confidence interval for a population proportion p is: p̂  z p̂(1 - p̂) n Agresti/Franklin Statistics, 35 of 87 Summary: Effects of Confidence Level and Sample Size on Margin of Error  The margin of error for a confidence interval: • Increases as the confidence level increases • Decreases as the sample size increases Agresti/Franklin Statistics, 36 of 87 What Does It Mean to Say that We Have “95% Confidence”?  If we used the 95% confidence interval method to estimate many population proportions, then in the long run about 95% of those intervals would give correct results, containing the population proportion Agresti/Franklin Statistics, 37 of 87 A recent survey asked: “During the last year, did anyone take something from you by force?”   a. b. c. Of 987 subjects, 17 answered “yes” Find the point estimate of the proportion of the population who were victims .17 .017 .0017 Agresti/Franklin Statistics, 38 of 87  Section 7.3 How Can We Construct a Confidence Interval To Estimate a Population Mean? Agresti/Franklin Statistics, 39 of 87 How to Construct a Confidence Interval for a Population Mean     Point estimate ± margin of error The sample mean is the point estimate of the population mean The exact standard error of the sample mean is σ/ n In practice, we estimate σ by the sample standard deviation, s Agresti/Franklin Statistics, 40 of 87 How to Construct a Confidence Interval for a Population Mean    For large n… • and also For small n from an underlying population that is normal… The confidence interval for the population mean is: x  z(  n ) Agresti/Franklin Statistics, 41 of 87 How to Construct a Confidence Interval for a Population Mean    In practice, we don’t know the population standard deviation Substituting the sample standard deviation s for σ to get se = s/ n introduces extra error To account for this increased error, we replace the z-score by a slightly larger score, the t-score Agresti/Franklin Statistics, 42 of 87 How to Construct a Confidence Interval for a Population Mean   In practice, we estimate the standard error of the sample mean by se = s/ n Then, we multiply se by a t-score from the t-distribution to get the margin of error for a confidence interval for the population mean Agresti/Franklin Statistics, 43 of 87 Properties of the t-distribution    The t-distribution is bell shaped and symmetric about 0 The probabilities depend on the degrees of freedom, df The t-distribution has thicker tails and is more spread out than the standard normal distribution Agresti/Franklin Statistics, 44 of 87 t-Distribution Agresti/Franklin Statistics, 45 of 87 Summary: 95% Confidence Interval for a Population Mean  A 95% confidence interval for the population mean µ is: s x  t ( ); df  n - 1 n .025  To use this method, you need: • • Data obtained by randomization An approximately normal population distribution Agresti/Franklin Statistics, 46 of 87 Example: eBay Auctions of Palm Handheld Computers  Do you tend to get a higher, or a lower, price if you give bidders the “buy-it-now” option? Agresti/Franklin Statistics, 47 of 87 Example: eBay Auctions of Palm Handheld Computers   Consider some data from sales of the Palm M515 PDA (personal digital assistant) During the first week of May 2003, 25 of these handheld computers were auctioned off, 7 of which had the “buy-it-now” option Agresti/Franklin Statistics, 48 of 87 Example: eBay Auctions of Palm Handheld Computers  “Buy-it-now” option: 235 225 225 240 250 250 210  Bidding only: 250 249 255 200 199 240 228 255 232 246 210 178 246 240 245 225 246 225 Agresti/Franklin Statistics, 49 of 87 Example: eBay Auctions of Palm Handheld Computers  Summary of selling prices for the two types of auctions: buy_now N Mean StDev no 18 231.61 21.94 yes 7 233.57 14.64 buy_now Maximum no 255.00 yes 250.00 Minimum Q1 Median Q3 178.00 221.25 240.00 246.75 210.00 225.00 235.00 250.00 Agresti/Franklin Statistics, 50 of 87 Example: eBay Auctions of Palm Handheld Computers Agresti/Franklin Statistics, 51 of 87 Example: eBay Auctions of Palm Handheld Computers  To construct a confidence interval using the t-distribution, we must assume a random sample from an approximately normal population of selling prices Agresti/Franklin Statistics, 52 of 87 Example: eBay Auctions of Palm Handheld Computers    Let µ denote the population mean for the “buy-it-now” option The estimate of µ is the sample mean: x = $233.57 The sample standard deviation is: s = $14.64 Agresti/Franklin Statistics, 53 of 87 Example: eBay Auctions of Palm Handheld Computers  The 95% confidence interval for the “buy-itnow” option is: s 14.64 x  t.025 ( )  233.57  2.44( ) n 7  which is 233.57 ± 13.54 or (220.03, 247.11) Agresti/Franklin Statistics, 54 of 87 Example: eBay Auctions of Palm Handheld Computers  The 95% confidence interval for the mean sales price for the bidding only option is: (220.70, 242.52) Agresti/Franklin Statistics, 55 of 87 Example: eBay Auctions of Palm Handheld Computers  Notice that the two intervals overlap a great deal: • “Buy-it-now”: (220.03, 247.11) • Bidding only: (220.70, 242.52)  There is not enough information for us to conclude that one probability distribution clearly has a higher mean than the other Agresti/Franklin Statistics, 56 of 87 How Do We Find a t- Confidence Interval for Other Confidence Levels?  The 95% confidence interval uses t.025 since 95% of the probability falls between - t.025 and t.025  For 99% confidence, the error probability is 0.01 with 0.005 in each tail and the appropriate t-score is t.005 Agresti/Franklin Statistics, 57 of 87 If the Population is Not Normal, is the Method “Robust”?   A basic assumption of the confidence interval using the t-distribution is that the population distribution is normal Many variables have distributions that are far from normal Agresti/Franklin Statistics, 58 of 87 If the Population is Not Normal, is the Method “Robust”?  How problematic is it if we use the tconfidence interval even if the population distribution is not normal? Agresti/Franklin Statistics, 59 of 87 If the Population is Not Normal, is the Method “Robust”?   For large random samples, it’s not problematic The Central Limit Theorem applies: for large n, the sampling distribution is bell-shaped even when the population is not Agresti/Franklin Statistics, 60 of 87 If the Population is Not Normal, is the Method “Robust”?    What about a confidence interval using the t-distribution when n is small? Even if the population distribution is not normal, confidence intervals using t-scores usually work quite well We say the t-distribution is a robust method in terms of the normality assumption Agresti/Franklin Statistics, 61 of 87 Cases Where the t- Confidence Interval Does Not Work  With binary data  With data that contain extreme outliers Agresti/Franklin Statistics, 62 of 87 The Standard Normal Distribution is the t-Distribution with df = ∞ Agresti/Franklin Statistics, 63 of 87 The 2002 GSS asked: “What do you think is the ideal number of children in a family?”  a. b. c. d. The 497 females who responded had a median of 2, mean of 3.02, and standard deviation of 1.81. What is the point estimate of the population mean? 497 2 3.02 1.81 Agresti/Franklin Statistics, 64 of 87  Section 7.4 How Do We Choose the Sample Size for a Study? Agresti/Franklin Statistics, 65 of 87 How are the Sample Sizes Determined in Polls?  It depends on how much precision is needed as measured by the margin of error  The smaller the margin of error, the larger the sample size must be Agresti/Franklin Statistics, 66 of 87 Choosing the Sample Size for Estimating a Population Proportion?    First, we must decide on the desired margin of error Second, we must choose the confidence level for achieving that margin of error In practice, 95% confidence intervals are most common Agresti/Franklin Statistics, 67 of 87 Example: What Sample Size Do You Need For An Exit Poll?  A television network plans to predict the outcome of an election between two candidates – Levin and Sanchez  They will do this with an exit poll that randomly samples votes on election day Agresti/Franklin Statistics, 68 of 87 Example: What Sample Size Do You Need For An Exit Poll?  The final poll a week before election day estimated Levin to be well ahead, 58% to 42% • So the outcome is not expected to be close  The researchers decide to use a sample size for which the margin of error is 0.04 Agresti/Franklin Statistics, 69 of 87 Example: What Sample Size Do You Need For An Exit Poll?  What is the sample size for which a 95% confidence interval for the population proportion has margin of error equal to 0.04? Agresti/Franklin Statistics, 70 of 87 Example: What Sample Size Do You Need For An Exit Poll?  The 95% confidence interval for a population proportion p is: p ˆ  1.96( se)  If the sample size is such that 1.96(se) = 0.04, then the margin of error will be 0.04 Agresti/Franklin Statistics, 71 of 87 Example: What Sample Size Do You Need For An Exit Poll?  Find the value of the sample size n for which 0.04 = 1.96(se): 0.04  1.96 pˆ (1  pˆ n solve algebraically for n : n  (1.96) pˆ (1  pˆ ) /(0.04) 2  Agresti/Franklin Statistics, 72 of 87 2 Example: What Sample Size Do You Need For An Exit Poll?  A random sample of size n = 585 should give a margin of error of about 0.04 for a 95% confidence interval for the population proportion Agresti/Franklin Statistics, 73 of 87 How Can We Select a Sample Size Without Guessing a Value for the Sample Proportion   In the formula for determining n, setting p̂ = 0.50 gives the largest value for n out of all the possible values to substitute for p̂ Doing this is the “safe” approach that guarantees we’ll have enough data Agresti/Franklin Statistics, 74 of 87 Sample Size for Estimating a Population Parameter  The random sample size n for which a confidence interval for a population proportion p has margin of error m (such as m = 0.04) is pˆ (1  pˆ ) z n m 2 2 Agresti/Franklin Statistics, 75 of 87 Sample Size for Estimating a Population Parameter  The z-score is based on the confidence level, such as z = 1.96 for 95% confidence  You either guess the value you’d get for the sample proportion based on other information or take the safe approach of setting p̂ = 0.50 Agresti/Franklin Statistics, 76 of 87 Sample Size for Estimating a Population Mean  The random sample size n for which a 95% confidence interval for a population mean has margin of error approximately equal to m is 4s n m 2 2  To use this formula, you guess the value you’ll get for the sample standard deviation, s Agresti/Franklin Statistics, 77 of 87 Sample Size for Estimating a Population Mean   In practice, since you don’t yet have the data, you don’t know the value of the sample standard deviation, s You must substitute an educated guess for s • You can use the sample standard deviation from a similar study Agresti/Franklin Statistics, 78 of 87 Example: Finding n to Estimate Mean Education in South Africa  A social scientist plans a study of adult South Africans to investigate educational attainment in the black community  How large a sample size is needed so that a 95% confidence interval for the mean number of years of education has margin of error equal to 1 year? Agresti/Franklin Statistics, 79 of 87 Example: Finding n to Estimate Mean Education in South Africa  No prior information about the standard deviation of educational attainment is available  We might guess that the sample education values fall within a range of about 18 years Agresti/Franklin Statistics, 80 of 87 Example: Finding n to Estimate Mean Education in South Africa     If the data distribution is bell-shaped, the range from x – 3s to x + 3 s will contain nearly all the distribution The distance x – 3 s to x + 3s equals 6s Solving 18 = 6s for s yields s = 3 So ‘3’ is a crude estimate of s Agresti/Franklin Statistics, 81 of 87 Example: Finding n to Estimate Mean Education in South Africa  The desired margin of error is m = 1 year  The required sample size is: 2 2 4s 4(3) n   36 m 1 2 2 Agresti/Franklin Statistics, 82 of 87 What Factors Affect the Choice of the Sample Size?   The first is the desired precision, as measured by the margin of error, m The second is the confidence level Agresti/Franklin Statistics, 83 of 87 What Other Factors Affect the Choice of the Sample Size?  A third factor is the variability in the data • If subjects have little variation (that is, s is small), we need fewer data than if they have substantial variation  A fourth factor is financial • Cost is often a major constraint Agresti/Franklin Statistics, 84 of 87 What if You Have to Use a Small n?  The t- methods for a mean are valid for any n • However, you need to be extra cautious to look for extreme outliers or great departures from the normal population assumption Agresti/Franklin Statistics, 85 of 87 What if You Have to Use a Small n?  In the case of the confidence interval for a population proportion, the method works poorly for small samples Agresti/Franklin Statistics, 86 of 87 Constructing a Small-Sample Confidence Interval for a Proportion   Suppose a random sample does not have at least 15 successes and 15 failures The confidence interval formula: p ˆ z p ˆ (1  p ˆ) n ● Is still valid if we use it after adding ‘2’ to the original number of successes and ‘2’ to the original number of failures  This results in adding ‘4’ to the sample size n Agresti/Franklin Statistics, 87 of 87

ch_07

Related documents

Products

Support

ch_07

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib