CHAPTER 9 – Estimating the Value of a Parameter Section 9.1 – Estimating a Population Parameter A point estimate is the value of a statistic that estimates the value of a parameter. đĨ For example, the point estimate for the population proportion is đĖ = đ where x is the number of individuals in the sample with a specified characteristic and n is the sample size. Example: a Point Estimate of a Population Proportion The Gallup Organization conducted a poll in which a simple random sample of 1015 Americans 18 and older were asked, “Do you consider the amount of federal income tax you have to pay is too high?” Of the 1015 adult Americans surveyed, 458 said yes. Obtain a point estimate for the proportion of Americans 18 and older who believe the amount of federal income tax they pay is too high. đĨ đĖ = đ =____________ Interpretation __________________________________________________________________ Construct and Interpret a Confidence Interval for the Population Proportion In statistics, we construct an interval for a population parameter based on a guess along with a level of confidence. The guess is the point estimate of the population parameter, and the level of confidence plays a role in the width of the interval. A confidence interval for an unknown parameter consists of an interval of numbers based on a point estimate. The level of confidence represents the expected proportion of intervals that will contain the parameter if a large number of different samples is obtained. The level of confidence is denoted (1 – α)·100%. For example, a 95% level of confidence (α = ___________) implies that if 100 different confidence intervals are constructed, each based on a different sample from the same population, we will expect ____________ of the intervals to contain the parameter and 5 not to include the parameter. 1 Review what we know about the model that describes the sampling distribution of đĖ , the sample proportion. • The shape of the distribution of all possible sample proportions is approximately normal provided đđ(1 − đ) ≥_______ the sample size is no more than _____% of the population size, and the data are obtained randomly. • The mean of the distribution of the sample proportion equals the population proportion. That is, đđĖ =_______ • The standard deviation of the distribution of the sample proportion (the standard error) is đđĖ =________ Because the distribution of the sample proportion is approximately normal, we know that 95% of all sample proportions will lie within 1.96 standard deviations of the population proportion, p, and 2.5% of the sample proportions will lie in each tail. It is common to write confidence interval estimates for the population proportion as point estimate ± margin of error The margin of error for a 95% confidence interval for the population proportion is 1.96 â đđĖ This determines the width of the interval. đĖ ± 1.96đđĖ 2 A 95% level of confidence means that 95% of all possible samples result in confidence intervals that include the parameter (and 5% of all possible samples result in confidence intervals that do not include the parameter). (1 − đŧ) â 100% of all sample proportions will result in confidence intervals that contain the population proportion. The sample proportions that are in the tails of the distribution in the figure will not result in confidence intervals that contain the population proportion. The value đ§đŧ/2 is called the critical value of the distribution. It represents the number of standard deviations the sample statistic can be from the parameter and still result in an interval that includes the parameter. Table 1 shows some of the common critical values used in the construction of confidence intervals. Notice that higher levels of confidence correspond to higher critical values. 3 Interpretation of a Confidence Interval A (1 − đŧ) â 100% confidence interval indicates that (1 − đŧ) â 100% of all simple random samples of size n from the population whose parameter is unknown will result in an interval that contains the parameter. Example: Interpreting a Confidence Interval 45.1% of national adults, aged 18 and older surveyed considered the amount of federal income tax they have to pay as too high. Results are based on telephone interviews with a random sample of 1015. There is a 95% confidence with a sampling error of 4 percentage points. point estimate ± margin of error The point estimate is __________, and the margin of error is ___________. The confidence interval is _____________________________ Interpretation: We are 95% confident that___________________________________________ ______________________________________________________________________________ Constructing a (đ − đļ) â đđđ% Confidence Interval for a Population Proportion Suppose that a simple random sample of size n is taken from a population or the data are the result of a randomized experiment. A (1 − đŧ) â 100% confidence interval for p is given by the following quantities đĖ(1−đĖ) Lower bound: đĖ − đđŧ/2 √ đ đĖ(1−đĖ) Upper bound: đĖ + đđŧ/2 √ đ The margin of error, E, in a (1 − đŧ) â 100% confidence interval for a population proportion is given by đ¸ = đđŧ/2 √ đĖ (1 − đĖ ) đ Example: Constructing a Confidence Interval for a Population Proportion In the Parent–Teen Cell Phone Survey conducted by Princeton Survey Research Associates International, 800 randomly sampled 16- to 17-year-olds living in the United States were asked whether they have ever used their cell phone to text while driving. Of the 800 teenagers surveyed, 272 indicated that they text while driving. Obtain a 95% confidence interval for the proportion of 16- to 17-year-olds who text while driving. 4 đĨ đĖ = đ =____________ Check for normality: • • Sample size is less than 5% of the population size and đđ(1 − đ) = ____________________ ≥_______ Because we want a 95% confidence interval, we have đŧ = 1 − ________ = _______, so đđŧ/2 = đ0.05/2 = đ0.025 =______________ The margin of error is _______________ We are 95% confident that the proportion of ________________________________________ is between ___________ and ____________. StatCrunch Command____________________________________________________________ The Effect of Level of Confidence on the Margin of Error We stated earlier that logic suggests that a higher level of confidence leads to a _____________ interval. Example: Role of the Level of Confidence on the Margin of Error For the problem of estimating the proportion of 16- to 17-year-old teenagers who text while driving, determine the effect on the margin of error by increasing the level of confidence from 95% to 99%. đŧ = 1 − ________ = _______, so đđŧ/2 = đ0.01/2 = đ0.005 =______________ The margin of error is _______________ We are 99% confident that the proportion of ________________________________________ is between ___________ and ____________. The Effect of Sample Size on the Margin of Error We know that larger sample sizes produce more precise estimates (the Law of Large Numbers). đĖ(1−đĖ) Given that the margin of error is đ¸ = đđŧ/2 √ đ we can see that increasing the sample size n ___________________ the standard error; so the margin of error decreases. This means that larger sample sizes will result in _____________________ confidence intervals. 5 Determine the Sample Size Necessary for Estimating a Population Proportion within a Specified Margin of Error Sample size needed for a specified margin of error, E, and level of confidence (1 – α): đđŧ/2 2 ) đ = đĖ (1 − đĖ ) ( đ¸ Problem: The formula uses đĖ which depends on n, the quantity we are trying to determine! Two possible solutions: 1. Use an estimate of đĖ based on a pilot study or an earlier study. 2. Let đĖ = 0.5 which gives the largest possible value of n for a given level of confidence and a given margin of error. The sample size required to obtain a (1 – α)·100% confidence interval for p with a margin of error E is given by đđŧ/2 2 ) đ = đĖ (1 − đĖ ) ( đ¸ (rounded up to the next integer), where đĖ is a prior estimate of p. If a prior estimate of p is unavailable, the sample size required is đđŧ/2 2 ) đ = 0.25 ( đ¸ rounded up to the next integer. Example: Determining Sample Size An economist wants to know if the proportion of the U.S. population who commutes to work via carpooling is on the rise. What size sample should be obtained if the economist wants an estimate within 2 percentage points of the true proportion with 90% confidence 6 1. if the economist uses the 2009 estimate of 10% obtained from the American Community Survey? đđŧ 2 đ = đĖ (1 − đĖ ) ( đ¸2 ) =____________________________________________ 2. The economist does not use any prior estimates: đđŧ/2 2 đ = 0.25 ( đ¸ ) =______________________________________________ The effect of not having a prior estimate of p is that the sample size more than doubled! 9.2 Estimating a Population Mean A point estimate is the value of a statistic that estimates the value of a parameter. The sample mean, đĨĖ , is a point estimate of the population mean μ. Example: Computing a Point Estimate of the Population Mean The website fueleconomy.gov allows drivers to report the miles per gallon of their vehicle. The data in Table 2 show the reported miles per gallon of 2014 Toyota Camry automobiles for 16 different owners. Obtain a point estimate of the population mean miles per gallon of a 2014 Toyota Camry. The point estimate of ī is 28.1 miles per gallon. Student’s t-Distribution Suppose a simple random sample of size n is taken from a population that follows a normal distribution. The distribution of đĨĖ − đ đĄ=đ ⁄ đ √ follows Student’s t-distribution with n – 1 degrees of freedom, where đĨĖ is the sample mean and s is the sample standard deviation. Properties of the t-Distribution 7 The t-distribution is different for different degrees of freedom. 1. The t-distribution is centered at 0 and is symmetric about 0. 2. The area under the curve is 1. The area under the curve to the right of 0 equals the area under the curve to the left of 0, which equals ½. 3. As t increases or decreases without bound, the graph approaches, but never equals, zero. 4. The area in the tails of the t-distribution is a little greater than the area in the tails of the standard normal distribution, because we are using s as an estimate of σ, thereby introducing further variability into the t-statistic. 5. As the sample size n increases, the density curve of t gets closer to the standard normal density curve. This result occurs because, as the sample size n increases, the values of s get closer to the values of σ, by the Law of Large Numbers. The notation zīĄ is used to represent the z-score whose area under the normal curve to the right of zīĄ is īĄ. Similarly let tīĄ represent the t-value whose area under the t-distribution to the right of tīĄ is īĄ. 8 Example: Finding t-Values Find the t-value such that the area under the t distribution to the right of the t-value is 0.10, assuming 15 degrees of freedom (df). That is, find t0.10 with 15 degrees of freedom. The value of t0.10 with 15 d.f. is_____________________________ StatCrunch Command ___________________________________________________________ Constructing a (1 – α)100% Confidence Interval for μ Provided • sample data come from a simple random sample or randomized experiment, • sample size is small relative to the population size (n ≤ 0.05N), and • the data come from a population that is normally distributed, or the sample size is large. A (1 – α)·100% confidence interval for μ is given by Lower bound: đĨĖ − đĄđŧ/2 â đ ⁄ √đ Upper bound: đĨĖ + đĄđŧ/2 â đ ⁄ √đ Where đĄđŧ/2 is the critical value with n – 1 degrees of freedom. Because this confidence interval uses the t-distribution, it is often referred to as a t-interval. The margin of error for constructing confidence intervals about a population mean is đ¸ = đĄđŧ/2 â đ ⁄ √đ The Normality Condition When Gosset developed the t-distribution, he assumed the sample data came from a population that is normally distributed. Most distributions are not exactly normal, so we need to verify that the sample data appear “normal enough” before using the t-distribution to construct confidence intervals for the population mean. 9 Small Sample Size Option 1 (The Better Option) • n < 30: Draw a normal probability plot to check the normality condition and a boxplot to check for outliers. If the data appear to come from a population that is approximately normal with no outliers, then it is reasonable to use Student’s t-distribution to construct a confidence interval for a population mean. Option 2 This option relies on the robustness of constructing confidence intervals using Student’s t-distribution. An inferential method is robust if it is accurate despite minor departures from its underlying assumptions (such as the assumption of normalityin Student’s tdistribution). • n < 15: Use Student’s t-distribution to construct confidence intervals about a population mean if the sample data are symmetric with no outliers. The data should not be skewed left or right. This condition may be verified with a boxplot. The median should be in the middle of the box and the whiskers should be of equal length. • 15 īŖ n < 30: Use Student’s t-distribution to construct confidence intervals about a population mean provided the sample data do not have “extreme” skewness and no outliers. Large Sample Size If n īŗ 30, use Student’s t-distribution to construct confidence intervals about a population mean even for skewed distributions. This relies on the Central Limit Theorem (for the distribution of the sample mean to be approximately normal) and the Law of Large Numbers (for the sample standard deviation to be close to the population standard deviation). Example: Constructing a Confidence Interval about a Population Mean The website fueleconomy.gov allows drivers to report the miles per gallon of their vehicle. The data in Table 3 show the reported miles per gallon of 2014 Toyota Camry automobiles for 16 different owners. Treat the sample as a simple random sample of all 2014 Toyota Camry automobiles. Construct a 95% confidence interval for the mean miles per gallon of a 2014 Toyota Camry. Interpret the interval. 10 StatCrunch Commands___________________________________________________________ The correlation between MPG and the expected z-scores is _______________ We are 95% confident that the mean ______________________________________________ is between ___________ and _____________ mpg. Determining the Sample Size n The sample size required to estimate the population mean, µ, with a level of confidence (1– α)·100% with a specified margin of error, E, is given by đ=( đ§đŧ/2 â đ 2 ) đ¸ where n is rounded up to the nearest whole number. Example: Determining Sample Size We again consider the problem of estimating the miles per gallon of a 2014 Toyota Camry. How large a sample is required to estimate the mean miles per gallon within 0.5 mile per gallon with 95% confidence? Substitute the values of z, s, and E into the formula and obtain đ= Round up to ________. This sample size results in an interval estimate of the population mean miles per gallon of a 2014 Toyota Camry with a margin of error of 0.5 mile per gallon with 95% confidence. 11 12