Statistical Estimation of a Population Parameter Statistical estimation involves both point estimation and interval estimation. Point estimation involves a single number computed from the sample data, e.g., the sample mean, as an estimate of the population parameter. It is called a "point" estimate because one point on the real number line is used to estimate the population parameter. Interval estimation involves a range of values which may contain the population parameter. Associated with this interval is a percentage value reflecting the level of confidence that the interval may contain the actual population parameter. Confidence Interval (Interval Estimates) for the Population Mean μ Background—Review of Sampling Distribution of xĖ and Margin of Sampling Error From a population with a mean µ = $190 and standard deviation σ = $64 samples of size n = 100 are selected. The sampling distribution of xĖ , the sample means computed from these samples, will be a normal curve with a mean µxĖ = µ = 190 and standard error se(ðĨĖ ) = ð⁄√ð = 6.4. Determine the boundaries of the interval that contains 95% of all ðĨĖ values computed from such samples. P(ðĨĖ 1 < ðĨĖ < ðĨĖ 2 ) = 0.95 ð§= ðĨĖ − ð se(ðĨĖ ) ðĨĖ 1 = ð − ð§0.025 se(ðĨĖ ) ðĨĖ 2 = ð + ð§0.025 se(ðĨĖ ) P(ð − ð§0.025 se(ðĨĖ ) < ðĨĖ < ð + ð§0.025 se(ðĨĖ )) = 0.95 MOE = ð§0.025 se(ðĨĖ ) = ð§0.025 ð⁄√ð = 1.96(6.4) = 12.5 P(ð − 12.5 < ðĨĖ < ð + 12.5) = 0.95 Thus, 95% of all sample means from random samples of n = 100 would fall within MOE = ±$12.5 from the population mean µ = $190. If 95% of ðĨĖ ’s fall within MOE, then the error probability, the probability that a sample mean falls outside MOE, is α = 5%. Page 1 of 11 P(ð − ð§0.025 se(ðĨĖ ) < ðĨĖ < ð + ð§0.025 se(ðĨĖ )) = 0.95 P(190 − 12.5 < ðĨĖ < 190 + 12.5) = 0.95 Example One of the infinite number of random samples of n = 100 selected from the above population yields the following data: 205 282 142 115 230 106 108 129 123 223 81 243 195 142 160 167 292 153 215 277 149 265 132 254 175 279 93 121 179 292 215 171 105 266 211 112 246 167 175 156 139 160 273 90 262 190 273 205 98 240 225 125 136 160 199 147 199 248 132 118 200 299 210 189 182 123 168 149 207 220 297 92 131 101 120 250 269 290 157 287 207 221 194 144 182 194 106 145 91 161 149 94 293 117 170 202 227 81 247 227 The mean of this sample is xĖ = ïĨx ân = 18363 â 100 = 183.63 This sample mean is one of the 95% of all sample means that fall within the interval µ ± MOE = 190 ± 12.5. Page 2 of 11 Building Intervals Around xĖ Now, using the MOE = $12.5 build an interval around the sample mean xĖ = 183.6: L = 183.6 – 12.5 = 171.1 U = 183.6 + 12.5 = 196.1 This interval contains or includes the population mean µ = 190. Another sample (data not shown) yields a sample mean xĖ = 196.7. Using the same MOE = $12.5, build an interval around this new sample mean. This interval also contains µ = 190 Let’s select a third sample and compute the mean and build an interval around this xĖ . The new mean is xĖ = 181.0 and the new interval is L, U = (168.5, 193.5). This interval also contain µ. Let’s consider an interval that does not contain the population mean. This time xĖ = 204.2 falls outside the MOE. Therefore, the interval L, U = (191.7, 216.7) does not contain or include µ = 190. A situation like this could happen 5% of the time. That is, 5% of sample means would generate intervals that would not contain the population mean. Now you can see that in 95% of cases the intervals built around different sample means will contain µ. The reason for this is clear. Because 95% of xĖ values deviate from the mean by no more MOE = ±$12.5. Each time you build an interval around xĖ using MOE = 12.5, of every 100 of these intervals 95 of them would contain the population mean. Thus, when you select a sample of size n from this population and build an interval around the sample mean, then you are 95% confident that this interval contains the population mean µ. Building a Confidence Interval for µ When σ is Not Known Example 1 To build an interval with a 95% confidence level for vehicle speed on a freeway, a random sample of n = 100 vehicles are clocked and the following sample data were obtained. The data are in miles per hour. 71 68 71 78 76 87 66 83 89 83 73 81 86 84 82 79 79 72 89 82 72 67 66 86 89 74 86 89 76 79 Using Excel, compute the sample mean and standard deviation. Page 3 of 11 86 82 70 90 78 66 80 73 75 65 87 65 76 66 71 75 73 69 90 74 70 80 70 78 66 71 68 82 86 89 72 86 85 79 70 71 73 79 76 83 71 90 77 80 77 81 78 69 79 86 75 71 78 73 71 72 80 76 76 82 78 87 90 65 90 88 88 87 89 73 ðĨĖ = ∑ ðĨĖ = 77.85 ð ∑(ðĨĖ − ðĨĖ )2 ð =√ = 7.41 ð−1 The interval is built around xĖ = 77.85. The specified “confidence level” is 95%. We want to be 95% confident that the interval we build would contain the population mean. This means that the error probability α (the probability that the interval does not contain the mean) is 5%. The formula for the confidence interval is: L , U = xĖ ± MOE MOE = zα/2 se(xĖ ) Given α = 0.05, then zα/2 = z0.025 = 1.96. To compute se(xĖ ), previously we used the formula se(xĖ ) = σ â ïn. But this formula is no longer applicable because σ, the population standard deviation, is not known. Instead of σ we must use s, the standard deviation computed from the sample. The standard error formula then becomes: se(xĖ ) = s âïn se(xĖ ) = 7.41 â ï100 = 0.741 MOE = (1.96)(0.741) = 1.45 mph L, U = 77.85 ± 1.45 = (76.4, 79.3). We can state, with a 95% level of confidence, that the average speed of all vehicles on the freeway is between 76.4 and 79.3 mph. Page 4 of 11 Example 2 To build a 95% confidence interval for the average commuting distance travelled by IUPUI students from their residence to the campus, a random sample of n = 120 students provided the following data (in miles). 15 8 6 8 10 23 22 19 16 6 7 8 10 8 21 7 15 14 13 22 24 2 12 14 13 4 4 25 17 25 7 6 25 9 10 23 19 6 9 12 10 7 1 13 19 20 5 14 22 23 8 6 11 18 12 21 13 18 24 22 14 25 12 5 16 13 23 7 16 7 3 24 20 7 13 1 5 2 12 17 1 1 23 15 13 5 5 8 6 23 12 21 13 9 5 11 12 22 5 15 16 4 9 17 2 2 20 10 17 8 13 4 13 9 15 24 2 17 5 9 Example 3 To develop a 95% confidence interval for the mean amount spent per customer at a famous downtown Indianapolis steakhouse, data were collected for a sample of 108 customers, shown below. The data are rounded to the nearest dollar. 33 34 44 38 54 33 28 45 31 42 46 32 29 56 37 51 44 53 39 30 41 31 43 49 41 32 58 43 32 32 32 54 46 57 39 53 49 34 28 36 56 46 52 45 42 39 53 35 32 52 32 52 31 58 44 32 34 36 47 45 53 28 51 32 54 47 48 40 58 58 47 42 38 54 42 47 53 40 31 44 53 45 48 38 55 57 29 38 45 42 45 57 45 34 57 42 50 33 33 Using Excel, compute the sample mean and standard deviation: Using Excel, compute the sample mean and standard deviation: xĖ = 12.45 xĖ = $42.74 s = $9.15 s = 6.896 The confidence interval for µ: The confidence interval for µ: L , U = xĖ ± MOE MOE = zα/2 se(xĖ ) zα/2 = z0.025 = 1.96 se(xĖ ) = s /ïn = 6.896/ï120 = 0.63 MOE = 1.96(0.63) = 1.23 miles L , U = 12.45 ± 1.23 = (11.22, 13.68) L , U = xĖ ± MOE MOE = zα/2 se(xĖ ) zα/2 = z0.025 = 1.96 se(xĖ ) = s âïn = 9.15 âï108 = 0.88 MOE = 1.96(0.88) = $1.72 L , U = 42.74 ± 1.72 = ($41.02, $44.46) Page 5 of 11 35 54 31 34 33 57 30 41 56 Confidence Intervals For µ Using “Small” Samples When building a confidence interval for an unknown population parameter, to determine the MOE we use a standard error which is computed using the sample standard deviation “s” as an estimate of the population standard deviation σ. Thus, we are using an estimated value, s, to build an interval around another estimated value, xĖ . This tends to lend an extra level of uncertainty to our interval estimate. The consequence of the added level of uncertainty is to widen the margin of statistical error. When the sample size is large, the impact of added level of uncertainty on the MOE is inconsequential and can be ignored. But with small samples we cannot ignore the impact. Margin of Error with Small Samples Generally, when sample size n is 100 or more (n ≥ 100) the sample is considered to be a large sample. Accordingly, the MOE formula is: s MOE = zα/2 n But, if the sample size is less than 100 (n < 100) the sample is considered as a small sample, and an alternative MOE formula is used: s MOE = tα/2, df n The term “tα/2, df” represents the “t distribution”, which is explained next. Page 6 of 11 The z-distribution and t-distribution compared The standard normal z-distribution is a bell-shaped distribution with the mean of zero (0) and standard deviation of one (1). For a given error probability α, say, α = 0.05, zα/2 = z0.025 = 1.96 The t-distribution is also a bell-shaped distribution with the mean of zero. However, the standard deviation of “t” is not 1. The standard deviation of t varies with the degrees of freedom (df) of the distribution. This is why the term df appears in the subscript of the t-distribution. The distribution’s degrees of freedom is defined as df = n – 1 (sample size minus 1). When the distribution’s df = 9, the standard deviation is 1.13. (Don’t worry about the formula to compute the standard deviation of t!). With a bigger standard deviation, the t-scores are more widely dispersed around mean of 0 than z-scores. Therefore, the t-score which bounds a tail area of 0.025 (t = 2.262) is located further away from zero than the z-score bounding the same tail area (z = 1.96). The following table shows that as the sample size (and degrees of freedom) increases, the t-score converges to the z-score. Tail area 0.025 Degrees of Freedom 10 20 30 50 100 200 1000 2.228 2.086 2.042 2.009 1.984 1.972 1.962 You can find the t-score for different tail areas and degrees of freedom using the t-table, or Excel (use the function =TINV). Page 7 of 11 Example 4 To build a 95% confidence interval for the average lifespan of compact fluorescent light bulbs (CFL) a sample of n = 10 light bulbs were tested and following data (in hours) were obtained: 7010 8120 6670 9300 9450 9980 8600 6820 9750 8300 Using Excel or a calculator, the mean and standard deviation of the sample are: xĖ = 8400 s = 1237.435 L , U = xĖ ± MOE s MOE = tα/2, df n n = 10 df = 10 – 1 = 9 α = 0.05 t0.025, 9 = 2.262 MOE = (2.262)(1237.435/ï10) = 885.15 hours L, U = 8400 ± 885.15 = (7514.85, 9285.15) hours Example 5 To build a 95% confidence interval for the average time spent to commute from his residence in Fishers to his office in downtown Indianapolis, Bob kept track of his time in 15 randomly selected days in a three-month period and recorded the following data (in minutes). 42 55 58 63 58 43 62 56 54 48 62 61 53 40 55 xĖ = 54 s = 7.531 se(xĖ ) = 7.531 âï15 = 1.944 t0.025, 14 = 2.145 MOE = 2.145(1.944) = 4.2 minutes L, U = 54 ± 4.2 = (49.8, 58.2) minutes Minimum Sample Size for a Desired Margin of Error (Desired Confidence Interval Width) When building a confidence interval, the narrower the interval the more precise, hence more meaningful or useful, the interval estimate. To make the interval more precise, we must make the MOE smaller. To do so we must increase the sample size. Depending on the nature of the statistical analysis, the sample size varies. The proper sample size depends upon the margin of error desired for the statistical analysis. Example 6 In example 5 a sample size of n = 10 light bulbs yielded a margin of error of MOE = 885.15 hours. Suppose we are interested in a narrower MOE, say, MOE = 200 hours, for a 95% confidence interval. What is the required minimum sample size for this margin of error? The formula to find the minimum sample size is as follows: ð§ðž⁄2 ðĖ 2 ð=( ) ðððļ The new term in the formula is ðĖ (sigma hat), which is called the “planning value” for the standard deviation (see Chapter 5 notes about how to obtain a planning value). For the light bulb example use σĖ = 1200 hours. Page 8 of 11 1.96 × 1200 2 ð=( ) = 138.3 200 n = 139 light bulbs (always round the result UP) Example 7 To build a 95% CI for Bob’s average commuting time in Example 5 with a margin of error of ±2 minutes, what is the minimum number of days Bob should keep track of his commuting time? Use σĖ = 8 minutes for planning value. 1.96 × 8 2 ð=( ) = 61.47 2 n = 62 days Confidence Interval for the Population Proportion π To build a confidence interval for the population proportion first you must compute the sample proportion pĖ from the sample and then determine the margin of error using pĖ . L, U = pĖ ± MOE MOE = zα/2se(pĖ ) To compute se(pĖ ) you must use pĖ as an estimate of π in the standard error formula. ð(1 − ð) se(ð) = √ ð Example 7 To build a 95% confidence interval for the proportion of Indiana residents who smoke cigarettes, in a sample of 750 Hoosiers 195 said they smoked cigarettes regularly. Compute the sample proportion and the margin of error to build the interval. pĖ = 195 â 750 = 0.26 0.26(1 − 0.26) se(ð) = √ = 0.0160 750 Page 9 of 11 α = 1 – 0.95 = 0.05 zα/2 = z0.025 = 1.96 MOE = 1.96(0.016) = 0.031 MOE ≈ 0.03 L, U = 0.26 ± 0.03 = (0.23, 0.29) We are 95% confident that the proportion of Hoosiers who smoke cigarettes is between 0.23 and 0.29. Example 8 In a poll conducted recently, of the 1100 American surveyed 737 said high gas prices have caused them financial hardship. Build a 95% confidence interval for the proportion of all Americans who feel high gas prices have caused them financial hardship. pĖ = 737 â 1100 = 0.67 0.67(1 − 0.67) se(ð) = √ = 0.0142 1100 α = 1 – 0.95 = 0.05 zα/2 = z0.025 = 1.96 MOE = 1.96(0.0142) = 0.028 MOE ≈ 0.03 L, U = 0.67 ± 0.03 = (0.64, 0.70) Minimum Sample Size for a Desired Margin of Error Depending on the context or nature of the statistical investigation, the researcher may consider a specific width or precision for the interval estimate for the population proportion. The narrower the interval estimate, for a given confidence level, the more precise the estimate of the population parameter it is. The precision of the interval estimate depends on the margin of error, which varies inversely with the sample size. p (1 ï p ) n Thus, we can specify the MOE in advance of the study and determine the sample size that would yield that margin of error. We can rearrange the MOE formula to solve for n: MOE = zα/2 ð§ðž⁄2 2 ð=( ) ðĖ(1 − ðĖ) ðððļ Note that in the n formula we have replaced the symbol pĖ with πĖ (pi-hat). We cannot use pĖ because the sample proportion is computed from a sample that is already selected! The symbol πĖ is for the “planning value”. In many cases a planning value of πĖ = 0.50 is used. This would give the largest minimum sample size for a desired margin of error. Example 9 To build a 95% confidence interval with a margin of error of ±0.03 (3 percentage points) for the proportion of likely voters who prefer the candidate Ima Loozer in a statewide election how many likely voters should be randomly contacted? Since this is a two-way race a planning value of 0.50 is used. Page 10 of 11 α = 1 – 0.95 = 0.05 zα/2 = z0.025 = 1.96 πĖ = 0.50 n = 0.5(0.5)(1.96 â0.03)² = 1067.11 n = 1068 (round UP) MOE = 0.03 Example 10 Suppose the election in example 9 is a three-way race and there is a strong third-party candidate Iwanna Winn. What is the minimum sample size for a 95% confidence interval with a margin of error of ±0.03 for the proportion of likely voters who prefer Ima Loozer? This time use a planning value of πĖ = 0.35. n = 0.35(0.65)(1.96 â0.03)² = 971.07 Page 11 of 11 n = 972 (round UP)