155S7.5_3 Estimating a Population Variance November 01, 2010 Key Concept MAT 155 Statistical Analysis Dr. Claude Moore Cape Fear Community College This section we introduce the chi­square probability distribution so that we can construct confidence interval estimates of a population standard deviation or variance. We also present a method for determining the sample size required to estimate a population standard deviation or variance. Chapter 7 Estimates and Sample Sizes 7­1 Review and Preview 7­2 Estimating a Population Proportion 7­3 Estimating a Population Mean: σ Known 7­4 Estimating a Population Mean: σ Not Known 7­5 Estimating a Population Variance Chi­Square Distribution Chi­Square Distribution 2 In a normally distributed population with variance σ , assume that we randomly select independent samples of size n and, for each sample, compute the sample variance s2 (which is the square of the sample standard deviation s). The sample statistic (pronounced chi­ square) has a sampling distribution called the chi­ square distribution. where n = sample size s 2 = sample variance σ2 = population variance degrees of freedom = n – 1 1 155S7.5_3 Estimating a Population Variance Properties of the Distribution of the Chi­Square Statistic 1. The chi­square distribution is not symmetric, unlike the normal and Student t distributions. As the number of degrees of freedom increases, the distribution becomes more symmetric. November 01, 2010 Properties of the Distribution of the Chi­Square Statistic – cont. 2. The values of chi­square can be zero or positive, but they cannot be negative. 3. The chi­square distribution is different for each number of degrees of freedom, which is df = n – 1. As the number of degrees of freedom increases, the chi­square distribution approaches a normal distribution. In Table A­4, each critical value of corresponds to an area given in the top row of the table, and that area represents the cumulative area located to the right of the critical value. Chi­Square Distribution Chi­Square Distribution for df = 10 and df = 20 Example Example Critical Values of the Chi­Square A simple random sample of ten voltage levels is obtained. Construction of a confidence interval for the population standard deviation σ requires the left and right critical values of corresponding to a confidence level of 95% and a sample size of n = 10. Find the critical value of separating an area of 0.025 in the left tail, and find the critical value of separating an area of 0.025 in the right tail. 2 155S7.5_3 Estimating a Population Variance Example For a sample of 10 values taken from a normally distributed population, the chi­square statistic = (n – 1)s2/σ2 has a 0.95 probability of falling between the chi­square critical values of 2.700 and 19.023. Instead of using Table A­4, technology (such as STATDISK, Excel, and Minitab) can be used to find critical values of . A major advantage of technology is that it can be used for any number of degrees of freedom and any confidence level, not just the limited choices included in Table A­4. November 01, 2010 Estimators of σ 2 The sample variance s2 is the best point estimate of the population variance σ2. Estimators of σ The sample standard deviation s is a commonly used point estimate of σ (even though it is a biased estimate). Confidence Interval for Estimating a Population Standard Deviation or Variance σ = population standard deviation s = sample standard deviation n = number of sample values Confidence Interval for Estimating a Population Standard Deviation or Variance Requirements: 1. The sample is a simple random sample. 2. The population must have normally distributed values (even if the sample is large). = left­tailed critical value of σ 2 = population variance s 2 = sample variance E = margin of error = right­tailed critical value of 3 155S7.5_3 Estimating a Population Variance Confidence Interval for Estimating a Population Standard Deviation or Variance Confidence Interval for the Population Variance σ 2 Procedure for Constructing a Confidence Interval for σ or σ 2 1. Verify that the required assumptions are satisfied. 2. Using n – 1 degrees of freedom, refer to Table A­4 or use technology to find the critical values and that correspond to the desired confidence level. 3. Evaluate the upper and lower confidence interval limits using this format of the confidence interval: November 01, 2010 Confidence Interval for Estimating a Population Standard Deviation or Variance Confidence Interval for the Population Standard Deviation σ Procedure for Constructing a Confidence Interval for σ or σ 2 ­ cont 4. If a confidence interval estimate of σ is desired, take the square root of the upper and lower confidence interval limits and change σ 2 to σ. 5. Round the resulting confidence level limits. If using the original set of data to construct a confidence interval, round the confidence interval limits to one more decimal place than is used for the original set of data. If using the sample standard deviation or variance, round the confidence interval limits to the same number of decimals places. 4 155S7.5_3 Estimating a Population Variance Confidence Intervals for Comparing Data Caution Confidence intervals can be used informally to compare the variation in different data sets, but the overlapping of confidence intervals should not be used for making formal and final conclusions about equality of variances or standard deviations. November 01, 2010 Example: The proper operation of typical home appliances requires voltage levels that do not vary much. Listed below are ten voltage levels (in volts) recorded in the author’s home on ten different days. These ten values have a standard deviation of s = 0.15 volt. Use the sample data to construct a 95% confidence interval estimate of the standard deviation of all voltage levels. 123.3 123.5 123.7 123.4 123.6 123.5 123.5 123.4 123.6 123.8 Example: Example: Requirements are satisfied: simple random sample and normality n = 10 so df = 10 – 1 = 9. Use table A­4 to find: Construct the confidence interval: n = 10, s = 0.15 5 155S7.5_3 Estimating a Population Variance Example: Evaluation the preceding expression yields: Finding the square root of each part (before rounding), then rounding to two decimal places, yields this 95% confidence interval estimate of the population standard deviation: Based on this result, we have 95% confidence that the limits of 0.10 volt and 0.27 volt contain the true value of σ. Determining Sample Sizes November 01, 2010 Determining Sample Sizes The procedures for finding the sample size necessary to estimate s2 are much more complex than the procedures given earlier for means and proportions. Instead of using very complicated procedures, we will use Table 7­2. STATDISK also provides sample sizes. With STATDISK, select Analysis, Sample Size Determination, and then Estimate St Dev. Minitab, Excel, and the TI­83/84 Plus calculator do not provide such sample sizes. Example: We want to estimate the standard deviation s of all voltage levels in a home. We want to be 95% confident that our estimate is within 20% of the true value of σ. How large should the sample be? Assume that the population is normally distributed. From Table 7­2, we can see that 95% confidence and an error of 20% for σ correspond to a sample of size 48. We should obtain a simple random sample of 48 voltage levels form the population of voltage levels. 6 155S7.5_3 Estimating a Population Variance Recap November 01, 2010 384/6. Find the critical values and that correspond to the given confidence level and sample size: 95%; n = 20 In this section we have discussed: • The chi­square distribution. • Using Table A­4. • Confidence intervals for the population variance and standard deviation. • Determining sample sizes. 384/8. Find the critical values and that correspond to the given confidence level and sample size: 90%; n = 51 Use the given confidence level and sample data to find a confidence interval for the population standard deviation In each case, assume that a simple random sample has been selected from a population that has a normal distribution. 384/10. Speeds of Drivers Ticketed in a 65 mi/h Zone on the Massachusetts Turnpike 95% confidence; n = 25, x = 81.0 mi/h, s = 2.3 mi/h. 7 155S7.5_3 Estimating a Population Variance Use the given confidence level and sample data to find a confidence interval for the population standard deviation In each case, assume that a simple random sample has been selected from a population that has a normal distribution. 384/12. Reaction Times of NASCAR Drivers 99% confidence; n = 8, x = 1.24 sec, s = 0.12 sec. Assume that each sample is a simple random sample obtained from a normally distributed population. Use Table 7­2 on page 382 to find the indicated sample size. 384/16. Find the minimum sample size needed to be 95% confident that the sample variance is within 20% of the population variance. November 01, 2010 Assume that each sample is a simple random sample obtained from a normally distributed population. Use Table 7­2 on page 382 to find the indicated sample size. 384/14. Find the minimum sample size needed to be 95% confident that the sample standard deviation s is within 30% of σ. Is this sample size practical in most applications? 385/20. Pulse Rates of Men and Women Data Set 1 in Appendix B includes 40 pulse rates of men, and those pulse rates have a mean of 69.4 beats per minute and a standard deviation of 11.3 beats per minute. That data set also includes 40 pulse rates of women, and those pulse rates have a mean of 76.3 beats per minute and a standard deviation of 12.5 beats per minute. a. Construct a 99% confidence interval estimate of the standard deviation of the pulse rates of men. b. Construct a 99% confidence interval estimate of the standard deviation of the pulse rates of women. c. Compare the variation of the pulse rates of men and women. Does there appear to be a difference? It appears that there is no difference between the standard deviation of the pulse rate for men and women because the men's interval is contained in the women's interval. 8 155S7.5_3 Estimating a Population Variance November 01, 2010 386/24. Comparing Waiting Lines 385/22. Designing Theater Seats In the course of designing theater seats, the sitting heights (in mm) of a simple random sample of adult women is obtained, and the results are listed below (based on anthropometric survey data from Gordon, Churchill, et al.). Use the sample data to construct a 95% confidence interval estimate of σ, the standard deviation of sitting heights of all women. Does the confidence interval contain the value of 35 mm, which is believed to be the standard deviation of sitting heights of women? 849 807 821 859 864 877 772 848 802 807 887 815 a. The listed values are waiting times (in minutes) of customers at the Jefferson Valley Bank, where customers enter a single waiting line that feeds three teller windows. Construct a 95% confidence interval for the population standard deviation σ. 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 7.7 7.7 b. The listed values are waiting times (in minutes) of customers at the Bank of Providence, where customers may enter any one of three different lines that have formed at three teller windows. Construct a 95% confidence interval for the population standard deviation σ. 4.2 5.4 5.8 6.2 6.7 7.7 7.7 8.5 9.3 10.0 c. Interpret the results found in parts (a) and (b). Do the confidence intervals suggest a difference in the variation among waiting times? Which arrangement seems better: the single­line system or the multiple­line system? 95% confidence interval for population standard deviation is (24.78, 59.39). 386/24. Comparing Waiting Lines a. The listed values are waiting times (in minutes) of customers at the Jefferson Valley Bank, where customers enter a single waiting line that feeds three teller windows. Construct a 95% confidence interval for the population standard deviation σ. 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 7.7 7.7 b. The listed values are waiting times (in minutes) of customers at the Bank of Providence, where customers may enter any one of three different lines that have formed at three teller windows. Construct a 95% confidence interval for the population standard deviation σ. 4.2 5.4 5.8 6.2 6.7 7.7 7.7 8.5 9.3 10.0 c. Interpret the results found in parts (a) and (b). Do the confidence intervals suggest a difference in the variation among waiting times? Which arrangement seems better: the single­line system or the multiple­line system? 9