ADMS 2320: Business Statistics Chapter 10: Introduction To Estimation We use sample data to estimate a population parameter by using estimators. Confidence interval estimator of 𝜇. It is used to estimate the population mean when the population standard deviation is known. 𝑥̅ - z𝛼/2 𝜎 √𝑛 < 𝜇 < 𝑥̅ + z𝛼/2 𝜎 √𝑛 The probability 1 - 𝛼 is called the confidence level; CI = 1 - 𝛼 𝑥̅ - z𝛼/2 𝜎 √𝑛 is called the lower confidence limit (LCL). 𝑥̅ + z𝛼/2 𝜎 √𝑛 is called the upper confidence limit (UCL). The two side of the confidence interval formula is based on 𝑥̅ −𝜇 z𝛼/2 = 𝜎/ 𝑛. √ Rearrange the formula, the confidence interval can be also written as: 𝜇 = 𝑥̅ ± z𝛼/2 𝜎 √𝑛 Where for the symbol ±; the “+” represents the upper confidence limit (UCL) and “-” represents the lower confidence limit (LCL). Examples Q10.27: How many rounds of golf do physicians (who play golf) play per year? A survey of 12 physicians revealed the following numbers: 3 41 17 1 33 37 18 15 17 12 29 51 Estimate with 95% confidence the mean number of rounds per year played by physicians, assuming that the number of rounds is normally distributed with a standard deviation of 12. S10.27: This question wants us to estimate the population mean (𝜇) when population SD (𝛔) is known. ̅ ± z𝜶/𝟐 Use confidence interval estimator of 𝜇: 𝜇 = 𝒙 𝝈 √𝒏 CI = 1 – 𝛼 .95 = 1 – 𝛼 𝛼 = .05 𝜎 = 12 𝑥̅ = 𝛴 𝑥𝐼 𝑛 n = 12 = 3 + 41 + 17 + 1 + 33 + 37 + 18 + 15 + 17 + 12 + 29 + 51 = 12 22.83 z𝛼/2 = z.05/2 = z.025 = 1 – .025 = .975 z.025 = 1.96 𝜇 = 𝑥̅ ± z𝛼/2 𝜎 √𝑛 = 22.83 ± z.05/2 12 √12 = 22.83 ± 1.96 12 √12 = 22.83 ± 6.79 16.04 < 𝜇 < 29.62 The mean number of rounds of golf physicians play per year is between 16.04 and 29.62. Q10.37: A statistics professor is in the process of investigating how many classes university students miss each semester. To help answer this question, she took a random sample of 100 university students and asked each to report how many classes he or she had missed in the previous semester. Estimate the mean number of classes missed by all students at the university. Use a 99% confidence level and assume that the population standard deviation is known to be 2.2 classes. S10.37: This question wants us to estimate the population mean (𝜇) when population SD (𝛔) is known. ̅ ± z𝜶/𝟐 Use confidence interval estimator of 𝜇: 𝜇 = 𝒙 𝜎 = 2.2 CI = 99% n = 100 CI = 1 – 𝛼 .99 = 1 – 𝛼 𝛼 = .01 𝑥̅ = 10.21 (from Appendix A of the textbook) 𝜇 = 𝑥̅ ± z𝛼/2 𝜎 √𝑛 𝝈 √𝒏 z𝛼/2 = z.01/2 = z.005 = 1 – .005 = .995 z.005 = 2.57 𝜇 = 𝑥̅ ± z𝛼/2 𝜎 √𝑛 = 10.21 ± (2.57) 2.2 √100 = 10.21 ± 0.5654 9.64 < 𝜇 < 10.78 LCL = 9.64 UCL = 10.78 The mean number of classes missed by all students at the university is between 9.64 and 10.78. Q10.13: a. A statistics practitioner took a random sample of 50 observations from a population with a standard deviation of 25 and computed the sample mean to be 100. Estimate the population mean with 90% confidence. b. Repeat part (a) using a 95% confidence level. c. Repeat part (a) using a 99% confidence level. d. Describe the effect on the confidence interval estimate of increasing the confidence level. S10.13: Determining sample size: z𝛼/2𝜎 2 ) 𝐵 n=( Solving n if the population standard deviation 𝛔, the confidence level 1 - 𝛼, and the bound on the error of estimation B are known. Any non-integer value must be rounded up. z𝛼/2𝜎 2 ) 𝐵 Eg. If n = ( z𝛼/2𝜎 2 ) 𝐵 n=( = 84.41, rounded to 85. = 389.67, rounded to 390. Examples Q10.47: a. Determine the sample size required to estimate a population mean to within 10 units given that the population standard deviation is 50. A confidence level of 90% is judged to be appropriate. b. Repeat part (a) changing the standard deviation to 100. c. Re-do part (a) using a 95% confidence level. Repeat part (a) wherein we wish to estimate the population mean to within 20 units S10.47: z𝜶/𝟐𝝈 2 ); 𝑩 Determining sample size to estimate a Mean: n = ( where B, is the margin of error z𝛼/2 given. a. 𝜎 = 50 B = 10 CI = 90% or 0.90 (CI = 1 - 𝛼) ---> thus 𝛼 = .10 z𝛼/2 = z.10/2 = z.05 = 1 – .05 = .95 z.05 = 1.64 z𝛼/2𝜎 2 ) 𝐵 n=( =( 1.64 ∗50 2 ) 10 b. 𝜎 = 100 = 67.2400 ≈ 68 𝜎 , √𝑛 is z𝛼/2𝜎 2 ) 𝐵 n=( =( 1.64 ∗100 2 ) 10 = 268.9600 (round up) ≈ 269 c. .95= 1 - 𝛼 ---> thus 𝛼 = .05 z𝛼/2 = z.05/2 = z.025 = 1 – .025 = .975 z.025 = 1.96 z𝛼/2𝜎 2 ) 𝐵 n=( =( 1.96 ∗50 2 ) 10 = 96.0400 (round up)≈ 97 d. B = 20 z𝛼/2𝜎 2 ) 𝐵 n=( =( 1.64 ∗50 2 ) 20 = 16.8100 (round up)≈ 17 Q10.59: A statistics professor wants to compare today’s students with those 25 years ago. All his current students’ marks are stored on a computer so that he can easily determine the population mean. However, the marks 25 years ago reside only in his musty files. He does not want to retrieve all the marks and will be satisfied with a 95% confidence interval estimate of the mean mark 25 years ago. If he assumes that the population standard deviation is 12, how large a sample should he take to estimate the mean to within 2 marks? S10.59: We want to find the sample size to estimate the mean. Margin of error estimation is no more than 2. B=2 CI = 95% .95 = 1 - 𝛼 𝛼 = .05 z𝛼/2 = z.05/2 = z.025 = 1 – .025 = .975 z.025 = 1.96 z𝛼/2𝜎 2 ) 𝐵 n=( = (1.96 ∗12)2 22 = 138.2976 (round up) ≈ 139 He should take a sample size of 139 to estimate the mean to within 2 marks. On chapters 10 and 11, we did inference about a population mean 𝜇 when the standard deviation 𝛔 is known. we used z-statistic and z-estimator of 𝜇 This chapter, we will look into a more realistic approach. We are going to use t-statistic and t-estimator instead. Inference about a population mean 𝜇 when the standard deviation 𝛔 is unknown. When the population standard deviation is unknown and the population is normal, the test statistic for testing hypotheses about 𝜇 is 𝑡= 𝑥̅ − 𝜇 𝑠/√𝑛 which is student t-distributed with 𝜈 = 𝑛 − 1 degree of freedom. This formula is similar to z-statistic but instead of using “𝛔”, substitute it with “s”, which is the sample standard deviation. Confidence interval estimator of the population mean 𝜇 when the standard deviation 𝛔 is unknown is x̅ ± 𝑡𝛼/2 𝑠 √𝑛 x̅ - 𝑡𝛼/2 Where x̅ + 𝑡𝛼/2 ; 𝑣=n–1 𝑠 √𝑛 𝑠 √𝑛 is the Lower confidence limit (LCL) is the upper confidence limit (UCL) Examples Q12.15: A random sample of 8 observations was drawn from a normal population. The sample mean and sample standard deviation are x̅ = 40 and s = 10. a. Estimate the population mean with 95% confidence. b. Repeat part (a) assuming that you know that the population standard deviation is 𝛔 = 10. c. Explain why the interval estimate produced in part (b) is narrower than that in part (a). S12.15: a. .95 = 1 – 𝛼 CI = 1 – 𝛼 𝛼 = .05 Since sample standard deviation is given instead of population, use Confidence Interval Estimator of 𝜇 when 𝛔 is unknown: x̅ ± 𝒕𝜶/𝟐 𝒔 √𝒏 ; 𝒗=n–1 𝑡𝛼/2 , 𝑣 = 𝑡𝛼/2 , n – 1 = 𝑡.025, 7 (use t table) 𝜇 = 40 ± 𝑡.025, 7 10 √8 = 40 ± 2.365 (3.536) = 40 ± 8.36 LCL = 31.64 and UCL = 48.36 b. 𝛔 = 10 Since 𝛔 is given, use confidence interval estimator of 𝜇 when 𝛔 is known (from chapter 11): x̅ ± 𝒛𝜶/𝟐 𝝈 √𝒏 𝑧𝛼/2 = 𝑧.05/2 = 𝑧.025 𝜇 = x̅ ± 𝑧𝛼/2 𝜎 √𝑛 = 40 ± 𝑧.025 10 √8 10 √8 = 40 ± 1.96 = 40 ± 6.93 LCL = 33.07 and UCL = 46.93 c. t-Distribution is more widely spread out than the standard normal distribution. :. 𝑧𝛼/2 is smaller than 𝑡𝛼/2 . Q12.19: a. A random sample of 11 observations was taken from a normal population. The sample mean and standard deviation are x̅ = 74.5 and s = 9. Can we infer at the 5% significance level that the population mean is greater than 70? b. Repeat part (a) assuming that you know that the population standard deviation is 9 . c. Explain why the conclusions produced in parts (a) and (b) differ. S12.19: a. n = 11 x̅ = 74.5 H0: 𝜇 = 70 s = 9, and 𝛔 is unknown. 𝛼 = 0.05 H1: 𝜇 > 70 Rejection Region: (right tail test) Since 𝑡𝛼 = 𝑡𝛼 , 𝑣 = 𝑡.05, 11−1 = 𝑡.05, 10 = (from t table) 1.812 Therefore reject H0: 𝑡 > 𝑡𝛼 = 1.812 𝑡= 𝑥̅ − 𝜇 𝑠/√𝑛 = 74.5 −70 9/√11 = 1.6583 ∵ 𝑡=1.6583 < 𝑡𝛼 = 1.812 ∴ Do not reject H0 There is not enough evidence to infer that the population mean is greater than 70. b. H0: 𝜇 = 70 H1: 𝜇 > 70 Population standard deviation 𝛔 is given = 9 (use Z distribution) Rejection Region: (right tail test) Since 𝑧𝛼 = 𝑧.05 = (from Z table) 1.645 Therefore reject H0: 𝑧 > 𝑧𝛼 = 1.645 z= 𝑥̅ − 𝜇 𝜎/√𝑛 = 74.5 −70 9/√11 = 1.6583 ≈ 1.66 Since, z =1.6583 > 𝑧𝛼 = 1.645 ∴ Reject H0 And (optional) p-value = P(z > 1.66) = 1 - .9515 = .0485 (< 0.05, H1 is true) There is enough evidence to infer that the population mean is greater than 70. c. Student t–distribution is more widely spread out than the standard normal Q12.30: University bookstores order books that instructors adopt for their courses. The number of copies ordered matches the projected demands. However, at the end of the semester, the bookstore has too many copies on hand and must return them to the publisher. A bookstore has a policy that the proportion of books returned should be kept as small as possible. The average is supposed to be less than 10%. To see whether the policy is working, random sample of book titles was drawn, and the fraction of the total originally ordered that are returned is recorded and listed here. Can we infer at the 10% significance level that the mean proportion of returns is less than 10%? 4 15 11 7 5 9 4 3 5 8 S12.30: n = 10 𝛼 = 0.10 𝛔 is unknown H0: 𝜇 = 10 H1: 𝜇 < 10 (mean proportion of returns is less than 10%) Sample mean and standard deviation: x̅ = ∑𝑥𝑖 𝑛 ∵ s2 = = 4+15+11+7+5+9+4+3+5+8 10 1 712 − ] [631 10−1 10 = 126.9 9 = 71 10 = 7.1 = 14.1 ∴ s = √14.1 = 3.7550 Rejection Region: (left tail test) 𝑡 < -𝑡𝛼 = -𝑡𝛼 , n – 1 = -𝑡.10, 9 = (t table) -1.383 Test statistic: 𝑡= 𝑥̅ − 𝜇 𝑠/√𝑛 7.1 −10 √10 = 3.7550/ ∵ 𝑡 < -𝑡𝛼 → = -2.4422 -2.4422 < -1.383 ∴ Reject H0 There is enough evidence to infer that the mean population of the return is less than 10%. Inference about a population variance. We are interested in drawing inferences about a population’s variability, so the parameter we need to investigate is the population variance 𝜎 2 Eg. H1: 𝜎 2 > # Test statistic for 𝜎 2 : The test statistic used to test hypotheses about 𝜎 2 is 𝒳2 = (𝑛−1)𝑠2 𝜎2 Which is chi-squared distributed with 𝜈 = 𝑛 − 1 degrees of freedom when the population random variable is normally distributed with variance equal to 𝜎 2 . Confidence interval estimator of 𝜎 2 : Lower Confidence Limit (LCL) = Upper Confidence Limit (UCL) = (𝑛−1)𝑠2 2 𝒳𝛼/2 (𝑛−1)𝑠2 2 𝒳1−𝛼/2 12.71: a. The sample variance of random sample of 50 observations from normal population was found to be s2 = 80. Can we infer at the 1% significance level that 𝛔2 is less than 100? n = 50 s2 = 80 𝛼 = 0.01 H0: 𝛔2 = 100 H1: 𝛔2 < 100 2 2 2 Rejection Region: 𝒳 2 < 𝒳1−𝛼 = 𝒳1−𝛼, = 𝒳.99, ≈ (𝒳 2 table) 29.7 𝑛−1 49 Reject H0: 𝒳 2 < 29.7 𝒳2 = (𝑛−1)𝑠2 𝝈2 = (50−1)80 100 = 39.2 ; p-value = .1596 2 ∵ 𝒳 2=39.2 > 𝒳1−𝛼 = 29.7 ∴ Do not reject H0 There is not enough evidence to conclude that the population variance (𝛔2) is less than 100. b. Repeat part (a) increasing the sample size to 100 n = 100 2 Reject H0: 𝒳 2 < 𝒳1−.01, 𝒳2 = (𝑛−1)𝑠2 𝝈2 = 100−1 (100−1)80 100 2 = 𝒳.99, ≈ (𝒳 2 table) 70.1 99 = 79.2 ; p-value = .0714 2 ∵ 𝒳 2=79.2 > 𝒳1−𝛼 = 70.1 ∴ Fail reject H0 There is not enough evidence to infer that the population variance (𝛔2) is less than 100.