Chapter 8 - Comparing Two Population Means 8.1 In this chapter, we will compare the means of two populations. For example, a. Does Advil fight pain better than Tylenol? b. Does Longhorn Steakhouse have a longer wait than Outback Steakhouse? c. Are men older than women when they graduate from college? d. Are SAT scores higher after taking a preparation course than before? e. Are the starting salaries for computer science majors higher than those of marketing majors? The method we first consider requires not only that the samples selected from the two populations be random but that they be independent samples. Independent Samples: Two samples are independent if the values in one sample are not related to values in the other sample. THEOREM. If independent and random samples of sizes n1 and n2 (n1 and n2 30) are drawn from two populations with means 1 and 2 and with standard deviations of 1 and 2, respectively, then the sampling distribution of X 1 - X 2 has the following properties: 1. E( X 1 - X 2 ) = 1 - 2 2. S.D( X 1 - X 2 ) = 1 / n1 2 / n2 3. X 1 - X 2 is approximately distributed. That is, X 1 - X 2 ~ N(1 - 2, 1 / n1 2 / n 2 ) 2 2 2 normally 2 Notes: 1. A CI for 1 - 2 has the form ( X 1 - X 2 ) Z 1 / n1 2 / n2 /2 2 2 2. If 1 and 2 are unknown, we will use s1 and s2 to estimate 1 and 2, respectively. Example 1.The same standardized test is given to students selected at random from two schools. The following table summarizes the results: School A: sample size = 50, mean = 68.7, sd = 6.0 School B: sample size = 40, mean = 65.1, sd = 6.4 (a) Construct a 95% C.I. estimate for 1 - 2. (b) Construct an 84% C.I. estimate for 1 - 2. Solution (a) [1.012, 6.188] (b) [1.738, 5.462]; z = 1.41 Example 2. A small amount of selenium, from 50 to 200 micrograms per day, is considered essential to good health. Suppose that two random samples of 30 adults were selected, one from region A and another from region B of the United States. A day's intake of selenium from both, liquids and solids, was recorded for the 30 adults from region A and region B. The following was observed: Region A: mean = 167.1, sd = 24.3 micrograms. Region B: mean = 140.9, sd = 17.6 micrograms. (a) Construct a 90% C.I. estimate for 1 - 2. (b) Construct a 68% C.I. estimate for 1 - 2. Solution (a) [17.16, 35.24] (b) [20.78, 31.62] z = 0.99 Hypothesis Testing for 1 - 2 Example 3. A sampling of 50 women and 40 men produced a mean age at death for women of 75.5 years and for men of 67.9 years with a sd of 16.2 and 18.3 respectively. At the 0.05 level of significance, do women live longer than men? Solution. Let 1 = mean age for women at death. 2 = mean age for men at death. Step1: State the null and alternative hypotheses. H : 1 - 2 = 0 H : 1 - 2 > 0 0 a Step 2: Write the given information using proper symbols. x1 = 75.5, n1 = 50, s1 = 16.2 x2 = 67.9, n2 = 40, s2 = 18.3, = 0.05 S.D( X 1 - X 2 )= 1 / n1 2 / n2 = 3.6907 2 2 Step 3: Find the test statistic. Z = (VariableSD Mean) = [(75.5 – 67.9) – 0]/3.6907 = 2.06 Step 4: Find the p-value p-value = P[Z > 2.06] = 0.0197 Step 5: Conclusion Since the p-value < , reject the null hypothesis. Yes, women live longer than men. Example 4. Last year a sample of 100 new cars manufactured by a Japanese car-maker averaged 33.8 mpg. This year a sample of 100 new cars of the same car-maker averaged 35.1 mpg. If each sample had a sd of 10 mpg, can the carmaker conclude that they have significantly increased the performance their cars? Use 0.01 as the level of significance. Solution. Let 1 = average mpg of last year’s cars. 2 = average mpg of this year’s cars. Step1: State the null and alternative hypotheses. H : 1 - 2 = 0 H : 1 - 2 < 0 0 a Step 2: Write the given information using proper symbols. x1 = 33.8, n1 = 100, s1 = 10 x2 = 35.1, n2 = 100, s2 = 10, = 0.01 S.D( X 1 - X 2 )= 1 / n1 2 / n2 = 1.414 2 2 Step 3: Find the test statistic. Z = (VariableSD Mean) = [(33.8 – 35.1) – 0]/1.414 = -0.92 Step 4: Find the p-value p-value = P[Z <-0.92] = 0.1788 Step 5: Conclusion Since the p-value >= , do not reject the null hypothesis. No, they have not significantly increased the performance of their cars. Example 5. A reading test is given to an elementary school class that consists of 42 Anglo-American children and 40 Mexican-American children. The means for the Anglo-American and Mexican-American children were 74 and 70, respectively whereas the sds for the Anglo-American and the Mexican-American children were 8 and 10, respectively. Is the difference between the means of the two groups significant at the 0.05 level? Solution. Let 1 = mean score for Anglo-Americans 2 = mean score for Mexican-Americans. Step1: State the null and alternative hypotheses. H : 1 - 2 = 0 H : 1 - 2 0 0 a Step 2: Write the given information using proper symbols. x1 = 74, n1 = 42, s1 = 8 x2 = 70, n2 = 40, s2 = 10, = 0.05 S.D( X 1 - X 2 )= 1 / n1 2 / n2 = 2.006 2 2 Step 3: Find the test statistic. Z = (VariableSD Mean) = [(74. – 70) – 0]/2.006 = 1.994 Step 4: Find the p-value p-value = P[|Z| > 1.994] =2(0.0233) = 0.0466 Step 5: Conclusion Since the p-value < , reject the null hypothesis. Yes, there is a significant difference between the means of the two groups. HW: 9-19(odd), 29, 30 (pp. 396-400) 8.2 Inferences For Two Populations Means Using Independent Samples (Standard Deviations Assumed Equal) Assumptions: 1. Two independent samples of data, 2. Both populations are normally distributed, 3. Variances of both populations are equal, but unknown, 4. Both samples are small (n1, n2 < 30). Because we are assuming that the variances of two populations are equal, we say that there is a = common variance, . In other words, = . The way that we estimate is with a value called the pooled estimate of common variance, s . 2 2 2 1 2 2 2 2 p s 2 p = (n1 1) s12 (n2 1) s 2 2 n1 n2 2 Distribution of the Pooled t-Statistic Under the assumptions above, the variable t= ( x1 x 2) ( 1 2) sp 1 1 n1 n2 has the t-distribution with df = n1 + n2 – 2 (See page 403) Under the assumptions 1-4 above, A CI for 1 - 2 is ( x1 x2 ) t s /2 p 1 1 n1 n2 where 1 - is the confidence level. Example 6. A survey of recent graduates with degrees in Marketing and Accounting revealed the following starting salaries in thousands of dollars. Marketing: 24.6, 27.8, 25.8, 20.8, 23.1, 25.2 24.7, 26.1, 22.6, 23.8 Accounting: 26.7, 24.9, 27.1, 23.1, 27.5, 27.4, 24.9, 26.3, 28.9, 26.4, 28.1, 29.7, 25.5, 24.9, 28.5 Is there an evidence at the 5% significance level that Accounting majors have higher starting salaries than Marketing majors? Solution 1 = mean salary of Marketing majors 2 = mean salary of Accounting majors. Step1: State the null and alternative hypotheses. H : 1 - 2 = 0 H : 1 - 2 < 0 0 a Step 2: Write the given information using proper symbols. x1 = 24.45, n1 = 10, s1 = 1.98 x2 = 26.66, n2 = 15, s2 = 1.79, = 0.05, df = 10+15 – 2 = 23 The pooled estimate of common variance, s = (n1 1)ns11 n2(n22 1)s2 = 3.482 2 2 p 2 S.D( X 1 - X2 )= s p 1 1 n1 n2 = 0.762 Step 3: Find the test statistic. t= ( x1 x 2) ( 1 2) sp 1 1 n1 n2 = -2.90 Step 4: Find the p-value P-value = P[t < -2.90] = .004 Step 5: Conclusion Since the p-value < 0.05, reject the null hypothesis. Yes, there is strong evidence that Accounting majors have higher starting salaries than Marketing majors. Example 7. A company assembles large electronic devices. The assembly method they use is rather simple and has been used for a long time. Recently, the company has become aware of a new assembly technique. The company is basically satisfied with their current method of assembly. However, if the company finds that the new method significantly reduces assembly time, they would adopt this new method. The company decides to do a test to compare the two methods over the course of a day. One production line will use the current method and the other production line will try the new method. The data (in minutes) are shown below: Current: 37.3, 55.9, 45.4, 52.2, 39.1, 34.4 New : 47.4, 57.3, 39.8, 60.4, 46.7, 36.4 Should the company adopt the new method? Use a significance level of 10%. Solution 1 = mean assembly time with the current method. 2 = mean assembly time with the new method. Step1: State the null and alternative hypotheses. H : 1 - 2 = 0 H : 1 - 2 >0 0 a Step 2: Write the given information using proper symbols. x1 = 44.05, n1 = 6, s1 = 8.62 x2 = 48.00, n2 = 6, s2 = 9.42, = 0.10, df = 6+6 – 2 = 10 The pooled estimate of the common variance, s = (n1 1)ns11 n2(n22 1)s2 = 81.52 2 2 2 p S.D( X 1 - X2 )= s p 1 1 n1 n2 = 5.21 Step 3: Find the test statistic. t= ( x1 x 2) ( 1 2) sp 1 1 n1 n2 = -0.758 Step 4: Find the p-value p-value = P(t > -0.76) = 0.77 Step 5: Conclusion Since the p-value >= 0.10, do not reject the null hypothesis. The company should not adopt the new method. Example 8. Find a 90% CI for 1 - 2. Assume that the variance of both populations to be the same. You are given the following information: = 42.13 , n1 = 10, s1 = 0.685 x2 = 43.23, n2 = 10, s2 = 0.750, df = 10+10 – 2 = 18 s = (n1 1)ns11 n2(n22 1)s2 = 0.516 x1 2 2 2 p S.D( X 1 - X2 )= s p 1 1 n1 n2 = 0.321 A CI for 1 - 2 is ( x1 x2 ) t s /2 p 1 1 n1 n2 = (42.13 – 43.23) 1.734(0.321) = -1.10 0.56 = [-1.66, -0.54] HW: 11, 17, 18 and 21 (pp. 406-408)