1 Chapter 5 – Decision Making for Two Samples Inference about Two Population Means We want to compare the means of two populations to see whether they differ. There are two situations to consider, as shown in the following examples: 1) In an experiment designed to study the effects of illumination level on task performance (“Performance of Complex Tasks Under Different Levels of Illumination,” J. Illuminating Engineering, 1976: 235-242), subjects were required to insert a fine-tipped probe into the eyeholes of ten needles in rapid succession both for a low-light-level with a black background and for a higher level with a white background. It is of interest to compare the mean times for completion of the task under the two different conditions. 2) Compare the mean lifetime, 1, for transistors produced by production line 1 to the mean lifetime, 2, for transistors produced by production line 2. We want to know whether these two means differ. In the first case, we are comparing related means, using dependent samples. For each member of one sample, there is a matched member of the other sample. In the second case, we are comparing unrelated means, using independent samples. There is no natural way to match each member of one sample with a member of the other sample. We will use somewhat different procedures for hypothesis tests, depending on whether our samples are dependent or independent. There is another issue to be considered. Are the variances of the two populations equal or unequal. This issue, of course, did not arise with inference about a single population. We will see that the procedure for inference about the difference between the means depends on the comparison of the variances. Comparing Two Means, Independent Samples We will assume the following: 1) We have selected a random sample from each of the two populations. The r.s., of size n1, from population 1 will be denoted by be i.i.d. with mean denoted by X 11 , X 12 , 1 and variance 12 . X 21 , X 22 , , X 2 n2 . , X 1n1 . These r.v.’s are assumed to The r.s., of size n2, from population 2 will be These r.v.’s are assumed to be i.i.d. with mean 2 2 and variance 2 . 2) 3) n n The two populations are independent. This implies that all of the 1 2 r.v.’s listed above are independent of each other. Either both populations are normal, or the conditions of the Central Limit Theorem apply. (We may also check for normality of each population using normal probability plots with the samples of data.) 2 1 2 , between the population means. We want to estimate the difference, A logical point estimator of this parameter is X 1 X 2 . It is easily shown that this statistic is an unbiased estimator of the parameter. It is also easily shown that the variance of the estimator is V X1 X 2 12 n1 22 n2 . Given these results and the assumptions listed above, it is clear that the random variable X 1 X 2 1 2 12 n1 22 has an approximate standard normal distribution. We want to use n2 this fact to do inference about the difference between the two population means. However, the random variable given above depends on two other unknown parameters. We need to estimate the two population variances. If we can assume equal population variances, then the following statistic: t for S 2 P X 1 X 2 1 2 n1 1s12 n2 1s22 1 2 . n1 n2 2 1 1 n1 n2 may be used to construct a confidence interval Here the quantity n1 1 S12 n2 1 S 22 n1 n2 2 is the pooled variance estimate. Confidence Intervals for Differences Between Population Means We can find confidence interval estimates for the differences between two population means (independent samples), where the two population variances are equal, using the following formula: X 1 X 2 t 2 n1 1s12 n2 1s22 1 ,d . f . n1 n2 2 n 1 1 n2 . In this case, d.f. = n1 + n2 – 2. Example: The accompanying table gives summary data on cube compressive strength (N/mm2) for concrete specimens made with a pulverized fuel-ash mix (“A study of twenty-five-year-old pulverized fuel ash concrete used in foundation structures,” Proceedings of the Institute of Civil Engineers,” 3 28 , in mean compressive Mar. 1985, 149-165). We want to estimate the difference, 7 strengths, with 95% confidence, and interpret this interval estimate. Age (days) 7 28 Sample Size 68 74 Sample Mean 26.99 35.76 Sample SD 4.89 6.43 Since the sample standard deviations do not differ considerably, we assume that the population variances are equal. The pooled variance estimate is n 1s72 n28 1s282 67 4.89 2 73 6.432 s P2 7 33.00206 . The critical value is n7 n28 2 140 t140, 0.025 1.9771 . Then the endpoints of the interval are x7 x28 t140, 0.025 1 1 1 1 26.99 35.76 1.9771 33.00206 10.6780 , and s P2 68 74 n7 n28 1 1 1 1 26.99 35.76 1.9771 33.00206 6.8620 . s P2 68 74 n7 n28 We are 95% confident that the difference between mean compressive strength after 7 days of curing and the mean compressive strength after 28 days of curing is between -10.6780 N/mm2 and -6.8620 N/mm2. In particular, we have a high level of confidence that the mean compressive 28-day strength is higher than the mean compressive 7-day strength. x7 x28 t140, 0.025