Module 27: Two Sample t-tests With Unequal Variances This module shows how to test the hypothesis that two population means are equal when there is evidence that the requirement that the two populations have the same variance is not met. REVIEWED 19 July 05 /MODULE 27 27 - 1 The General Situation Earlier we tested hypotheses about two population means, based on data from two independent samples under the assumption that the two populations had the same variance. For this situation, we calculated the pooled estimate of the common variance with: s 2 p n1 1s n2 1s n1 1 n2 1 2 1 2 2 27 - 2 We then tested the null hypothesis: H0 : µ1= µ2 vs. H1: µ1 ≠ µ2 with the test statistic: x1 x2 t sp 1 1 n1 n2 27 - 3 What if σ12 ≠ σ22 If the two population variances are not the same, then they are estimated separately and: 2 2 Var ( x1 x2 ) 1 2 n1 n2 is estimated by: s2 s 2 1 2 n1 n 2 27 - 4 When we have two independent random samples from two different populations and there is evidence that the two population variances differ, i.e. 12 ≠ 22, then we can base a test about the difference between the population means 1 and 2 on the following: t x1 x2 2 1 2 2 t( f ) s s n1 n2 27 - 5 To deal with 12 ≠ 22, we are adjusting the degrees of freedom in order to obtain a better approximation than would otherwise be the case. This adjustment requires that we calculate: 2 s s n1 n2 f 2 2 2 2 2 s1 s2 n1 n2 n1 1 n2 1 2 1 2 2 27 - 6 Example Independent random samples were taken from populations of histidine levels for males and females. The measurements were approximately normally distributed . The statistics from the two samples were: n s s2 Males 5 300.8 15,291.7 Females 10 153.2 2,484.62 27 - 7 Test the hypothesis H0: M = F vs. H1: M F 1. The hypothesis: H0: M = F vs H1: M F 2. The assumptions: Independent samples, normal distributions 3. The - level: = 0.05 4. The test statistic: t x1 x2 2 1 2 2 s s n1 n2 t f 27 - 8 5. The critical region: Reject if t is not between t0.975(f), where 2 15, 291.7 2, 484.6 5 10 f 2 2 2 15, 291.7 2, 484.6 10 5 5 1 10 1 so that f = 6.99-2 = 4.99. We will use f = 5, and t0.975(5) = 2.57058. 27 - 9 6. The result: 300.8 153.2 t 15, 291.7 2, 486.6 5 10 147.6 t 2.57 57.50 7. The conclusion: The value calculated for t above is very close to the cut point so that P 0.05 27 - 10 99% Confidence Interval s12 s22 s12 s22 C x1 x2 t1 / 2 ( f ) 1 2 x1 x2 t1 / 2 ( f ) 0.99 n1 n2 n1 n2 for n1 = 5, n2 =10, for which x1 300.8 , s12 15, 291.70 x2 153.2 , s22 2484.62 s12 s22 3, 306.8 n1 n2 f = 5, t0.995(5) = 4.0321 27 - 11 We have C 300.8 153.2 4.0321 3,306.8 1 2 300.8 153.2 4.0321 3,306.8 0.99 C147.6 231.87 1 2 147.6 231.87 0.99 C 84.27 1 2 379.47 0.99 27 - 12 Example: Am J Public Health, 1996;Oct:1436 27 - 13 27 - 14 Example: Am J Public Health, 1999;Dec:1692 Table 1 in the article by Sargent JD et al provides data on blood lead levels for samples from two communities. n x SD Providence (1) 136 6.7 2.06 Test the hypothesis that 12 6.0 Test the hypothesis that 1 2 Test the hypothesis that 12 22 Worcester (2) 153 5.4 1.10 27 - 15