Chapter 24 Comparing Means: Confidence Intervals and Hypotheses Tests for the Difference between Two Population Means µ1 - µ2 1 Confidence Intervals for the Difference between Two Population Means µ1 - µ2: Independent Samples • Two random samples are drawn from the two populations of interest. • Because we compare two population means, we use the statistic x1 x 2 . 2 Population 1 Population 2 Parameters: µ1 and 12 Parameters: µ2 and 22 (values are unknown) (values are unknown) Sample size: n1 Statistics: x1 and s12 Sample size: n2 Statistics: x2 and s22 Estimate µ1 µ2 with x1 x2 3 Sampling distribution model for x1 x2 ? E ( x1 x2 ) m1 m2 SD ( x1 x2 ) 12 n1 22 Estimate using SE ( x1 x2 ) n2 Shape? 2 s s n1 n2 df 2 2 2 2 1 s1 1 s2 n1 1 n1 n2 1 n2 2 1 2 2 s12 s22 n1 n2 Sometimes used (not always very good) estimate of the degrees of freedom is min(n1 − 1, n2 − 1). df s12 s22 n1 n 2 m1-m2 x1 x2 Confidence Interval for m1 – m2 Confidence interval s2 s2 ( x x ) tdf * 1 2 1 2 n n 1 2 where tdf * is the value from the t-table that corresponds to the confidence level 2 s s n1 n2 df 2 2 2 2 1 s1 1 s2 n1 1 n1 n2 1 n2 2 1 2 2 5 Example: “Cameron Crazies”. Confidence interval for m1 – m2 Do the “Cameron Crazies” at Duke home games help the Blue Devils play better defense? Below are the points allowed by Duke (men) at home and on the road for the conference games from a recent season. Pts allowed 44 at home 56 44 54 75 101 91 81 Pts allowed on road 56 70 74 80 67 65 79 58 home: x1 68.25 s1 21.8 n1 8 road: x2 68.63 s2 8.9 n2 8 6 Example: “Cameron Crazies”. Confidence interval for m1 – m2 Calculate a 95% CI for m1 - m2 where m1 = mean points per game allowed by Duke at home. m2 = mean points per game allowed by Duke on road • n1 = 8, n2 = 8; s12= (21.8)2 = 475.36; s22 = (8.9)2 = 79.41 2 2 s s 475.36 79.41 n n 8 8 1 2 df 9.27 2 2 2 2 2 2 1 s1 1 s2 1 475.36 1 79.41 7 8 7 8 n1 1 n1 n2 1 n2 2 1 2 2 7 Example: “Cameron Crazies”. Confidence interval for m1 – m2 • To use the t-table let’s use df = 9; t9* = 2.2622 • The confidence interval estimator for the difference between two means is … ( x x ) t9* 1 2 s2 s2 1 2 n n 1 2 475.36 79.41 8 8 .38 18.84 19.22,18.46 (68.25 68.63) 2.2622 8 Interpretation • The 95% CI for m1 - m2 is (-19.22, 18.46). • Since the interval contains 0, there appears to be no significant difference between m1 = mean points per game allowed by Duke at home. m2 = mean points per game allowed by Duke on road • The Cameron Crazies appear to have no affect on the ABILITY of the Duke men to play defense. How can this be? 9 Beware!! Common Mistake !!! A common mistake is to calculate a one-sample confidence interval for m1, a one-sample confidence interval for m2,and to then conclude that m1 and m2 are equal if the confidence intervals overlap. This is WRONG because the variability in the sampling distribution for x1 x 2 from two independent samples is more complex and must take into account variability coming from both samples. Hence the more complex formula for the standard error. SE s12 s22 n1 n2 INCORRECT Two single-sample 95% confidence intervals: The confidence interval for the male mean and the confidence interval for the female mean overlap, suggesting no significant difference between the true mean for males and the true mean for females. Male Male interval: (18.68, 20.12) Female mean 19.4 17.9 st. dev. s 2.52 3.39 n 50 50 Female interval: (16.94, 18.86) CORRECT The 2-sample 95% confidence interval of the form ( y1 y2 ) t * .025, df s12 n1 s22 n2 for the difference mmale m female between the means is (.313, 2.69). Interval is entirely positive, suggesting significant difference between the true mean for males and the true mean for females (evidence that true male mean is larger than true female mean). 0 .313 1.5 2.69 Reason for Contradictory Result It's always true that a b a b . Specifically, 2 1 2 2 s s s1 s2 n1 n2 n1 n2 SE ( x1 x2 ) SE ( x1 ) SE ( x2 ) 12 Does smoking damage the lungs of children exposed to parental smoking? Forced vital capacity (FVC) is the volume (in milliliters) of air that an individual can exhale in 6 seconds. FVC was obtained for a sample of children not exposed to parental smoking and a group of children exposed to parental smoking. Parental smoking FVC Yes No x s n 75.5 9.3 30 88.2 15.1 30 We want to know whether parental smoking decreases children’s lung capacity as measured by the FVC test. Is the mean FVC lower in the population of children exposed to parental smoking? Parental smoking FVC Yes No x s n 75.5 9.3 30 88.2 15.1 30 2 s s n1 n2 df 48.23 2 2 2 2 1 s1 1 s2 n1 1 n1 n2 1 n2 2 1 2 2 95% confidence interval for (µ1 − µ2), with df = 48.23 t* = 2.0104: s12 s22 ( x1 x2 ) t * n1 n2 m1 = mean FVC of children with a smoking parent; m2 = mean FVC of children without a smoking parent 9.32 15.12 (75.5 88.2) 2.0104 30 30 12.7 2.0104*3.24 12.7 6.51 (19.21, 6.19) We are 95% confident that lung capacity is between 19.21 and 6.19 milliliters LESS in children of smoking parents. Do left-handed people have a shorter life-expectancy than right-handed people? Some psychologists believe that the stress of being lefthanded in a right-handed world leads to earlier deaths among left-handers. Several studies have compared the life expectancies of lefthanders and right-handers. One such study resulted in the data shown in the table. Handedness Mean age at death Left Right star left-handed quarterback Steve Young x s n 66.8 25.3 99 75.2 15.1 888 left-handed presidents We will use the data to construct a confidence interval for the difference in mean life expectancies for left- handers and right-handers. Is the mean life expectancy of left-handers less than the mean life expectancy of right-handers? Handedness Mean age at death s n Left 66.8 25.3 99 Right 75.2 15.1 888 95% confidence interval for (µ1 − µ2), with df = 105.92 t* = 1.9826: s12 s22 ( x1 x2 ) t * n1 n2 (25.3) 2 (15.1) 2 (66.8 75.2) 1.9826 99 888 8.4 1.9826* 2.59 8.4 5.13 (13.53, 3.27) The “Bambino”,left-handed Babe Ruth, baseball’s all-time best player. m1 = mean life expectancy of left-handers; m2 = mean life expectancy of right-handers We are 95% confident that the mean life expectancy for lefthanders is between 3.27 and 13.53 years LESS than the mean life expectancy for right-handers. The null hypothes H is that both Two-sample t-test population means m and m are equal, 0 1 2 thus their difference is equal to zero. H 0 : m1 m2 0 0,1 tail H A : m1 - m2 0,1 tail 0,2 tail test statistic: t P-value=P(t < t0) P-value=P(t > t0) ( x1 x2 ) ( m1 m2 ) s12 s22 n1 n2 Because in a two-sample test H0 says (m1 − m2) 0, the test statistic is … P-value=2P(t > |t0|) t ( x1 x2 ) (0) 2 1 2 2 s s n1 n2 Does smoking damage the lungs of children exposed to parental smoking? Forced vital capacity (FVC) is the volume (in milliliters) of air that an individual can exhale in 6 seconds. FVC was obtained for a sample of children not exposed to parental smoking and a group of children exposed to parental smoking. FVC x Parental smoking s n Yes 75.5 9.3 30 No 88.2 15.1 30 We want to know whether parental smoking decreases children’s lung capacity as measured by the FVC test. Is the mean FVC lower in the population of children exposed to parental smoking? Parental smoking FVC Yes No x s n 75.5 9.3 30 88.2 15.1 30 H0: m1 − m2 = 0 df = 48.23 t 2 1 2 2 s s n1 n2 75.5 88.2 2 2 2 m1 = mean FVC of children with a smoking parent; m2 = mean FVC of children without a smoking parent Ha: m1 − m2 < 0 x1 x2 2 s s n1 n2 df 48.23 2 2 2 2 1 s1 1 s2 n1 1 n1 n2 1 n2 2 1 2 9.3 15.1 30 30 P-value=P(t<-3.9) .0001 12.7 t 3.9 2.9 7.6 Conclusion: Reject H0. Lung capacity is significantly impaired in children of smoking parents. Recall the 95% CI for m1 − m2: (19.21, 6.19) Can directed reading activities in the classroom help improve reading ability? A class of 21 third-graders participates in these activities for 8 weeks while a control classroom of 23 third-graders follows the same curriculum without the activities. After 8 weeks, all children take a reading test (scores in table). H 0 : m1 m2 0 H A : m1 m2 0 t 51.48 41.52 2 11.01 17.15 21 23 df = 37.86 2 2.31 1 = mean test score of activities participants 2 = mean test score of controls P-value=P(t37.86 > 2.31) = .013 There is evidence that reading activities improve reading ability. Robustness The two-sample t procedures are more robust than the one- sample t procedures. They are the most robust when both sample sizes are equal and both sample distributions are similar. But even when we deviate from this, two-sample tests tend to remain quite robust. When planning a two-sample study, choose equal sample sizes if you can. As a guideline, a combined sample size (n1 + n2) of 40 or more will allow you to work even with the most skewed distributions. Pooled two-sample procedures There are two versions of the two-sample t-test: one assuming equal variance (“pooled 2-sample test”) and one not assuming equal variance (“unequal” variance, as we have studied) for the two populations. They have slightly different formulas and degrees of freedom. Two normally distributed populations with unequal variances The pooled (equal variance) twosample t-test was often used before computers because it has exactly the t distribution for degrees of freedom n1 + n2 − 2. However, the assumption of equal variance is hard to check, and thus the unequal variance test is safer. Pooled two-sample procedures (cont.) When both population have the same standard deviation, the pooled estimator of σ2 is: The sampling distribution for x1 x2 has exactly the t distribution with (n1 + n2 − 2) degrees of freedom. A level C confidence interval for µ1 − µ2 is (with area C between −t* and t*) To test the hypothesis H0: µ1- µ2 = 0 against a one-sided or a two-sided alternative, compute the pooled two-sample t statistic for the t(n1 + n2 − 2) distribution. Which type of test? One sample, paired samples, two samples? • Comparing vitamin content of bread immediately after baking vs. 3 days later (the same loaves are used on day one and 3 days later). • an oral contraceptive? Comparing a Paired group of women not using an oral • Comparing vitamin content of bread contraceptive with a group taking it. immediately after baking vs. 3 days Two samples later (tests made on independent loaves). Two samples • Average fuel efficiency for 2005 vehicles is 21 miles per gallon. Is average fuel efficiency higher in the new generation “green vehicles”? One sample Is blood pressure altered by use of • Review insurance records for dollar amount paid after fire damage in houses equipped with a fire extinguisher vs. houses without one. Was there a difference in the average dollar amount paid? Two samples