12.2 Hypothesis Test about the Difference Large Sample Case ( n1 30, n2 30 ): I. Motivating Example : Objective: we want to test if the mean scores in two training center are different 1 : the mean score in the first training center. 2 : the mean score in the second training center. We want to test H 0 : 1 2 vs. H a : 1 2 ( H 0 : 1 2 0 vs. H a : 1 2 0 ) with 0.05 . In addition, n1 30, n2 40, x1 82.5, x2 78, 12 82 , 22 102 A sensible statistical procedure would be where c reject H 0 : x1 x2 c not reject H 0 : x1 x2 c is some constant. Next Question: how to determine the value of c ? Answer: control (the probability of making type I error) to determine the value of c . As H 0 is true, 1 2 X1 X 2 N 0, X2 Then, 1 1X2 . the probabilit y that wro ngly reject H 0 P ( H 0 is true but is rejected) P 1 2 ; X 1 X 2 c X X c 2 P 1 X X X1 X 2 1 2 c X z 1X2 c P Z X1 X 2 12 c X 1 X 2 z n1 2 2 22 n2 z 2. Thus, reject H 0 : x1 x2 X 1 X 2 z not reject H 0 : x1 x2 X 1 X 2 z 2 2 is a sensible statistical procedure. Furthermore, denote z x1 x2 X 1 X2 x1 x2 12 n1 Thus, by dividing X 1X2 22 . n2 on the both sides, the above sensible statistical procedure can be simplified to reject H 0 : z z not reject H 0 : z z 2 2 In addition, p - value the probabilit y of making type I error by rejecting H 0 at x1 x2 as 1 2 X X x x2 2 P X 1 X 2 x1 x2 , 1 2 P 1 1 , 1 2 X1 X 2 X1 X 2 P Z z Therefore, in this example, 2 z x1 x 2 X Thus, we reject 1X2 82.5 78 2.09 2.09 1.96 z 0.025 z 8 2 10 2 30 40 2 H 0 . Also, p value P Z z P Z 2.09 0.0366 0.05 . we reject H0 General based on p-value. Case: significance n1 30, n2 30 and level of as As 1 , 2 are known, let z x1 x2 0 X 1 x1 x2 0 X2 12 n1 (I): H 0 : 1 2 0 vs. 22 . n2 H a : 1 2 0 Then, In addition, reject H 0 : z z not reject H 0 : z z p - value PZ z (II): H 0 : 1 2 0 vs. H a : 1 2 0 Then, reject H 0 : z z not reject H 0 : z z 3 In addition, p - value PZ z (III): H 0 : 1 2 0 H a : 1 2 0 vs. Then, In addition, reject H 0 : z z not reject H 0 : z z 2 2 p - value P Z z As 1 , 2 are unknown, let z x1 x2 0 x x2 0 1 sX1 X 2 s12 s22 . n1 n2 (I): H 0 : 1 2 0 vs. H a : 1 2 0 Then, In addition, reject H 0 : z z not reject H 0 : z z p - value PZ z (II): H 0 : 1 2 0 vs. H a : 1 2 0 Then, reject H 0 : z z not reject H 0 : z z In addition, 4 p - value PZ z (III): H 0 : 1 2 0 H a : 1 2 0 vs. Then, reject H 0 : z z not reject H 0 : z z In addition, 2 2 p - value P Z z Example: Consider the following results for two samples randomly taken from two populations. Sample 1 Sample 2 Sample size 64 49 Mean 1150 921 Standard deviation 90 65 Let 1 and 2 be the population means. (a) For 0.05 , test H 0 : 1 2 200 (b) For 0.01, please use p-value to test using the classical hypothesis test. H 0 : 1 2 200 . (c) For 0.05 , please use the confidence interval method to test the hypothesis H 0 : 1 2 200 . [solution:] (a) x1 1150, x2 921, s1 90, s2 65, n1 64, n2 49, 0 200, 0.05 . Then, z x1 x2 0 s12 s22 n1 n2 1150 921 200 90 2 652 64 49 5 1.99 z z0.05 1.645 Therefore, we reject H 0 . (b) p value PZ z PZ 1.99 0.0233 0.01 . Therefore, we do not reject H 0 . (c) A 95% confidence interval for x1 x 2 z 1 2 is s12 s 22 90 2 65 2 1150 921 z 0.025 n1 n 2 64 49 2 229 1.96 14.587 200.41,257.59 Since 200 200.41,257.59 , we reject . H0 . II. Small Sample Case ( n1 30, n2 30 ): Similar to 11.1, two assumptions are made: 1. Both populations have normal distribution. 2. The variance of the populations are equal ( 1 2 22 2 ) Motivating Example : Objective: we want to test if the mean project-completion time using the new software package is shorter than using current technology 1 : the mean project-completion time using the current technology 2 : the mean project-completion time using the new software package. We want to test H 0 : 1 2 vs. H a : 1 2 ( H 0 : 1 2 0 vs. H a : 1 2 0 ) with 0.05 . In addition, 6 n1 12, n2 12, x1 325, x2 288, s1 40, s2 44 . Thus, s 2 p n1 1s12 n2 1s22 n1 n2 2 11 40 2 11 44 2 1768. 12 12 2 A sensible statistical procedure would be reject H 0 : x1 x2 c not reject H 0 : x1 x2 c where c is some constant. The above statistical procedure is equivalent to the following statistical test: reject H 0 : t not reject H 0 : t x1 x2 c c s X1 X 2 s X1 X 2 c s X 1 X 2 c As H 0 is true, X1 X 2 1 2 1 1 S n n 2 1 T n1 n2 2 2 p where S p2 , is the sample statistic with possible values s 2p . Then, 0.05 the probabilit y that wro ngly reject H 0 P ( H 0 is true but is rejected) X1 X 2 P 1 2 1 S p n1 n2 P T n1 n2 2 c c X1 X 2 c P 1 1 1 2 1 s 2p S p n1 n2 n1 n2 7 c tn1 n2 2, tn1 n2 2, 0.05 . Thus, reject H 0 : t x1 x2 s X1 X 2 not reject H 0 : t t n1 n2 2,0.05 x1 x2 1 1 s 2p n1 n2 t n1 n2 2, t n1 n2 2,0.05 t n1 n2 2, is a sensible statistical procedure. In addition, p - value the probabilit y of making type I error by rejecting H 0 at x1 x 2 as 1 2 X1 X 2 x1 x 2 P , 1 2 1 1 1 2 1 s 2p S p n1 n2 n1 n2 PT n1 n2 2 t Therefore, in this example, t x1 x2 1 1 s 2p n1 n2 Thus, we reject 325 288 1 1 1768 12 12 2.16 1.717 t 22,0.05 t n1 n2 2, H0 . Also, p value PT n1 n2 2 t PT 22 2.16 0.0209 0.05 we reject H0 based on p-value. 8 General Case: significance as n1 30, n2 30 and level of t x1 x2 0 sX 1 X 2 x1 x2 0 1 1 . s n n 2 1 2 p (I): H 0 : 1 2 0 vs. H a : 1 2 0 Then, reject H 0 : t t n1 n2 2, not reject H 0 : t t n1 n2 2, In addition, p - value PT n1 n2 2 t (II): H 0 : 1 2 0 vs. H a : 1 2 0 Then, reject H 0 : t t n1 n2 2, not reject H 0 : t t n1 n2 2, In addition, p - value PT n1 n2 2 t (III): H 0 : 1 2 0 vs. Then, 9 H a : 1 2 0 t t n1 n2 2 , reject H 0 : t t n1 n2 2 , not reject H 0 : 2 2 In addition, p - value PT n1 n2 2 t Example: Consider the following results for two samples randomly taken from two normal populations with equal variance Sample 1 Sample 2 Sample size 10 12 Mean 48 44 Standard deviation 9 8 (a) Test H 0 : 1 2 3 vs. H a : 1 2 3 at 0.1 using the classical hypothesis test. (b) Test H 0 : 1 2 4 vs. H a : 1 2 4 at 0.05 using p-value. (c) Test H 0 : 1 2 3 vs. H a : 1 2 3 at 0.05 using the confidence interval method. (d) At 95% confidence, how many data would have to be taken to provide an interval with length 6 given equal sample sizes in two populations? [solution:] (a) n1 10, n2 12, x1 48, x2 44, s1 9, s2 8, 0 3 . Then, s 2 p n1 1s12 n2 1s22 n1 n2 2 9 9 2 11 8 2 71.65 10 12 2 Thus, t x1 x2 0 1 1 s 2p n1 n2 48 44 3 1 1 71.65 10 12 Therefore, we reject H 0 . (b) 0 4 10 1.93 t n n 2, t 20,0.05 1.7247 1 2 2 p value P T n1 n2 2 t P T n1 n2 2 x1 x2 0 1 1 s 2p n1 n2 48 44 4 P T 20 1 1 71.65 12 10 PT 20 2.207 P T 20 2.086 0.05 Therefore, we reject H 0 . (c) A 95% confidence interval for 1 2 is 1 1 1 1 s 2p 48 44 t 20, 0.025 71.65 1 2 2 12 10 n1 n2 4 2.086 3.6243 3.56,11.56 x1 x2 t n n 2, Since 3 3.56,11.56 , we do not reject H 0 . (d) As sample sizes are large and equal sample sizes ( n1 n2 n ) in two populations, the 1 100% confidence interval for 1 2 is x1 x2 z The length of the confidence interval is 2 z z 2 s12 s2 2 z0.025 n n s12 s 22 . n n 2 2 s12 s 22 . Therefore, n n 92 82 1.96 n n 92 82 3 n n 92 82 92 82 1.96 2 3 61.89 n n n 32 1.96 2 s12 s22 z2 n 2 , E : the marginal error 2 E Therefore, n 62 and total 124 data need to be taken. 11 Online Exercise: Exercise 12.2.1 Exercise 12.2.2 12