AMS 572 Lecture Notes 8 Oct. 8th , 2010 Ch8 Inference on two population means (and two population variances). 1. The samples are paired paired sample t-test 2. The samples are independent independent sample t-test a) 12 2 2 pooled-variance t-test b) 12 2 2 unspooled-variance t-test (Note: to check if 12 2 2 , use F-test) 2a.Inference on 2 population means, when both populations are normal. We have 2 independent samples, the population variances are unknown but equal ( 12 2 2 2 ) pooled-variance t-test. Data: iid X1 , , X n1 N ( 1 , 12 ) iid Y1 , Goal: , Yn2 ~ N (2 , 22 ) Compare 1 and 2 1) Point estimator: 1 2 X Y ~ N ( 1 2 , 2) Pivotal quantity: 12 n1 22 n2 ) N ( 1 2 , ( 1 1 2 ) ) n1 n2 Z ( X Y ) ( 1 2 ) ~ N (0,1) 1 1 n1 n2 (n1 1) S12 2 ~ n21 1 , (n2 1) S22 2 not the PQ, since we don’t know 2 ~ n22 1 , and they are independent ( S12 & S 22 are independent because these two samples are independent to each other) W (n1 1) S12 2 (n2 1) S22 2 Definition: W Z12 Z 22 ~ n21 n2 2 i .i .d . Z k2 , when Z i ~ N (0,1) , then W ~ k2 . Definition: t-distribution: Z ~ N (0,1) , W ~ k2 , and Z & W are independent, then T T Z W n1 n2 2 where S p2 ( X Y ) ( 1 2 ) 1 1 n1 n2 (n1 1) S12 2 (n2 1) S 22 2 n1 n2 2 ( X Y ) ( 1 2 ) ~ tn1 n2 2 . 1 1 Sp n1 n2 (n1 1) S12 (n2 1) S22 is the pooled variance. n1 n2 2 This is the PQ of the inference on the parameter of interest ( 1 2 ) 3) Confidence Interval for (1 2 ) Z ~ tk W k 1 P ( t 1 P ( t 1 P ( t n1 n2 2, n1 n2 2, n1 n2 2, T t 2 2 ) 2 1 1 1 1 ( X Y ) ( 1 2 ) t ) Sp n1 n2 2, n1 n2 n1 n2 2 2 1 P( X Y t ( X Y ) ( 1 2 ) t ) n1 n2 2, 1 1 2 Sp n1 n2 Sp n1 n2 2, n1 n2 2, 2 Sp 1 1 1 1 1 2 X Y t ) Sp n1 n2 2, n1 n2 n1 n2 2 This is the 100(1 )% C.I for (1 2 ) 4) Test: Test statistic: T0 = ( X Y ) c0 Sp H 0 : 1 2 c0 a) H a : 1 2 c0 1 1 n1 n2 H0 ~ tn1 n2 2 (The most common situation is c0 =0 H 0 : 1 2 ) At the significance level α, we reject H 0 in favor of H a iff T0 tn1 n2 2, If P value P(T0 t0 | H 0 ) , reject H 0 H 0 : 1 2 c0 b) H a : 1 2 c0 At significance level α, reject H 0 in favor of H a iff T0 tn1 n2 2, If P value P(T0 t0 | H 0 ) , reject H 0 H 0 : 1 2 c0 c) H a : 1 2 c0 At α=0.05, reject H 0 in favor of H a iff | T0 | t n1 n2 2, 2 If P value 2 P(T0 | t0 || H 0 ) , reject H 0 Example 2A. An experiment was conducted to compare the mean number of tapeworms in the stomachs of sheep that had been treated for worms against the mean number in those that were untreated. A sample of 14 worm-infected lambs was randomly divided into 2 groups. Seven were injected with the drug and the remainders were left untreated. After a 6-month period, the lambs were slaughtered and the following worm counts were recorded: Drug-treated sheep 18 43 28 50 16 32 13 Untreated sheep 40 54 26 63 21 37 39 (a). Test at α = 0.05 whether the treatment is effective or not. (b) What assumptions do you need for the inference in part (a)? (c). Please write up the entire SAS program necessary to answer questions raised in (a) and (b). SOLUTION: Inference on two population means. Two small and independent samples. Drug-treated sheep: X 1 28.57 , s12 198.62 , n1 7 Untreated sheep: X 2 40.0 , s 22 215.33 , n2 7 (a) Under the normality assumption, we first test if the two population variances are equal. That is, H 0 : 12 22 versus H a : 12 22 . The test statistic is F0 s12 198.62 0.92 , F6,6, 0.05,U 4.28 and F6,6,0.05, L 1 / 4.28 0.23 . s 22 215.33 Since F0 is between 0.23 and 4.28, we cannot reject H0 . Therefore it is reasonable to assume that 12 22 . Next we perform the pooled-variance t-test with hypotheses H 0 : 1 2 0 versus H a : 1 2 0 t0 X1 X 2 0 sp 1 1 n n2 28.57 40.0 0 1.49 14.39 1 1 7 7 Since t 0 1.49 is greater than t12, 0.05 1.782 , we cannot reject H0. We have insufficient evidence to reject the hypothesis that there is no difference in the mean number of worms in treated and untreated lambs. (b) (1) Both populations are normally distributed (2) 12 22 (c) /*Problem #2B*/ data sheep; input group worms; datalines; 1 18 1 43 1 28 1 50 1 16 1 32 1 13 2 40 2 54 2 26 2 63 2 21 2 37 2 39 ; run; proc univariate data=sheep normal; class group; var worms; title 'Check for normality'; run; proc ttest data=sheep; class group; var worms; title 'Independent samples t-test'; run; proc npar1way data=sheep wilcoxon; class group; var worms; title 'Nonparametric test for two-mean comparisons'; run; Oct. 11th , 2010 2b. Inference on 2 population means, when both populations are normal and the population variances are not equal ( 12 22 ). We have two independent samples. unpooled-variance t-test. 1) P.Q: T ( X 1 X 2 ) ( 1 2 ) S12 S 2 2 n1 n2 ~ tdf Formula for the df: Walch Satterthwaite method: df W12 (W1 W2 ) 2 S12 S2 2 W , W where 1 2 (n1 1) W2 2 (n2 1) n1 n2 or another way to find df (less accurate and more conservative) df min(n1 1, n2 1) 2) 100(1-α)% C.I. for ( 1 2 ) X Y t df , 2 S12 S2 2 n1 n2 3) Test: Test statistic: T0 = ( X Y ) c0 S12 S2 2 n1 n2 H0 ~ tdf H : 2 c0 a) 0 1 H a : 1 2 c0 At α=0.05, reject H 0 in favor of H a iff T0 tdf , If P value P(T0 t0 | H 0 ) , reject H 0 H 0 : 1 2 c0 b) H a : 1 2 c0 At α=0.05, reject H 0 in favor of H a iff T0 tdf , If P value P(T0 t0 | H 0 ) , reject H 0 H 0 : 1 2 c0 c) H a : 1 2 c0 At α=0.05, reject H 0 in favor of H a iff | T0 | t If P value 2 P(T0 | t0 || H 0 ) , reject H 0 4) F-test: iid Data: X1 , , X n1 N ( 1 , 12 ) df , 2 iid , Yn2 ~ N (2 , 22 ) Y1 , 12 H : 1 0 H 0 : 12 22 22 2 2 2 H a : 1 2 H : 1 1 a 22 ① Point estimator: ( 12 12 S12 ) 2 2 2 2 S2 2 ② P.Q: Definition: F-Distribution Let W1 ~ k21 , W2 ~ k22 , and W1 , W2 are independent. Then F (n1 1) S12 12 ~ n21 1 (n2 1) S22 2 2 ~ n22 1 (n1 1) S12 F W1 / k1 ~ Fk1 ,k2 W2 / k2 12 (n2 1) S 22 22 (n1 1) (n2 1) S12 12 ~ Fn1 1,n2 1 S 22 22 Pivotal Quantity ③ Conference Interval for 12 22 1- α = P( Fn1 1,n2 1, L F Fn1 1,n2 1,U ) = P ( FL S12 S12 S2 2 S2 = P( 2 1 2 2 ) FU 2 FL ④ F-test: Test Statistic: S12 H0 F0 2 = ~ Fn1 1,n2 1 S2 12 H : 0 2 1 2 a) 2 H : 1 1 a 22 S12 S2 2 12 22 FU ) At significance level , we will reject H 0 in favor of H a iif. F0 Fn1 1,n2 1, 2,upper or F0 Fn1 1,n2 1, 2,lower 12 H0 : 2 1 2 b) 2 H : 1 1 a 22 Reject H 0 iif F0 Fn1 1,n2 1, ,upper 12 H : 0 2 1 2 c) 2 H : 1 1 a 22 Reject H 0 iff F0 Fn1 1,n2 1, ,lower 5) A trick for the F distribution 1 Fk2 ,k1 F Some F-table only gives the upper bound. If we know Fk1 ,k2 , ,U , how to find If F Fk1 ,k2 , then Fk1 ,k2 , , L ? P( F Fk ,k 1 1 Fk1 ,k2 , , L 2 , , L ) P( 1 1 1 Fk2 ,k1 , ,U ) = P( ) F F Fk1 ,k2 , , L Fk2 ,k1 , ,U 6) Example: A new drug for reducing blood pressure (BP) is compared to an old drug. 20 patients with comparable high BP were recruited and randomized evenly to the 2 drugs. Reductions in BP after 1 month of taking the drugs are as follows: New drug: 0, 10, -3, 15, 2, 27, 19, 21, 18, 10 Old drug: 8, -4, 7, 5, 10, 11, 9, 12, 7, 8 ① Assume both populations are normal, please test at α =0.05 whether the new drug is better than the old one. ② Run the SAS program to do the above tests (including the normality test) Solution: ① n1 =10, X =11.9, S12 =97.43, n2 =10, Y =7.3, S 2 2 =20.01 2 2 H 0 : 1 2 F test: 2 2 H a : 1 2 Test statistic: F0 S12 =4.87> F10,10,0.25,U S22 At α=0.05, reject H 0 and use unpooled-variance t-test H 0 : 1 2 0 T-test: H a : 1 2 0 Test statistic: T0 = X Y S12 S 2 2 n1 n2 =1.34< tdf ,0.05 At α =0.05, fail to reject H 0 and we cannot say that the new drug is better than the old one. ② Data BP; input drug BPR; datalines; 1 0 1 10 1 -3 1 15 1 2 1 27 1 19 1 21 1 18 1 10 2 8 2 -4 2 7 2 5 2 10 2 11 2 9 2 12 2 7 2 8 ; run; PROC UNIVARIATE DATA=BP NORMAL; CLASS DRUG; VAR BPR; RUN; PROC TTEST DATA=BP; CLASS DRUG; VAR BPR; RUN; Selected result: Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling W D W-Sq A-Sq Pr Pr Pr Pr Test 0.955526 0.142059 0.037679 0.238555 --Statistic--- Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling W D W-Sq A-Sq < > > > W D W-Sq A-Sq 0.7339 >0.1500 >0.2500 >0.2500 -----p Value------ 0.805147 0.273266 0.124461 0.776159 Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.0167 0.0334 0.0447 0.0294 *The data of old drug is not normal, use nonparametric test. Method Pooled Satterthwaite Variances Equal Unequal DF 18 12.547 t Value 1.34 1.34 Pr > |t| 0.1962 0.2033 Equality of Variances Method Folded F Num DF Den DF F Value Pr > F 9 9 4.87 0.0274 If the populations are not normal, use nonparametric test for 2 means from 2 independent samples (Wilcoxon Rank Sum test) PROC NPAR1WAY DATA=BP; CLASS DRUG; VAR DRUG; RUN; Selected result: Wilcoxon Two-Sample Test Statistic 55.0000 Normal Approximation Z One-Sided Pr < Z Two-Sided Pr > |Z| -4.3153 <.0001 <.0001 t Approximation One-Sided Pr < Z Two-Sided Pr > |Z| 0.0002 0.0004 *In this problem, we can conclude that the new drug is better than the old one. The result conflicts with the manually calculated one due to the assumption of normality.