AMS 572 Lecture Notes

AMS 572 Lecture Notes Oct. 1, 2013. Review: Inference on two population means 1. Two normal pops,  12 &  22 are known, independent samples. → exact Z P.Q. Z  X  Y  ( 1   2 )  12 n1  22  ~ N 0,1 n2 2. At least one population is not normal, independent samples, both are large ( n1  30, n2  30 ) → approximate Z (1) P.Q. (if the variances are known, by the central limit theorem) Z X  Y  ( 1   2 )  2 1 n1   2 2 ~ N 0,1 n2 (2) P.Q. (if the variances are unknown, by the central limit theorem, and the Slutsky’s theorem: http://en.wikipedia.org/wiki/Slutsky's_theorem) Z X  Y  ( 1   2 ) S12 S 22  n1 n2 ~ N 0,1 3. Two normal pops,  12 &  22 are unknown but equal (  12   22   2 ), independent samples. Pooled variance t (exact) P.Q. T  X  Y  ( 1   2 ) Sp 1 1  n1 n2 ~ t df Exact d. f .  n1  n2  2 (SAS) 4. Two normal pops,  12 &  22 are unknown,  12   22 approximate t 1 P.Q. T  X  Y  ( 1   2 ) S12 S 22  n1 n2 ~ t df More accurate d.f. ⇒ Satterthwaite method (SAS) Quick & less accurate ⇒ d . f .  min( n1  1, n2  1) in-class exam 5. other situations ⇒ nonparametric method Mann-Whitney U-test = Wilcoxon Rank-Sum Test (SAS) 6. Modern nonparametric method Bootstrap Resampling method 7. Transformation to Normal distribution Box-Cox transformation e.g. X & Y are not normal, but ln(X) & ln(Y) are normal. Inference on two population variances * Both pop’s are normal, two independent samples i .i.d . Sample 1 : X 1 , X 2 ,  , X n1 ~ N (1 ,  12 ) i.i.d . Sample 2 : Y1 , Y2 ,  , Yn2 ~ N ( 2 ,  22 ) ⇒ (n1  1) S12 ⇒ (n2  1) S 22  12  2 2 ~  n21 1 ~  n22 1  12 (*** Parameter of interest : 2 ) 2 1. Point estimator : ˆ12  S12 ˆ 22  S 22 Def. F-distribution Let W1 ~  k21 , W2 ~  k22 , W1 ,W2 are independent. Then, F  W1 k1 ~ Fk1 ,k2 W2 k 2 2   P( F  Fk ,k 1  P( 2 , ,lower ) 1 1  ) F Fk1 ,k2 , ,lower 1 ~ Fk 2 ,k1 F Fk1 ,k2 , ,lower  1 Fk2 ,k1 , ,upper (n1  1) S12 F  12 (n2  1) S 22  22 (n1  1)  (n2  1) 2. 100(1   )% CI for 1    P( F 2 ,lower  S12 S 22  12  22 ~ Fn1 1,n2 1  12  22 S12 S 22  12  22  F 2 ,upper ) 3  P(  P( 1 F 2  ,lower S12 S 22 F 2  ,upper  12  22 S S 2 1 2 2  1 F  12   22 F 2 ,upper S12 S 22 2 ) ) ,lower 3. Test H0 :  12 1  22  12 Ha : 2  1 2 Test Statistic F0  S12 H 0 ~ Fn1 1,n2 1 S 22 At the significance level  , we reject H 0 if F0 is too large or too small. F0  c1 , F0  c 2 * conventional boundries / thresholds c1  Fn 1,n 1, 1 2 c2  Fn 1,n 1, 1 2 2 ,upper 2 ,lower SAS program for test on 2 pop means 1. paired samples sample 1 10 23 16 18 … 33 sample 2 15 28 21 29 … 58 data paired; input IQ1 IQ2; diff=IQ1-IQ2; datalines; 4 10 15 23 28 … 33 58 ; run; proc univariate data=paired normal; var diff; run; 2. independent Samples sample 1 10 23 16 18 … 33 (group 1) sample 2 15 28 21 29 … 58 (group 2) data indept; input group IQ; datalines; 1 10 1 23 1 16 … 1 33 2 15 2 28 … 2 58 ; proc sort data= indept; 5 by group; run; proc univariate data= indept normal; var IQ; by group; run; /*use by group if we have already sorted by group– otherwise use class*/ /* If both are nomal */ proc ttest data= indept; var IQ; class group; run; /* If at least 1 pop is NOT normal */ proc NPAR1WAY data= indept; var IQ; class group; Wilcoxon Rank-sum test run; Inference on two population means, independent samples: Power and Sample Size calculation – Exact or Large Sample Z-test 1. Based on the maximum error / or the length of the CI. Suppose we are using the exact or the large sample approximate z-test ; n1  n2  n Suppose the maximum error is E with probability (1   ) 6 P(| ( X  Y )  ( 1   2 ) | E )  1   X  Y  ( 1   2 )  12 n1   22 ~ N (0,1) n2 P (  E  X  Y  ( 1   2 )  E )  1   E P(  12 n1   22 E  2 1 n1 n n    22 X  Y  ( 1   2 )  12 n2 n1  Z   22  n2 E  12 n1   22 )  1 n2 2 n2 Z  2  12   22 E ( Z  2 ) 2 ( 12   22 ) E2 100(1   ) % CI for (1   2 ) L  2 E X  Y  Z 2 L  2  Z 2 n  12 n  12 n    22 n  22 n 4( Z  2 ) 2 ( 12   22 ) L2 2. Based on the power of the test H 0 : 1  2  1 H a : 1  2   2  (or , )1 1    power  P(reject H 0 | H a ) 7  P ( Z 0  Z | 1  2   2 )  P( X  Y  1  12 n  P(  12  2  1  12 n n n n Z  Z | 1   2   2 ) n X  Y  2 n Z    22   22   22  Z  n  2  1  12 n  | 1  2   2 )  22 n  Z  n  Z   ( 12   22 ) 2  (1-sided test, exact Z)  2  1 2 ( Z  Z  ) 2 ( 12   22 )   2  1  2 (1-sided test, approximate Z) ( Z 2  Z  ) 2 ( 12   22 )   2  1  2 (2-sided test, exact or approximate Z) Inference on two population means, independent samples: Power and Sample Size Determination – Pooled Variance T-test 1. Sample size calculation in a C.I. scenario (Maximum error) iid X1 , , X n1 ~ N ( 1 ,  2 ) , P.Q: T  iid Y1 , , Yn2 ~ N ( 2 ,  2 ) ( X  Y )  ( 1  2 ) ~ t2 n  2 2 Sp n 100(1-α)% CI for 1  2 is ( X  Y )  t  The length of the CI : L= 2  t  n  8( t2 n  2, 2  S p L 2 n  2,  2 2 n  2,  Sp   Sp 2 2 n 2 n )2 8 2. Inference on the test situation  H 0 : 1  2  0   H a : 1  2    0 Data: Two independent samples iid X1 , iid , X n1 ~ N ( 1 ,  2 ) , Y1 , , Yn2 ~ N ( 2 ,  2 ) Here  12   2 2   2 and n1  n2  n . For a given α (e.g. 0.05 or 0.01) and a power = (1-β) (e.g. 85%), calculate the sample size.  (e.g. Eff =1)  (X Y)  0 ( X  Y ) H0  ~ t2 n  2 T.S : T0 = 1 1 2 Sp  Sp n1 n2 n Def.: Effect size = At α = 0.05, reject H 0 in favor of H a iff T0  t2 n  2, Power = 1-β = P(reject H 0 | H a )= P(T0  t2 n 2, | H a : 1  2    0) = P( = P( (X Y)  t2 n  2, | H a : 1  2  ) 2 Sp n (X Y)     t2 n 2,  | H a : 1  2  ) 2 2 Sp Sp n n = P(T  t2 n2,  Eff *   n ) | H a : 1  2  ) (Effect size=   Sp 2  t2 n2,   t2 n2,  Eff * n 2  n  2( t2 n2,  t2 n2,  Eff )2 9 Example A new method of making concrete blocks has been proposed. To test whether or not the new method increases the compressive strength, 5 sample blocks are made by each method. New Method 14 15 13 15 16 Old Method 13 15 13 12 14 a. Get a 95% for the mean difference of the 2 methods. b. At  = 0.05, can you conclude the new method is better? Provide p-value. Write the SAS program for part (b). Solution a. Assume both populations are normal. First, we check whether  12   22 H 0 :  12   22 H a :  12   22 S12 Test Statistic : F0  2 S2 n Recall S 2   (x i 1 i  x) 2 n 1  F0  1  F4, 4, 0.1,upper  4.11 It is reasonable to assume  12   22 Pooled-variance statistic (PQ) t ( X 1  X 2 )  ( 1   2 ) Sp 1 1  n1 n2 ~ t n1  n2 2 95% CI for (1   2 ) is ( X 1  X 2 )  t n1  n2 2,0.025  S p  1 1  n1 n2 10 Sp  (n1  1) S12  (n2  1) S 22 n1  n2  2 b. Assume both populations are normal. First, we check whether  12   22 By part (a), we found that it is reasonable to assume  12   22 H 0 : 1   2  0 H a : 1   2  0 Test Statistic : T0  X1  X 2  0 Sp 1 1  n1 n2 H0 ~ t n1  n2 2 At  =0.05, we reject H 0 if T0  t n1  n2  2,0.05  t 8,0.05 . But T0 ( 1.66)  t 8,0.05 ( 1.86)  We cannot reject H 0 at  =0.05. SAS data block ; input method $ strength ; datalines ; new 14 new 15 new 13 new 15 new 16 old 13 old 15 old 13 old 12 old 14 ; run ; 11 proc univariate data=block normal plot ; class method ; var strength ; run ; proc ttest data=block ; class method ; var strength ; run ; proc npar1way data=block ; class method ; var strength ; run ; Example An experiment was done to determine the effect on dairy cattle of a diet supplement with liquid whey. While no differences were noted in milk production between the group with a standard diet (hay + grain + water) and the experimental group with whey supplement (hay + grain + whey), a considerable difference was noted in the amount of hay ingested. For a 2-tailed test with  =0.05, determine the approximate number of cattle that should be included in each group if we want   0.1 for 1   2  0.5 . Previews study has shown   0.8 Solution H 0 : 1   2  0 H a : 1   2    0 1.  12   22   2 2. either both populations are normal or both sample size are large. 12 n  2 2 ( Z  2  Z  ) 2 2  (0.8) 2 (1.96  1.28) 2  53.75 (0.5) 2  n  54 Example Do fraternities help or hurt your academic progress at college? To investigate this question, 5 students who joined fraternities in 1998 were randomly selected. It was shown that their GPA before and after they joined the fraternities are as follows. Student 1 2 3 4 5 Before 3 4 3 3 2 After 2 3 3 2 1 Diff. 1 1 0 1 1 Please test the hypothesis at  =0.05 Solution H 0 : d  0 H a : d  0 Assumption : the difference follows a normal distribution. X d  0.8, S d  0.447, n  5,   0.05 Test statistic : T0  X d  0 H0 ~ t n 1 Sd n T0  4.02  t 4,0.025  2.776 We reject H 0 at  =0.05 and conclude fraternities does hurt… SAS data frat ; input before after ; diff = before – after ; datalines ; 13 32 43 33 32 21 ; run ; proc univariate data=frat normal ; var diff ; run ; G*Power: http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/ Dear students: Please install G*Power on your own computer. This is a very effective power calculation tool. For two-sided exact T tests, for example, it will give you the accurate power calculation – better than the approximate hand calculation formula where we dropped a smaller probability in the calculation. You can follow the following website to learn how to use G*Power: 1. One population mean: http://www.ats.ucla.edu/stat/gpower/one_sample.htm 2. Two population means (paired samples): http://www.ats.ucla.edu/stat/gpower/pairedsample.htm 3. Two population means (independent samples): http://www.ats.ucla.edu/stat/gpower/indepsamps.htm You can also find many other helpful sites by typing at Google: g power, t test (or typing the key words: g*power, t test) 14

AMS 572 Lecture Notes

Related documents

Products

Support

AMS 572 Lecture Notes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib