AMS572.01 Final Exam Fall, 2010 Name ___________________________ID ________________Signature____________________ AMS Major? ______ Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please enter “Yes” or “No” for “AMS Major”. Please provide complete solutions for full credit. The exam goes from 2:15 - 4:45pm. Good luck! 1. (for all) The following data set from a study by the well-known chemist and Nobel Laureate Linus Pauling gives the incidence of cold among 279 French skiers who were randomized to the Vitamin C and Placebo groups. Group Cold Yes No Vitamin C 17 122 Placebo 31 109 (a) Construct a 95% confidence interval for the difference between the two incidence rates; (b) Please test whether the incidence rates for the Placebo group is significantly higher than that of the Vitamin C group at the 5% level of significance. Please report the p-value of your test. (c) Please write up the entire SAS program necessary to answer question raised in (b), including the data step. Answer: 17 31 0.122, n1 139 ; Placebo: pˆ 1 0.221, n2 140 ; 17 122 31 109 The 100(1-α)% confidence interval for (p1 - p2) is pˆ 1 1 pˆ 1 pˆ 2 1 pˆ 2 pˆ 1 1 pˆ 1 pˆ 2 1 pˆ 2 pˆ 1 pˆ 2 Z , pˆ 1 pˆ 2 Z n1 n2 n1 n2 2 2 (a) VC: pˆ 1 After plugging in Z0.025 = 1.96 etc., we found the 95% CI to be [-0.187, -0.011] (b) This is problem 9.12 in our text book. (*It is also OK, in fact better, if they used the pooled proportion in the denominator. *It is also OK if they did a 2-sided test.) 1 (c) SAS code: Data cold; Input group $ outcome $ count; Datalines; VC yes 17 VC no 122 Placebo yes 31 Placebo no 109 ; Run; Proc freq data=cold; Tables group*outcome/chisq; Weight count; Run; 2. (for all) People at high risk of sudden cardiac death can be identified using the change in a signal averaged electrocardiogram before and after prescribed activities. The current method is about 80% accurate. The method was modified, hoping to improve its accuracy. The new method is tested on 50 people and gave correct results on 46 patients. (a) Is this convincing evidence that the new method is more accurate? Please test at α =.05. (b) If the new method actually has 90% accuracy, what power does a sample of 50 have to demonstrate that the new method is better at α =.05? (c) How many patients should be tested in order for this power to be at least 0.75? Answer: This is problems 9.7 & 9.8 in our text book. 2 3. (for all) A classic tale involves four car-pooling students who missed a test and gave as an excuse of a flat tire. On the make-up test, the professor asked the students to identify the particular tire that went flat. If they really did not have a flat tire, would they be able to identify the same tire? To mimic this situation, 40 other students were asked to identify the tire they would select. The data are: Tire Left front Right front Left rear Right rear Frequency 11 15 8 6 (a) Is At α=0.05, please test whether each tire has the same chance to be selected. (b) Please write up the entire SAS program necessary to answer question raised in (a), including the data step. Answer. This is a problem from our lecture notes 12. 1 H 0 : p1 p2 p3 p4 (a) 4 H a : H 0 is not true n=40, ei =n pi =10 k W0 i 1 ( xi ei )2 2 4.6 3,0.05, upper 7.81 ei 3 Fail to reject H 0 . (b) DATA TIRE; INPUT location $ NUMBER; DATALINES; LF 11 RF 15 LR 8 RR 6 ; * HYPOTHESIZING A 1:1:1:1 RATIO; PROC FREQ DATA=TIRE ORDER=DATA; WEIGHT NUMBER; TITLE3 'GOODNESS OF FIT ANALYSIS'; TABLES location / CHISQ NOCUM TESTP=(0.25 0.25 0.25 0.25); RUN; 4. (for all) The effect of caffeine levels on performing a simple finger tapping task was investigated in a double blind study. Thirty male college students were trained in finger tapping and randomly assigned to receive three different doses of caffeine (0, 100, or 200 mg) with 10 students per dose group. Two hours following the caffeine treatment, students were asked to finger tap and the numbers of taps per minute were counted. The data are tabulated below. Caffeine Dose 0 mg 242 245 244 100 mg 248 246 245 200 mg 246 248 250 Finger Taps per Minute 248 247 248 242 244 246 242 247 248 250 247 246 243 244 252 248 250 246 248 245 250 (a) Construct an ANOVA table and test if there are significant differences in finger tapping between the groups at α =.05. (b) Compare the finger tapping speed between the 0 mg and the 200 mg groups at α =.05. List assumptions necessary – and, please perform tests for the assumptions that you can test in an exam setting. (c) Please write up the entire SAS program necessary to answer question raised in (a), including the data step. (d) Please write up the entire SAS program necessary to answer question raised in (b), including the data step, and the tests for all assumptions necessary. Answer: (a) This is Problem 12.2(b) in our text book, one-way ANOVA. We are testing whether the mean tapping speed in the three groups are equal or not. That is: H 0 : 1 2 3 versus H a : The above is not true (b) This is inference on two population means, independent samples. The first assumption is that both populations are normal. The second is the equal variance assumption which we can test in the exam setting as the follows. Group 1 (dose 0 mg): X 1 244.8 , s12 5.73 , n1 10 Group 2 (dose 200 mg): X 2 248.3 , s22 4.9 , n2 10 4 Under the normality assumption, we first test if the two population variances are equal. That is, H 0 : 12 22 versus H a : 12 22 . The test statistic is F0 s12 5.73 1.17 , F9,9,0.05,U 3.18 . s22 4.9 Since F0 < 3.18, we cannot reject H0 . Therefore it is reasonable to assume that 12 22 . Next we perform the pooled-variance t-test with hypotheses H 0 : 1 2 0 versus H a : 1 2 0 t0 X 1 X 2 0 244.8 248.3 0 3.39 1 1 1 1 sp 5.315 n n2 10 10 Since t0 3.39 is smaller than t18,0.025 2.10092 , we reject H0 and claim that the finger tapping speed are significantly different between the two groups at the significance level of 0.05. (c) data finger; input group taps @@; datalines; 0 242 0 245 0 244 0 248 0 247 0 248 0 242 0 244 0 246 0 242 1 248 1 246 1 245 1 247 1 248 1 250 1 247 1 246 1 243 1 244 2 246 2 248 2 250 2 252 2 248 2 250 2 246 2 248 2 245 2 250 ; run; proc anova data = finger; class group; model taps = group; means group/tukey; run; /*the means step is not necessary for the given problem.*/ (d) data finger2; set finger; where group ne 1; run; proc univariate data = finger2 normal; class group; var taps; run; proc ttest data = finger2; class group; var taps; run; proc npar1way data = finger2; class group; var taps; run; /* the data step from part (d) follows immediately after that from part (c).*/ /* alternatively, one can save the data finger as a permanent sas data, and then you can use that later*/ 5 5A. (for AMS majors) Suppose we have two independent random samples from two normal populations: X 1 , X 2 , , X n1 ~ N 1 , 2 , and Y1 , Y2 , , Yn2 ~ N 2 , 2 . (a) At the significance level α, please construct a test using the pivotal quantity approach to test whether 1 22 or not. (*Please include the derivation of the pivotal quantity, the proof of its distribution, and the derivation of the rejection region for full credit.) (b) At the significance level α, please derive the likelihood ratio test for testing whether 1 22 or not. Subsequently, please show whether this test is equivalent to the one derived in part (a). Answer: (a) Here is a simple outline of the derivation of the test: H 0 : 1 2 2 0 versus H a : 1 2 2 0 using the pivotal quantity approach. X 2Y . Its distribution is N 2 , 1 / n 4 / n using the mgf for N , which is M t exp t t / 2 , and the independence X 2Y 2 ~ N 0,1 . Unfortunately, Z can not properties of the random samples. From this we have Z [1]. We start with the point estimator for the parameter of interest 1 2 2 : 2 1 2 2 1 2 2 2 1 2 1 / n1 4 / n2 serve as the pivotal quantity because σ is unknown. [2]. We next look for a way to get rid of the unknown σ following a similar approach in the construction of the pooled2 variance t-statistic. We found that W n1 1S12 n2 1S 22 / 2 ~ n21 n2 2 using the mgf for k which is 1 M t 2t k/2 , and the independence properties of the random samples. [3]. Then we found, from the theorem of sampling from the normal population, and the independence properties of the random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our pivotal quantity: X 2Y T 1 2 2 S p 1 / n1 4 / n2 ~ t n1 n2 2 , where S 2 p n1 1S12 n2 1S 22 n1 n2 2 variance. [4]. The rejection region is derived from P T0 c | H 0 , where T0 is the pooled sample X 2Y 0 S p 1 / n1 4 / n2 H0 ~ t n1 n2 2 . Thus c t n1 n2 2, / 2 . Therefore at the significance level of α, we reject H 0 in favor of H a iff T0 t n1 n2 2, / 2 (b) Given that we have two independent random samples from two normal populations with equal but unknown variances. Now we derive the likelihood ratio test for: H0 : μ1 = 2μ2 vs Ha : μ1 ≠ 2μ2 Let μ2 = μ, then, ={−∞ < μ1 = 2μ, μ2 = μ < +∞, 0 ≤ σ2 < +∞}, Ω = {−∞ < μ1 , μ2 < +∞, 0 < σ2 < +∞} 1 n1 +n2 2 L(ω) = L(μ, σ2 ) = (2πσ2 ) lnL(ω) = − n1 +n2 2 2 1 n1 2 (xi − 2μ)2 + ∑nj=1 exp[− 2σ2 (∑i=1 (yj − μ) )], and there are two parameters . 1 2 n1 2 (xi − 2μ)2 + ∑nj=1 ln(2πσ2 ) − 2σ2 (∑i=1 (yj − μ) ), for it contains two parameters, we do the partial derivatives with and σ2 respectively and let the partial derivatives equal to 0. Then we have: 2n1 x̅ + n2 y̅ μ̂ = 4n1 + n2 6 2 = σ̂ ω 1 n1 +n2 n1 n2 1 2 [∑ (xi − 2μ̂)2 + ∑ (yj − μ̂) ] n1 + n2 i=1 j=1 2 1 n1 2 (xi − μ1 )2 + ∑nj=1 L(Ω) = L(μ1 , μ2 , σ2 ) = ( 2 ) 2 exp[− 2 (∑i=1 (yj − μ2 ) )], and there are three 2πσ 2σ parameters. n1 n2 n1 + n2 1 2 lnL(Ω) = − ln(2πσ2 ) − 2 (∑ (xi − μ1 )2 + ∑ (yj − μ2 ) ) 2 2σ i=1 j=1 2 We do the partial derivatives with μ1 , μ2 and σ respectively and let them all equal to 0. Then we have: n1 n2 1 2 μ ̂1 = x̅, μ ̂2 = y̅, σ̂2Ω = [∑ (xi − x̅)2 + ∑ (yj − y̅) ] n1 + n2 i=1 j=1 At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we have: n1 +n2 2 1 n1 +n2 ( ̂ ) 2 2 ̂ 2 L(ω ̂) σ 2πσω Ω λ= = ] n1 +n2 = [ ̂ ̂) L(Ω σ2ω 2 1 ( ̂2 ) 2πσΩ 1 ∑ni=1 (xi = 1 ∑ni=1 (xi − 2 [ − x̅)2 + 2 ∑nj=1 (yj − y̅) n1 +n2 2 2 2n1 x̅ + n2 y̅ 2 2n1 x̅ + n2 y̅ 2 2 ∑nj=1 ) + (y − j 4n1 + n2 4n1 + n2 ) ] n1 +n2 t 20 = [1 + ]− 2 n1 + n2 − 2 where t 0 is the test statistic in the pooled variance t-test. Therefore, λ ≤ λ∗ is equivalent to |t 0 |≥ c. Thus at the significance level α, we reject the null hypothesis in favor of the alternative when |t 0 | ≥ c = t n1 +n2 −2,α/2. This shows that the pivotal quantity approach and the likelihood ratio test approach are equivalent in this case. iid 5B. (for non AMS majors) We have two independent samples X1 , Y1 , , X n1 ~ N ( 1 , 12 ) and iid H 0 : 1 2 0 , Yn2 ~ N ( 2 , 2 2 ) , where 12 2 2 2 and n1 n1 n . For the hypothesis of H a : 1 2 0 (a) Please derive the general formula for power calculation for the pooled variance t-test based on an effect size of EFF at the significance level of α. Recall - Definition: Effect size = EFF =| | (e.g. Eff=1) (b) With a sample size of 20 per group, α = 0.05, and an estimated effect size ranging from 0.8 to 1.2, please calculate the power of your pooled variance t-test. Answer: (a) T.S : T0 = (X Y) 0 ( X Y ) H0 ~ t2 n 2 1 1 2 Sp Sp n1 n2 n At α=0.05, reject H 0 in favor of H a iff T0 t2 n 2, 7 Power = 1-β = P(reject H 0 | H a ) = P(T0 t2 n 2, | H a : 1 2 0) = P( = P( (X Y) t2 n 2, | H a : 1 2 ) 2 Sp n (X Y) t2 n 2, | H a : 1 2 ) 2 2 Sp Sp n n ≈ P(T t2 n2, Eff * n ) | H a : 1 2 ) (Effect size = Sp 2 (b) With n = 20, α = 0.05, Eff = 0.8 to 1.2, the power is calculated as follows: 20 | H a : 1 2 2 Power (Eff = 0.8) = P T t 38, 0.05 0.8 * PT 1.686 2.530 PT 0.844 0.80 20 | H a : 1 2 2 Power (Eff = 1.2) = P T t 38, 0.05 1.2 * PT 1.686 3.795 PT 2.109 0.98 Note: the T statistic above follows a t-distribution with 38 (=20+20-2) degrees of freedom. Therefore we conclude that the power will range from 80% to 98% for a given effect size of 0.8 to 1.2. 6. (extra credit for all students) Suppose we have two independent random samples from two normal populations i.e., Y11 , Y12 ,, Y1,n1 ~ N 1 , 12 , and Y21 , Y22 ,, Y2,n2 ~ N 2 , 22 . Furthermore, suppose 12 22 , although their values are unknown. Please prove whether the one-way ANOVA F-test is equivalent to the pooled variance t-test (2-sided) or not. Answer: 8 That’s all, class; I wish you a very happy holiday season and winter vacation! 9