AMS572.01 Practice Midterm Exam Fall, 2013 Name _______________________________ID _________________________Signature_________________________ Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide complete solutions for full credit. ***For the real midterm, we will have 3 problems. Here I provided more problems so you can see more types of problems.*** 1. The effect of caffeine levels on performing a simple finger tapping task was investigated in a double blind study. Twenty male college students were trained in finger tapping and randomly assigned to receive two different doses of caffeine (0 or 100 mg) with 10 students per dose group. Two hours following the caffeine treatment, students were asked to finger tap and the numbers of taps per minute were counted. The data are tabulated below. Caffeine Dose Finger Taps per Minute 0 mg 242 245 244 248 247 248 242 244 246 242 100 mg 248 246 245 247 248 250 247 246 243 244 (a) Compare the finger tapping speed between the two groups at α =.05. List assumptions necessary – and, please perform tests for the assumptions that you can test in an exam setting. (b) Please write up the entire SAS program necessary to answer question raised in (a), including the data step, and the tests for all assumptions necessary. Answer: (a) This is inference on two population means, independent samples. The first assumption is that both populations are normal. The second is the equal variance assumption which we can test in the exam setting as the follows. Group 1 (dose 0 mg): X 1 244.8 , s12 5.73 , n1 10 Group 2 (dose 100 mg): X 2 246.4 , s22 4.27 , n2 10 Under the normality assumption, we first test if the two population variances are equal. That is, H 0 : 12 22 versus H a : 12 22 . The test statistic is F0 s12 5.73 1.34 , F9,9,0.05,U 3.18 . s22 4.27 Since F0 < 3.18, we cannot reject H0 . Therefore it is reasonable to assume that 12 22 . Next we perform the pooled-variance t-test with hypotheses H 0 : 1 2 0 versus H a : 1 2 0 t0 X 1 X 2 0 244.8 246.4 0 1.6 1 1 1 1 sp 5 n n2 10 10 Since t0 1.6 is NOT smaller than t18,0.025 2.10092 , we can NOT reject H0 and thus, we conclude that the finger tapping speed are NOT significantly different between the two groups at the significance level of 0.05. data finger; input group taps @@; datalines; 0 242 0 245 0 244 0 248 0 247 0 248 0 242 0 244 0 246 0 242 1 248 1 246 1 245 1 247 1 248 1 250 1 247 1 246 1 243 1 244 1 ; run; proc univariate data = finger normal; class group; var taps; run; proc ttest data = finger; class group; var taps; run; proc npar1way data = finger; class group; var taps; run; 2. Suppose we have two independent random samples X 1 , X 2 , , X n1 ~ N 1 , 12 , and Y1 , Y2 , , Yn2 ~ N 2 , 22 . from two normal populations: 2 2 2 2 (a) At the significance level α, please construct a test of the hypothesis H 0 : 1 3 2 0 versus H a : 1 3 2 0 . (b) Suppose we have confirmed that 12 3 22 0 . At the significance level α, please construct a test to test H 0 : 31 22 4 0 versus H a : 31 22 4 0 using the pivotal quantity method. Please include the derivation of the pivotal quantity, the proof of its distribution, and the derivation f the rejection region for full credit. Answer: Recall we had a more general setting of this problem, see below. All we need to do is to plug in a = 1, b = 3 for part (a), and c = 3, d = 4, e =2 for part (b). General setting of the problem: Suppose we have two independent random samples from two normal populations i.e., X1, X 2 , , X n1 ~ N 1 , 12 , and Y1 , Y2 , , Yn2 ~ N 2 , 22 . (a). At the significance level α, please construct a test of the hypothesis Ho: a 1 b 2 vs. H1: a 12 b 22 . Here a, b are known constants. 2 2 (b). Suppose we have confirmed that a 1 b 2 . At the significance level α, please construct a test to test whether 2 2 c1 d e2 or not using the pivotal quantity method. Here c, d , e are known constants. Please include the derivation of the pivotal quantity, the proof of its distribution, and the derivation of the rejection region for full credit. SOLUTION: This is inference on two normal population means, independent samples. 2 2 2 2 (a) This is the usual F-test on two normal population variances: H 0 : 1 / 2 b / a versus H a : 1 / 2 b / a The test statistic is: F0 S12 / S 22 S12 / S 22 H 0 ~ Fn1 1,n2 1 2 2 1,0 / 2,0 b/a At the significance level α, we will reject H0 if F0 is smaller than Fn1 1,n2 1, / 2, L or F0 is greater than Fn1 1,n2 1, / 2,U 2 b 2 . Here is a simple outline of the derivation of the test: a H 0 : c1 d e2 versus H a : c1 d e2 , which are equivalent to: H 0 : c1 e2 d versus H a : c1 e2 d 2 2 2 2 2 (b) Given that a 1 b 2 , we set 2 and thus 1 (1) We start with the point estimator for the parameter of interest c1 e2 : cX eY . Its distribution is N c1 e2 , 2 c 2b / an1 e2 / n2 using the mgf for N , 2 which is M t exp t 2 t 2 / 2 , and the independence properties of the random samples. From this we have Z cX eY c 1 e 2 c 2b / an1 e2 / n2 ~ N 0,1 . Unfortunately, Z can not serve as the pivotal quantity because σ is unknown. (2) We next look for a way to get rid of the unknown σ following a similar approach in the construction of the pooled- a 2 2 2 2 2 variance t-statistic. We found that W n1 1 S1 n2 1 S 2 / ~ n1 n2 2 using the mgf for k which b 1 1 2t is M t k/2 , and the independence properties of the random samples. (3) Then we found, from the theorem of sampling from the normal population, and the independence properties of the random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have cX eY c obtained our pivotal quantity: T 1 e 2 a n1 1 S12 n2 1 S22 b * c 2b / an1 e 2 / n2 n1 n2 2 ~ tn1 n2 2 . (4) The rejection region is derived from P T0 c | H 0 , where T0 cX eY d H0 a n1 1 S12 n2 1 S22 b * c 2b / an1 e 2 / n2 n1 n2 2 ~ tn1 n2 2 . Thus c t n1 n2 2, / 2 . Therefore at the significance level of α, we reject H 0 in favor of H a iff T0 t n1 n2 2, / 2 iid 3. We have two independent samples X1 , , X n1 ~ N ( 1 , 12 ) and Y1 , iid , Yn2 ~ N (2 , 22 ) , where H : 2 0 12 2 2 2 and n1 2n2 . For the hypothesis of 0 1 H a : 1 2 0 (a) Please derive the general formula for power calculation for the pooled variance t-test based on an effect size of EFF at the significance level of α. Recall - Definition: Effect size = EFF =| | (e.g. Eff=1) (b) With a sample size of 40 in group 1, and 20 in group 2, α = 0.05, and an estimated effect size ranging from 0.8 to 1.2, please calculate the power of your pooled variance t-test. 3 Answer: (a) Let n2 n, thus n1 2n2 2n T.S : T0 = (X Y) 0 ( X Y ) H0 ~ t3 n 2 1 1 3 Sp Sp n1 n2 2n At α=0.05, reject H 0 in favor of H a iff T0 t3n2, Power = 1-β = P(reject H 0 | H a ) = P(T0 t3n2, | H a : 1 2 0) = P( = P( (X Y) t3n2, | H a : 1 2 ) 3 Sp 2n (X Y) t3n2, | H a : 1 2 ) 3 3 Sp Sp 2n 2n ≈ P(T t3n2, Eff * 2n ) | H a : 1 2 ) (Effect size = Sp 3 (b) With n = 20, α = 0.05, Eff = 0.8 to 1.2, the power is calculated as follows: Power (Eff = 0.8) = P T t58,0.05 0.8* 40 | H a : 1 2 3 P T 1.67 2.92 P T 1.25 0.8918 Power (Eff = 1.2) = P T t58,0.05 1.2* 40 | H a : 1 2 3 P T 1.67 4.38 P T 2.71 0.9956 Note: the T statistic above follows a t-distribution with 58 (=40+20-2) degrees of freedom. Therefore we conclude that the power will range from 89.18% to 99.56% for the given effect size of 0.8 to 1.2. Note: In the exam situation, you have no access to R and thus you can simply provide a rough estimate of the power based on your T-table. For the given problem, the degree of freedom is larger than what is given in the Ttable, and thus we use the Z-table to approximate. The power is thereby estimated to be from 89.44% to 99.66% for the given effect size of 0.8 to 1.2. 4. How to become an art sleuth? Like all creative artists, composers of music develop certain personal characteristics in their works. One such characteristic is the number of melody notes in each bar of music. Now suppose you buy an old unsigned manuscript of a waltz which you suspect is an unknown work by Johann Strauss, and if so, very valuable. You count the number of melody notes per bar of several genuine Strauss waltzes and compare frequency distribution with a similar count of the unknown work. Would the following results support your high hopes? Use α = 0.05. 4 No. of melody notes per bar Strauss waltzes Unknown waltz 0 5 6 1 32 60 2 133 62 3 114 96 4 67 33 ≥6 15 18 5 22 7 Total 388 282 SOLUTION: This is inference on several population proportions following a multinomial distribution. If the unknown work was from Johann Strauss, then we will expect the following frequency distribution of melody notes per bar: No. of melody notes per bar Expected relative frequency ( pi0 ) Expected frequency (count) ( Ei ) Observed frequency ( Oi ) 0 1 2 3 4 5 ≥6 5/388 32/388 133/388 114/388 67/388 22/388 15/388 282*5/388 ≈ 3.63 282*32/388 282*133/388 282*114/388 282*67/388 282*22/388 282*15/388 ≈ 23.26 ≈ 96.66 ≈ 82.86 ≈ 48.70 ≈ 15.99 ≈ 10.90 6 60 62 96 33 The large sample chi-square test can be applied to test: H 0 : pi pi0 , i 1, The chi-square test statistic is: 7 2 0 Oi Ei i 1 Ei 2 6 3.63 3.63 2 60 23.26 23.26 2 18 10.90 10.90 7 18 , 7 versus H a : H 0 is not true. 2 88.83 Since 02 88.83 6,2 0.05,upper 12.59 , we reject the null hypothesis at the significance level of α = 0.05 and conclude that it is not likely that the unknown waltz was written by Strauss. 5. The following data set from a study by the well-known chemist and Nobel Laureate Linus Pauling gives the incidence of cold among 279 French skiers who were randomized to the Vitamin C and Placebo groups. Group Cold Yes No Vitamin C 17 122 Placebo 31 109 (a) Construct a 95% confidence interval for the difference between the two incidence rates; (b) Please test whether the incidence rates for the Placebo group is significantly higher than that of the Vitamin C group at the 5% level of significance. Please report the p-value of your test. (c) Please write up the entire SAS program necessary to answer question raised in (b), including the data step. Answer: 17 31 0.122, n1 139 ; Placebo: pˆ 1 0.221, n2 140 ; 17 122 31 109 The 100(1-α)% confidence interval for (p1 - p2) is pˆ 1 1 pˆ 1 pˆ 2 1 pˆ 2 pˆ 1 1 pˆ 1 pˆ 2 1 pˆ 2 pˆ 1 pˆ 2 Z , pˆ 1 pˆ 2 Z n1 n2 n1 n2 2 2 (a) VC: pˆ 1 After plugging in Z0.025 = 1.96 etc., we found the 95% CI to be [-0.187, -0.011] 5 (b) This is problem 9.12 in our text book. (*It is also OK, in fact better, if we use the pooled proportion in the denominator.) The hypotheses are H 0 : p1 p 2 vs H1 : p1 p2 For the vitamin C group, the proportion catching cold is pˆ 1 17 139 0.122 . For the placebo group, the proportion catching cold is pˆ 2 31 140 0.221. Then the test statistic is pˆ 1 pˆ 2 z pˆ 1 qˆ1 pˆ 2 qˆ 2 n1 n2 0.122 0..221 (0.122)(0.878) (0.221)(0.779) 139 140 2.212 The P-value is P 1 ( 2.212 0.0136 Since P 0.05 , reject H 0 and conclude that taking vitamin C reduces the incidence rate of colds compared to a placebo. (c) SAS code: Data cold; Input group $ outcome $ count; Datalines; VC yes 17 VC no 122 Placebo yes 31 Placebo no 109 ; Run; Proc freq data=cold; Tables group*outcome/chisq; Weight count; Run; 6. In a study of hypnotic suggestion, 10 male volunteers were randomly allocated to an experimental group and a control group. Each subject participated in a two-phase experimental session. In the first phase, respiration was measured while the subject was awake and at rest. In the second phase, the subject was told to imagine that he was performing muscular work, and respiration was measured again. For subjects in the experimental group, hypnosis was induced between the first and second phases; thus, the suggestion to imagine muscular work was “hypnotic suggestion” for experimental subjects and “waking suggestion” for control subjects. The accompanying table shows the measurements of total ventilation (liters of air per minute per square meter of body area) for all 10 subjects. Subject 1 2 Experimental Group Rest Work 6 6 7 9 Subject 6 7 Control Group Rest Work 6 5 5 5 6 3 4 5 5 7 6 8 12 7 8 9 10 5 6 5 5 6 4 (a) Use suitable tests to investigate (Use α =.05 for each test. Please report the p-value for each test and state the assumption(s) of the test.) (i) the response of the experimental group to suggestion; (ii) the response of the control group to suggestion; (iii) the differences between the responses of the experimental and control groups. (b) Please write up the entire SAS program necessary to answer questions raised in (a). Please include the data step as well as tests for testing for various assumptions. Answer: (a) Response = Work - Rest (i) Inference on one population mean. Small sample. x1 2.2, s1 1.9, n1 5 H 0 : 1 0 vs H a : 1 0 t0 x1 0 2.2 0 2.56 s1 / n1 1.9 / 5 Since t0 2.56 t4,0.05 2.132 we reject H0 at the significance level 0.05. Since t 4,0.025 2.776 t0 2.56 t4,0.05 2.132 we can infer that 0.025 p value 0.05 . The assumption is that the response from the experimental group is normally distributed. Note: if the normality assumption is not true, we will perform the nonparametric test – either the sign test or the signed-rank test. (ii) Inference on one population mean. Small sample. x 2 0.4, s2 0.55, n2 5 H 0 : 2 0 vs H a : 2 0 t0 x2 0 0.4 0 1.63 s2 / n2 0.55 / 5 Since t0 1.63 t4,0.05 2.132 we can not reject H0 at the significance level 0.05. Since t4,0.05 2.132 1.63 t4,0.1 1.533 we can infer that 0.9 1 0.1 p value 1 0.05 0.95 . The assumption is that the response from the control group is normally distributed. Note: if the normality assumption is not true, we will perform the nonparametric test – either the sign test or the Wilcoxon signed-rank test. (iii) Inference on two population means. Two small, independent samples. Sample 1: responses from the experimental group. Sample 2: responses from the control group. X1 2.2, X 2 0.4, n1 n2 5, s1 1.9, s2 0.55 7 Under the normality assumption, we first test if the two population variances are equal H 0 : 12 22 vs H a : 12 22 . Test statistic s2 F0 12 12.33 , F4,4,0.025 9.60 and F4,4,.0975 1/ F4,4,.025 1/ 9.6 0.104 . s2 Since F0 is larger than 9.60, we reject H0 . Therefore it is not reasonable to assume that 12 22 . If both populations are normal, we can test the equality of the two populations means using the unequalvariance t-test. If at least one population is not normal, we will perform the nonparametric test – Wilcoxon rank sum test (also referred to as the Mann-Whitney U test). Here assuming both populations are normal, we will perform the un-equal variance t-test to check whether the responses from the two groups are different or not. We will use the simple (and less accurate) formula for calculating the degrees of freedom calculation: d.f. = min( n1 1, n2 1 ) H 0 : 1 2 0 , H a : 1 2 0 T.S : T0 ( X1 X 2 ) 0 H0 ~ t4 s12 s22 n1 n2 At α=0.05, reject H 0 in favor of H a iff T0 t4,0.025 2.776 Here t0 ( X1 X 2 ) 0 2 1 2 2 s s n1 n2 2.2 (0.4) 1.92 0.552 5 5 2.939 Since 2.939 > 2.776, we conclude that the responses from the two groups are different at the significance level of 0.05. (b) /*Problem #1*/ data one; input ID group rest work; diff=work-rest; datalines; 1 1 6 6 2 1 7 9 3 1 5 8 4 1 7 12 5 1 6 7 6 2 6 5 7 2 5 5 8 2 5 5 9 2 6 6 10 2 5 4 ; run; proc univariate data=one normal; class group; var diff; title 'Check for normality and test for one population mean, Q1'; run; proc ttest data=one; 8 class group; var diff; title 'Independent samples t-test, Q1'; run; proc npar1way data=one wilcoxon; class group; var diff; title 'Nonparametric test for two-mean comparisons, Q1'; run; 7. Let Xi, i = 1, …, n, denote the outcome of a series of n independent trials, where Xi = 1 with probability p, n and Xi = 0 with probability (1- p). Let X X i . i 1 (a). Please derive the 100(1-α)% large sample confidence interval for p using the pivotal quantity method. (b). At the significance level α, please derive the large sample test for H0: p = p0 versus Ha: p ≠ p0, using the pivotal quantity method. (* Please include the derivation of the pivotal quantity, the proof of its distribution, and the derivation of the rejection region for full credit.) Solution: (a). The population distribution is Bernoulli (p), i.e. Xi ~ Bernoulli(p). Therefore the population mean is p and the population variance is p(1-p). When the sample size n is large, by the central limit theorem, we know that the sample mean follows approximately the normal distribution with its mean being the population mean and its n variance being the population variance divided by n as follows: pˆ Thus it is easily shown that Z X i 1 n i X p1 p ~ N p, . n n pˆ p ~ N 0,1 is a pivotal quantity for the inference on p. p1 p n We can use this pivotal quantity to construct the large sample confidence interval for p. Alternatively, we can pˆ p ~ N 0,1 to construct the large sample confidence also use the following pivotal quantity Z * pˆ 1 pˆ n interval as follows. * 1 P Z Z Z 1 P Z 2 2 2 pˆ p Z pˆ 1 pˆ 2 n pˆ 1 pˆ pˆ 1 pˆ 1 P pˆ Z p pˆ Z n n 2 2 Therefore the 100(1-α)% large sample confidence interval for p is: pˆ 1 pˆ pˆ 1 pˆ pˆ Z , pˆ Z n n 2 2 9 pˆ p ~ N 0,1 is a pivotal quantity for the inference on p1 p n p. For a 2-sided test of H0: p = p0 versus Ha: p ≠ p0, the test statistic is the pivotal quantity at p = p0, that is, pˆ p0 Z0 . Intuitively, we would reject H0 in favor of Ha if Z 0 c . The problem is how to determine c. p0 1 p0 n By the definition of the significance level, we have Preject _ H 0 | H 0 P Z 0 c | H 0 2PZ 0 c | H 0 (b). From part (a) above, we have shown that Z Thus / 2 PZ 0 c | H 0 and subsequently we have c Z / 2 That is, at the significance level α, we reject H0 in favor of Ha if Z 0 Z / 2 . 8. People at high risk of sudden cardiac death can be identified using the change in a signal averaged electrocardiogram before and after prescribed activities. The current method is about 80% accurate. The method was modified, hoping to improve its accuracy. The new method is tested on 50 people and gave correct results on 46 patients. (a) Is this convincing evidence that the new method is more accurate? Please test at α =.05. (b) If the new method actually has 90% accuracy, what power does a sample of 50 have to demonstrate that the new method is better at α =.05? (c) How many patients should be tested in order for this power to be at least 0.75? Answer: This is problems 9.7 & 9.8 in our text book. 10 11