AMS572.01 Final Exam Fall, 2010

advertisement
AMS572.01
Final Exam
Fall, 2010
Name ___________________________ID ________________Signature____________________ AMS Major? ______
Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please enter “Yes” or
“No” for “AMS Major”. Please provide complete solutions for full credit. The exam goes from 2:15 - 4:45pm. Good luck!
1. (for all) The following data set from a study by the well-known chemist and Nobel Laureate Linus Pauling
gives the incidence of cold among 279 French skiers who were randomized to the Vitamin C and Placebo
groups.
Group
Cold
Yes No
Vitamin C 17 122
Placebo
31 109
(a) Construct a 95% confidence interval for the difference between the two incidence rates;
(b) Please test whether the incidence rates for the Placebo group is significantly higher than that of the Vitamin
C group at the 5% level of significance. Please report the p-value of your test.
(c) Please write up the entire SAS program necessary to answer question raised in (b), including the data step.
Answer:
17
31
 0.122, n1  139 ; Placebo: pˆ 1 
 0.221, n2  140 ;
17  122
31  109
The 100(1-α)% confidence interval for (p1 - p2) is

pˆ 1 1  pˆ 1  pˆ 2 1  pˆ 2 
pˆ 1 1  pˆ 1  pˆ 2 1  pˆ 2  
 pˆ 1  pˆ 2  Z 


, pˆ 1  pˆ 2  Z 



n1
n2
n1
n2
2
2


(a) VC: pˆ 1 
After plugging in Z0.025 = 1.96 etc., we found the 95% CI to be [-0.187, -0.011]
(b) This is problem 9.12 in our text book. (*It is also OK, in fact better, if they used the pooled proportion in the
denominator. *It is also OK if they did a 2-sided test.)
1
(c) SAS code:
Data cold;
Input group $ outcome $ count;
Datalines;
VC yes 17
VC no 122
Placebo yes 31
Placebo no 109
;
Run;
Proc freq data=cold;
Tables group*outcome/chisq;
Weight count;
Run;
2. (for all) People at high risk of sudden cardiac death can be identified using the change in a signal averaged
electrocardiogram before and after prescribed activities. The current method is about 80% accurate. The
method was modified, hoping to improve its accuracy. The new method is tested on 50 people and gave
correct results on 46 patients.
(a) Is this convincing evidence that the new method is more accurate? Please test at α =.05.
(b) If the new method actually has 90% accuracy, what power does a sample of 50 have to demonstrate that the
new method is better at α =.05?
(c) How many patients should be tested in order for this power to be at least 0.75?
Answer: This is problems 9.7 & 9.8 in our text book.
2
3. (for all) A classic tale involves four car-pooling students who missed a test and gave as an excuse of a flat
tire. On the make-up test, the professor asked the students to identify the particular tire that went flat. If they
really did not have a flat tire, would they be able to identify the same tire? To mimic this situation, 40 other
students were asked to identify the tire they would select. The data are:
Tire
Left front Right front Left rear Right rear
Frequency
11
15
8
6
(a) Is At α=0.05, please test whether each tire has the same chance to be selected.
(b) Please write up the entire SAS program necessary to answer question raised in (a), including the data step.
Answer. This is a problem from our lecture notes 12.
1

 H 0 : p1  p2  p3  p4 
(a) 
4
 H a : H 0 is not true
n=40,
ei =n pi =10
k
W0  
i 1
( xi  ei )2
2
 4.6  3,0.05,
upper  7.81
ei
3
 Fail to reject H 0 .
(b)
DATA TIRE;
INPUT location $ NUMBER;
DATALINES;
LF 11
RF 15
LR 8
RR 6
;
* HYPOTHESIZING A 1:1:1:1 RATIO;
PROC FREQ DATA=TIRE ORDER=DATA; WEIGHT NUMBER;
TITLE3 'GOODNESS OF FIT ANALYSIS';
TABLES location / CHISQ NOCUM TESTP=(0.25 0.25 0.25 0.25);
RUN;
4. (for all) The effect of caffeine levels on performing a simple finger tapping task was investigated in a
double blind study. Thirty male college students were trained in finger tapping and randomly assigned to
receive three different doses of caffeine (0, 100, or 200 mg) with 10 students per dose group. Two hours
following the caffeine treatment, students were asked to finger tap and the numbers of taps per minute were
counted. The data are tabulated below.
Caffeine Dose
0 mg
242 245 244
100 mg
248 246 245
200 mg
246 248 250
Finger Taps per Minute
248 247 248 242 244 246 242
247 248 250 247 246 243 244
252 248 250 246 248 245 250
(a) Construct an ANOVA table and test if there are significant differences in finger tapping between the groups
at α =.05.
(b) Compare the finger tapping speed between the 0 mg and the 200 mg groups at α =.05. List assumptions
necessary – and, please perform tests for the assumptions that you can test in an exam setting.
(c) Please write up the entire SAS program necessary to answer question raised in (a), including the data step.
(d) Please write up the entire SAS program necessary to answer question raised in (b), including the data step,
and the tests for all assumptions necessary.
Answer:
(a) This is Problem 12.2(b) in our text book, one-way ANOVA. We are testing whether the mean tapping speed
in the three groups are equal or not. That is: H 0 : 1  2  3 versus H a : The above is not true
(b) This is inference on two population means, independent samples. The first assumption is that both
populations are normal. The second is the equal variance assumption which we can test in the exam setting
as the follows.
Group 1 (dose 0 mg): X 1  244.8 , s12  5.73 , n1  10
Group 2 (dose 200 mg): X 2  248.3 , s22  4.9 , n2  10
4
Under the normality assumption, we first test if the two population variances are equal. That is, H 0 :  12   22 versus
H a :  12   22 . The test statistic is
F0 
s12 5.73

 1.17 , F9,9,0.05,U  3.18 .
s22 4.9
Since F0 < 3.18, we cannot reject H0 . Therefore it is reasonable to assume that  12   22 .
Next we perform the pooled-variance t-test with hypotheses H 0 : 1   2  0 versus H a : 1  2  0
t0 
X 1  X 2  0  244.8  248.3  0

 3.39
1 1
1 1
sp

5.315

n n2
10 10
Since t0  3.39 is smaller than t18,0.025  2.10092 , we reject H0 and claim that the finger tapping speed are
significantly different between the two groups at the significance level of 0.05.
(c)
data finger;
input group taps @@;
datalines;
0 242 0 245 0 244 0 248 0 247 0 248 0 242 0 244 0 246 0 242
1 248 1 246 1 245 1 247 1 248 1 250 1 247 1 246 1 243 1 244
2 246 2 248 2 250 2 252 2 248 2 250 2 246 2 248 2 245 2 250
;
run;
proc anova data = finger;
class group;
model taps = group;
means group/tukey;
run;
/*the means step is not necessary for the given problem.*/
(d)
data finger2;
set finger;
where group ne 1;
run;
proc univariate data = finger2 normal;
class group;
var taps;
run;
proc ttest data = finger2;
class group;
var taps;
run;
proc npar1way data = finger2;
class group;
var taps;
run;
/* the data step from part (d) follows immediately after that from part (c).*/
/* alternatively, one can save the data finger as a permanent sas data, and then you can use that later*/
5
5A. (for AMS majors) Suppose we have two independent random samples from two normal populations:
X 1 , X 2 , , X n1 ~ N  1 ,  2  , and Y1 , Y2 , , Yn2 ~ N  2 ,  2  .
(a) At the significance level α, please construct a test using the pivotal quantity approach to test whether
1  22 or not. (*Please include the derivation of the pivotal quantity, the proof of its distribution, and the
derivation of the rejection region for full credit.)
(b) At the significance level α, please derive the likelihood ratio test for testing whether 1  22 or not.
Subsequently, please show whether this test is equivalent to the one derived in part (a).
Answer:
(a) Here is a simple outline of the derivation of the test: H 0 : 1  2 2  0 versus H a : 1  2 2  0 using the pivotal
quantity approach.
X  2Y  . Its distribution is
N   2 ,  1 / n  4 / n  using the mgf for N  ,   which is M t   exp t   t / 2 , and the independence
X  2Y     2  ~ N 0,1 . Unfortunately, Z can not
properties of the random samples. From this we have Z 
[1]. We start with the point estimator for the parameter of interest 1  2 2  :
2
1
2
2
1
2 2
2
1
2
 1 / n1  4 / n2
serve as the pivotal quantity because σ is unknown.
[2]. We next look for a way to get rid of the unknown σ following a similar approach in the construction of the pooled2
variance t-statistic. We found that W  n1  1S12  n2  1S 22 /  2 ~  n21 n2 2 using the mgf for  k which is

1
M t    
 2t 

k/2
, and the independence properties of the random samples.
[3]. Then we found, from the theorem of sampling from the normal population, and the independence properties of the
random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our
pivotal quantity:
X  2Y   
T
1
 2 2 
S p 1 / n1  4 / n2
~ t n1  n2 2 , where S 
2
p
n1  1S12  n2  1S 22
n1  n2  2
variance.


[4]. The rejection region is derived from P T0  c | H 0   , where T0 
is the pooled sample
X  2Y   0
S p 1 / n1  4 / n2
H0
~ t n1  n2 2 . Thus
c  t n1  n2 2, / 2 . Therefore at the significance level of α, we reject H 0 in favor of H a iff T0  t n1 n2 2, / 2
(b) Given that we have two independent random samples from two normal populations with equal but unknown
variances. Now we derive the likelihood ratio test for:
H0 : μ1 = 2μ2 vs Ha : μ1 ≠ 2μ2
Let μ2 = μ, then,
={−∞ < μ1 = 2μ, μ2 = μ < +∞, 0 ≤ σ2 < +∞}, Ω = {−∞ < μ1 , μ2 < +∞, 0 < σ2 < +∞}
1
n1 +n2
2
L(ω) = L(μ, σ2 ) = (2πσ2 )
lnL(ω) = −
n1 +n2
2
2
1
n1
2
(xi − 2μ)2 + ∑nj=1
exp[− 2σ2 (∑i=1
(yj − μ) )], and there are two parameters .
1
2
n1
2
(xi − 2μ)2 + ∑nj=1
ln(2πσ2 ) − 2σ2 (∑i=1
(yj − μ) ), for it contains two parameters, we do the
partial derivatives with
and σ2 respectively and let the partial derivatives equal to 0. Then we have:
2n1 x̅ + n2 y̅
μ̂ =
4n1 + n2
6
2 =
σ̂
ω
1
n1 +n2
n1
n2
1
2
[∑ (xi − 2μ̂)2 + ∑ (yj − μ̂) ]
n1 + n2
i=1
j=1
2
1
n1
2
(xi − μ1 )2 + ∑nj=1
L(Ω) = L(μ1 , μ2 , σ2 ) = ( 2 ) 2 exp[− 2 (∑i=1
(yj − μ2 ) )], and there are three
2πσ
2σ
parameters.
n1
n2
n1 + n2
1
2
lnL(Ω) = −
ln(2πσ2 ) − 2 (∑ (xi − μ1 )2 + ∑ (yj − μ2 ) )
2
2σ
i=1
j=1
2
We do the partial derivatives with μ1 , μ2 and σ respectively and let them all equal to 0. Then we have:
n1
n2
1
2
μ
̂1 = x̅, μ
̂2 = y̅, σ̂2Ω =
[∑ (xi − x̅)2 + ∑ (yj − y̅) ]
n1 + n2
i=1
j=1
At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we
have:
n1 +n2
2
1
n1 +n2
( ̂
)
2
2
̂
2
L(ω
̂)
σ
2πσω
Ω
λ=
=
]
n1 +n2 = [ ̂
̂)
L(Ω
σ2ω
2
1
( ̂2 )
2πσΩ
1
∑ni=1
(xi
=
1
∑ni=1
(xi − 2
[
− x̅)2 +
2
∑nj=1
(yj
− y̅)
n1 +n2
2
2
2n1 x̅ + n2 y̅ 2
2n1 x̅ + n2 y̅ 2
2
∑nj=1
)
+
(y
−
j
4n1 + n2
4n1 + n2 ) ]
n1 +n2
t 20
= [1 +
]− 2
n1 + n2 − 2
where t 0 is the test statistic in the pooled variance t-test. Therefore, λ ≤ λ∗ is equivalent to |t 0 |≥ c. Thus at the
significance level α, we reject the null hypothesis in favor of the alternative when |t 0 | ≥ c = t n1 +n2 −2,α/2. This
shows that the pivotal quantity approach and the likelihood ratio test approach are equivalent in this case.
iid
5B. (for non AMS majors) We have two independent samples X1 ,
Y1 ,
, X n1 ~ N ( 1 , 12 ) and
iid
 H 0 : 1  2  0
, Yn2 ~ N ( 2 ,  2 2 ) , where  12   2 2   2 and n1  n1  n . For the hypothesis of 
 H a : 1  2    0
(a) Please derive the general formula for power calculation for the pooled variance t-test based on an effect
size of EFF at the significance level of α.

Recall - Definition: Effect size = EFF =| | (e.g. Eff=1)

(b) With a sample size of 20 per group, α = 0.05, and an estimated effect size ranging from 0.8 to 1.2, please
calculate the power of your pooled variance t-test.
Answer:
(a) T.S : T0 =
(X Y)  0
( X  Y ) H0

~ t2 n  2
1 1
2
Sp

Sp
n1 n2
n
At α=0.05, reject H 0 in favor of H a iff T0  t2 n  2,
7
Power = 1-β = P(reject H 0 | H a ) = P(T0  t2 n  2, | H a : 1  2    0)
= P(
= P(
(X Y)
 t2 n  2, | H a : 1  2  )
2
Sp
n
(X Y)  

 t2 n 2, 
| H a : 1  2  )
2
2
Sp
Sp
n
n
≈ P(T  t2 n2,  Eff *

n

)
| H a : 1  2  ) (Effect size = 
 Sp
2
(b) With n = 20, α = 0.05, Eff = 0.8 to 1.2, the power is calculated as follows:


20
| H a : 1   2   
2

Power (Eff = 0.8) = P T  t 38, 0.05  0.8 *

 PT  1.686  2.530  PT  0.844  0.80


20
| H a : 1   2   
2

Power (Eff = 1.2) = P T  t 38, 0.05  1.2 *

 PT  1.686  3.795  PT  2.109  0.98
Note: the T statistic above follows a t-distribution with 38 (=20+20-2) degrees of freedom.
Therefore we conclude that the power will range from 80% to 98% for a given effect size of 0.8 to 1.2.
6.
(extra credit for all students) Suppose we have two independent random samples from two normal
populations i.e., Y11 , Y12 ,, Y1,n1 ~ N 1 , 12 , and Y21 , Y22 ,, Y2,n2 ~ N  2 ,  22 . Furthermore, suppose




 12   22 , although their values are unknown. Please prove whether the one-way ANOVA F-test is
equivalent to the pooled variance t-test (2-sided) or not.
Answer:
8
That’s all, class; I wish you a very happy holiday season and winter vacation!
9
Download