AMS572.01 Practice Midterm Exam Fall, 2013

advertisement
AMS572.01
Practice Midterm Exam
Fall, 2013
Name _______________________________ID _________________________Signature_________________________
Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide
complete solutions for full credit. ***For the real midterm, we will have 3 problems. Here I provided more problems so
you can see more types of problems.***
1. The effect of caffeine levels on performing a simple finger tapping task was investigated in a double blind
study. Twenty male college students were trained in finger tapping and randomly assigned to receive two
different doses of caffeine (0 or 100 mg) with 10 students per dose group. Two hours following the caffeine
treatment, students were asked to finger tap and the numbers of taps per minute were counted. The data are
tabulated below.
Caffeine Dose
Finger Taps per Minute
0 mg
242 245 244 248 247 248 242 244 246 242
100 mg
248 246 245 247 248 250 247 246 243 244
(a) Compare the finger tapping speed between the two groups at α =.05. List assumptions necessary – and,
please perform tests for the assumptions that you can test in an exam setting.
(b) Please write up the entire SAS program necessary to answer question raised in (a), including the data step,
and the tests for all assumptions necessary.
Answer:
(a) This is inference on two population means, independent samples. The first assumption is that both
populations are normal. The second is the equal variance assumption which we can test in the exam setting
as the follows.
Group 1 (dose 0 mg): X 1  244.8 , s12  5.73 , n1  10
Group 2 (dose 100 mg): X 2  246.4 , s22  4.27 , n2  10
Under the normality assumption, we first test if the two population variances are equal. That is, H 0 :  12   22 versus
H a :  12   22 . The test statistic is
F0 
s12 5.73

 1.34 , F9,9,0.05,U  3.18 .
s22 4.27
Since F0 < 3.18, we cannot reject H0 . Therefore it is reasonable to assume that  12   22 .
Next we perform the pooled-variance t-test with hypotheses H 0 : 1   2  0 versus H a : 1  2  0
t0 
X 1  X 2  0  244.8  246.4   0

 1.6
1 1
1 1
sp

5

n n2
10 10
Since t0  1.6 is NOT smaller than t18,0.025  2.10092 , we can NOT reject H0 and thus, we conclude that the
finger tapping speed are NOT significantly different between the two groups at the significance level of 0.05.
data finger;
input group taps @@;
datalines;
0 242 0 245 0 244 0 248 0 247 0 248 0 242 0 244 0 246 0 242
1 248 1 246 1 245 1 247 1 248 1 250 1 247 1 246 1 243 1 244
1
;
run;
proc univariate data = finger normal;
class group;
var taps;
run;
proc ttest data = finger;
class group;
var taps;
run;
proc npar1way data = finger;
class group;
var taps;
run;
2. Suppose we have two independent random samples
X 1 , X 2 , , X n1 ~ N  1 ,  12  , and Y1 , Y2 , , Yn2 ~ N  2 ,  22  .
from
two
normal
populations:
2
2
2
2
(a) At the significance level α, please construct a test of the hypothesis H 0 :  1  3 2  0 versus H a :  1  3 2  0 .
(b) Suppose we have confirmed that  12  3 22  0 . At the significance level α, please construct a test to test
H 0 : 31  22  4  0 versus H a : 31  22  4  0 using the pivotal quantity method. Please include the
derivation of the pivotal quantity, the proof of its distribution, and the derivation f the rejection region for full credit.
Answer: Recall we had a more general setting of this problem, see below.
All we need to do is to plug in a = 1, b = 3 for part (a), and c = 3, d = 4, e =2 for part (b).
General setting of the problem: Suppose we have two independent random samples from two normal populations i.e.,
X1, X 2 ,
, X n1 ~ N  1 ,  12  , and Y1 , Y2 ,
, Yn2 ~ N  2 ,  22  .
(a). At the significance level α, please construct a test of the hypothesis Ho: a 1  b 2 vs. H1: a 12  b 22 . Here a, b
are known constants.
2
2
(b). Suppose we have confirmed that a 1  b 2 . At the significance level α, please construct a test to test whether
2
2
c1  d  e2 or not using the pivotal quantity method. Here c, d , e are known constants. Please include the derivation
of the pivotal quantity, the proof of its distribution, and the derivation of the rejection region for full credit.
SOLUTION:
This is inference on two normal population means, independent samples.
2
2
2
2
(a) This is the usual F-test on two normal population variances: H 0 :  1 /  2  b / a versus H a :  1 /  2  b / a
The test statistic is: F0 
S12 / S 22
S12 / S 22 H 0

~ Fn1 1,n2 1
2
2
 1,0
/  2,0
b/a
At the significance level α, we will reject H0 if F0 is smaller than Fn1 1,n2 1, / 2, L or F0 is greater than Fn1 1,n2 1, / 2,U
2
b 2
 . Here is a simple outline of the derivation of the test:
a
H 0 : c1  d  e2 versus H a : c1  d  e2 , which are equivalent to: H 0 : c1  e2  d versus
H a : c1  e2  d
2
2
2
2
2
(b) Given that a 1  b 2 , we set  2   and thus  1 


(1) We start with the point estimator for the parameter of interest  c1  e2  : cX  eY . Its distribution is
N c1  e2 ,  2  c 2b /  an1   e2 / n2  using the mgf for N  ,  2  which is M t   exp t   2 t 2 / 2 , and


the independence properties of the random samples. From this we have Z 
 cX  eY    c
1
 e 2 
 c 2b /  an1   e2 / n2
~ N  0,1 .
Unfortunately, Z can not serve as the pivotal quantity because σ is unknown.
(2) We next look for a way to get rid of the unknown σ following a similar approach in the construction of the pooled-
a

2
2
2
2
2
variance t-statistic. We found that W    n1  1 S1   n2  1 S 2  /  ~  n1  n2  2 using the mgf for  k which
b


 1 

 1  2t 
is M t   
k/2
, and the independence properties of the random samples.
(3) Then we found, from the theorem of sampling from the normal population, and the independence properties of the
random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have
 cX  eY    c
obtained our pivotal quantity: T 
1

 e 2 
a
 n1  1 S12   n2  1 S22
b
* c 2b /  an1   e 2 / n2
n1  n2  2
~ tn1  n2  2 .

(4) The rejection region is derived from P T0  c | H 0   , where
T0 
 cX  eY   d
H0
a
 n1  1 S12   n2  1 S22
b
* c 2b /  an1   e 2 / n2
n1  n2  2
~ tn1  n2  2 . Thus c  t n1  n2  2, / 2 . Therefore at the
significance level of α, we reject H 0 in favor of H a iff T0  t n1  n2 2, / 2
iid
3. We have two independent samples X1 ,
, X n1 ~ N ( 1 , 12 ) and Y1 ,
iid
, Yn2 ~ N (2 ,  22 ) , where
 H :   2  0
 12   2 2   2 and n1  2n2 . For the hypothesis of  0 1
 H a : 1  2    0
(a) Please derive the general formula for power calculation for the pooled variance t-test based on an effect
size of EFF at the significance level of α.

Recall - Definition: Effect size = EFF =| | (e.g. Eff=1)

(b) With a sample size of 40 in group 1, and 20 in group 2, α = 0.05, and an estimated effect size ranging from
0.8 to 1.2, please calculate the power of your pooled variance t-test.
3
Answer:
(a) Let n2  n, thus n1  2n2  2n
T.S : T0 =
(X Y)  0
( X  Y ) H0

~ t3 n  2
1 1
3
Sp

Sp
n1 n2
2n
At α=0.05, reject H 0 in favor of H a iff T0  t3n2,
Power = 1-β = P(reject H 0 | H a ) = P(T0  t3n2, | H a : 1  2    0)
= P(
= P(
(X Y)
 t3n2, | H a : 1  2  )
3
Sp
2n
(X Y)  

 t3n2, 
| H a : 1  2  )
3
3
Sp
Sp
2n
2n
≈ P(T  t3n2,  Eff *


2n
)
| H a : 1  2  ) (Effect size = 
 Sp
3
(b) With n = 20, α = 0.05, Eff = 0.8 to 1.2, the power is calculated as follows:

Power (Eff = 0.8) = P  T  t58,0.05  0.8*


40
| H a : 1  2   
3

 P T  1.67  2.92  P T  1.25  0.8918

Power (Eff = 1.2) = P  T  t58,0.05  1.2*


40
| H a : 1  2   
3

 P T  1.67  4.38  P T  2.71  0.9956
Note: the T statistic above follows a t-distribution with 58 (=40+20-2) degrees of freedom.
Therefore we conclude that the power will range from 89.18% to 99.56% for the given effect size of 0.8 to 1.2.
Note: In the exam situation, you have no access to R and thus you can simply provide a rough estimate of the
power based on your T-table. For the given problem, the degree of freedom is larger than what is given in the Ttable, and thus we use the Z-table to approximate. The power is thereby estimated to be from 89.44% to 99.66%
for the given effect size of 0.8 to 1.2.
4. How to become an art sleuth? Like all creative artists, composers of music develop certain personal characteristics in
their works. One such characteristic is the number of melody notes in each bar of music. Now suppose you buy an old
unsigned manuscript of a waltz which you suspect is an unknown work by Johann Strauss, and if so, very valuable.
You count the number of melody notes per bar of several genuine Strauss waltzes and compare frequency distribution
with a similar count of the unknown work. Would the following results support your high hopes? Use α = 0.05.
4
No. of melody notes per bar
Strauss waltzes
Unknown waltz
0
5
6
1
32
60
2
133
62
3
114
96
4
67
33
≥6
15
18
5
22
7
Total
388
282
SOLUTION: This is inference on several population proportions following a multinomial distribution. If the unknown
work was from Johann Strauss, then we will expect the following frequency distribution of melody notes per bar:
No. of melody notes
per bar
Expected relative
frequency ( pi0 )
Expected frequency
(count) ( Ei )
Observed frequency
( Oi )
0
1
2
3
4
5
≥6
5/388
32/388
133/388
114/388
67/388
22/388
15/388
282*5/388
≈ 3.63
282*32/388 282*133/388 282*114/388 282*67/388 282*22/388 282*15/388
≈ 23.26
≈ 96.66
≈ 82.86
≈ 48.70
≈ 15.99
≈ 10.90
6
60
62
96
33
The large sample chi-square test can be applied to test: H 0 : pi  pi0 , i  1,
The chi-square test statistic is:
7
 
2
0
 Oi  Ei 
i 1
Ei
2
 6  3.63

3.63
2
 60  23.26 

23.26
2

18  10.90 

10.90
7
18
, 7 versus H a : H 0 is not true.
2
 88.83
Since 02  88.83  6,2  0.05,upper  12.59 , we reject the null hypothesis at the significance level of α = 0.05 and conclude
that it is not likely that the unknown waltz was written by Strauss.
5. The following data set from a study by the well-known chemist and Nobel Laureate Linus Pauling gives the
incidence of cold among 279 French skiers who were randomized to the Vitamin C and Placebo groups.
Group
Cold
Yes No
Vitamin C 17 122
Placebo
31 109
(a) Construct a 95% confidence interval for the difference between the two incidence rates;
(b) Please test whether the incidence rates for the Placebo group is significantly higher than that of the Vitamin
C group at the 5% level of significance. Please report the p-value of your test.
(c) Please write up the entire SAS program necessary to answer question raised in (b), including the data step.
Answer:
17
31
 0.122, n1  139 ; Placebo: pˆ 1 
 0.221, n2  140 ;
17  122
31  109
The 100(1-α)% confidence interval for (p1 - p2) is

pˆ 1 1  pˆ 1  pˆ 2 1  pˆ 2 
pˆ 1 1  pˆ 1  pˆ 2 1  pˆ 2  
 pˆ 1  pˆ 2  Z 


, pˆ 1  pˆ 2  Z 



n1
n2
n1
n2
2
2


(a) VC: pˆ 1 
After plugging in Z0.025 = 1.96 etc., we found the 95% CI to be [-0.187, -0.011]
5
(b) This is problem 9.12 in our text book. (*It is also OK, in fact better, if we use the pooled proportion in the
denominator.)
The hypotheses are
H 0 : p1  p 2 vs H1 : p1  p2
For the vitamin C group, the proportion catching cold is pˆ 1  17 139  0.122 . For the placebo group, the
proportion catching cold is pˆ 2  31 140  0.221. Then the test statistic is
pˆ 1  pˆ 2
z
pˆ 1 qˆ1 pˆ 2 qˆ 2

n1
n2
0.122  0..221

(0.122)(0.878) (0.221)(0.779)

139
140
 2.212
The P-value is
P  1  (  2.212  0.0136
Since P    0.05 , reject H 0 and conclude that taking vitamin C reduces the incidence rate of colds compared
to a placebo.
(c) SAS code:
Data cold;
Input group $ outcome $ count;
Datalines;
VC yes 17
VC no 122
Placebo yes 31
Placebo no 109
;
Run;
Proc freq data=cold;
Tables group*outcome/chisq;
Weight count;
Run;
6. In a study of hypnotic suggestion, 10 male volunteers were randomly allocated to an experimental group
and a control group. Each subject participated in a two-phase experimental session. In the first phase,
respiration was measured while the subject was awake and at rest. In the second phase, the subject was told
to imagine that he was performing muscular work, and respiration was measured again. For subjects in the
experimental group, hypnosis was induced between the first and second phases; thus, the suggestion to
imagine muscular work was “hypnotic suggestion” for experimental subjects and “waking suggestion” for
control subjects. The accompanying table shows the measurements of total ventilation (liters of air per
minute per square meter of body area) for all 10 subjects.
Subject
1
2
Experimental Group
Rest
Work
6
6
7
9
Subject
6
7
Control Group
Rest
Work
6
5
5
5
6
3
4
5
5
7
6
8
12
7
8
9
10
5
6
5
5
6
4
(a) Use suitable tests to investigate (Use α =.05 for each test. Please report the p-value for each test and state the
assumption(s) of the test.)
(i)
the response of the experimental group to suggestion;
(ii)
the response of the control group to suggestion;
(iii)
the differences between the responses of the experimental and control groups.
(b) Please write up the entire SAS program necessary to answer questions raised in (a). Please include the data
step as well as tests for testing for various assumptions.
Answer:
(a) Response = Work - Rest
(i) Inference on one population mean. Small sample.
x1  2.2, s1  1.9, n1  5
H 0 : 1  0 vs H a : 1  0
t0 
x1  0
2.2  0

 2.56
s1 / n1 1.9 / 5
Since t0  2.56  t4,0.05  2.132 we reject H0 at the significance level   0.05.
Since t 4,0.025  2.776  t0  2.56  t4,0.05  2.132 we can infer that 0.025  p  value  0.05 . The assumption
is that the response from the experimental group is normally distributed.
Note: if the normality assumption is not true, we will perform the nonparametric test – either the sign test or the
signed-rank test.
(ii) Inference on one population mean. Small sample.
x 2  0.4, s2  0.55, n2  5
H 0 : 2  0 vs H a :  2  0
t0 
x2  0
 0.4  0

 1.63
s2 / n2 0.55 / 5
Since t0  1.63  t4,0.05  2.132 we can not reject H0 at the significance level   0.05.
Since t4,0.05  2.132  1.63  t4,0.1  1.533 we can infer that 0.9  1  0.1  p  value  1  0.05  0.95 . The
assumption is that the response from the control group is normally distributed.
Note: if the normality assumption is not true, we will perform the nonparametric test – either the sign test or the
Wilcoxon signed-rank test.
(iii) Inference on two population means. Two small, independent samples.
Sample 1: responses from the experimental group.
Sample 2: responses from the control group.
X1  2.2, X 2  0.4, n1  n2  5, s1  1.9, s2  0.55
7
Under the normality assumption, we first test if the two population variances are equal H 0 :  12   22 vs
H a :  12   22 .
Test statistic
s2
F0  12  12.33 , F4,4,0.025  9.60 and F4,4,.0975  1/ F4,4,.025  1/ 9.6  0.104 .
s2
Since F0 is larger than 9.60, we reject H0 . Therefore it is not reasonable to assume that  12   22 .
If both populations are normal, we can test the equality of the two populations means using the unequalvariance t-test. If at least one population is not normal, we will perform the nonparametric test – Wilcoxon rank
sum test (also referred to as the Mann-Whitney U test).
Here assuming both populations are normal, we will perform the un-equal variance t-test to check whether the
responses from the two groups are different or not. We will use the simple (and less accurate) formula for
calculating the degrees of freedom calculation: d.f. = min( n1  1, n2  1 )
H 0 : 1  2  0 , H a : 1  2  0
T.S : T0 
( X1  X 2 )  0
H0
~ t4
s12 s22

n1 n2
At α=0.05, reject H 0 in favor of H a iff T0  t4,0.025  2.776
Here t0 
( X1  X 2 )  0
2
1
2
2

s
s

n1 n2
2.2  (0.4)
1.92 0.552

5
5
 2.939
Since 2.939 > 2.776, we conclude that the responses from the two groups are different at the significance level of 0.05.
(b)
/*Problem #1*/
data one;
input ID group rest work;
diff=work-rest;
datalines;
1 1 6 6
2 1 7 9
3 1 5 8
4 1 7 12
5 1 6 7
6 2 6 5
7 2 5 5
8 2 5 5
9 2 6 6
10 2 5 4
;
run;
proc univariate data=one normal;
class group;
var diff;
title 'Check for normality and test for one population mean, Q1';
run;
proc ttest data=one;
8
class group;
var diff;
title 'Independent samples t-test, Q1';
run;
proc npar1way data=one wilcoxon;
class group;
var diff;
title 'Nonparametric test for two-mean comparisons, Q1';
run;
7. Let Xi, i = 1, …, n, denote the outcome of a series of n independent trials, where Xi = 1 with probability p,
n
and Xi = 0 with probability (1- p). Let X   X i .
i 1
(a). Please derive the 100(1-α)% large sample confidence interval for p using the pivotal quantity method.
(b). At the significance level α, please derive the large sample test for H0: p = p0 versus Ha: p ≠ p0, using the
pivotal quantity method. (* Please include the derivation of the pivotal quantity, the proof of its distribution, and
the derivation of the rejection region for full credit.)
Solution:
(a). The population distribution is Bernoulli (p), i.e. Xi ~ Bernoulli(p). Therefore the population mean is p and
the population variance is p(1-p). When the sample size n is large, by the central limit theorem, we know that
the sample mean follows approximately the normal distribution with its mean being the population mean and its
n
variance being the population variance divided by n as follows: pˆ 
Thus it is easily shown that Z 
X
i 1
n
i

X
 p1  p  
~ N  p,
.
n
n


pˆ  p
~ N 0,1 is a pivotal quantity for the inference on p.
p1  p 
n
We can use this pivotal quantity to construct the large sample confidence interval for p. Alternatively, we can
pˆ  p
~ N 0,1 to construct the large sample confidence
also use the following pivotal quantity Z * 
pˆ 1  pˆ 
n
interval as follows.




*
1    P  Z   Z  Z    1    P  Z  

2
2 
2





pˆ  p
 Z 

pˆ 1  pˆ 
2

n


pˆ 1  pˆ 
pˆ 1  pˆ  

 1    P pˆ  Z 
 p  pˆ  Z 

n
n
2
2


Therefore the 100(1-α)% large sample confidence interval for p is:

pˆ 1  pˆ 
pˆ 1  pˆ  
 pˆ  Z 

, pˆ  Z 


n
n
2
2


9
pˆ  p
~ N 0,1 is a pivotal quantity for the inference on
p1  p 
n
p. For a 2-sided test of H0: p = p0 versus Ha: p ≠ p0, the test statistic is the pivotal quantity at p = p0, that is,
pˆ  p0
Z0 
. Intuitively, we would reject H0 in favor of Ha if Z 0  c . The problem is how to determine c.
p0 1  p0 
n
By the definition of the significance level, we have
  Preject _ H 0 | H 0   P Z 0  c | H 0   2PZ 0  c | H 0 
(b). From part (a) above, we have shown that Z 
Thus  / 2  PZ 0  c | H 0  and subsequently we have c  Z / 2
That is, at the significance level α, we reject H0 in favor of Ha if Z 0  Z / 2 .
8. People at high risk of sudden cardiac death can be identified using the change in a signal averaged
electrocardiogram before and after prescribed activities. The current method is about 80% accurate. The
method was modified, hoping to improve its accuracy. The new method is tested on 50 people and gave
correct results on 46 patients.
(a) Is this convincing evidence that the new method is more accurate? Please test at α =.05.
(b) If the new method actually has 90% accuracy, what power does a sample of 50 have to demonstrate that the
new method is better at α =.05?
(c) How many patients should be tested in order for this power to be at least 0.75?
Answer: This is problems 9.7 & 9.8 in our text book.
10
11
Download