Stat 557 ...

advertisement
Stat 557
Midterm Exam Solutions
Fall 2002
Some problems may have more than one reasonable solution, although better solutions receive higher scores. Not
all of the reasonable solutions may be mentioned below. Also, you should only give the solution you think is best.
If you gave more than one solution, you were awarded the score for the weakest solution you gave. You should a
score written on your paper for each part of each problem. If not, check with the instructor. Also check if your
score was correctly tabulated. A stem-leaf display of scores is given on the last page.
1.
(A) (10 points) Since the observed numbers of tumors are small, the normal approximation to the null
p1 − p 2
n p + n 2 p2
distribution of
, may not provide an accurate p-value.
Z=
where p = 1 1
n1 + n 2
 1
1 

p(1 − p) +
 n1 n 2 
Also, the chi-squared approximation to the null distribution of the Pearson chi-square test statistic, or the
likelihood ratio test statistic, may not provide an accurate p-value, and those tests are for two-sided alternatives.
Use Fisher’s exact randomization test to compute the one-sided p-value as
 5   27 
 5   27 
   
   
 4   12  +  5   11  = 86919300 + 13037895 = .166
601080390 601080390
 32 
 32 
 
 
 16 
 16 
The null hypothesis is not rejected. These data do not provide strong evidence in support of the claim that
exposure to Avadex increases the incidence of lung cancer.
(B)
(10 points) In a larger study, the normal approximation to the Z-test could provide accurate p-values.
We will base our power calculation on the normal approximation to that test statistic for obtaining power
.90 of detecting a difference of (.25-.0625) = .1875 using a .05 level test against a one-sided alternative.
.25 + .0625
2(.15625)(.84375)
Using π =
= .15625 and R =
= 1.035 we have
2
(.25)(.75) + (.0625)(.9375)
n=
2.
(1.282 + (1.035) (1.645)) 2 [(.25)(.75) + (.0625)(.9375)]
(.25 − .75) 2
= 63 mice in each treatment group.
(A) (8 points) The five counts Y1 , Y2 , Y3 , Y4 , Y5 have a multinomial distribution with probabilities
π i = ( mi −1 e − m ) /(i − 1)!
for i=1, 2, 3, 4 and π 5 = 1 − π1 − π 2 − π 3 − π 4 . Then, the log-likelihood
5
5
i =1
i =1
function for these counts is ( m) = log(n! ) − ∑ log(Yi !) + ∑ Yi log(π i ) where n=400.
2
(B) (8 points) df = (5 − 1) − 1 = 3 . Since 9.02 > χ 3,.05 = 7.81 , the Poisson model is rejected at the
.05 level of significance.
(C)
5
( Yi − m̂ i ) 2
i =1
m̂ i
(i) (8 points) Since X 2 = ∑
= 3.634 , is smaller than χ 22, .10 the data are
1
consistent with the proposed model. Also, G 2 = 2
5
∑
i =1
χ 22, .10 = 4.61.
(ii)
Y 
Yi log i  = 3.558 is smaller than
 m̂ 
 ij 
(8 points) Assuming this model is correct, the mle for the expected number of
ˆ
compartments that contain exactly one yeast cell is m̂1 = 400θˆ (1 − λˆ ) e − θ = 119.62 . To
apply the delta method, compute a vector of estimates of first partial derivatives
 ∂m
Ĝ =  1
 ∂θ
ˆ
∂m1  
= 400(1 − θˆ )(1 − λˆ ) e − θ

∂λ  
ˆ
− 400θˆ e − θ  = [39.027 − 141.897]

and estimate the variance of m̂1 as Var ( m̂1 ) = ĜV̂Ĝ T = 75.133 . Then, an approximate 95%
confidence interval is 119.62 ± (1.96)(8.668)
⇒ 119.62 ± 16.99 ⇒ (102.6, 136.6)
Alternatively, you could construct an approximate 95% confidence interval for
log(m1 ) = log(n ) + log(θ) + log(1 − λ ) − θ using the delta method to obtain the large sample
normal approximation to the distribution of log(m̂1 ) = log(n ) + log(θˆ ) + log(1 − λˆ ) − θˆ .
 ∂m
Compute Ĝ =  1
 ∂θ
[
]
∂m1  ˆ −1
= θ − 1 − (1 − λˆ ) −1 = [0.326 − 1.186] . Then,
∂λ 
Var (log(m̂1 )) = ĜV̂Ĝ T = .00525 and an approximate 95% confidence interval for log(m1 ) is
log(119.62) ± (1.96) .00525
⇒ ( 4.642, 4.926) . Applying the exponential function to the
endpoints of this interval yields (103.7, 137.8) as an approximate 95% confidence interval for
m1 .
3.
ˆ=
(A) (4 points) α
(B)
(51)(10)
= 0.5095
(77)(13)
1
1
1
1
+
+ +
= .4577 . Then, an
51 77 13 10
ˆ ) ± (1.96)Slog(αˆ ) ⇒ − .6743 ± .4577
approximate 95% confidence interval for log(α) is log(α
ˆ ) = −0.6743 and Slog(αˆ ) =
(8 points) Compute log(α
⇒ (−1.5714, 0.2229) Then, an approximate 95% confidence interval for α is
(e −1.5714 , e 0.2229 ) ⇒ (0.2078, 1.2497) . For women under 50 in Tokyo, the odds of three year
survival for women diagnosed with malignant tumors is not significantly different from the odds of
three year survival for women diagnosed with benign tumors.
ˆ ± (1.96)Sαˆ would not come as close to achieving a 95%
A confidence interval constructed as α
ˆ).
coverage probability as the confidence interval based on the normal approximation to log(α
Alternatively, one could have described a bootstrap procedure for constructing a confidence interval.
(C)
(6 points) The largest log-linear model that satisfies the null hypothesis that 3-year survival is
conditionally independent of treatment center given the age of the women and tumor status at time of
diagnosis is
CT
AS
AT
ST
CAT
AST
log(mijk ) = λ + λCi + λAj + λSk + λT + λCA
ij + λ i + λ jk + λ j + λ k + λ ij + λ ik
(D)
(4 points) There are 12 degrees of freedom for testing the fit of this model.
2
(E)
(14 points) The log-linear model
CT
AS
AT
ST
AST
log(mijk ) = λ + λCi + λAj + λSk + λT + λCA
ij + λ i + λ jk + λ j + λ k + λ ik
was fit to these data, where
C
A
S
T
= treatment center (Tokyo, Boston, London)
= age group
= survival for 3 years (yes, no)
= tumor status at time of diagnosis (malignant, benign)
Maximum likelihood estimates of the parameters in this model, along with their standard errors, were
displayed on page 8 of this exam. These estimates were obtained using the restrictions that any λ term
is zero when any factor involved in that λ term is at its highest level.
CS
Since λ ij = 0 for all (i,j) in this model, it implies that within each of the subgroups formed by the six
possible combinations of the age and tumor status categories the probably of surviving three years is the
same at all three locations.
The presence of a significant three factor interaction involving age, tumor status and survival rates
suggests that at each treatment center the association between 3-year survival and tumor status changes
across age groups. To examine this in more detail, compute the values of λ
ST + λ AST for each age
k
jk
group
< 50 years
3-year
survival
(S)
50 − 69 years
> 69 years
Malign
ant
benign
Malign
ant
benign
Malign
ant
Benign
Yes
-.9103
0
-.4786
0
.1341
0
No
0
0
0
0
0
0
Note that the log of a conditional odds ratio, the odds of 3-year survival when diagnosed with a malignant
tumor divided by the odds of 3-year survival when diagnosed with a benign tumor for a particular age
group and any center, is obtained as
ST + λ AST ) − ( λ ST + λ AST ) − ( λ ST + λ AST ) + ( λ ST + λ AST ) = ( λ ST + λ AST )
( λ11
j11
12
j12
21
j21
22
j22
11
j11
under the imposed parameter constraints. Then, the estimated conditional odds ratios are
Under 50
50-69
70 or older
λˆ AST
111
ˆ AST
αˆ ST • A =1 = exp(λˆST
11 + λ111 ) = exp( −.9013) = .4024
αˆ
= exp(λˆST + λˆ AST ) = exp( −.4786) = .6197
ST • A = 2
αˆ ST • A = 3
11
211
= exp( λˆST
11 ) = exp(.1341) = 1.143
= −1.044 is significantly different from zero (p-value=.0287), we can conclude that αST • A =1
is smaller than αST • A = 3 . However, the data do not show that αST • A = 2 is significantly different from
αST • A = 3 . An approximate 95% confidence interval for αST • A = 3 is (.543, 2.41). Confidence intervals
for αST • A = 3 and αST • A = 3 cannot be constructed because covariance estimates were not reported.
Since
3
For younger women, the odds of 3-year survival for women diagnosed with a malignant tumor are only
about 40% of the odds of survival for women diagnosed with a benign tumor. For women 50-69, the odds
of 3-year survival for wome diagnosed with a malignant tumor are estimated to be about 60% of the odds
of survival for women diagnosed with a benign tumor. For women over 69, the odds of 3-year survival are
about the same for women diagnosed with either malignant or benign tumors.
Conditional associations between 3-year survival and age, conditional on tumor status and center, are
consistent across treatment centers, but are different for malignant and benign tumors. Estimated values of
AST
λ AS
jk + λ jk are shown below.
Age
Group
Malignant Tumor
Benign Tumor
Survived
Died
Survived
Died
<50
0.1384
0
1.1828
0
50-69
0.3887
0
1.0014
0
>69
0
0
0
0
(A)
The log of a conditional odds ratio, the odds of 3-year survival for a particular age group divided by the
odds of 3-year survival for the oldest age group, for the - th tumor status and any center, is obtained as
AST
AS
AST
AS
AST
AS
AST
AS
AST
( λ AS
j1 + λ j1 ) − ( λ j2 + λ j2 ) − ( λ 31 + λ 31 ) + ( λ 32 + λ 32 ) = ( λ11 + λ j1 )
under the imposed parameter constraints. Then, the estimated conditional odds ratios are
ˆ AST
Malignant tumor: under 50 vs over 69: exp( λˆ AS
11 + λ111 ) = exp(.1384 ) = 1.148
ˆ AST
exp(λˆ AS
21 + λ 211 ) = exp(.3887) = 1.475
Malignant tumor: under 50 vs 50-69: exp(.1384 − .3887) = 0.779
Benign tumor: under 50 vs over 69: exp( λˆ AS ) = exp(1.1828) = 3.263
Malignant tumor: 50-69 vs over 69:
11
Benign tumor: 50-69 vs over 69:
exp(λˆ AS
21 ) = exp(1.0014) = 2.722
exp(1.1828 − 1.0014) = 1.199
Benign tumor: under 50 vs 50-69:
For women diagnosed with malignant tumors, the odds of 3-year survival is estimated to be 15% greater for
women under 50 than for women over 69. We cannot construct a confidence interval or perform a test of
significance because the required covariance estimate was not reported. For women diagnosed with benign
tumors, the odds of 3-year survival is about 3 times greater for women under 50 than for women over 69.
An approximate 95% confidence interval for this odds ratio is (1.68, 6.32). Since λˆ AST
111 = −1.044 is
significantly different from zero (p-value= .0287), we conclude that the conditional odds ratios (age effect
on 3-year survival) for women with benign tumors is significantly greater than the conditional odds ratio
for women with malignant tumors. For women diagnosed with malignant tumors, the odds of 3-year
survival is estimated to be 47% greater for women under 50 than for women over 69. We cannot construct
a confidence interval or perform a test of significance because the required covariance estimate was not
reported. For women diagnosed with benign tumors, the odds of 3-year survival is about 3 times greater
for women under 50 than for women over 69. An approximate 95% confidence interval for this odds ratio
is (1.50, 4.94). Since λˆ AST
211 = −1.044 is not significantly different from zero (p-value= .1737), we
cannot conclude that the conditional odds ratios (age effect on 3-year survival) for benign tumors is
significantly greater than the conditional odds ratio for malignant tumors. Differences in odds of 3-year
survival between the youngest and middle age groups are not large, if they exist at all. We cannot perform
tests of significance or construct confidence intervals because needed covariance estimates were not
reported.
4
You could also comment on the associations between treatment centers and age and treatment centers and
CA
rates of malignant and benign tumors implied by the significant λ ij and λCT
i terms in the model,
although you were not asked for this information and it does not affect the score you received for your
ˆCT
answer. λˆ CT
11 = −.4051 and λ 21 = −.6740 indicate that there were lower proportions of malignant
tumors in the samples taken from the Tokyo and Boston treatment centers than from the sample taken from
ˆ AC
the London treatment centers. λˆ AC
11 = 1.4390 and λ12 = −0.7642 indicate that the samples taken from
the Tokyo treatment centers had the highest proportion of women under 50 and the samples taken from the
Boston treatment centers had the lowest proportion of women under 50, within each combination of the
ˆ AC
survival and tumor status categories. λˆ AC
21 = 0.6049 and λ 22 = −0.4831 indicate a similar pattern for
50-69 year old women, but not as pronounced.
(F) (4 points) Since the model in part (E) indicates that both 3-year survival (S) and treatment centers (C)
have significant associations with age, inference about the conditional association between 3-year survival
(S) and treatment centers (C) could change if the table is collapsed across the age levels.
4.
(8 points) A confidence interval based on the large sample normal approximation to the distribution of a sample
proportion would not be appropriate in this case. It would produce 0 ± (1.96)(0) ⇒ (0, 0) . An “exact”
confidence interval can be obtained by finding the values of the proportion π 0 that are consistent with the
observed data in the sense that the null hypothesis H 0 : π = π 0 would not be rejected at the .05 level of
significance. Then, the upper bound of the interval corresponds to the largest value of π 0 for which
10 
.025 ≥ Pr{Y ≤ 0} = Pr{Y = 0} =   π 00 (1 − π 0 )10 − 0 = (1 − π 0 )10 . This yields π 0 = .308 as the upper
0
bound of the confidence interval. The lower bound is the smallest value of π 0 for which .025 ≥ Pr{Y ≥ 0} .
This is taken to be π 0 = 0 in this case because Pr{Y ≥ 0} = 1 for any π 0 > 0 . The confidence interval is
(0, .308) .
A bootstrap confidence intervals was suggested by some students, but this is no more accurate than the large
sample normal approximation in this case. Since you observed no failures in 10 cases, resampling from the
observed sample would always produce no failures in 10 cases and a confidence interval of (0,0).
The total number of points for this exam are 100. A stem-leaf display of the scores is shown below.
9|02
8|68
8|0001113344
7|5678
7|011113444
6|5555667778
6|001344
5|
5|
4|
4|3
5
Download