Assignment 2 Solutions Stat 557 Fall 2000 Problem 1

advertisement
Stat 557
Fall 2000
Assignment 2 Solutions
Problem 1
(a) The table of counts would have a multinomial distribution if simple random sampling
with replacement was used. Since the population is large, a multinomial distribution
would provide a good approximation to the distribution of the counts if simple random
sampling without replacement was used. The log-likelihood function is
`(; Y) = log(n!) ;
2 X
2
X
i=1 j=i
log(Yij!) +
2 X
2
X
i=1 j=i
Yijlog(ij):
Some students suggested that the respondents would have to be "identical". This is
nonsense. Yhe population may contain individuals that are quite dierent. By selecting
respondents from the population using simple random sampling with replacement, the
probability distribution of the possible responses is the same for each selection and the
outcome for any single selection would be stochastically independent of the outcome
for any other selection. This is what is needed to obtain a multinomial distribution for
the counts.
(b) Under the independence model ij = i j . Substituting this into the log-likelihood
function shown above, we have
+
`(; Y) = log(n!) ;
2 X
2
X
i=1 j=i
+
log(Yij!) +
2
X
i=1
Yi log(i ) +
+
+
2
X
j=1
Y jlog( j):
+
+
Maximize this log-likelihood with respect to the contraints + = 1 and +
= 1. The formula for the maximum likelihood estimates for the expected counts
is
m^ ij = YiY Y j i = 1; 2 and j = 1; 2;
1+
2+
1+
2+
+
+
++
and (m^ ; m^ ; m^ ; m^ ) = (799:5; 220:5; 295:2; 81:5):
11
12
21
22
(c) G = 2 Pi Pj i Yij log(Yij =m^ ij ) = 5:321 on 1 d.f. with p-value = .021.
2
X = Pi Pj i Y m;m = 5:15 on 1 d.f. with p-value = .023.
The data suggest that opinions on gun registration are not held independently of
opinions on the death penalty. In particular, people who oppose the death penalty are
more likely to favor gun registration than people who favor the death penalty.
2
2
2
=1
2
=1
2
=
2
=
( ij
^ ij )
^ ij
1
(d) This is reected in the estimated odds ratio ^ = YY1211 YY2122 = :705, which indicates that
the odds of favoring gun registration among those who favor the death penalty is only
about 70% of the odds of favoring fun registration among those who oppose the death
penalty. To obtain an approximate 95% condence interval for , rst compute a an
approximate 95% condence interval for log():
q
log(^) (Z: ) ( Y11 + Y12 + Y21 + Y22 )
025
)
1
1
1
1
(;0:34956) (1:96)(:1545)
)
(;0:65244; ;0:04668).
)
(0:521; 0:954)
Then convert to a 95% condence interval for :
(exp(;0:65244); exp(;0:04668))
Problem 2
(a) The log-likelihood function is
`(; Y) = log(n!) ;
= log(n!) ;
= log(n!) ;
2 X
2
X
i=1 j =i
2 X
2
X
i=1 j =i
2
X
j =i
log(Yij !) +
2 X
2
X
i=1 j =i
Yij log(ij )
log(Yij !) + Y log( ) + (Y + Y )log((1 ; )) + 2Y log((1 ; ) )
2
11
12
21
22
log(Yij !) + (2Y + Y + Y )log() + (Y + Y + 2Y )log((1 ; ))
11
12
21
12
(b) Solve the likelihood equation
0 = @`(; Y) = 2Y + Y + Y ; Y + Y + 2Y
@
1;
The maximum likelihood estimate is
^ = 2Y +2YY + Y = Y 2Y+ Y = 0:757
11
11
12
12
21
21
12
1+
++
21
+2
++
(c) Compute m.l.e.'s for expected counts:
m^
m^ = m^
m^
11
12
21
22
= Y ^ = 800:5055
= Y ^(1 ; ^) = 256:9944
= Y (1 ; ^) = 82:5055
++
2
++
++
2
2
22
21
22
2
The deviance statistic is
G =2
2
2 X
2
X
i=1 j =i
Yij log(Yij =m^ ij ) = 16:2823
with 2 d.f. and p-value = .0003. It is not surprising that the data do not support this
model because the independence model was rejected in problem 1.
(d) An analysis of deviance table
Comparison
d.f. Deviance p-value
Model A vs. Model B 1
10.962 .0009
Model B vs. Model C 1
5.321
.021
Model A vs. Model C 2
16.283 .0003
Although Model B is a signicant improvement over Model A, neither Model A nor
Model B is appropriate in this case.
Problem 3
(a) The m.l.e. for the mean number of corn borers per location in the Poisson model is
m^ = 3:16667: This estimate was computed without combining any categories.
The Pearson statistic for testing the t of the i.i.d. Poisson model is X = 103:90 with
6 d:f: and p ; value < :0001: In computing this test categories with at least 7 corn
borers are combined into a single category. The Poisson model does not appear to be
appropriate.
2
(b) In this case, Y = 3:16667; and S = 7:70555: The deviance statistic is nS =Y = 292
with 119 d:f: and p;value < :0001: This test shows that the variance of the distribution
of numbers of corn borers across locations is larger than the mean. Hence, the i.i.d.
Poisson model is not appropriate.
2
2
(c) Maximum likelihood estimates of the expected counts for the Poisson and negative binomial models are shown in the following table. Blank entries indicate that categories
were combined to keep estimates of expected counts larger than 5 as requested in the
statement of the problem. Maximum likelihood estimates of the parameters in the
negative binomial probability function are ^ = 0:3573 and K^ = 1:7605.
3
Number of Number of Poisson Neg. Binomial
Corn Borers Locations Model
Model
0
24
5.057
19.602
1
16
16.015
22.179
2
16
25.357
19.675
3
18
26.765
15.850
4
15
21.189
12.124
5
19
13.420
8.977
6
6
7.083
6.501
7
5
5.111
7.892
8
3
9
4
7.200
10
3
11
0
12 or more
1
The value of the Pearson goodness-of-t statistic is X = 4:497 with 6 d:f: and p ;
value = 0:610: The negative binomial model is consistent with the observed data.
2
(d) Do not use the i.i.d. Poisson model to construct the condence interval because it was
shown that the Poisson model is inappropriate. There are several ways, however, to
obtain an approximate 95% condence interval for the mean number of corn borers at
a location.
(i) You could use the central limit to show that Y has a limiting normal distribution.
You must also show that S is a consistent estimator for = V ar(Yi). Then an
approximate 95% condence interval 1s
2
2
q
Y (1:96) S =n
2
)
(2:67; 3:67)
This method does not assume any particular distribution for the counts, but it does
assume that counts at the n locations are independent and identically distributed
random variables.
(ii) Assuming the negative binomial model is appropriate, V ar(Yi) = K (1 ; )= and
V ar(Y ) = K (1 ; )=n . Then an approximate 95% condence interval is
2
2
s
^
Y (1:96) K (1n^; ^ )
2
4
)
(2:63; 3:70)
(iii) You could use the Delta method to nd the limiting normal distribution for the
m.l.e. of the mean
^
; :3573) = 3:1667 = Y :
m^ nb = g(^; K^ ) = K (1^; ^ ) = (1:7605)(1
:3573
The computer output inverts the estimated Fisher information matrix to estimate the
covariance matrix of (^; K^ )0 as
2
3
:
0031364
:
0210552
5
V^ = 4
:0210552 :1613241
Compute rst partial derivatives of g(; K ),
!
;K
@g
@g
G = @ @K = 2
Then
1 ; !
^
G^ = ;^K
1 ; ^ = (;13:790; 1:79875)
^
and V ar(m^nb) is estimated as G^ V^ G^ 0 = :073855: Using the large sample normal approximation to the distribution of the m.l.e., m^ nb; an approximate 95% condence interval
for the mean number of corn borers in a location is
2
p
3:111667 (1:96) :073855
)
(2:63; 3:70):
This method is also based on the belief that the negative binomial model is appropriate.
Note that the condence interval based on the incorrect Poisson model
q
^
m^ (1:96) m=n
)
(2:85; 3:49)
is too short because the Poisson model does not allow for enough dispersion in the
counts.
(iv) You could use the delta method and large sample normality of m.l.e.'s to rst construct
a condence interval for
log(mnb) = log(K ) + log(1 ; ) ; log()
Then apply the exponential function to the endpoints of that interval to obtain an approximate 95% condence interval for mnb . When would this be better than approach (iii)?
5
Problem 4
(a) ^ = 2:933 with approximate 95% condence interval (1:67; 5:16). This suggests that
the odds of contracting Hodgkin's Disease are between 67% to 500% greater among
people who have had their tonsils removed.
This condence interval was constructed by rst constructing an approximate 95%
condence interval for log() and then applying the exponential function to the endpoints of the interval to obtain an approximate 95% condence interval for the odds
ratio. Directly using the large sample normal distribution for the odds ratio, we have
q
^ (1:96) (^
2
2 X
2
X
i=1 j =i
Yij; )
1
)
(1:27; 4:59):
This does not adequately account for the right skewness in the disribution of the odds
ratio in this case. Notice how much this interval is shifted to the left of the previous
interval.
(b) Some people questioned the use of the odds ratio as a good approximation to relative
risk in the Vianna study. Since this is a retrospective study, you cannot directly
estimate relative risk, but the odds ratio does provide a good approximation because
Hodgkin's disease is a rare disease. Some people worried about the restrictions put on
the control group in the Vianna study, and others noted that the sibling controls used
by Johnson and Johnson may dier substantially from the controls used in the Vianna
study.
(b) While there are advantages in using siblings as controls, Johnson and Johnson did
not do an appropriate analysis, because they did not allow for the eects of matched sibling pairs. The "controls" are not an independent sample of persons without Hodgkin's
disease. Each sibling pair, not each individual, should be thought of as an independent
experimental unit. Siblings are likely to provide correlated responses. Johnson and
Johnson should have reported the data in the following table with one count for each
sibling pair.
C ontrol Sibling
Had T. No T.
Sibling with Had T. Y
Y
Hodgkin's
disease
No T.
Y
Y
6
11
12
21
22
There are n = Y + Y + Y + Y = 85 sibling pairs. Unfortunately, this table cannot
be obtained from the table reported by Johnson and Johnson (1972).
The Pearson chi-square test for independence performed by Johnson and Johnson is
incorrect. They should have performed a test of marginal homogeneity using the counts
in the table shown above. This is McNemar's test (or the sign test). Reject the null
hypothesis of marginal homogeneity if
;Y ) >
X = (YY +
;
Y 21
(this is a 2-sided test).
Using the above table Johnson and Johnson could have estimated an odds ratio as
+ Y )(Y + Y ) :
^ = ((YY +
Y )(Y + Y )
To construct an approximate condence interval you would have to derive the formula
for the variance of this statistic, or the natural logarithm of this statistic, We might
attempt this on the next assignment.
Finally, note that Vianna, et al. sampled from a dierent population than Johnson
and Johnson. What are the consequences of this?
11
12
21
22
12
2
21
2
12
2
1
11
12
12
22
21
22
11
12
Problem 5
There are only six tables with the same row and column toals as the observed table. These
tables can be distinguished by the Y value. The exact probabilites are presented in the
following table.
Table
Exact
Number Y Probability
1
21
0.2755 observed table
2
22
0.0939
3
23
0.0114
4
19
0.2127
5
18
0.0449
6
20
0.6384
Looking at the proportions of cases where the cancer is controlled for the surgery and radiation treatments, only table 6 is more consistent with the null hypothesis of equal proportions
than the observed table. Consequently, the p-value, 0.6384, is the sum of the probabilities
for the other ve tables. These data are consistent with the null hypothesis that the success
rates are the same for the surgery and radiation treatments.
11
11
7
Problem 6
Since the objective is to show that the IFN-B treatment is better, we should test the null
hypothesis that the IFN-B treatment has the same eect as the control treatment against
a directional alternative where the IFN-B treatment is "better". Dierent denitions of
what it means for the IFN-B treatment to be "better" result in dierent answers. The row
total in this table are xed by the randomization procedure that places 10 subjects in each
treatment group. Under the null hypothesis that the IFN-B and control treatments are
equally eective, the column totals are also xed. There are 43 possible tables with these
row and column totals. Each table is distinguished by the values of Y and Y , the rst two
counts in the rst row of the table. The values of these two counts and the corresponding
probability that each table occurs by random assignment of subjects to treatment groups
are shown below.
11
Table
Number Y
1
6
2
6
3
6
4
6
5
6
6
5
7
5
8
5
9
5
10
5
11
5
12
4
13
4
14
4
15
4
16
4
17
4
11
Y
0
4
1
3
2
5
0
1
4
2
3
6
0
1
2
3
4
12
Pearson
X
14.67
12.00
10.50
9.17
8.67
9.17
13.33
7.83
5.33
4.67
3.83
8.67
14.67
7.83
3.33
1.17
1.33
2
12
Exact
Probability
15/184756
70/184756
160/184756
336/184756
420/184756
336/184756
25/184756
600/184756
2520/184756 (observed table)
2800/184756
4200/184756
420/184756
15/184756
600/184756
6300/184756
16800/184756
15750/184756
8
Table
Pearson
Exact
Number Y Y
X
Probability
18
4 5
3.83 4200/184756
19
3 7 10.50 160/184756
20
3 6
4.67 2800/184756
21
3 5
1.17 16800/184756
22
3 4
0.00 28000/184756
23
3 3
1.17 16800/184756
24
3 2
4.67 2800/184756
25
3 1 10.50 160/184756
26
2 3
3.83 4200/184756
27
2 4
1.33 15750/184756
28
2 5
1.67 16800/184756
29
2 6
3.33 6300/184756
30
2 7
7.83
600/184756
31
2 8 14.67
15/184756
32
2 2
8.67
420/184756
33
1 5
4.67 2800/184756
34
1 4
5.33 2520/184756
35
1 8 13.33
25/184756
36
1 7
7.83
600/184756
37
1 6
4.67 2800/184756
38
1 3
9.17
336/184756
39
0 6
8.67
420/184756
40
0 5
9.17
336/184756
41
0 7 10.50 160/184756
42
0 4 12.00
70/184756
43
0 8 14.67
15/184756
Note that in this case the exact probabilities provide the same ordering of the tables the
Pearson X values. Table 9 is the observed table and the EXACT option in PROC FREQ
in SAS computes a multi-sided p-value of 0.0642 by adding the probabilities for tables 1,
2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 19, 25, 30, 31, 32, 34, 35, 36, 38, 39, 40, 41, 42, 43. The
sher.test( ) function in SPLUS yields the same p-value by summing the probabilities for
all tables that occur with probability no larger than the probabiility of the observed table.
Thisis not necessarily a good way to dene a critical region or dene a p-value. This set of
tables includes many tables that have fewer treated patients that either improve or stay the
11
2
12
2
9
same than in the observed table, so it does not provide an appropriate p-value with respect
to the objective of this study.
One reasonable criterion is that tables provide more evidence than the observed table that
the IFN-B treatment is better if at least 5 of the treated patients show improvement and at
least 9 of the treated patients either improve or stay the same (or no more than one of the
treated patients becomse worse). This includes tables 2, 4, 6, 9 and the p-value is 0.01766.
Another crierion is that dierence between the number of treated patients that improve and
the number of treated patients that get worse should be at least 5 ; 1 = 4. This results in a
p-value of 0.0222. Many students failed to clearly describe the criterion used to identify the
possible tables that were included in the evaluation of the p-value.
Problem 7
(a) Suppose skin cancer patients were randomly sampled with replacement from a population of skin cancer patients (or simple random sampling without replacement could be
used if the population of skin patients is large). Each patient in the sample is classied
according to the stage of their disease and their reaction to DCNB allergen. Then,
the counts in the table would have a multinomial distribution with six categories and
sample size n = 173.
(b) The value of the Pearson X test is 15.365 with 2df and p ; value = 0:00046: The
value of the deviance test is G = 15:445 with 2df and p ; value = 0:00044.
2
2
(c) Both tests in part show strong evidence that the allergic reaction to DCNB is related
to the stages of skin cancer. Stage III skin cancer patients are more likely to have
negative reactions to DCNB exposure than the other skin cancer patients.
(d) The results are the same as in part (b). As shown in class, for a two way contingency
table, both the multinomial distribution and independent binomial distributions yield
the same mle's for the expected counts and the same degrees of freedom for testing the
t of the independence model. Consequently, both procedures yield the same value of
the test statistic and the same p-value. The columns of the table would correspond to
independent binomila distributions if separate simple random samples of patients were
taken from patients at the three dierent stages of skin cancer.
10
Problem 8-11
The objective of this exercise was to show you how easy it is use simulation methods to investigate small sample properties of test statistics. Here we concentrated on the Type I error
levels of the Pearson X test and the G test. The table in problem 8 satises Cochran's
rule. Only one expected count is smaller than 5 and none are smaller than 1. Cochran's rule
is more severely violated as we move through problems 9, 10, and 11. Results in problems
8, 9, and 10 illustrate that the presence of expected counts in the range from 0.5 through 5
tends to inate the type I error level of the G test, while the Pearson X test maintains a
type I error level closer to the nominal .05 level. This is seen in the Q-Q plots where quantile
of the null distribution of the G statistic tend to be above the 45 degree line. For the table
in problemm 11, however, points on the Q-Q plot for the G statistic tend to be below the
45-degree line and the type I error level is smaller than the nominal .05 level. This illustrates that using the large sample chi-square approximation too the null distribution of the
G statistic can produce a conservative test with very little power when the table contains
many very small expected counts. In such cases, the table will contain many observed zero
counts and cells with zero counts contribute zero to the value of the G statistic. In such
cases the null distribution of the Pearson statistic places more probability on smaller values
than indicated by the large sample chi-squared approximation, but there are also sizeable
probabilites of some large positive values of X . These patterns would have been more severe
in a table with more cells. Overall, the large sample chi-square approximation to the null
distirbution of the test statistics provides more accurate type I error levels for the Pearson
X than for the G statistic when there are moderate violations of Cochran's rule. For more
severe violations of Cochran's rule, the large sample chi-squared approximation may not
provide the desired type I error level for either test statistic.
2
2
2
2
2
2
2
2
2
2
2
The code you used also provided "exact" p-values for the X and G statistics. This was
done by simulating 10,000 tables for which the null hypothesis of independence is true, using
the overall success rate from the observed table to generate four independent binomial counts
with sample sizes corresponding to the column totals in the original table of counts. Is this
a reasonable thing to do? It does not provide the same p-value as Fisher's exact test where
both the row and column totals are xed. The Fisher exact test, which is appropriate for
randomized experiments, considers only a small subset of the possible tables we could have
seen in simulating 10,000 tables. Which approach is better in this situation? Most students
failed to address these issues.
2
11
2
Download