STAT 557 ASSIGNMENT 5 Name ________________
Reading Assignment: Finish reading Chapter 6 in Lloyd. Next we will read Chapter 4.
Written Assignment: Due Monday, November 11.
1. The following table shows numbers of colonies of TA98 Salmo nella observed on plates treated with five levels of quinoline. Three plates were prepared for each level of quinoline. log
10
(dose) of quinoline
(mg per plate)
0
1.0
1.5
2.0
2.5
Numbers of colonies of TA98
12
21
Salmonella
17
26
26 26
25
28
35
28 33
34 45
50
47
Let Y ij
denote the count for the j-th plate prepared with the i-th level of quinoline. Suppose the Y ij
’s are independent Poisson random variables with E(Y ij
) = m i
, where log(m i
) =
β
0
+
β
1
X and X is the log(dose) value for the corresponding level of quinoline.
(a) Write down the formula for the log-likelihood function for this model.
(b) Find the maximum likelihood estimates for
β
0
and
β
1
. Give an interpretation of these estimated coefficients.
(c) Construct a 95% confidence interval for the mean number of colonies when the logdose of quinoline is 2.2.
(d) Test the fit of the model against the alternative of independent Poisson counts with a different mean at each level of quinoline. Report a p-value and state your conclusion.
(e) Now fit the model log(m i
) =
β
0
+
β
1
X with m i
= E(Y i
) and where Y i
has a negative binomial distribution. Report estimates of
β
0
,
β
1
and the dispersion parameter, and report their standard errors. Does the estimate of the dispersion parameter indicate that there is more variation in the counts than can be accounted for by a Poisson regression model? Explain.
(f) Use the model in Part (e) to construct a 95% confidence interval for the mean number of colonies when the log-dose of quinoline is 2.2. Compare this with your answer to Part
(c).
(g) Explore adding higher order polynomial terms to the original Poisson regression model and the model in Part (e). What is the “best” model? Use your estimate of the “best” model to construct a 95% confidence interval for the mean number of colonies when the log-dose of quinoline is 2.2.
2. Vidmar (1972) collected the following data to examine the possible effects of limiting the number of alternatives on decisions made by a jury panel. The following table shows the number of times each alternative was selected in mock murder trials. The same set of alternatives was not given to each jury. Consequently, structural zeros appear in the table for those alternatives which were not available (these are indicated by dashes). In a real trial only two alternatives are usually available; but guilty and the verdict sought by the prosecutor. The alternatives are listed in decreasing order of severity of the corresponding penalty. Assume that decisions were made independently, so the columns represent seven independent multinomials each with sample size 24.
Alternative
First degree murder
Second degree murder
Manslaughter
Not guilty
A
11
-
-
13
B
-
20
-
4
Set of Alternatives Offered
C D E
-
-
22
2
2
22
-
0
7
-
16
1
F
-
11
13
0
G
2
15
5
2
2
(a) Fit the quasi-independence model to this two-dimensional table. Report values for
X
2 =
__________ G
2 =
__________ d.f.
=
__________
Does this model fit well? Examine results. State your conclusion.
(b) Vidmar concluded that “under conditions of restricted decision alternatives, the more severe the degree of guilt associated with the least severe quilt alternative the greater were the chances of obtaining a not guilty verdict.”
(i) Do you agree with this conclusion? Explain.
(ii) Is this conclusion necessarily incompatible with the quasi-independence model?
Explain.
3. The data given below gives a cross-classification of husband’s father’s occupational class versus wife’s father’s occupational class for a sample of 15, 084 married couples (from the
1962 Current Population Survey). The data were collected in this form to determine if
American women tend to marry into families of lower, similar, or higher occupational status.
3
Husband’s
Father’s Class Wife’s Father’s Class
Highest
↑
↓
Lowest
I
II
III
IV
V
I
341
410
317
323
202
TOTALS 1593
II
396
600
458
548
307
2309
III
303
568
882
1004
523
3280
IV
317
625
972
1777
860
4551
V
158
306
435
595
1857
TOTALS
1515
2509
3064
4247
3749
3351 15084
(a) Fit the symmetry model to these data. Report values of the likelihood ratio and Wald goodness-of-fit statistics and the d.f. Does the model fit well? Examine the residuals.
State your conclusions.
(b) Perform a test of the hypothesis that as many women marry up (into a higher class) as marry down.
(c) Perform both the Wald and likelihood ratio tests for marginal homogeneity. Report values of the statistics and the d.f. State your conclusions.
(d) Ignore the diagonal counts and fit the quasi independence model. Report the values of the goodness-of-fit statistics and the d.f. Examine the residuals. State your conclusions.
(e) If none of the above models are suitable, find a model that explains the relationships between occupational classes for the husband’s father and wife’s father.
4. Information on the prevalence of the birth defect known as spina bifida in upstate New York between 1969 and 1974, inclusive, was obtain from three different sources: birth certificates
(B), death certificates (D), and Medical Rehabilitation Program files (M). For each source listings from computer tapes of cases were inspected to confirm that the case had actually been diagnosed as spina bifida. To be included in this study a case from either the
D or M source had to have been reported within three years after the year of birth. (Hook, et al., 1980).
The observed counts are shown below:
B
Reporting Source
D M
Number of
Cases
Yes
Yes
Yes
Yes
No
No
No
No
Yes
Yes
No
No
Yes
Yes
No
No
Yes
No
Yes
No
Yes
No
Yes
No
12
142
112
247
4
49
60
?
These data are posted in the file spina.bifida.txt. Our goal is to estimate m
222
the number of spina bifida cases in 1969-1974 in upstate New York that were not reported by any of these sources.
(a) Fit the model log(m ijk
)
= λ + λ B i
+ λ D j
+ λ M k
This would be an appropriate model if the probability that a case is reported by one source is independent of the probability that the case is reported by either of the other two sources.
(i) Obtain an estimate of m and its standard error.
222
(ii) Is this model appropriate for these data? Support your answer with a goodness of fit test.
(b) Fit the model log(m ijk
)
= λ + λ B i
+ λ D j
+ λ M k
+ λ BD ij
+ λ BM ik
+ λ DM jk
For this model the conditional odds that one source reports a case may depend on whether or not the other sources report the case. For example, the conditional logodds that a case is reported on a birth certificate is log
m
1jk m
2jk
=
(
λ B
1
− λ B
2
) (
λ BD
1j
− λ BD
2j
λ BM
1k
− λ BM
2k
)
(i) Obtain an estimate of m
222
from this model and report the standard error of
the estimate.
(ii) Is this model appropriate for these data? Explain.
(c) Find the most parsimonious model that provides an adequate fit to these data.
(i) Report a formula for the model you select.
(ii) Using your model in (i), estimate m
222
and report its standard error.
Compare the standard error of this estimator to the standard errors of the estimators in Parts (b) and (c).
(d) Given what you learned in Parts (a) – (c), estimate the rate of spina bifida cases per 1,000 live births, and report a 95% confidence interval. There were 863, 143 live births in upstate New York from 1969 through 1974.
4