Document 10714920

advertisement

STAT 557 ASSIGNMENT 5 Name ________________

Reading Assignment: Finish reading Chapter 6 in Lloyd. Next we will read Chapter 4.

Written Assignment: Due Monday, November 11.

1. The following table shows numbers of colonies of TA98 Salmo nella observed on plates treated with five levels of quinoline. Three plates were prepared for each level of quinoline. log

10

(dose) of quinoline

(mg per plate)

0

1.0

1.5

2.0

2.5

Numbers of colonies of TA98

12

21

Salmonella

17

26

26 26

25

28

35

28 33

34 45

50

47

Let Y ij

denote the count for the j-th plate prepared with the i-th level of quinoline. Suppose the Y ij

’s are independent Poisson random variables with E(Y ij

) = m i

, where log(m i

) =

β

0

+

β

1

X and X is the log(dose) value for the corresponding level of quinoline.

(a) Write down the formula for the log-likelihood function for this model.

(b) Find the maximum likelihood estimates for

β

0

and

β

1

. Give an interpretation of these estimated coefficients.

(c) Construct a 95% confidence interval for the mean number of colonies when the logdose of quinoline is 2.2.

(d) Test the fit of the model against the alternative of independent Poisson counts with a different mean at each level of quinoline. Report a p-value and state your conclusion.

(e) Now fit the model log(m i

) =

β

0

+

β

1

X with m i

= E(Y i

) and where Y i

has a negative binomial distribution. Report estimates of

β

0

,

β

1

and the dispersion parameter, and report their standard errors. Does the estimate of the dispersion parameter indicate that there is more variation in the counts than can be accounted for by a Poisson regression model? Explain.

(f) Use the model in Part (e) to construct a 95% confidence interval for the mean number of colonies when the log-dose of quinoline is 2.2. Compare this with your answer to Part

(c).

(g) Explore adding higher order polynomial terms to the original Poisson regression model and the model in Part (e). What is the “best” model? Use your estimate of the “best” model to construct a 95% confidence interval for the mean number of colonies when the log-dose of quinoline is 2.2.

2. Vidmar (1972) collected the following data to examine the possible effects of limiting the number of alternatives on decisions made by a jury panel. The following table shows the number of times each alternative was selected in mock murder trials. The same set of alternatives was not given to each jury. Consequently, structural zeros appear in the table for those alternatives which were not available (these are indicated by dashes). In a real trial only two alternatives are usually available; but guilty and the verdict sought by the prosecutor. The alternatives are listed in decreasing order of severity of the corresponding penalty. Assume that decisions were made independently, so the columns represent seven independent multinomials each with sample size 24.

Alternative

First degree murder

Second degree murder

Manslaughter

Not guilty

A

11

-

-

13

B

-

20

-

4

Set of Alternatives Offered

C D E

-

-

22

2

2

22

-

0

7

-

16

1

F

-

11

13

0

G

2

15

5

2

2

(a) Fit the quasi-independence model to this two-dimensional table. Report values for

X

2 =

__________ G

2 =

__________ d.f.

=

__________

Does this model fit well? Examine results. State your conclusion.

(b) Vidmar concluded that “under conditions of restricted decision alternatives, the more severe the degree of guilt associated with the least severe quilt alternative the greater were the chances of obtaining a not guilty verdict.”

(i) Do you agree with this conclusion? Explain.

(ii) Is this conclusion necessarily incompatible with the quasi-independence model?

Explain.

3. The data given below gives a cross-classification of husband’s father’s occupational class versus wife’s father’s occupational class for a sample of 15, 084 married couples (from the

1962 Current Population Survey). The data were collected in this form to determine if

American women tend to marry into families of lower, similar, or higher occupational status.

3

Husband’s

Father’s Class Wife’s Father’s Class

Highest

Lowest

I

II

III

IV

V

I

341

410

317

323

202

TOTALS 1593

II

396

600

458

548

307

2309

III

303

568

882

1004

523

3280

IV

317

625

972

1777

860

4551

V

158

306

435

595

1857

TOTALS

1515

2509

3064

4247

3749

3351 15084

(a) Fit the symmetry model to these data. Report values of the likelihood ratio and Wald goodness-of-fit statistics and the d.f. Does the model fit well? Examine the residuals.

State your conclusions.

(b) Perform a test of the hypothesis that as many women marry up (into a higher class) as marry down.

(c) Perform both the Wald and likelihood ratio tests for marginal homogeneity. Report values of the statistics and the d.f. State your conclusions.

(d) Ignore the diagonal counts and fit the quasi independence model. Report the values of the goodness-of-fit statistics and the d.f. Examine the residuals. State your conclusions.

(e) If none of the above models are suitable, find a model that explains the relationships between occupational classes for the husband’s father and wife’s father.

4. Information on the prevalence of the birth defect known as spina bifida in upstate New York between 1969 and 1974, inclusive, was obtain from three different sources: birth certificates

(B), death certificates (D), and Medical Rehabilitation Program files (M). For each source listings from computer tapes of cases were inspected to confirm that the case had actually been diagnosed as spina bifida. To be included in this study a case from either the

D or M source had to have been reported within three years after the year of birth. (Hook, et al., 1980).

The observed counts are shown below:

B

Reporting Source

D M

Number of

Cases

Yes

Yes

Yes

Yes

No

No

No

No

Yes

Yes

No

No

Yes

Yes

No

No

Yes

No

Yes

No

Yes

No

Yes

No

12

142

112

247

4

49

60

?

These data are posted in the file spina.bifida.txt. Our goal is to estimate m

222

the number of spina bifida cases in 1969-1974 in upstate New York that were not reported by any of these sources.

(a) Fit the model log(m ijk

)

= λ + λ B i

+ λ D j

+ λ M k

This would be an appropriate model if the probability that a case is reported by one source is independent of the probability that the case is reported by either of the other two sources.

(i) Obtain an estimate of m and its standard error.

222

(ii) Is this model appropriate for these data? Support your answer with a goodness of fit test.

(b) Fit the model log(m ijk

)

= λ + λ B i

+ λ D j

+ λ M k

+ λ BD ij

+ λ BM ik

+ λ DM jk

For this model the conditional odds that one source reports a case may depend on whether or not the other sources report the case. For example, the conditional logodds that a case is reported on a birth certificate is log

 m

1jk m

2jk

=

(

λ B

1

− λ B

2

) (

λ BD

1j

− λ BD

2j

λ BM

1k

− λ BM

2k

)

(i) Obtain an estimate of m

222

from this model and report the standard error of

the estimate.

(ii) Is this model appropriate for these data? Explain.

(c) Find the most parsimonious model that provides an adequate fit to these data.

(i) Report a formula for the model you select.

(ii) Using your model in (i), estimate m

222

and report its standard error.

Compare the standard error of this estimator to the standard errors of the estimators in Parts (b) and (c).

(d) Given what you learned in Parts (a) – (c), estimate the rate of spina bifida cases per 1,000 live births, and report a 95% confidence interval. There were 863, 143 live births in upstate New York from 1969 through 1974.

4

Download