HW7F02.doc Reading Assignment: Lloyd, chapter 5, section 6.2

advertisement
HW7F02.doc
STAT 557
ASSIGNMENT 7
Reading Assignment:
Lloyd, chapter 5, section 6.2
Written Assignment:
Due Wednesday, December 11, in class
Final Exam:
Monday, December 16, 9:45-11:45 am, Carver 101
1.
FALL 2002
Between 1977 and 1980 the National Toxicology Program (NTP) of the US Department of Health
and Human Services conducted a toxicology and carcinogenesis bioassay of a polybrominated
biphenyl mixture (PBB), a flame retardant known as Firemaster F-1. As part of the study, Fisher
334 rats were given 125 oral doses of PBB over a period of six months beginning at about seven
weeks after birth. Both male and female rats were used. Occurrence or non-occurrence of a
nonlethal lesion, bile duct hyperplasia, was recorded for each of 319 rats for which pathology
reports were available. The data are posted in the file
bile.dat
with one line for each rat. There are seven numbers on each line. The first is the rat
identification number. This is followed by values for sex, dose level of PBB, initial weight of the
rat (in grams), cage tier, age (in weeks) when the rat died or was sacrificed and examined, the
response variable. Sex is coded as 1 = female and 0 = male. The response is coded as 1 =
hyperplasis present and 0 = hyperplasia absent. The six dose levels were 0, 0.1, 0.3, 1.0, 3.0, and
10.0. The cage tier variable has values 1, 2, 3, 4, 5 corresponding to the tier in which the rat’s
cage was housed during the six-month experiment, with 1 = top, …, 5 = bottom. The SAS code
stored in the file bile.sas can be used to read the data file and answer some of the following
questions. S-PLUS code for constructing a binary tree is posted in the file bile.ssc.
A.
First fit the logistic regression model
 π
log i
 1− πi

 = β 0 + β 1 Z 1i + β 2 Z 2i + β 3 Z 3i + β 4 Z 4i + β 5 Z 5i + β 6 Z 6i + β 7 Z 7i + β 8 Z 8i

where Z1, Z2, Z3, Z4 are dummy variables corresponding to the five cage tier levels, Z5
denotes sex of the rat, Z6 is initial weight, Z7 is age at examination, Z8 is the dose, and π
denotes the probability of bile duct hyperplasia. According to this model, which of the
variables appear to be associated with prevalence of bile duct hyperplasia?
B.
Use the centroid clustering method to form 20 clusters and compute a local mean deviance
plot. Submit this plot. What does this plot suggest?
C.
Construct partial residual plots for Z6, Z7 and Z8. Pass smooth curves through these plots
(you do not have to submit these plots, just look at them on your computer screen). What do
these plots suggest?
HW7F02.doc
D.
How would you interpret the estimate for β5 in the model in part (A)?
E.
How would you interpret the estimate of β8 in the model in part (A)?
F.
Try adding interaction terms Z5Z6, Z5Z7, Z5Z8, Z6Z7, Z6Z8, Z7Z8 to the model in part (A).
Search for a good model.
G.
For the model you selected in part (F), examine diagnostics on leverage and influence of
individual cases on estimates of model parameters to determine if there are highly influential
cases or outliers. State your conclusions.
H. Do whatever additional analyses of these data you think would be useful to select a final
model (or models). Report a formula for your model with estimates of parameter values
substituted into the formula. Report standard errors for parameter estimates in parentheses
beneath the parameter estimates. Write a short paragraph describing your conclusions about
the effects of cage tiers, sex, age, initial weight, and level of exposure to PBB on prevalence
of bile duct hyperplasia.
2.
Milk production can be enhanced by administering a recombinantly derived version of a naturally
occurring hormone called bST to cows. Before use of bST could be approved by regulatory
agencies in the US and Europe, studies had to be done to examine the effectiveness and safety of
the treatment. One study examined the possible effects of the use of bST on the incidence of
mastitis in dairy cows. Mastitis is an inflammation of the mammary gland that is commonly
treated with antibiotics. Milk from cows treated with antibiotics cannot be sold for human
consumption.
This study was performed with cows at 11 different locations, in six different regions of the US
and in five different countries in Europe. Cows in the US tend to have higher milk production
and higher mastitis rates than cows in Europe. Cows that produce more milk tend to have higher
rates of mastitis. Farming practices and environmental factors also vary across sites.
The parity of the cow may also affect the chance that mastitis occurs. To begin a period of milk
production, a cow must conceive and give birth to a calf. A cow that has given birth to only one
calf and is in her first lactation period has a parity value of one. A cow that has given birth five
times and is in her fifth lactation period has a parity value of five. Mastitis incidence tends to be
higher for cows in their first lactation period. Mastitis incidence tends to be lower for cows in
later lactation periods and it may be nearly the same for cows of parity two or higher. Treatment
with bST could have a more dramatic effect on mastitis rates in parity one cows, than older cows,
because of the increased stress resulting from increased milk production.
The time at which bST is administered could also affect mastitis rates. In this study, lactation
periods were divided into trimesters of 50-99 days, 100-140 days and later than 140 days after the
cow gives birth. When it was used, the bST treatment was either given in the first or second
trimester.
The data are posted in the file mastitis.dat. There is one line for each cow in the study. There
are seven numbers on each line corresponding to the following variables:
2
HW7F02.doc
Mastitis occurrence
0 = no
1 = yes
Treatment
0 = control, no bST treatment
1 = bST treatment
Peak milk production
kg per day
Parity
1 = first lactation period
2 = second lactation period or beyond
Trimester when bST is administered
1 = 0-99 days
2 = 100-140 days
3 = later than 140 days
0 = control cow, no bST used
Site
coded 1 through 11 with sites
5, 6, 7, 8, 9, 10 in the US
Analyze these data to determine how the incidence of mastitis is affected by treatment with bST
and if the trimester in which bST is administered has any affect. You will want to make
allowance for parity and site. SAS code for logistic regression is posted in the file mastitis.sas
and SAS code for constructing a CHAID analysis is posted in the file mastchaid.sas. S-PLUS
code for logistic regression is posted in the file mastitis.ssc and S-PLUS code for constructing
binary trees is posted in the file masttree.ssc.
Write a brief report of your findings. This report should consist of three parts:
i)
Briefly describe the analyses you performed on these data. Explain how you selected a
model to describe the relationships between mastitis incidence and the other variables in this
study.
ii)
Report the formula for the model that you think best describes the relationships between
mastitis incidence and the other variables in this study. Examine diagnostic statistics and
give some justification for your belief in this model.
iii) State your conclusions in words that an animal scientist or regulatory official could
understand.
3
HW7F02.doc
3. Table 6.1 on Page 301 in Lloyd’s book gives the following data on the relationship of age and
marital status of Danes. An individual is classified as divorced if he or she had been divorced at
any time prior to the survey, regardless of whether or not that individual is currently remarried.
Hence, an individual is classified as single if the individual has never married and an individual is
classified as married if the individual has married once and has never been divorced. Lloyd does
not say how individuals who are widowed and never divorced are classified. Furthermore, Lloyd
treats these data as a simple random sample of 185 individuals from the population of Danes who
are at least 16 years old. At the time of the survey, the legal age of marriage in Denmark was 16.
Age Group
17 – 21
21 – 25
25 – 30
30 – 40
40 – 50
50 – 60
60 – 70
70+
Xi
19
23
27.5
35
45
55
65
75
Single
17
16
8
6
5
3
2
1
Observed Counts
Married
Divorced
1
0
8
0
17
1
22
4
21
6
17
8
8
6
3
5
Total
18
24
26
32
32
28
16
9
Lloyd uses the Xi variable to model trends across age. We will do the same, although it would be
better to have the actual age of each respondent. These data have been stored in the file
dmstatus.dat. S-PLUS code for completing this problem has been posted in the file
dmstatus.ssc and SAS code has been posted in the file dmstatus.sas. The graphs produced by
the S-PLUS code are better and they are easier to produce.
A. Using marriage as the baseline category, fit the following polychotomous logistic regression
model.
 πsin gle, i 
 = α1 + β1 (X i − 16)
log
 π married, i 


 π divorced, i 
 = α 2 + β 2 (X i − 16)
log
 π married, i 


Report a value of a log-likelihood ratio G2 test, and its degrees of freedom and p-value, for
testing the fit of this model against the general alternate of eight different and independent
multinomial distributions for the eight age groups. Examine the plot of the estimated curves
against age. This plot also shows the observed proportions of respondents in the three response
categories at each level. Does this model appear to be appropriate for the data?
4
HW7F02.doc
B. Does the model
 π sin gle, i 
 = α1 + β1 (X i − 16) + γ1 (X i − 16 )2
log
 π married, i 


 πdivorced, i 
 = α 2 + β 2 (X i − 16) + γ 2 (X i − 16)2
log
 π married, i 


appear to be appropriate for these data?
C. Does the model
 π sin gle, i 
 = α1 + β1 log(X i − 16 )
log
 π married, i 


 π divorced, i 
 = α 2 + β 2 log(X i − 16)
log
 π married, i 


appear to be appropriate for these data?
D. Does the model
 πsin gle, i 
 = α1 + β1 log(X i − 16) + γ1 [log(X i − 16)]2
log
 π married, i 


 π divorced, i 
 = α 2 + β 2 log(X i − 16 ) + γ 2 [log(X i − 16)]2
log
 π married, i 


appear to fit these data?
E. Which of the models from parts A-D, if any, provides an adequate description of the
relationships between marital status and age in Denmark? Describe these relationships.
5
Download