HW7F02.doc STAT 557 ASSIGNMENT 7 Reading Assignment: Lloyd, chapter 5, section 6.2 Written Assignment: Due Wednesday, December 11, in class Final Exam: Monday, December 16, 9:45-11:45 am, Carver 101 1. FALL 2002 Between 1977 and 1980 the National Toxicology Program (NTP) of the US Department of Health and Human Services conducted a toxicology and carcinogenesis bioassay of a polybrominated biphenyl mixture (PBB), a flame retardant known as Firemaster F-1. As part of the study, Fisher 334 rats were given 125 oral doses of PBB over a period of six months beginning at about seven weeks after birth. Both male and female rats were used. Occurrence or non-occurrence of a nonlethal lesion, bile duct hyperplasia, was recorded for each of 319 rats for which pathology reports were available. The data are posted in the file bile.dat with one line for each rat. There are seven numbers on each line. The first is the rat identification number. This is followed by values for sex, dose level of PBB, initial weight of the rat (in grams), cage tier, age (in weeks) when the rat died or was sacrificed and examined, the response variable. Sex is coded as 1 = female and 0 = male. The response is coded as 1 = hyperplasis present and 0 = hyperplasia absent. The six dose levels were 0, 0.1, 0.3, 1.0, 3.0, and 10.0. The cage tier variable has values 1, 2, 3, 4, 5 corresponding to the tier in which the rat’s cage was housed during the six-month experiment, with 1 = top, …, 5 = bottom. The SAS code stored in the file bile.sas can be used to read the data file and answer some of the following questions. S-PLUS code for constructing a binary tree is posted in the file bile.ssc. A. First fit the logistic regression model π log i 1− πi = β 0 + β 1 Z 1i + β 2 Z 2i + β 3 Z 3i + β 4 Z 4i + β 5 Z 5i + β 6 Z 6i + β 7 Z 7i + β 8 Z 8i where Z1, Z2, Z3, Z4 are dummy variables corresponding to the five cage tier levels, Z5 denotes sex of the rat, Z6 is initial weight, Z7 is age at examination, Z8 is the dose, and π denotes the probability of bile duct hyperplasia. According to this model, which of the variables appear to be associated with prevalence of bile duct hyperplasia? B. Use the centroid clustering method to form 20 clusters and compute a local mean deviance plot. Submit this plot. What does this plot suggest? C. Construct partial residual plots for Z6, Z7 and Z8. Pass smooth curves through these plots (you do not have to submit these plots, just look at them on your computer screen). What do these plots suggest? HW7F02.doc D. How would you interpret the estimate for β5 in the model in part (A)? E. How would you interpret the estimate of β8 in the model in part (A)? F. Try adding interaction terms Z5Z6, Z5Z7, Z5Z8, Z6Z7, Z6Z8, Z7Z8 to the model in part (A). Search for a good model. G. For the model you selected in part (F), examine diagnostics on leverage and influence of individual cases on estimates of model parameters to determine if there are highly influential cases or outliers. State your conclusions. H. Do whatever additional analyses of these data you think would be useful to select a final model (or models). Report a formula for your model with estimates of parameter values substituted into the formula. Report standard errors for parameter estimates in parentheses beneath the parameter estimates. Write a short paragraph describing your conclusions about the effects of cage tiers, sex, age, initial weight, and level of exposure to PBB on prevalence of bile duct hyperplasia. 2. Milk production can be enhanced by administering a recombinantly derived version of a naturally occurring hormone called bST to cows. Before use of bST could be approved by regulatory agencies in the US and Europe, studies had to be done to examine the effectiveness and safety of the treatment. One study examined the possible effects of the use of bST on the incidence of mastitis in dairy cows. Mastitis is an inflammation of the mammary gland that is commonly treated with antibiotics. Milk from cows treated with antibiotics cannot be sold for human consumption. This study was performed with cows at 11 different locations, in six different regions of the US and in five different countries in Europe. Cows in the US tend to have higher milk production and higher mastitis rates than cows in Europe. Cows that produce more milk tend to have higher rates of mastitis. Farming practices and environmental factors also vary across sites. The parity of the cow may also affect the chance that mastitis occurs. To begin a period of milk production, a cow must conceive and give birth to a calf. A cow that has given birth to only one calf and is in her first lactation period has a parity value of one. A cow that has given birth five times and is in her fifth lactation period has a parity value of five. Mastitis incidence tends to be higher for cows in their first lactation period. Mastitis incidence tends to be lower for cows in later lactation periods and it may be nearly the same for cows of parity two or higher. Treatment with bST could have a more dramatic effect on mastitis rates in parity one cows, than older cows, because of the increased stress resulting from increased milk production. The time at which bST is administered could also affect mastitis rates. In this study, lactation periods were divided into trimesters of 50-99 days, 100-140 days and later than 140 days after the cow gives birth. When it was used, the bST treatment was either given in the first or second trimester. The data are posted in the file mastitis.dat. There is one line for each cow in the study. There are seven numbers on each line corresponding to the following variables: 2 HW7F02.doc Mastitis occurrence 0 = no 1 = yes Treatment 0 = control, no bST treatment 1 = bST treatment Peak milk production kg per day Parity 1 = first lactation period 2 = second lactation period or beyond Trimester when bST is administered 1 = 0-99 days 2 = 100-140 days 3 = later than 140 days 0 = control cow, no bST used Site coded 1 through 11 with sites 5, 6, 7, 8, 9, 10 in the US Analyze these data to determine how the incidence of mastitis is affected by treatment with bST and if the trimester in which bST is administered has any affect. You will want to make allowance for parity and site. SAS code for logistic regression is posted in the file mastitis.sas and SAS code for constructing a CHAID analysis is posted in the file mastchaid.sas. S-PLUS code for logistic regression is posted in the file mastitis.ssc and S-PLUS code for constructing binary trees is posted in the file masttree.ssc. Write a brief report of your findings. This report should consist of three parts: i) Briefly describe the analyses you performed on these data. Explain how you selected a model to describe the relationships between mastitis incidence and the other variables in this study. ii) Report the formula for the model that you think best describes the relationships between mastitis incidence and the other variables in this study. Examine diagnostic statistics and give some justification for your belief in this model. iii) State your conclusions in words that an animal scientist or regulatory official could understand. 3 HW7F02.doc 3. Table 6.1 on Page 301 in Lloyd’s book gives the following data on the relationship of age and marital status of Danes. An individual is classified as divorced if he or she had been divorced at any time prior to the survey, regardless of whether or not that individual is currently remarried. Hence, an individual is classified as single if the individual has never married and an individual is classified as married if the individual has married once and has never been divorced. Lloyd does not say how individuals who are widowed and never divorced are classified. Furthermore, Lloyd treats these data as a simple random sample of 185 individuals from the population of Danes who are at least 16 years old. At the time of the survey, the legal age of marriage in Denmark was 16. Age Group 17 – 21 21 – 25 25 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70+ Xi 19 23 27.5 35 45 55 65 75 Single 17 16 8 6 5 3 2 1 Observed Counts Married Divorced 1 0 8 0 17 1 22 4 21 6 17 8 8 6 3 5 Total 18 24 26 32 32 28 16 9 Lloyd uses the Xi variable to model trends across age. We will do the same, although it would be better to have the actual age of each respondent. These data have been stored in the file dmstatus.dat. S-PLUS code for completing this problem has been posted in the file dmstatus.ssc and SAS code has been posted in the file dmstatus.sas. The graphs produced by the S-PLUS code are better and they are easier to produce. A. Using marriage as the baseline category, fit the following polychotomous logistic regression model. πsin gle, i = α1 + β1 (X i − 16) log π married, i π divorced, i = α 2 + β 2 (X i − 16) log π married, i Report a value of a log-likelihood ratio G2 test, and its degrees of freedom and p-value, for testing the fit of this model against the general alternate of eight different and independent multinomial distributions for the eight age groups. Examine the plot of the estimated curves against age. This plot also shows the observed proportions of respondents in the three response categories at each level. Does this model appear to be appropriate for the data? 4 HW7F02.doc B. Does the model π sin gle, i = α1 + β1 (X i − 16) + γ1 (X i − 16 )2 log π married, i πdivorced, i = α 2 + β 2 (X i − 16) + γ 2 (X i − 16)2 log π married, i appear to be appropriate for these data? C. Does the model π sin gle, i = α1 + β1 log(X i − 16 ) log π married, i π divorced, i = α 2 + β 2 log(X i − 16) log π married, i appear to be appropriate for these data? D. Does the model πsin gle, i = α1 + β1 log(X i − 16) + γ1 [log(X i − 16)]2 log π married, i π divorced, i = α 2 + β 2 log(X i − 16 ) + γ 2 [log(X i − 16)]2 log π married, i appear to fit these data? E. Which of the models from parts A-D, if any, provides an adequate description of the relationships between marital status and age in Denmark? Describe these relationships. 5