Stat 565 Assignment 5 Fall 2005

advertisement
Stat 565
Assignment 5
Fall 2005
Reading Assignment: Diggle, Heagerty, Liang, and Zeger (DHLZ), Chapters 1, 3, 4, 5
Written Assignment: Due in class on Tuesday, November 14
Final Exam:
Monday, December 12, 7:30-9:30 am
1. Copelan et al (1991) describe a multicenter study of risk factors on various outcomes of
bone marrow transplants for leukemia patients. Patients were classified into three groups:
acute lymphoblastic leukemia (ALL) patients, low risk acute myelotic leukemia (AML)
patients (first remission), and high risk AML patients (second or greater relapse and
remission). Intermediate events such as the development of acute graft-versus-host disease
(GVHD) can affect the outcome. Some patients in this study were given a GVHD
prophylactic containing methotrexate (MTX). We will focus on the effects of the
following covariates on disease free survival:
Z1 = (Patient age at time of transplant) - 28 years
Z 2 = (Donor age at time of transplant) - 28 years
Z3 = Z1 × Z2 (Patient × Donor age interaction)
⎧1 for low risk acute myelotic leukemia (AML) patients
Z4 = ⎨
⎩0 otherwise
⎧1 for high risk acute myelotic leukemia (AML) patients
Z5 = ⎨
⎩0 otherwise
⎧1 for French-American-British (FAB) morphology score of 4 or 5 and AML
Z6 = ⎨
⎩0 otherwise
Z7 = (Waiting time to transplant in months)
⎧1 treated with MTX as a GVHD prophylatic
Z8 = ⎨
⎩0 otherwise
The data are posted as BMT2.txt on the assignment section of the course webpage. There is
one line of data for each subject in this file. Each line of data contains information on the
variables in the following order:
ID
Subject identification number
group
disease group (1=ALL, 2=low risk AML, 3=high risk AML)
time
disease free survival time in days (death, relapse, or end of study)
status
disease free survival indicator(1=dead or relapsed, 0=censored)
page
Patient age (in years) at time of transplant
dage
Donor age (in years) at time of transplant
wait
Waiting time to transplant in months
fab
(1= FAB score of 4 or 5 and AML , 0=otherwise)
mtx
(1=treatment with MTX, 0=otherwise)
Code for entering these data into SAS is posted as BMT2.sas, and code for entering test
data into an R data frame is posted as BMT2.R. In analyzing these data, you can assume
that observations taken on different patients are independent.
(a) Fit a proportional hazards model using Z1 , Z 2 , Z3 , Z 4 , Z5 , Z6 , Z7 , Z8 as covariates.
Which variables in this model have a significant association with disease free survival
time?
(c) Plot dfbeta residuals for the model in part (a) to determine if there are any highly
influential subjects?
(d) Plot the scaled Schoenfeld residuals against time. State your conclusions.
(e) Explore how continues variables should be used in the model by examining martingale
residuals. State your conclusions.
(f) Present the model that you feel best describes the effects of disease group, patient age,
donor age, waiting time, FAB score, and use of MTX on disease free survival time.
2. Recurring infections are a common problem for kidney dialysis patients. McGilchrist and
Aisbett (1991) report data on the recurrence times of infections for 38 kidney patients who
used the same type of portable dialysis machine. Time to recurrence is the time (in days)
between the end of the previous infection and the beginning of the current infection. Two
times to recurrence of infection at the site of insertion of the catheter placement
(T1,T2 ) and the corresponding censoring indicators (δ1, δ2 ) were recorded for each patient.
Information on age, gender and type of kidney disease were also recorded for each patient,
The data are posted in the file kidney_infection.dat on the assignment section of the course
web page. There is one line of data for each patient. Values for the variables appear in
the following order:
ID
Patient identification number
AGE Age of the patient (in years)
SEX
Gender of the patient (0=male, 1=female)
2
GN
Indicator of disease type GN (0=no, 1=yes)
AN
Indicator of disease type AN (0=no, 1=yes)
PKD
Indicator of disease type PKD (0=no, 1=yes)
Time Time (in days) to recurrence of infection or censoring
Status Censuring indicator (0=censored time, 1=infection recurrence time)
Recur (1=first infection period, 2=second infection period )
Ages range from 10 to 69 years. Use a new variable AGE10=AGE-10 instead of AGE in
the model, so the baseline hazard corresponds to a 10 year-old male patient with none of
three types of disease. Code for entering these data into SAS is posted as
kidney_infection.sas, and code for entering test data into an R data frame is posted as
kidney_infection.R. In analyzing these data, you can assume that observations taken on
different patients are independent.
(a) Assuming that the two times recorded for the same patient are independent, fit a Cox
proportional hazards model to these data using AGE10, SEX, AN, GN, and PKD as the
covariates. Report the values of estimates of the regression parameters and their standard
errors. You may call this the Independence Working Model (IWM), but it will not be the
correct model if within patient recurrence times are correlated. Nevertheless, it will
provide consistent estimates of the regression parameters if the marginal proportional
hazards model is correct, but the standard errors will tend to be too small if within patient
recurrence times have a positive correlation.
(b) Now fit the marginal proportional hazards model and compute a robust estimate of the
covariance matrix of the parameter estimates to account for correlation among within
patient recurrence times. Report estimates of parameters and their standard errors. How
do the robust estimates of the standard errors differ from those based on the assumption
of completely independent failure times? Are any of the differences large enough to
change inferences about the significance of any of the covariates, relative to the
inferences provided by the IWM.
(c) Using the same set of covariates as in parts (a) and (b), fit a gamma frailty model to the
data with a separate random effect for each patient? How do the standard errors for the
regression parameter estimates provided by the frailty model differ from the robust
standard errors obtained in part (b)? Do the parameter estimates differ from those
obtained in part (b)?
(d) Using the results from part (c), interpret the effects of age, gender, and each of the three
types of disease on the risk of recurrence of infection. It is not enough to simply say that
an effect is statistically significant. Describe the direction and magnitude of the effect.
(e) Using the frailty model in part (c), test the null hypothesis of no association among within
patient failure times. State your conclusion.
3
Download