Stat 565 Assignment 5 Fall 2005 Reading Assignment: Diggle, Heagerty, Liang, and Zeger (DHLZ), Chapters 1, 3, 4, 5 Written Assignment: Due in class on Tuesday, November 14 Final Exam: Monday, December 12, 7:30-9:30 am 1. Copelan et al (1991) describe a multicenter study of risk factors on various outcomes of bone marrow transplants for leukemia patients. Patients were classified into three groups: acute lymphoblastic leukemia (ALL) patients, low risk acute myelotic leukemia (AML) patients (first remission), and high risk AML patients (second or greater relapse and remission). Intermediate events such as the development of acute graft-versus-host disease (GVHD) can affect the outcome. Some patients in this study were given a GVHD prophylactic containing methotrexate (MTX). We will focus on the effects of the following covariates on disease free survival: Z1 = (Patient age at time of transplant) - 28 years Z 2 = (Donor age at time of transplant) - 28 years Z3 = Z1 × Z2 (Patient × Donor age interaction) ⎧1 for low risk acute myelotic leukemia (AML) patients Z4 = ⎨ ⎩0 otherwise ⎧1 for high risk acute myelotic leukemia (AML) patients Z5 = ⎨ ⎩0 otherwise ⎧1 for French-American-British (FAB) morphology score of 4 or 5 and AML Z6 = ⎨ ⎩0 otherwise Z7 = (Waiting time to transplant in months) ⎧1 treated with MTX as a GVHD prophylatic Z8 = ⎨ ⎩0 otherwise The data are posted as BMT2.txt on the assignment section of the course webpage. There is one line of data for each subject in this file. Each line of data contains information on the variables in the following order: ID Subject identification number group disease group (1=ALL, 2=low risk AML, 3=high risk AML) time disease free survival time in days (death, relapse, or end of study) status disease free survival indicator(1=dead or relapsed, 0=censored) page Patient age (in years) at time of transplant dage Donor age (in years) at time of transplant wait Waiting time to transplant in months fab (1= FAB score of 4 or 5 and AML , 0=otherwise) mtx (1=treatment with MTX, 0=otherwise) Code for entering these data into SAS is posted as BMT2.sas, and code for entering test data into an R data frame is posted as BMT2.R. In analyzing these data, you can assume that observations taken on different patients are independent. (a) Fit a proportional hazards model using Z1 , Z 2 , Z3 , Z 4 , Z5 , Z6 , Z7 , Z8 as covariates. Which variables in this model have a significant association with disease free survival time? (c) Plot dfbeta residuals for the model in part (a) to determine if there are any highly influential subjects? (d) Plot the scaled Schoenfeld residuals against time. State your conclusions. (e) Explore how continues variables should be used in the model by examining martingale residuals. State your conclusions. (f) Present the model that you feel best describes the effects of disease group, patient age, donor age, waiting time, FAB score, and use of MTX on disease free survival time. 2. Recurring infections are a common problem for kidney dialysis patients. McGilchrist and Aisbett (1991) report data on the recurrence times of infections for 38 kidney patients who used the same type of portable dialysis machine. Time to recurrence is the time (in days) between the end of the previous infection and the beginning of the current infection. Two times to recurrence of infection at the site of insertion of the catheter placement (T1,T2 ) and the corresponding censoring indicators (δ1, δ2 ) were recorded for each patient. Information on age, gender and type of kidney disease were also recorded for each patient, The data are posted in the file kidney_infection.dat on the assignment section of the course web page. There is one line of data for each patient. Values for the variables appear in the following order: ID Patient identification number AGE Age of the patient (in years) SEX Gender of the patient (0=male, 1=female) 2 GN Indicator of disease type GN (0=no, 1=yes) AN Indicator of disease type AN (0=no, 1=yes) PKD Indicator of disease type PKD (0=no, 1=yes) Time Time (in days) to recurrence of infection or censoring Status Censuring indicator (0=censored time, 1=infection recurrence time) Recur (1=first infection period, 2=second infection period ) Ages range from 10 to 69 years. Use a new variable AGE10=AGE-10 instead of AGE in the model, so the baseline hazard corresponds to a 10 year-old male patient with none of three types of disease. Code for entering these data into SAS is posted as kidney_infection.sas, and code for entering test data into an R data frame is posted as kidney_infection.R. In analyzing these data, you can assume that observations taken on different patients are independent. (a) Assuming that the two times recorded for the same patient are independent, fit a Cox proportional hazards model to these data using AGE10, SEX, AN, GN, and PKD as the covariates. Report the values of estimates of the regression parameters and their standard errors. You may call this the Independence Working Model (IWM), but it will not be the correct model if within patient recurrence times are correlated. Nevertheless, it will provide consistent estimates of the regression parameters if the marginal proportional hazards model is correct, but the standard errors will tend to be too small if within patient recurrence times have a positive correlation. (b) Now fit the marginal proportional hazards model and compute a robust estimate of the covariance matrix of the parameter estimates to account for correlation among within patient recurrence times. Report estimates of parameters and their standard errors. How do the robust estimates of the standard errors differ from those based on the assumption of completely independent failure times? Are any of the differences large enough to change inferences about the significance of any of the covariates, relative to the inferences provided by the IWM. (c) Using the same set of covariates as in parts (a) and (b), fit a gamma frailty model to the data with a separate random effect for each patient? How do the standard errors for the regression parameter estimates provided by the frailty model differ from the robust standard errors obtained in part (b)? Do the parameter estimates differ from those obtained in part (b)? (d) Using the results from part (c), interpret the effects of age, gender, and each of the three types of disease on the risk of recurrence of infection. It is not enough to simply say that an effect is statistically significant. Describe the direction and magnitude of the effect. (e) Using the frailty model in part (c), test the null hypothesis of no association among within patient failure times. State your conclusion. 3