STA 695 Final Angela Schoergendorfer May 5, 2009 1) Weibull regression model I fit a Weibull regression model with the effects age, age^2 and t5. The results of the model show that the effect of t5 is not significant (p=.209). For age, the linear as well as the quadratic effect are significant (p=.0066 and =.0008, respectively). I therefore refit the model (model 1.2) without the effect of t5, and will use this model for the comparison in part 3. > model1 = survreg(Surv(time,status)~age + I(age^2) + t5, data=stan) > summary(model1) Call: survreg(formula = Surv(time, status) ~ age + I(age^2) + t5, data = stan) Value Std. Error z p (Intercept) 4.66230 1.66855 2.79 5.20e-03 age 0.24207 0.08915 2.72 6.62e-03 I(age^2) -0.00388 0.00116 -3.34 8.26e-04 t5 -0.37258 0.29670 -1.26 2.09e-01 Log(scale) 0.47114 0.08434 5.59 2.32e-08 Scale= 1.60 Weibull distribution Loglik(model)= -760.4 Loglik(intercept only)= -770.7 Chisq= 20.62 on 3 degrees of freedom, p= 0.00013 Number of Newton-Raphson Iterations: 5 n=153 (25 observations deleted due to missingness) > model1.2 = survreg(Surv(time,status)~age + I(age^2) , data=stan) > summary(model1.2) Call: survreg(formula = Surv(time, status) ~ age + I(age^2), data = stan) Value Std. Error z p (Intercept) 4.76093 1.59528 2.98 2.84e-03 age 0.20988 0.08645 2.43 1.52e-02 I(age^2) -0.00346 0.00113 -3.06 2.22e-03 Log(scale) 0.47509 0.07934 5.99 2.12e-09 Scale= 1.61 Weibull distribution Loglik(model)= -843.9 Loglik(intercept only)= -852.9 Chisq= 18.09 on 2 degrees of freedom, p= 0.00012 Number of Newton-Raphson Iterations: 5 n= 178 2) Cox Proportional Hazard Regression Fitting a Cox proportional hazards regression model with the covariates age, age^2 and t5, we find that just as with the Weibull regression model, the linear and quadratic effect are significant (p=.0079 and p=.0012, respectively), while the effect of t5 is not significant (p=.2300). > model2 = coxph(Surv(time,status)~ age + I(age^2) + t5 , data=stan) > summary(model2) Call: coxph(formula = Surv(time, status) ~ age + I(age^2) + t5, data = stan) n=153 (25 observations deleted due to missingness) coef exp(coef) se(coef) z p age -0.14747 0.863 0.055493 -2.66 0.0079 I(age^2) 0.00234 1.002 0.000721 3.25 0.0012 t5 0.22466 1.252 0.186847 1.20 0.2300 age I(age^2) t5 exp(coef) exp(-coef) lower .95 upper .95 0.863 1.159 0.774 0.962 1.002 0.998 1.001 1.004 1.252 0.799 0.868 1.806 Rsquare= 0.116 (max possible= Likelihood ratio test= 18.9 on Wald test = 21.3 on Score (logrank) test = 21.5 on 0.996 ) 3 df, p=0.000288 3 df, p=9.07e-05 3 df, p=8.24e-05 Prediction The results of estimating the survival curve and 95% (pointwise) confidence interval for a patient who is 45 years old and has a t5-mismatch score of 1.3 are plotted in Figure 1. To estimate the cumulative hazard for this patient, I centered age and t5 by subtracting the mean age and t5-score of all observations in the dataset. The patient has a slightlylower hazard of dying than the baseline hazard (see Figure 2), which is a patient of mean age and mean mismatch score t5. # predicted survival curve for patient with age=45, t5 = 1.3 newdata = data.frame(age=c(45),t5=c(1.3)) fitpred2 = survfit(model2,newdata) # predicted cumulative hazard baseH = basehaz(model2) meanage = mean(stan$age, na.rm=T) meant5 = mean(stan$t5, na.rm=T) predH = baseH$hazard * exp(sum(model2$coef*c(45-meanage,(45)^2-meanage2,1.3-meant5))) Figure 1 3) Figure 2 Cox Proportional Hazard Regression without t5 Model3 refits the Cox PH regression without the covariate t5. > model3 = coxph(Surv(time,status)~ age + I(age^2), data=stan) > summary(model3) Call: coxph(formula = Surv(time, status) ~ age + I(age^2), data = stan) n= 178 coef exp(coef) se(coef) z p age -0.12488 0.883 0.054043 -2.31 0.0210 I(age^2) 0.00205 1.002 0.000705 2.91 0.0037 age I(age^2) exp(coef) exp(-coef) lower .95 upper .95 0.883 1.133 0.794 0.981 1.002 0.998 1.001 1.003 Rsquare= 0.088 (max possible= Likelihood ratio test= 16.5 on Wald test = 18.6 on Score (logrank) test = 18.6 on 0.996 ) 2 df, p=0.000267 2 df, p=9e-05 2 df, p=9.22e-05 Prediction Figure 3 shows the predicted median survival time over age., figure 4 shows the observed (or censored) survival time over age (all plots that follow are plotted on the same scale of the y-axis to make comparisons easier). As could be expected, there is a lot of scatter in the observed data compared to the predicted curve. In the plot of predicted median survival time we can see that until 30 the median survival time increases (from 431 for age 12 to 1634 for age 30-32). After that it steadily decreases. # predict each patients’ survival time fitpred3=survfit(model3,newdata=stan) median_loc = apply(fitpred3$surv,2,find.median) find.median = function(x) return(sum(x>=.5)) median_pred = fitpred3$time[median_loc+1] Figure 3 Figure 4 Comparison to Weibull model without t5 Figure 5 plots the predicted median survival time from the Weibull regression model without t5 (model 1.2) as a solid line, and places the pointwise predicted median survival from the Cox PH model over it. The general shape of the predicted curve is very similar, in that survival time increases until age 30 and then steadily decreases. There are slight deviations around age 20 and age 40, where median survival time predicted by the Cox PH model is slightly higher than that by the Weibull regression. Overall the results of the two models agree, both in detecting age and ag^2 as significant, t5 as nonsignificant, and in the overall prediction of the median survival time. Figure 5 4) SAS Analysis (103 patients) The first model contains the variable ‘plant’ and ‘surg’, along with the covariates age, age^2, and their interactions with ‘plant’. Neither of the interactions is significant, so I removed them one at a time. But even after removing the interaction, age^2 was not significant, so that I ended up with the same model as on p139 of the book. The PHREG Procedure Model Information Data Set WORK.STAN Dependent Variable surv1 Censoring Variable dead Censoring Value(s) 0 Ties Handling EFRON Number of Observations Read Number of Observations Used 103 103 Summary of the Number of Event and Censored Values Total Event Censored Percent Censored 103 75 28 27.18 Analysis of Maximum Likelihood Estimates DF Parameter Estimate Standard Error Chi-Square Pr > ChiSq plant 1 -1.15579 3.36472 0.1180 0.7312 . surg 1 -0.70131 0.36712 3.6492 0.0561 0.496 ageaccpt 1 -0.04130 0.07779 0.2819 0.5954 . ageaccpt_sq 1 0.0007543 0.00103 0.5408 0.4621 . plant*ageaccpt 1 0.02603 0.15631 0.0277 0.8677 . plant * ageaccpt plant*ageaccpt_sq 1 -0.0000396 0.00181 0.0005 0.9826 . plant * ageaccpt_sq Parameter Hazard Ratio Label Model after interaction have been removed: Analysis of Maximum Likelihood Estimates DF Parameter Estimate Standard Error Chi-Square Pr > ChiSq Hazard Ratio plant 1 -0.04785 0.30526 0.0246 0.8754 0.953 surg 1 -0.70043 0.36492 3.6842 0.0549 0.496 ageaccpt 1 -0.05362 0.06364 0.7098 0.3995 0.948 ageaccpt_sq 1 0.00103 0.0007718 1.7815 0.1820 1.001 Parameter Final model with only one significant covariate (apart from time-dependent covariate ‘plant’ and variable of interest ‘surg’) Analysis of Maximum Likelihood Estimates DF Parameter Estimate Standard Error Chi-Square Pr > ChiSq Hazard Ratio plant 1 -0.04614 0.30276 0.0232 0.8789 0.955 surg 1 -0.77143 0.35961 4.6019 0.0319 0.462 ageaccpt 1 0.03109 0.01391 4.9948 0.0254 1.032 Parameter This dataset is not the same as in the previous analysis (1-3), and the variables considered are different, so the results can not be expected to be the same. Still, one might still expect the quadratic effect of age to be significant. In this dataset however, there are only 5 observations with age less than 25 (compared to 19 in the previous dataset). It seems reasonable that 5 are too few observations to detect a significant decrease in expected survival time in this age range compared to older ages. This might explain why the covariate age^2 is not significant in this model. 5) R code library(survival) data(stanford2) # delete observations where survival time <5 stan = stanford2[(stanford2$time>=5),] ## 1 ## ## Weibull regression model (t5 ... mismatch score) model1 = survreg(Surv(time,status)~age + I(age^2) + t5, data=stan) summary(model1) # leave out t5 model1.2 = survreg(Surv(time,status)~age + I(age^2) , data=stan) summary(model1.2) ## 2 ## ## Cox PH regression model2 = coxph(Surv(time,status)~ age + I(age^2) + t5 , data=stan) summary(model2) plot(survfit(model2)) # predicted survival curve for patient with age=45, t5 = 1.3 newdata = data.frame(age=c(45),t5=c(1.3)) fitpred2 = survfit(model2,newdata) par(new=T) plot(fitpred2, main="Predicted Survival for Patient Age = 45, t5 = 1.3", col=2) legend(2000,1, legend=c("baseline","predicted"),col=c(1,2), lty=1) # predicted cumulative hazard baseH = basehaz(model2, centered=T) meanage = mean(stan$age, na.rm=T) meanage2 = mean(stan$age^2,na.rm=T) meant5 = mean(stan$t5, na.rm=T) predH = baseH$hazard * exp(sum(model2$coef*c(45-meanage,(45)^2-meanage2,1.3-meant5))) plot(baseH$time, baseH$hazard, type="l", main="Estimated Baseline Cumulative Hazard \n Predicted Cumulative Hazard for Patient Age = 45, t5 = 1.3", xlab = "Time", ylab="Cumulative Hazard") lines(baseH$time,predH, type="l", lty=2) legend(0,1.8, legend=c("baseline","predicted"),lty=c(1,2)) ## 3 ## ## 2nd Cox PH model model3 = coxph(Surv(time,status)~ age + I(age^2), data=stan) summary(model3) fit3 = survfit(model3) plot(fit3) # predict each patients' survival time fitpred3=survfit(model3,newdata=stan) find.median = function(x) return(sum(x>=.5)) median_loc = apply(fitpred3$surv,2,find.median) median_pred = fitpred3$time[median_loc+1] cbind(stan$id,median_pred) # predicted median survival over age plot(stan$age, median_pred, xlab="Age", ylab="Predicted median survival", main="Predicted median survival time over age", ylim=c(0,3700)) # observed survival over age plot(stan$age[stan$status==1], stan$time[stan$status==1], pch=20, ylim=c(0,3700), main="Observed survival over Age", xlab="Age", ylab="observed time") points(stan$age[stan$status==0], stan$time[stan$status==0]) legend(11,3650,legend=c("observed death", "censored"), pch=c(20,1)) # predicted vs. ovserved plot(median_pred[stan$status==1], stan$time[stan$status==1], pch=20, ylim=c(0,3700), main="Observed survival over predicted median", xlab="Predicted Median survival", ylab="observed time") points(median_pred[stan$status==0], stan$time[stan$status==0]) legend(100,3650,legend=c("observed death", "censored"), pch=c(20,1)) abline(1,1) # comparison to model 1.2 median_weibull = predict(model1.2, newdata=stan, type="quantile", p=.5) plot(stan$age, median_weibull, xlab="Age", ylab="Predicted median survival", main="Predicted median survival time over age \n Weibull model and Cox PH (no t5)", type="l", ylim=c(0,3700)) points(stan$age, median_pred) legend(50,3000,legend=c("Weibull","Cox PH"), pch=c(NA,1), lty=c(1,NA)) plot(median_weibull, median_pred, xlab="Predicted Median Weibull", ylab="Predicted Median Cox PH", main="Cox PH vs. Weibull - predicted medians") abline(1,1) # comparison to model 1 median_weibull = predict(model1, newdata=stan, type="quantile", p=.5) plot(stan$age[!is.na(stan$t5)], median_weibull, xlab="Age", ylab="Predicted median survival", main="Predicted median survival time over age \n Weibull model (with t5) and Cox PH (no t5)", ylim=c(0,3700)) points(stan$age[!is.na(stan$t5)], median_pred[!is.na(stan$t5)], pch=20) legend(50,3000,legend=c("Weibull","Cox PH"), pch=c(1,20),) plot(median_weibull, median_pred[!is.na(stan$t5)], xlab="Predicted Median Weibull", ylab="Predicted Median Cox PH", main="Cox PH vs. Weibull - predicted medians") abline(1,1) 6) SAS Code data stan; set stan; ageaccpt_sq= ageaccpt*ageaccpt; run; proc phreg data=stan; model surv1*dead(0) = plant surg ageaccpt ageaccpt_sq plant*ageaccpt plant*ageaccpt_sq/ ties=efron; if wait>surv1 or wait=. then plant=0; else plant=1; run; proc phreg data=stan; model surv1*dead(0) = plant surg ageaccpt plant*ageaccpt / ties=efron; if wait>surv1 or wait=. then plant=0; else plant=1; run; proc phreg data=stan; model surv1*dead(0) = plant surg ageaccpt ageaccpt_sq/ ties=efron; if wait>surv1 or wait=. then plant=0; else plant=1; run; proc phreg data=stan; model surv1*dead(0) = plant surg ageaccpt / ties=efron; if wait>surv1 or wait=. then plant=0; else plant=1; run; proc gplot data=stan; plot surv1*ageaccpt=dead; run;