Diagnostics of Cox Proportional Hazards Models In Cox model, there is a baseline hazard function h0(t) which is modified multiplicatively by covariates (including group indicators), so hazard function for any individual case is h(t)=h0(t)exp(βTx). Log-Log Survival Plot Plotting log-log transformation of the K-M survival curve, namely, log-log(S(t)), versus time t provides a way of assessing the PH model assumption. Since log[H( t | X*)] - log [H( t | X)] = βT ( X* - X) by PH model assumption, we would see the difference between the two log-log K-M curves is constant over t . This method works when we have categorical covariates involved. Remember that you can always categorize a continuous variable using cut-points. If PH holds, we should get parallel curves for different groups. We make this plot first for a data set generated from a PH model and then for a data set not satisfying PH assumption. par(mfrow=c(2,1), oma=rep(4,4)) # “data1” is generated from rtime2. Note that the default setting for rtimr2() will generate data from exp(1) model # z2 is ordinal with 2 categories. data1 <- rtime2(N=300, p.censor=0.3) data1.surv<-survfit(Surv(time, status)~factor(z2), data=data1) plot(data1.surv, lty=c(3,2), fun="cloglog", xlab="log(time)", ylab="log-log S(t)",main="Exponential(PH) Model") legend(10, -3, c("z2=0", "z2=1"), lty=c(3,2)) # Here “data” is simulated using rtime3 – the bimodal lognormal model rtime3 <- function(N=100, beta=c(2, log(2)), mu=1, p.censor=0.3) { w <- 1:N for (i in 1:N) { w[i] <- sample(c(rnorm(1, 1, 0.15), rnorm(1, 3, .15)), 1, replace=F) } z1 <- sample(c(0,1),N, replace=T) z2 <- runif(N, 0, 1) if (p.censor==0) status <- 1 else if (0 < p.censor && p.censor < 1) { status <- sample(c(0,1), size=N, replace=T, prob=c(p.censor,1-p.censor))} else stop("Wrong Argument in p.censor.") data.frame(time=exp(mu + beta[1]*z1 + beta[2]*z2 + w), status=status, z1 = z1, z2=z2) } # z1 is bianry data2 <- rtime3(N=100, beta=c(2, log(2)), mu=1, p.censor=0.3) data2.surv <- survfit(Surv(time, status)~factor(z1), data=data2) plot(data2.surv, lty=c(3,2), fun="cloglog", xlab="log(time)", ylab="log-log S(t)", main="Bimodal log-Normal (NON-PH) Model") legend(400, -3, c("z1=0", "z1=1"), lty=c(3,2)) title(main="Bimodal Log-Normal (NON-PH) Model") Residuals Most diagnostic tools in survival analysis are based on various residuals. 1. The martingale residual, which is the default, is used for discovering the correct functional form for a predictor. For example, consider the lung cancer data. > nlung <- na.omit(lung[, c("time", "status", "sex", + "ph.ecog", "ph.karno", "pat.karno", "wt.loss")]) > par(mfrow = c(2,2)) > attach(nlung) > fit.1 <- coxph(Surv(time,status) ~ strata(sex) + + ph.karno + pat.karno + wt.loss, data = nlung) > scatter.smooth(ph.ecog, resid(fit.1)) > fit.2 <- coxph(Surv(time,status) ~ strata(sex) + + ph.ecog + pat.karno + wt.loss, data = nlung) > scatter.smooth(ph.karno, resid(fit.2)) > fit.3 <- coxph(Surv(time,status) ~ strata(sex) + + ph.ecog + ph.karno + wt.loss, data = nlung) > scatter.smooth(pat.karno, resid(fit.3)) > fit.4 <- coxph(Surv(time,status) ~ strata(sex) + + ph.ecog + ph.karno + pat.karno, data = nlung) >scatter.smooth(wt.loss,resid(fit.4)) > mtext("Detecting Functional Form with Martingale Residuals", outer=T, side=1) 2. The deviance residual, which is a normalized transform of martingale residual, can be used for identifying poorly predicted subjects. However, it has shown that deviance residuals do not work well and cannot be recommended. fit <- coxph(Surv(time, status)~z1+z2, data=data1) plot(resid(fit, type="deviance"), ylab="Deviance Residuals") title(sub="(Data from Weibull Model)") abline(0) 3. The Cox-Snell residual. Same as in parametric models, plotting the cumulative hazards function of Cox-Snell residuals provides a way of checking goodness of fit. cox.snell <- data1$status - resid(fit) sv <- survfit(Surv(cox.snell, data1$status)~1) plot(sv$time, -log(sv$surv), ylab="H.hat(ri)", xlab="ri") abline(0,1) title(main="Cumulative Hazards of Cox-Snell Residuals", sub="A way of checking goodness of fit for Cox models") > cox.zph(fit) rho z1 0.0729 z2 -0.0374 GLOBAL NA chisq 1.143 0.318 1.479 p 0.285 0.573 0.477