STA635 Project by Benjamin Hall Cox proportional hazards model Model: hyi (t) = h0(t)e1X1+...+ kXk is the hazard function of the ith individual Assumption: The hazard function for each individual is proportional to the basine hazard, h0(t). This assumption implies that the hazard function is fully determined by the covariate vector. Problem: There may be unobserved covariates that cause this assumption to be violated. Simluation Ex: Unobserved covariate Consider a situation with the following population: Group Proportion of Population Hazard Rate Hazard Rate with placebo with Drug A 1 40% 1 .5 2 40% 2 1 3 20% 10 8 Obviously Drug A is effective for the entire population. But what happens in the Cox Model if the group is unobservable? Simulation Example, continued Let’s simulate this example and apply the Cox model: Have R simulate 100 people according to the previous table’s probabilities and randomly assign them to treatment or placebo For each person, have R simulate # of incidents within period of length 1. At the end of period of length 1 right-censoring occurs. Simulation Example, continued Here is some of the data generated by R (see final slide for code): id group treat time status 1 3 0 .38 1 [2,] 1 3 0 .07 1 [3,] 1 3 0 .23 1 [4,] 1 3 0 .15 1 [5,] 1 3 0 .11 1 [6,] 1 3 0 .89 0 [7,] 2 3 1 .22 1 [8,] 2 3 1 .04 1 ... ... ... ... ... [1,] ... Simulation Example, continued Now we run coxph on our data: > myfit1 Call: coxph(formula = Surv(time, status) ~ treat) coef exp(coef) se(coef) z p treat -0.128 0.88 0.11 -1.17 0.24 Likelihood ratio test=1.37 on 1 df, p=0.242 n= 445 Notice that the LRT has a p-value of .242 which is not significant. But we know that treatment is effective for everyone. What is happening? Simulation Example, continued The problem is that we have heterogeneity in the data due to the unobservable groups. Since we cannot include group in our model, the assumption of proportional hazards is violated. What can we do to solve this problem? Use a frailty model. Frailty Model Frailty models can help explain the unaccounted for heterogeneity. Frailty Model: hyi (t) = z h0(t)e1X1+...+ kXk is the hazard function of the ith individual The distribution of z is specified to be, say, Gamma. (Note: z must be non-negative since the hazard is nonnegative.) In this situation, the shared frailty model is appropriate, that is multiple observations of the same individual always has the same value of z. Frailty Model in R Let’s apply the frailty model to our simulated data: > myfit2 Call: coxph(formula = Surv(time, status) ~ treat + frailty(id)) coef se(coef) se2 Chisq DF p treat -0.147 0.160 0.111 0.85 1.0 3.6e-01 frailty(id) 93.89 43.5 1.4e-05 Iterations: 5 outer, 17 Newton-Raphson Variance of random effect= 0.294 I-likelihood = -1887.9 Degrees of freedom for terms= 0.5 43.5 Likelihood ratio test=117 on 44.0 df, p=1.37e-08 n= 445 Notice that the LRT now has a highly significant p- value. Frailty Model in R Now let’s try implementing the frailty model to a real data set, the kidnet data set. Here are the results for the regular Cox Model: > kfit1 Call: coxph(formula = Surv(time, status) ~ age + sex, data = kidney) coef exp(coef) se(coef) z p age 0.00203 1.002 0.00925 0.220 0.8300 sex -0.82931 0.436 0.29895 -2.774 0.0055 Likelihood ratio test=7.12 on 2 df, p=0.0285 n= 76 Here the LRT is significant with a p-value of .0285 even without considering frailty. Frailty Model in R However, a frailty model seems applicable in this situation since their are multiple oberservations (i.e. 2 kidneys) per person. Below considers frailty: > kfit2 Call: coxph(formula = Surv(time, status) ~ age + sex + frailty(id), data = kidney) coef se(coef) se2 Chisq DF p age 0.00525 0.0119 0.0088 0.2 1 0.66000 sex -1.58749 0.4606 0.3520 11.9 1 0.00057 frailty(id) 23.1 13 0.04000 Iterations: 7 outer, 49 Newton-Raphson Variance of random effect= 0.412 I-likelihood = -181.6 Degrees of freedom for terms= 0.5 0.6 13.0 Likelihood ratio test=46.8 on 14.1 df, p=2.31e-05 n= 76 Now the LRT is even more significant. Resources Therneau and Grambsch, Modeling Survival Data, Chapter 9 Wienke, Andreas, “Frailty Models”, http://www.demogr.mpg.de/papers/working/wp2003-032.pdf Govindarajulu, “Frailty Models and Other Survival Models”, www.ms.uky.edu/~statinfo/nonparconf/govindarajulu. ppt R Code library(survival) #GEN_TIME gen_time <- function(group, treat) { if (group == 1) { return (round(rexp(1, 1-(.5*treat)),2))} if (group == 2) { return (round(rexp(1, 2-treat),2))} if (group == 3) { return (round(rexp(1, 10-2*treat),2))}} # PERSON DATA person_data <- function() { treat <- rbinom(1,1,.5) x <- runif(1) t1 <- matrix(NA, nrow=1, ncol=25) if (x < .4) { group <- 1 } if (x > .4 & x < .8) { group <- 2} if (x > .8) { group <- 3} elapse <- 0 count <- 1 while (elapse < 1) { t1[(count+3)] <- gen_time(group, treat) elapse <- elapse + t1[(count+3)] count <- count + 1} count <- count - 1 t1[1] <- group t1[2] <- treat t1[3] <- count for (i in (count+4):25) { t1[i] <- 0 } if (count == 1) { t1[count+3] <- 1 } if (count > 1) { t1[count+3] <- 1-t1[count+2] } return (t1)} m1 <- matrix (NA, nrow=100, ncol=25) for (i in 1:100) { m1[i,] <- person_data() } samp_size <- sum(m1[,3]) samp <- matrix(NA, nrow = samp_size, ncol= 5) colnames(samp) <- c("id", "group", "treat", "time", "status") count2 <- 1 for (i in 1:100) { for (j in 1:m1[i,3]) { samp[count2, 1] <- i samp[count2, 2] <- m1[i,1] samp[count2, 3] <- m1[i,2] samp[count2, 4] <- m1[i,j+3] samp[count2, 5] <- 1 if(j==m1[i,3]) { samp[count2, 5] <- 0 } count2 <- count2 + 1}} myfit1 <- coxph(Surv(samp[,4], samp[,5]) ~ samp[,3]) myfit2 <- coxph(Surv(samp[,4], samp[,5]) ~ samp[,3] + frailty(samp[,1])) myfit1 myfit2