Introduction to Frailty Models

STA635 Project by Benjamin Hall
Cox proportional hazards model
 Model: hyi (t) = h0(t)e1X1+...+ kXk is the hazard function
of the ith individual
 Assumption: The hazard function for each individual
is proportional to the basine hazard, h0(t). This
assumption implies that the hazard function is fully
determined by the covariate vector.
 Problem: There may be unobserved covariates that
cause this assumption to be violated.
Simluation Ex: Unobserved covariate
 Consider a situation with the following population:
Group
Proportion of
Population
Hazard Rate Hazard Rate
with placebo with Drug A
1
40%
1
.5
2
40%
2
1
3
20%
10
8
 Obviously Drug A is effective for the entire population.
 But what happens in the Cox Model if the group is
unobservable?
Simulation Example, continued
Let’s simulate this example and apply the Cox model:
 Have R simulate 100 people according to the previous
table’s probabilities and randomly assign them to
treatment or placebo
 For each person, have R simulate # of incidents within
period of length 1. At the end of period of length 1
right-censoring occurs.
Simulation Example, continued
 Here is some of the data generated by R (see final slide
for code):
id
group
treat
time
status
1
3
0
.38
1
[2,] 1
3
0
.07
1
[3,] 1
3
0
.23
1
[4,] 1
3
0
.15
1
[5,] 1
3
0
.11
1
[6,] 1
3
0
.89
0
[7,] 2
3
1
.22
1
[8,] 2
3
1
.04
1
...
...
...
...
...
[1,]
...
Simulation Example, continued
 Now we run coxph on our data:
> myfit1
Call:
coxph(formula = Surv(time, status) ~ treat)
coef
exp(coef) se(coef) z
p
treat
-0.128
0.88
0.11
-1.17 0.24
Likelihood ratio test=1.37 on 1 df, p=0.242 n= 445
 Notice that the LRT has a p-value of .242 which is not
significant. But we know that treatment is effective for
everyone. What is happening?
Simulation Example, continued
 The problem is that we have heterogeneity in the data
due to the unobservable groups.
 Since we cannot include group in our model, the
assumption of proportional hazards is violated.
 What can we do to solve this problem? Use a frailty
model.
Frailty Model
 Frailty models can help explain the unaccounted for
heterogeneity.
 Frailty Model: hyi (t) = z  h0(t)e1X1+...+ kXk is the
hazard function of the ith individual
 The distribution of z is specified to be, say, Gamma.
(Note: z must be non-negative since the hazard is nonnegative.)
 In this situation, the shared frailty model is
appropriate, that is multiple observations of the same
individual always has the same value of z.
Frailty Model in R
 Let’s apply the frailty model to our simulated data:
> myfit2
Call:
coxph(formula = Surv(time, status) ~ treat + frailty(id))
coef se(coef) se2 Chisq DF
p
treat
-0.147 0.160
0.111 0.85
1.0
3.6e-01
frailty(id)
93.89 43.5 1.4e-05
Iterations: 5 outer, 17 Newton-Raphson
Variance of random effect= 0.294 I-likelihood = -1887.9
Degrees of freedom for terms= 0.5 43.5
Likelihood ratio test=117 on 44.0 df, p=1.37e-08 n= 445
 Notice that the LRT now has a highly significant p-
value.
Frailty Model in R
 Now let’s try implementing the frailty model to a real
data set, the kidnet data set.
 Here are the results for the regular Cox Model:
> kfit1
Call:
coxph(formula = Surv(time, status) ~ age + sex, data = kidney)
coef exp(coef) se(coef)
z
p
age 0.00203 1.002
0.00925 0.220 0.8300
sex -0.82931 0.436
0.29895 -2.774 0.0055
Likelihood ratio test=7.12 on 2 df, p=0.0285 n= 76
 Here the LRT is significant with a p-value of .0285 even
without considering frailty.
Frailty Model in R
 However, a frailty model seems applicable in this
situation since their are multiple oberservations (i.e. 2
kidneys) per person. Below considers frailty:
> kfit2
Call:
coxph(formula = Surv(time, status) ~ age + sex + frailty(id), data = kidney)
coef se(coef)
se2 Chisq DF p
age
0.00525 0.0119 0.0088 0.2 1 0.66000
sex
-1.58749 0.4606 0.3520 11.9 1 0.00057
frailty(id)
23.1 13 0.04000
Iterations: 7 outer, 49 Newton-Raphson
Variance of random effect= 0.412 I-likelihood = -181.6
Degrees of freedom for terms= 0.5 0.6 13.0
Likelihood ratio test=46.8 on 14.1 df, p=2.31e-05 n= 76
 Now the LRT is even more significant.
Resources
 Therneau and Grambsch, Modeling Survival Data,
Chapter 9
 Wienke, Andreas, “Frailty Models”,
http://www.demogr.mpg.de/papers/working/wp2003-032.pdf
 Govindarajulu, “Frailty Models and Other Survival
Models”,
www.ms.uky.edu/~statinfo/nonparconf/govindarajulu.
ppt
R Code
library(survival)
#GEN_TIME
gen_time <- function(group, treat) {
if (group == 1) {
return (round(rexp(1, 1-(.5*treat)),2))}
if (group == 2) {
return (round(rexp(1, 2-treat),2))}
if (group == 3) {
return (round(rexp(1, 10-2*treat),2))}}
# PERSON DATA
person_data <- function() {
treat <- rbinom(1,1,.5)
x <- runif(1)
t1 <- matrix(NA, nrow=1, ncol=25)
if (x < .4) { group <- 1 }
if (x > .4 & x < .8) { group <- 2}
if (x > .8) { group <- 3}
elapse <- 0
count <- 1
while (elapse < 1) {
t1[(count+3)] <- gen_time(group, treat)
elapse <- elapse + t1[(count+3)]
count <- count + 1}
count <- count - 1
t1[1] <- group
t1[2] <- treat
t1[3] <- count
for (i in (count+4):25) { t1[i] <- 0 }
if (count == 1) { t1[count+3] <- 1 }
if (count > 1) { t1[count+3] <- 1-t1[count+2] }
return (t1)}
m1 <- matrix (NA, nrow=100, ncol=25)
for (i in 1:100) {
m1[i,] <- person_data() }
samp_size <- sum(m1[,3])
samp <- matrix(NA, nrow = samp_size, ncol= 5)
colnames(samp) <- c("id", "group", "treat", "time", "status")
count2 <- 1
for (i in 1:100) {
for (j in 1:m1[i,3]) {
samp[count2, 1] <- i
samp[count2, 2] <- m1[i,1]
samp[count2, 3] <- m1[i,2]
samp[count2, 4] <- m1[i,j+3]
samp[count2, 5] <- 1
if(j==m1[i,3]) { samp[count2, 5] <- 0 }
count2 <- count2 + 1}}
myfit1 <- coxph(Surv(samp[,4], samp[,5]) ~ samp[,3])
myfit2 <- coxph(Surv(samp[,4], samp[,5]) ~ samp[,3] + frailty(samp[,1]))
myfit1
myfit2