Uploaded by Kevin Lim

Regression with frailty in survival analysis

advertisement
Regression with Frailty in Survival analysis
Kyuson Lim
November 26, 2021
STATS 756
2
Kyuson Lim
Contents
1
Acknowledgement
5
2
Introduction
7
2.1
8
Understanding the concept of cox-proportional hazard model . . . . . .
2.1.1
3
Partial Likelihood
3.1
3.2
4
5
Parametric cox model with frailty term . . . . . . . . . . . . .
Breslow Partial likelihood
10
11
. . . . . . . . . . . . . . . . . . . . . . . .
11
3.1.1
Example for computing Partial likelihood . . . . . . . . . . . .
12
3.1.2
Penalize Partial Likelihood (PPL) . . . . . . . . . . . . . . . .
14
Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.2.1
15
Newton-Raphson algorithm - example . . . . . . . . . . . . . .
Simulation for data
17
4.1
Simulation study: Infection in Kidney patients . . . . . . . . . . . . . .
17
4.2
Simulation study: testing the model fit . . . . . . . . . . . . . . . . . .
18
4.3
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
4.3.1
20
Generalized gamma frailty model . . . . . . . . . . . . . . . .
Appendix: R codes
21
3
STATS 756
4
Kyuson Lim
CONTENTS
Chapter 1
Acknowledgement
The purpose of this report is solely on the interpretation and analysis of ‘Regression
with Frailty in survival analysis’ written by the McGilchrist and Aisbett in 1991. Note
that the concepts of penalized partial likelihood and the Newton-Raphson algorithm
used for elicitation of maximized coefficients in the Cox model extends to the paper of
’Estimation in generalized mixed models‘ with the same author.
Moreover, the original dataset that is used for the analysis is attached in the R package
‘survival’ which is imported to be re-assessed. More specification of the original dataset,
Infections in kidney patients, is found from the R document of ‘survival’ page 55.
Also, this report rephrase for the specification of dataset containing the outlier, model
specification and the codes to have used in the ’survival‘ package.
However, the examples and codes are extracted from the textbook, ‘Applied survival
analysis using R’ written by the author Dirk F. Moore for graph visualization of optimization in Newton-Raphson method and the guidance for elicitation process. Combined
with the textbook ‘Frailty models in survival analysis’ written by the Andreas Wienke,
the frailty terms are defined and derived for the equation of log likelihood as well as
the penalized partial likelihood. Interpretation for the original paper and optimization
method are defined by rephrasing the definitions used in the textbooks.
Finally, the reports briefly extend to the paper of ‘Generalized gamma frailty model’
written by Professor Dr. Balakrishnan which extends the idea and method of the
original paper as to be used with. This paper is highly reputable in the matter where
the distribution of frailty term is parametric to be specified for lognormal, Weibull
frailty model and the generalized gamma distribution. The comparison of the model
performance with these distributions and the Newton-Raphson algorithm is used for
extension of the original paper.
I am pleased thank for all textbooks and guideline for writing this report in behalf
of the course STATS 756 for analysis in Cox model with frailty terms. Also, I would be
pleased to thank for Professor Dr. Balakishnan to support me to learn with the ideas of
survival analysis and censoring and writing the report.
5
STATS 756
6
Kyuson Lim
CHAPTER 1. ACKNOWLEDGEMENT
Chapter 2
Introduction
To begin with, some concepts and relationship between the survival function and the
hazard function is explained. First of all, the Cox proportional-hazards model (Cox,
1972) is essentially a regression model commonly used statistical in medical research
for investigating the association between the survival time of patients and one or more
predictor variables.
A graph of survival analysis
The empirical hazard function is a step function evaluated at each time. Therefore,
The survival probability, 𝑆(𝑑) is the probability that an individual survives from time
origin to a specified future time 𝑑. The hazard, β„Ž(𝑑) is a continuous probability function
that an individual who is under observation at a time 𝑑 has an event at that time.
These two hazard function and survival function is modeled as follows. As the survival function is a decreasing from 𝑑 = 0 → ∞ for continuous variable, the distribution
function is explained by 𝐹 (𝑑) = 𝑝(𝑇 ≤ 𝑑).
Hence,
∫ ∞
𝑆(𝑑) = 𝑝(𝑇 > 𝑑) =
𝑓 (𝑒)𝑑𝑒 = 1 − 𝐹 (𝑑)
𝑑
𝑓 (𝑑)
𝑆(𝑑) and instantaneous time ∫(4𝑑 → 0)
𝑑
𝑑 (log 𝑆(𝑑))
𝑓 (𝑑)
− 𝑑𝑑
= 1−𝐹
(𝑑) for cdf of 𝐻 (𝑑) = 0 β„Ž(𝑒)𝑑𝑒.
For survival time 𝑇, hazard function β„Ž(𝑑) =
illustrate the event after time 𝑑, β„Ž(𝑑) =
7
STATS 756
Kyuson Lim
Therefore,
𝐻 (𝑑) = − log(𝑆(𝑑))
and 𝑆(𝑑) = exp(−𝐻 (𝑑)). If one is known, other two are easily determined.
2.1
Understanding the concept of cox-proportional hazard model
In the survival analysis, the paper of regression with frailty in survival analysis contains
the data for the infections of kidney patients. The survival analysis plot for the infection
of 76 Kidney patients in the paper, analyzed by R shows that
Figure 2.1: A graph of survival analysis for Kidney data
As to have known with, the survival function defines a probability of surviving up
to a point 𝑑, 𝑆(𝑑) = 𝑝(𝑇 > 𝑑). Since the hazard function is an instantaneous failure
rate, given subject has survived up to time 𝑑 and fails in next small interval of time,
>𝑑)
β„Ž(𝑑) = lim𝛿→0 𝑝(𝑑<𝑇 <𝑑+𝛿|𝑇
.
𝛿
In this report, some of the goals for understanding the cox proportional hazard model
with frailty terms include
• Cox proportional hazard model analyzes the various influential covariates simultaneously, where β„Ž0 (𝑑) is non-parametric part and exp( 𝜷x𝑖 ) is parametric part.
• Fit regression model to censored survival data and partial likelihood allows us to
compare 2 groups of survival data.
• By maximized log-likelihood of 𝜷, Newton-Raphson algorithm is used to derive
the estimates.
8
CHAPTER 2. INTRODUCTION
Kyuson Lim
STATS 756
An endpoint, a single cause of death, and the survival times of each case have been
assumed to be independent. Methods for analyzing such survival data is not sufficient if
cases are not independent or if the event could occur repeatedly. Hence, the proportional
hazard model is described as
β„Ž(𝑑|x𝑖 ) = β„Ž0 (𝑑) exp( 𝜷0x𝑖 (𝑑)), 𝜷0 = (𝛽1 , .., 𝛽 𝑝 )
A cluster-specific random effect terms have a relative effect on the baseline hazard function, β„Ž0 (x𝑖 ), reflect underlying hazard for subjects with all covariates x1 , ..., x 𝑝 equal to
0. We can find that the distribution of the baseline hazard function is a non-parametric
part and the parametric part is exp( 𝜷0x𝑖 ) for the cox proportional hazard model to be
semi-parametric.
For 2 covariates, x1 = 1 and x2 = 0, a hazard rate for treated group is β„Ž1 (𝑑|x𝑖 = 1) =
β„Ž0 (𝑑) exp( 𝜷), β„Ž1 (𝑑|x𝑖 = 1) = β„Ž1 (𝑑). Two hazards is constant exp( 𝜷), not dependent on
time (t) and two hazards ratio of 2 groups remain proportional over time
β„Ž1 (𝑑)
= exp( 𝜷)
β„Ž0 (𝑑)
Moreover, a ratio of relative Hazard of two patients (π‘₯1π‘˜ = 1, π‘₯2π‘˜ = 0) as explained in
the lecture can be shown as
exp(𝛽1 π‘₯11 + 𝛽2 π‘₯12 + · · · + 𝛽 π‘˜ π‘₯ 1π‘˜ + · · · + 𝛽 𝑝 π‘₯ 1𝑝 )
β„Ž1 (𝑑|x)
β„Ž0 (𝑑) exp( 𝜷0x)
=
0 0 =
0
β„Ž2 (𝑑|x ) β„Ž0 (𝑑) exp( 𝜷 x ) exp(𝛽1 π‘₯21 + 𝛽2 π‘₯22 + · · · + 𝛽 π‘˜ π‘₯ 2π‘˜ + · · · + 𝛽 𝑝 π‘₯ 2𝑝 )
exp(𝛽1 π‘₯ 11 + 𝛽2 π‘₯ 12 + · · · + 𝛽 π‘˜ · 1 + · · · + 𝛽 𝑝 π‘₯ 1𝑝 ) exp(𝛽 π‘˜ )
=
= exp(𝛽 π‘˜ )
exp(𝛽1 π‘₯ 21 + 𝛽2 π‘₯ 22 + · · · + 𝛽 π‘˜ · 0 + · · · + 𝛽 𝑝 π‘₯ 2𝑝 )
exp(0)
Hence, the Cox Proportional Hazards model is a linear model for the log of the hazard
ratio.
=
A graph of relative hazard ratio for each coefficient
The relative hazard ratio of each prediction in the paper is computed as follows. The
dependence of particular diseases is found to influence the relative hazard ratio. The sex
is insignificant predictor to be considered as well. While age is estimated to be 1 and
sex to be 0.23, the GS disease is 1.09, the AN is 1.42 and the PKD is 0.24.
CHAPTER 2. INTRODUCTION
9
STATS 756
2.1.1
Kyuson Lim
Parametric cox model with frailty term
In clustered data, survival times of individuals that are in same unit or family, meaning
that survival times within a cluster are similar, to each other than to those from other
clusters then the independence no longer holds.
To accommodate such structure of subjects in the same cluster is to assign each
individual in a cluster a common factor known as a frailty or as random effect.
A random effects incorporated for within-cluster homogeneity in outcomes 1.
Shared frailty model: β„Ž(𝑑|x𝑖 𝑗 ) = β„Ž0 (𝑑) exp(𝛽0x𝑖 𝑗 (𝑑) + 𝑀 𝑖 )
Note that 𝑀 𝑖 is the random effect for 𝑖th cluster for all individuals, vary across clusters.
Subjects in the same cluster all share the same frailty factor.
A frailty model refer to a survival model with only a random intercept. Meaning
that the frailty term in the random effect, the frailty term of the model follows with some
distribution such as log-normal, gamma, and Weibull distributions. In the paper, logfrailities are assumed to be normally distributed with 𝐸 (log(𝑀 𝑖 )) = 0, 𝑉 π‘Žπ‘Ÿ (log(𝑀 𝑖 )) =
𝜎 2 (I − M−1 110) 2
1increase/decrease hazard for distinct class
2𝑀 𝑖 = 1, 10u = 0
10
CHAPTER 2. INTRODUCTION
Chapter 3
Partial Likelihood
The goal is to estimate 𝜷 that does not depend on β„Ž0 (𝑑) for ordered death time of π‘Ÿ
individuals, 𝑑 (1) < · · · < 𝑑 (π‘Ÿ) . First, we define risk set, 𝑅(𝑑 ( 𝑗) ), to be the group of
individuals who are alive and uncensored at at a time prior to 𝑑 ( 𝑗) . The failure time 𝑑𝑖 is
0 when
Hence, 𝑃(individuals 𝑖 dies at 𝑑 ( 𝑗) given one individual from risk set on 𝑅(𝑑 ( 𝑗) ) dies
at 𝑑 ( 𝑗) | one death from the risk set 𝑅(𝑑 ( 𝑗) ) at 𝑑 ( 𝑗) ) = 𝑃(individual 𝑖 dies at 𝑑 ( 𝑗) )/𝑃( one
death at 𝑑 ( 𝑗) )
β„Ž0 (𝑑 ( 𝑗) ) exp( 𝜷0x𝑖 )
β„Žπ‘– (𝑑 𝑗 |x𝑖 )
exp( 𝜷0x𝑖 )
Í
𝑅(𝑑 ( 𝑗) ) = Í
=Í
=
0
0
π‘˜∈𝑅(𝑑 ( 𝑗) ) β„Ž π‘˜ (𝑑𝑖 |x 𝑗 )
π‘˜∈𝑅(𝑑 ( 𝑗) ) β„Ž0 (𝑑 ( 𝑗) ) exp( 𝜷 x π‘˜ )
π‘˜∈𝑅(𝑑 ( 𝑗) ) exp( 𝜷 x π‘˜ )
The partial likelihood differs from the likelihood as the factors are conditional probabilities and frailties are latent variables that is unobserved.
Then, the partial likelihood is simply expressed as a multiplication of conditional
probabilities among 𝑛 samples for 𝑗 events
exp( 𝜷0x𝑖 )
𝐿 ( 𝜷) = Ππ‘Ÿπ‘—=1 Í
, π‘Ÿ ∈ {𝑑 (1) , ..., 𝑑 (π‘Ÿ) }(survival time)
0
π‘˜ ∈𝑅(𝑑 ( 𝑗) ) exp( 𝜷 x π‘˜ )
Notice that π‘₯𝑖 is a vector of covariates for individual 𝑖 who dies at 𝑑 ( 𝑗) . As we can see from
the equation, the risk function prior to the 𝑖th event is not counted in the denominator.
A partial likelihood allow to use unspecified baseline survival distribution to define
a survival distributions of subjects based on their covariates. Also, the derivation for
maximized 𝜷 could be determined by taking the log of the equation.
3.1
Breslow Partial likelihood
The likelihood is also expressed with hazard function β„Ž(π‘₯) = 𝑓 (𝑑)/𝑆(𝑑). Note that the
likelihood function is only for uncensored individuals.
𝑛
𝐿( 𝜷, x𝑖 ) = Π𝑖=1
𝑓 (𝑑𝑖 , 𝜷) 𝑑𝑖 𝑆(𝑑𝑖 , 𝜷) 1−𝑑𝑖
𝑛
= Π𝑖=1
β„Ž(𝑑𝑖 , 𝜷) 𝑑𝑖 𝑆(𝑑𝑖 .𝜷), β„Ž(t, 𝜷) = β„Ž0 (𝑑) exp( 𝜷0x𝑖 )
11
STATS 756
Kyuson Lim
As one of the simplest method, Breslow approximation adjusts both terms of the
marginal method so that they have the same denominator, corresponding to all subjects
at risk
exp( 𝜷0x𝑖 )
𝑛
𝐿 ( 𝜷) = Π𝑖=1 Í
{ π‘˜ ∈𝑅(𝑑 ( 𝑗) ) exp( 𝜷0x π‘˜ )} 𝑑𝑖
Note that 𝑑1 , ..., 𝑑 𝑛 is defined for observed survival time for 𝑛 individuals. Also, a 𝑑𝑖 is
an event indicator as follows.
(
0
if patient is censored
𝑑𝑖 =
1
if patient dies
Likewise, the partial likelihood is written in terms of a product of terms for each
individuals, as opposed to each failure time.
𝑑𝑖 exp( 𝜷0x𝑖 )
𝑛
log 𝐿 ( 𝜷) = log Р𝑗=1 Í
0
π‘˜∈𝑅(𝑑 ( 𝑗) ) exp( 𝜷 x π‘˜ )
𝑛
∑︁
exp( 𝜷0x𝑖 )
𝑑𝑖 log Í
=
0
π‘˜∈𝑅(𝑑 ( 𝑗) ) exp( 𝜷 x π‘˜ )
𝑖=1
∑︁
𝑛
𝑛
∑︁
∑︁
0
0
=
𝑑𝑖 log(exp( 𝜷 x𝑖 )) −
𝑑𝑖 log
exp( 𝜷 x π‘˜ )
𝑖=1
=
𝑛
∑︁
0
𝑑𝑖 𝜷 x𝑖 −
𝑖=1
=
𝑛
∑︁
π‘˜ ∈𝑅(𝑑 ( 𝑗) )
𝑖=1
𝑛
∑︁
∑︁
𝑑𝑖 log
𝑑𝑖 𝜷x𝑖 − log
exp( 𝜷 x π‘˜ )
π‘˜∈𝑅(𝑑 ( 𝑗) )
𝑖=1
0
∑︁
0
exp( 𝜷 x π‘˜ )
(e1)
π‘˜∈𝑅(𝑑 ( 𝑗) )
𝑖=1
The partial likelihood is valid when there are no two subjects who have same event
time. A variation of hazard rate attribute to dependence of risk variables or frailty terms,
hence the frailty is a random component. In our paper, the We can investigate to find for
the specific derivation of computation for partial likelihood as follows.
3.1.1
Example for computing Partial likelihood
Now, the simple example of 6 patients with two groups, treatment and control, is shown
below. At time 0, 6 patients are at a risk of experiencing an event, which is defined as
group of patients for initial set 𝑅1 .
Patient
1
2
3
4
5
6
12
Survtime
6
7
10
15
19
25
Censor
1
0
1
1
0
1
Group
C
C
T
T
T
T
CHAPTER 3. PARTIAL LIKELIHOOD
Kyuson Lim
STATS 756
1. Before first failure at time 𝑑 = 6, all 6 patients are at risk and anyone could
experience event.
2. By groups, we know for each control and treatment group that exp(π‘₯1 𝛽) =
exp(π‘₯2 𝛽) = exp(π‘₯4 𝛽) = 1, exp(π‘₯ 3 𝛽) = exp(π‘₯ 5 𝛽) = exp(π‘₯ 6 𝛽) = exp(𝛽).
3. Substitute for 𝑝 1 =
Í β„Ž0 (𝑑1 ) exp(π‘₯ 𝑖 𝛽)
,
π‘˜ ∈𝑅1 β„Ž0 (𝑑 1 ) exp(π‘₯ π‘˜ 𝛽)
where β„Ž0 (𝑑1 ) is the hazard for a subject
from a control group, the equation yield for 𝑝 1 =
1β„Ž0 (𝑑1 )
3β„Ž0 (𝑑1 ) exp(𝛽)+3β„Ž0 (𝑑1 )
4. At time 7, a control patient dropped out and at 𝑑 = 10, 𝑝 2 =
1
time 𝑑 = 15 three patients at risk to give 𝑝 3 = 2 exp(𝛽)+1
.
=
exp(𝛽)
3 exp(𝛽)+1
1
3 exp(𝛽)+3 .
as well as at
5. At last event 𝑑 = 25, one subject is at risk with partial likelihood to be the product
exp(𝛽)
of all, 𝐿(exp(𝛽)) = (3 exp(𝛽)+3) (3 exp(𝛽)+1) (2 exp(𝛽)+1) .
6. Taking the log transformation, 𝑙 (𝛽) = 𝛽 − log(3 exp(𝛽) + 3) − log(3 exp(𝛽) + 1) −
log(2 exp(𝛽) + 1).
We can put into R code for computation of the example to easily estimate the 𝛽. Note
that we used a partial likelihood to maximize for obtaining for the estimate of 𝛽.
> plsimple <- function(beta) {
+
psi <- exp(beta)
+
result <- log(psi) - log(3*psi + 3) - log(3*psi + 1) - log(2*psi + 1)
+
result }
> result <- optim(par=0, fn = plsimple, method = "L-BFGS-B",
control=list(fnscale = -1), lower = -3, upper = 1)
> result$par
[1] -1.326129
We may find from the maximum partial likelihood estimate, 𝛽ˆ = −1.326 which is also found
from the plot, 𝑙 (𝛽) versus 𝛽. The optimized maximum value achieved by the
Maximum partial likelihood estimate by Newton-Rapshon algorithm
The solid curved black line is a plot of the log partial likelihood over a range of values
of 𝛽. The maximum is indicated by the vertical dashed blue line, and the value of the
CHAPTER 3. PARTIAL LIKELIHOOD
13
STATS 756
Kyuson Lim
log-partial likelihood at a point is -3.672. The value -4.277 of the log-partial likelihood
is at the null hypothesis value, 𝛽 = 0.
The tangent to the 𝑙 (𝛽) curve at 𝛽 = 0 is shown by the straight red line. Its slope is the
derivative of the log-likelihood evaluated at 𝛽 = 0.
3.1.2
Penalize Partial Likelihood (PPL)
Taking a log, random effect are treated as penalty term in GLM by the Best Linear
Unbiased Prediction (BLUP). Previously, we have defined the partial log likelihood as
two terms for unknown 𝜷. This also extend to the joint likelihood of parameters πœƒ, 𝜷
and w for two separate parts of the equations (e1). A full likelihood is also expressed
as , 𝑙 𝑓 𝑒𝑙𝑙 (β„Ž0 (·), πœƒ, 𝜷) = log 𝑓 (x, u|β„Ž0 (·), πœƒ, 𝜷) = log 𝑓 (x|β„Ž0 (·), 𝜷, u) + log( 𝑓 (u|πœƒ)) =
𝑙 𝑓 𝑒𝑙𝑙,1 (β„Ž0 (·), 𝜷) + 𝑙 𝑓 𝑒𝑙𝑙,2 (β„Ž0 (πœƒ)).
Maximization in PPL is a double iterative process, alternates between inner (𝑙 π‘π‘Žπ‘Ÿπ‘‘ )
and outer loop (𝑙 π‘π‘’π‘›π‘Žπ‘™π‘–π‘§π‘’ ) until convergence. A penalty term of random effect is far
away from mean value 0, by reducing a penalized partial likelihood. If log 𝑀 𝑖 is
𝑁 (0, 𝜎 2 𝐷) where 𝐷 is known matrix, then BLUP consists of maximizing a sum of two
log-likelihood:
𝑙 𝑃𝑃𝐿 (πœƒ, 𝜷, w) = 𝑙 π‘π‘Žπ‘Ÿπ‘‘ ( 𝜷, w) − 𝑙 π‘π‘’π‘›π‘Žπ‘™π‘–π‘§π‘’ (πœƒ, w)
We know 𝑙 π‘π‘Žπ‘Ÿπ‘‘ ( 𝜷, w) which is the conditional likelihood for data given frailties. Also,
the 𝑙 π‘π‘’π‘›π‘Žπ‘™π‘–π‘§π‘’ (πœƒ, w) stands for the distribution for frailties. The sum is a termed a penalized
likelihood function in the sense that the 𝑙 π‘π‘’π‘›π‘Žπ‘™π‘–π‘§π‘’ is a penalty function for the conditional
log-likelihood of 𝑙 π‘π‘Žπ‘Ÿπ‘‘ . Note that this procedure is also specified in the paper ‘Estimation
in generalized mixed models’.
When a Cox model with random shared frailty terms is fit, one can use the median
hazard ratio as a measure of the magnitude of the effect of clustering on the hazard of the
outcome. In 𝑙 π‘π‘Žπ‘Ÿπ‘‘ , a Newton-Raphson uses local quadratic approximations of penalty
term. Iterate to estimate 𝜷 and w𝑖 , using the derivative of likelihood and variance matrix,
V 1.
3.2
Newton-Raphson Method
The Newton-Raphson algorithm is originated from Taylor’s series 𝑓 (π‘₯) ≈ 𝑓 (π‘₯ π‘˜ ) + (π‘₯ −
π‘₯ π‘˜ ) 𝑓 0 (π‘₯ π‘˜ ) + 2!1 (π‘₯ − π‘₯ π‘˜ ) 2 𝑓 00 (π‘₯ π‘˜ ) + · · · + 𝑛!1 (π‘₯ − π‘₯ π‘˜ ) 𝑛 𝑓 (𝑛) (π‘₯ π‘˜ ) about some point, the system
of non-linear equations is solved by the procedure of Newton-Raphson method. Now
the Newton-Raphson method takes the first two terms of the expansion,
𝑓 (π‘₯) = 𝑓 (π‘₯ π‘˜ ) + (π‘₯ − π‘₯ π‘˜ ) 𝑓 0 (π‘₯ π‘˜ ),
and assume that π‘₯ = π‘₯ π‘˜+1 is the solution of the equation 𝑓 (π‘₯) = 0 then 0 = 𝑓 (π‘₯ π‘˜ ) +
(π‘₯ π‘˜+1 − π‘₯ π‘˜ ) 𝑓 0 (π‘₯ π‘˜ ) to be rearranged. Generally, the Newton-Raphson method could
1If V is replaced by 𝐸 (V), then the iterative procedure becomes the method of scoring.
14
CHAPTER 3. PARTIAL LIKELIHOOD
Kyuson Lim
STATS 756
only be solved for non-linear equation with a single variable. We approximate roots,
𝑓 (𝛽) = 0.
1. Start with initial value 𝛽 (0) of 𝛽.
2. First-order linear approximation of 𝑓 at 𝛽 (0) + β„Ž:
𝑓 (𝛽 (0) + β„Ž) ≈ 𝑓 (𝛽 (0) ) + β„Ž 𝑓 0 (𝛽 (0) )
3. Solve to find solution 𝛽 (1) (updated) = 𝛽 (0) + β„Ž of 𝑓 (𝛽) = 0 ⇒ 𝑓 (𝛽 (1) ) = 0 by
β„Ž = −{ 𝑓 0 (𝛽 (0) )}−1 𝑓 (𝛽 (0) ) and thus 𝛽 (1) = 𝛽 (0) − { 𝑓 0 (𝛽 (0) )}−1 𝑓 (𝛽 (0) )
4. Iterate until process converges 𝛽 (π‘˜+1) ≈ 𝛽 (π‘˜) .
A GLM (poisson, logistic) uses the method of iteration for estimating the coefficients.
A distribution of frailties are obtained when dependence on frailty terms. A NewtonRaphson procedure converge if sufficient variation of measure risk variables exists within
each patients.
3.2.1
Newton-Raphson algorithm - example
The goal is to produce better approximations to the roots of a real-valued function.
𝑓 (π‘₯ 0 )
𝑓 0 (π‘₯0 )
π‘₯1 = π‘₯0 −
..
.
π‘₯ 𝑛+1 = π‘₯ 𝑛 −
𝑓 (π‘₯ 𝑛 )
𝑓 0 (π‘₯ 𝑛 )
For example, when 𝑓 (π‘₯) = π‘₯ 2 − π‘Ž and 𝑓 0 (π‘₯) = 2π‘₯, the initial guess is π‘₯ 0 = 10 and
the difference is set to be small to iterate until convergence.
102 − 612
𝑓 (π‘₯ 0 )
=
10
−
= 35.6
𝑓 0 (π‘₯ 0 )
2 × 10
𝑓 (π‘₯ 1 )
35.62 − 612
= π‘₯1 − 0
= 35.6 −
= 26.395
𝑓 (π‘₯ 1 )
2 × 35.6
= · · · = 24.790
= · · · = 24.7376
= · · · = 24.738633753
π‘₯1 = π‘₯0 −
π‘₯2
π‘₯3
π‘₯4
π‘₯5
However, we could implement into a set f non-linear systems to solve for 𝜷. Simply,
by the implementation of Jacobian matrix,
𝜷 π‘˜ = 𝜷 π‘˜−1 − 𝐽 ( 𝜷 π‘˜−1 ) −1 V( 𝜷 π‘˜−1 )
CHAPTER 3. PARTIAL LIKELIHOOD
15
STATS 756
Kyuson Lim
With initial estimate of 𝜷0 , w0 , the goal is to iteratively estimate 𝜷 with PPL. Loglikelihood is approximately quadratic in region of true values. Previously, we have found
that maximized 𝑙 π‘π‘Žπ‘Ÿπ‘‘ − 𝑙 π‘π‘’π‘›π‘Žπ‘™π‘–π‘§π‘’ gives estimators where a joint log-likelihood is 𝑙 𝑃𝑃𝐿 ,
2
𝜷0
0
−πœ• 𝑙 π‘π‘Žπ‘Ÿ 𝑑 /πœ• πœ·πœ• 𝜷 0
−πœ• 2 𝑙 π‘π‘Žπ‘Ÿ 𝑑 /πœ• πœ·πœ•π’˜ 0
𝜷ˆ
−1 πœ•π‘™ π‘π‘Žπ‘Ÿ 𝑑 /πœ• 𝜷0
−1
=
+V
−V
, V=
w0
πœ•π‘™ π‘π‘Žπ‘Ÿ 𝑑 /πœ•π’˜ 0
𝜎 −2 w0
−πœ• 2 𝑙 π‘π‘Žπ‘Ÿ 𝑑 /πœ• πœ·πœ• 𝜷 0 −πœ• 2 𝑙 π‘π‘Žπ‘Ÿ 𝑑 /πœ•π’˜πœ•π’˜ 0 + 𝜎 −2 I
wΜ‚
ˆ wΜ‚ has approximately
The variance matrix V taken to be 𝜎 2 (I − M−1 ), 10V = 0. So, 𝜷,
a joint normal distribution with mean 𝜷, w with variance matrix V.
16
CHAPTER 3. PARTIAL LIKELIHOOD
Chapter 4
Simulation for data
4.1
Simulation study: Infection in Kidney patients
Analyzed by the R package ‘survival’, the survfit of the original dataset is shown below.
The data of 76 patients for
> data(kidney)
> kfitm1 <- coxph(Surv(time,status) ~ age + sex + disease + frailty(id, dist=’gauss’))
> kfitm1
Call:
coxph(formula = Surv(time, status) ~ age + sex + disease + frailty(id,
dist = "gauss"), data = kidney)
coef se(coef)
age
0.00489 0.01497
sex
-1.69728 0.46101
diseaseGN
0.17986 0.54485
diseaseAN
0.39294 0.54482
diseasePKD
-1.13631 0.82519
frailty(id, dist = "gauss
se2
Chisq
DF
0.01059 0.10678 1.0
0.36170 13.55454 1.0
0.39273 0.10897 1.0
0.39816 0.52016 1.0
0.61728 1.89621 1.0
17.89195 12.1
p
0.74384
0.00023
0.74131
0.47077
0.16850
0.12376
Iterations: 7 outer, 42 Newton-Raphson
Variance of random effect= 0.493
Degrees of freedom for terms= 0.5 0.6 1.7 12.1
Likelihood ratio test=47.5 on 14.9 df, p=3e-05
n= 76, number of events= 58
The only regression coefficient with -1.697 that is significantly large compared to
its standard error is that of the sex variable, indicating a lower infection rate for female
patients.
The estimate of 𝜎 2 = 0.3821. In general, the effect of the prior distribution on frailty
terms is to shrink estimates toward the origin, which bias the estimate.
> kfit <- coxph(Surv(time, status)~ age + sex + disease + frailty(id), kidney)
> kfit
Iterations: 6 outer, 35 Newton-Raphson
Variance of random effect= 5e-07
I-likelihood = -179.1
Degrees of freedom for terms= 1 1 3 0
Likelihood ratio test=17.6 on 5 df, p=0.003
n= 76, number of events= 58
17
STATS 756
Kyuson Lim
> round(kfit$coefficients, 3)
age
sex diseaseGN
0.003
-1.483
0.088
diseaseAN diseasePKD
0.351
-1.431
Compare to previously defined code where the distribution of the frailty term is unspecified, the iterations of Newton-Raphson
algorithm iterates only for 35 times to find the approximated value of 𝜷.
4.2
Simulation study: testing the model fit
We also test the proportional hazards assumption for a Cox regression model fit. Note
the function to have used in the analysis is ‘coxph’.
> cox.zph(kfit)
chisq df
age
0.105 1
sex
5.953 1
disease 1.985 3
GLOBAL 7.869 5
p
0.746
0.015
0.576
0.164
Figure 4.1: A graph of coefficient vs. time
The plot gives an estimate of the time-dependent coefficient 𝜷(𝑑). If the proportional
hazards assumption holds then the true 𝜷(𝑑) function would be a horizontal line, slope
of 0.
However, the linearity of the regression model in the survival analysis could be tested
via a plot of Martingale residuals. Martingale residuals are the discrepancy between
the observed value of a subject’s failure indicator and its expected value, integrated
over the time for which that patient was at risk. Note that the martingale residuals
are plotted against covariates to detect nonlinearity. Plots of martingale residuals and
partial residuals are examined against the last two of covariates, age and sex.
Smooths are produced by local linear regression (using the lowess function). There
are no observed non-linearity.
18
CHAPTER 4. SIMULATION FOR DATA
Kyuson Lim
STATS 756
Figure 4.2: A graph of residuals
Comparing the magnitudes of the largest values to the regression coefficients suggests
that 1 observation is influential individually.
Figure 4.3: A graph of coefficient vs. time
One of the males (id 21) is a large outlier, with much longer survival than his peers.
If this observation is removed, then no evidence remains for a random subject effect.
CHAPTER 4. SIMULATION FOR DATA
19
STATS 756
4.3
Kyuson Lim
Extensions
The predicted survival profiles for patients 5, and 12 is modeled.
Predicted survival curves for three patients using the penalization
Now, the survival analysis is not only restricted to modeling the survival time of
patients but also survival time of small and medium size corporation (business) in
the market as well. This implementation of the modeling is currently being studied
by data scientist in South Korea for the extension of time to event data analysis. By
implementation of the cox model in the financial market, we may look forward to have
modeling of corporations for various improvement.
4.3.1
Generalized gamma frailty model
The paper presents a frailty model using the generalized gamma distribution as the
frailty distribution, and lognormal, Weibull frailty model as special cases. Written by
Dr. N. Balakrishnan, the BLUP method of this paper is addressed for modeling a new
frailty model with generalized gamma distribution that has more parameters to be less
parametric and more flexible.
Instead of EM algorithm, the Newton-Raphson algorithm is applied to obtain the
MLE of parameters. The use of generalized gamma distribution as the frailty distribution
in a frailty model has substantially improved the goodness-of-fit of the frailty model.
The model is particularly useful in reducing errors in frailty variance estimation. Also,
the performance of the likelihood ratio test depends on the cluster size.
20
CHAPTER 4. SIMULATION FOR DATA
Chapter 5
Appendix: R codes
library(survminer); library(lubridate);
library(penalized);library(survival)
# MPLE
plsimple <- function(beta) {
psi <- exp(beta)
result <- log(psi) - log(3*psi + 3) log(3*psi + 1) - log(2*psi + 1)
result }
result <- optim(par=0, fn = plsimple, method = "L-BFGS-B",
control=list(fnscale = -1),
lower = -3, upper = 1)
result$par
# survival analysis, plot
ggsurvplot(survfit(kfit), data = kidney)
# used in report/presentation
ggsurvplot(survfit(kfit), pval = F, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
fun = "pct",
data = kidney,legend = "none",
ggtheme = theme_bw())
# hazard ratio, confidence interval
ggforest(kfitm1, data = kidney, fontsize=1.25)
# model diagnostics for events
cox.zph(kfit) %>% plot
plot(survfit(kfitm1)[1], lty=2, lwd=2, fun="event")
# model validation
temp <- cox.zph(kfit)
print(temp)
plot(temp)
# display the results
# plot curves
21
STATS 756
Kyuson Lim
# model validation
par(mfrow=c(2,2))
res <- residuals(kfit, type=’martingale’)
X <- as.matrix(kidney[,c("age", "sex")]) # matrix of covariates > par(mfrow=c(2,2))
for (j in 1:2) { # residual plots
plot(X[,j], res, xlab=c("age", "sex")[j], ylab="residuals")
abline(h=0, lty=2)
lines(lowess(X[,j], res, iter=0))}
b <- coef(kfit)[c(1,2)] # regression coefficients
for (j in 1:2) { # partial-residual plots
plot(X[,j], b[j]*X[,j] + res, xlab=c("age", "sex")[j], ylab="component+residual")
abline(lm(b[j]*X[,j] + res ~ X[,j]), lty=2)
lines(lowess(X[,j], b[j]*X[,j] + res, iter=0))
}
# influential point
dfbeta <- residuals(kfit, type="dfbeta")
par(mfrow=c(1,3))
for (j in 1:3) {
plot(dfbeta[,j], ylab=names(coef(kfit))[j])
abline(h=0, lty=2)
}
# prediction
attach(kidney)
# penalization
hepato.opt <- optL1(Surv(time, status),
penalized=as.data.frame(kidney[,4:5]), standardize=T,
fold=10)
set.seed(34)
hepato.prof <- profL1(Surv(time, status),
penalized=kidney[,4:5],
standardize=T, fold=10)
hepato.pen <- penalized(Surv(time, status),
penalized=kidney[,4:5], standardize=T,
lambda1=hepato.opt$lambda)
round(coef(hepato.pen, standardize=T), 3)
hepato.predict.5 <- predict(hepato.pen,
kidney[5,4:5])
hepato.predict.12 <- predict(hepato.pen,
kidney[12,4:5])
par(mfrow=c(1,1))
plot(stepfun(hepato.predict.5@time[-1], hepato.predict.5@curves),
do.points=F, col="blue", lwd=2, ylim=c(0,1),
xlab="Time in months", ylab="Predicted survival probability")
plot(stepfun(hepato.predict.12@time[-1], hepato.predict.12@curves),
do.points=F, add=T, col="red")
legend("bottomleft", legend=c( "Patient 5","Patient 12"), pch=1,
col=c("blue", "red"))
22
CHAPTER 5. APPENDIX: R CODES
Bibliography
[1] McGilchrist, C. A., & Aisbett, C. W. (1991). Regression with frailty in survival analysis. Biometrics, 461-466.
https://www.jstor.org/stable/2532138?casa_token=cxuDrkxyJzUAAAAA%3AEnp4ejKDMHcBHgMbROgKulGAA-lUE0Iw16oVqCSqDXPbWGutHjuBeIJ
3D0LUaBnEGd-dVIBW88Bkm6vPgEhEca24&seq=1#metadata_info_tab_contents.
[2]
Balakrishnan, N., & Peng, Y. (2006). Generalized gamma frailty model. Statistics in
medicine, 25(16), 2797-2816.
https://pubmed.ncbi.nlm.nih.gov/16220516/
[3]
(R) Package ‘survival’ [Terry M. Therneau, et. al.]
https://cran.r-project.org/web/packages/survival/survival.pdf
[4]
Moore, D. F. (2016). Applied survival analysis using R. New York, NY: Springer.
https://link.springer.com/book/10.1007/978-3-319-31245-3
[5]
McGilchrist, C. A. (1994). Estimation in generalized mixed models. Journal of the
Royal Statistical Society: Series B (Methodological), 56(1), 61-69.
https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1994.tb01959.x
[6]
Wienke, A. (2010). Frailty models in survival analysis. CRC press.
https://www.routledge.com/Frailty-Models-in-Survival-Analysis/Wienke/p/book/9781420073881
23
Download