Cox Proportional Hazards Model • log β

advertisement
Cox Proportional Hazards Model
• Incorporate the effects of covariates
• Since the hazard function is strictly
positive, consider the log link,
log(g(x; β)) = xT β = β1x1 +β2x2+· · ·+βk xk
⇒
• Parametric survival distributions are
not specified
• Semi-parametric models
Then, the hazard ratio is
T
exp((xT
1 − x0 )β)
• We have just defined the Cox
Proportional Hazards Model
• Proportional hazards assumption
h(t; x, β) = h0(t)exp(xT β)
h(t; x, β) = h0(t)g(x, β)
and
for some unspecified baseline hazard
h0(t)
• hazard ratio:
h(t; x1 , β)
h(t; x0 , β)
=
g(x; β) = exp(xT β)
T
S(t; x, β) = [So(t)]exp(x β)
(Cox, 1972 JRSS B, 74, 187-220)
• Estimation is difficult since ho(t)
is an infinite dimensional nuisance
parameter. What can we do?
g(x1 ; β)
g(x0 ; β)
300
The proportional hazards assumption is a
rather strong assumption that may not
always be reasonable
• Suppose there is a single covariate
x=
⎧
⎪
⎪
⎨
⎪
⎪
⎩
0 non-surgical intervention
1 surgical intervention
• Surgical intervention has high early
risk, but very good long-term survival
prospects
• Non-surgical intervention as low early
risk, but poor long-term survival
prospects
• The hazard ratio would not be
constant; it would initially be high
and then decrease over time.
302
301
Estimation in Cox Model
• Instead of the full likelihood, use
the partial likelihood proposed by Cox
(1972, JRSS B and 1975, Biometrika )
• Consider the conditional probability
that an individual dies at time t(j) given
that t(j) is one of the r observed death
times {t(1), t(2), ..., t(r)}.
P (individual with values x(j) dies at t(j))
P (one death at t(j))
Since the baseline hazard has an arbitrary form, intervals between successive death times provide no information
about the effects of covariates on the
hazard function.
303
• Individuals are assumed to respond
independently
P (individual with values x(j) dies at t(j))
k∈R(t(j)) P (individual k dies at t(j))
• Replace probability of death at time t(j)
with probability of death in the interval
[t(j), t(j) + Δ).
P (individual with x(j) dies in [t(j) , t(j) +Δ))
Δ
P (individual k dies in [t(j) , t(j) +Δ))
k∈R(t(j))
Δ
• Contribution to the likelihood is
hj (t(j))
k∈R(t(j)) hk (t(j))
k∈R(t(j)) hazard at
t(j)for individual k
h0(t(j)) exp(xT
j β)
T
k∈R(t(j)) h0(t(j))exp(xk β)
• Cancel out the baseline hazard function
to obtain
exp(xT
j β)
T
k∈R(t(j)) exp(xk β)
• The joint partial likelihood is
⎡
Lp(β) =
• Limit as Δ → 0
hazard at t(j) for individual with x(j)
=
n
j=1
⎢
⎢
⎢
⎢
⎢
⎣
⎤
exp(xT
j β)
T
k∈R(tj ) exp(xk β)
⎥
⎥
⎥
⎥
⎥
⎦
δj
where δj = 0 if tj is a censoring
time, 1 otherwise.
304
305
• Then, the joint likelihood is
Alternative explanation:
n [f (t )]δi [S (t )]1−δi
i (i)
i=1 i (i)
δi
1−δi
= n
i=1[hi(t(i))Si (t(i))] [Si (t(i))]
fi (t)
• By definition hi(t) = S
i (t)
δi
= n
i=1[hi(t(i))] Si (t(i))
• The contribution to the likelihood for
an observed failure at time t is
⎡
= n
i=1
⎢
⎢
⎢
⎢
⎢
⎣
fi(t) = hi(t)Si(t)
=
k∈R(t(i))hk (t(i))
⎤
δi
⎥
⎥
⎥
⎥
⎥
⎦
×[ k∈R(t(i))hk (t(i))]δi Si(t(i))
exp(xT
i β)
h0(t)exp(xT
i β)[S0(t)]
⎡
=
• The contribution to the likelihood for a
right censored observation at time t is
Si(t) =
hi(t(i))
⎤
⎢
n ⎢⎢⎢
i=1 ⎢⎢⎣ exp(xT
i β)
T
k∈R(t(i))exp(xk β)
δi
⎥
⎥
⎥
⎥
⎥
⎥
⎦
δi
× n
i=1[ k∈R(t(i))hk (t(i))] Si(t(i))
T
[S0(t)]exp(xi β)
• The partial likelihood is
• Let δi = 1 if ti is a failure time and
δi = 0 if ti is a censoring time.
306
⎡
L(β) =
⎢
⎢
n ⎢⎢
i=1 ⎢⎢⎣ ⎤
exp(xT
i β)
T
k∈R(t(i))exp(xk β)
δi
⎥
⎥
⎥
⎥
⎥
⎥
⎦
307
• The local information matrix I(β) has
elements
More Details on Estimation in Cox
Proportional Hazards Model
Ik = −∂ 2 log(Lp(β))/∂βk ∂β
• Assuming no ties, the log-partial
likelihood is
=
n
i=1
log(Lp(β))
=
n
T β − log T
δ
x
j∈R(ti) exp(xj β)
i=1 i i
δi
j∈R(ti)
wij (xk,j − x̄k,i)(x,j − x̄,i)
• Large sample distribution
.
β̂ ∼
N (β, I(β)−1 )
• The partial likelihood depends only on
the ordering of the survival times, not
the actual values; so it is invariant to
monotone transformation of time
• Hypothesis testing
– Partial likelihood ratio tests
• The gradient vector (score function)
has elements
⎡
U (β) =
where
and
⎢
⎢
⎣
⎤
∂ log(Lp(β)) ⎥⎥
∂βk
⎦
⎡
=
n
⎢ ⎣
i=1
⎤
δi(xk,i − x̄k,i)⎥⎦
– Score tests
x̄k,i = j∈R(ti) wij xk,j
T
wij = exp(xT
j β)/ ∈R(ti) exp(x β)
– Wald test
308
Tied Survival Times
Due to the inability to continuously monitor subjects.
309
• The Breslow approximation: The
contribution to the partial likelihood is
( r +r +rr1+r +r ) × ( r +r +rr2+r +r )
1
2
3
4
5
1
2
3
4
5
• Suppose 5 individuals are at risk and
two fail at time t with risk scores r1 =
T
exp(xT
1 β) and r2 = exp(x2 β)
– This is a poor approximation when
β is not close to zero
• We cannot order these two subjects
with respect their actual failure times.
– β̂ is biased toward zero ( failed
individuals are used too often in
the denominator)
– Default in SAS
310
311
• Efron approximation: The contribution to the partial likelihood is
2
( r +r +rr1+r +r ) × ( .5r +.5r r+r
)
1
2
3
4
5
1
2
3 +r4+r5
• Average likelihood method:
– If subjects with tied death times
have identical risk scores, this
method is exact
– “exact” partila likelihood
– Computationally intensive
(TIES=exact option in the MODEL
statement in the SAS PHREG
procedure)
– Biased toward zero for “larger”
values of β , (eg |βi| > 2.5)
– Smaller bias than the Breslow
estimator
– Least amount of bias
– Default in S-Plus and R
– Contribution to the likelihood for a
pair of tied failure times is
1
2
Contribution if three subjects fail at
the same time
⎛
⎞
r1
r1+r2+r3+r4+r5
×
⎛
⎜
2
( r +r +rr1+r +r )( r +r r+r
)
1
2
3
4
5
2
3
4 +r5
1
)
+( r +r +rr2+r +r )( r +r r+r
1
2
3
4
5
1
3
4 +r5
⎟
r2
⎟
⎠
2
2
r
+
r
+
r
+r
+r
1
2
3
4
5
3
3
3
⎜
⎜
⎝2
⎞
⎟
r3
⎟
⎠
1
1
3 r1 + 3 r2 + 3 r3 +r4 +r5
× ⎜⎝ 1
312
313
• If there are three individuals with tied
failure times, the contribution to the
partial likelihood is
1
6
• The sum of the k! terms can be expressed as intergral (use intergration by
parts)
r3
2
( r +r +rr1+r +r )( r +r r+r
)(
)
1
2
3
4
5
2
3
4 +r5 r3 +r4 +r5
r1
r3
r2
+( r +r +r +r +r )( r +r +r +r )( r +r +r )
1
2
3
4
5
2
3
4
5
2
4
∞
5
0
r3
1
)(
)
+( r +r +rr2+r +r )( r +r r+r
1
2
3
4
5
1
3
4 +r5 r3 +r4 +r5
r2
r3
r1
+( r +r +r +r +r )( r +r +r +r )( r +r +r )
1
2
3
4
5
1
3
4
5
1
4
k
j=1
⎡
⎛
⎢
⎢
⎢
⎣
⎜
1 − exp ⎜⎜⎝
⎞⎤
rj t
⎟⎥
−t dt
⎟⎥
⎟⎥ e
⎠⎦
rk+1 + rk+2 + · · · + rn
(Delong, et al., 1994, Biometrika, 81,
607-611)
5
r2
1
)(
)
+( r +r +rr3+r +r )( r +r r+r
1
2
3
4
5
1
2
4 +r5 r2 +r4 +r5
r1
2
)(
)
+( r +r +rr3+r +r )( r +r r+r
1
2
3
4
5
1
2
4 +r5 r1 +r4 +r5
• Alternatively, you could use a sum of a
random sample of the k! terms.
• If k individuals had tied failure times,
there would be k! = k(k − 1) · · · (2)(1)
terms in this sum
314
315
Interpretation in the Cox Model
• Coefficients are log hazard ratios. For
a continuous covariate
exp(βj (xj + 1))
eβj =
exp(βj xj )
for any value of xj
• Also,
rβj
e
=
• To construct confidence intervals for
hazard ratios, exponentiate endpoints
of confidence intervals for log hazard
ratios
Compute L̂ = β̂j − zα/2Sβ̂
j
exp(βj (xj + r))
and Û = β̂j + zα/2Sβ̂
j
exp(βj xj )
for any value of xj
• rβj is a log-hazard ratio associated with
an r unit increase in Xj , while the other
covariates are held constant.
Then, compute (exp(L̂), exp(Û ))
as an approximate (1 − α) × 100%
confidence interval for exp(β)
• rβj is estimated as r β̂j with standard
error rSβ̂ , where S 2 is the (j, j)
β̂
j
j
element of the inverse of the local
estimate of the information matrix for
the partial likelihood
316
• To construct a confidence interval for
exp(xT β)
√
Compute SxT β̂ = xT V x, where
V is the inverse of the local
estimate of the information
matrix for the partial
likelihood.
Compute L̂ = xT β̂j − zα/2SxT β̂
j
and Û = xT β̂j + zα/2SxT β̂
j
Then, compute (exp(L̂), exp(Û ))
as an approximate (1 − α) × 100%
confidence interval for exp(xT β)
317
• Estimate risk scores (use the XBETA
option in the OUTPUT statement for
the PHREG procedure in SAS)
r̂(x, β̂) =
β̂k xk = xT β̂
• Estimate of the covariate adjusted survivor function
T
Ŝ(t; x, β̂) = [Ŝ0(t)]exp(x β̂)
• If the x’s are centered, the baseline
survivor function S0(t) corresponds
to an ’average’ subject
• We need an estimate of the baseline
survivor function S0(t)
318
319
Estimating the hazard and survivor
functions
The Kaflbeisch-Prentice (1973, Biometrika
60, 267-278) method:
The probability that a subject with covariates values xi survives beyond time t is
S(t; xj ) = (S0(t))
Let ηj denote the conditional probability of
failing in [t(j), t(j+1)) given that a subject
from the baseline population survives to
time t(j). Defining η0 = 1,
S0(t(j)) =
j−1
i=0
ηj
and the conditional probability that a
subject with covariate values xj fails in
[t(j), t(j+1)) given survival to the beginning
of the interval is approximately
T
T
S0(t(j))exp(xj β) − S0 (t(j+1))exp(xj β)
exp(xT
j β̂)
S0 (t(j))
exp(xTj β)
= 1 − ηj
exp(xTj β)
The conditional probability that a subject
with covariate values xj survives beyond
t(j) given survival to the beginning of the
interval is approximately
For observed failure times
t(1) < t(2) < ... < t(r)
suppose there are dj deaths and
nj individuals at risk at time t(j).
exp(xT
j β)
S0(t(j+1))
exp(xT
j β)
S0(t(j))
exp(xT
j β)
= ηj
320
321
• S0 (t) is estimated as
An approximate joint likelihood is
⎡
r
⎢
⎢
⎢
⎣
⎛
j=0 k∈Dj
⎤
⎞
⎜
⎜
⎝
1−
exp(xT
k β)⎟
⎟
ηj
⎠
k∈Rj −Dj
exp(xT
k β) ⎥⎥⎥
ηj
⎦
Ŝ0 (t) = k
j=1 η̂j
⎡
⎢
⎢
⎢
⎢
⎣
= 1−
Estimate ηj by maximizing the log of this
approximate likelihood. Setting first
partial derviatives equal to zero we obtain
ηˆj as the solution to
exp(xT
i β̂)
k∈D(t(j)) 1
exp(xT
i β̂)
− η̂j
=
k∈R(t(j))
exp(xT
i β̂)
If there are no ties (dj = 1 for all
j=1,2,...,r), then
⎡
⎢
⎢
⎢
⎢
⎢
⎣
η̂j = 1 −
exp(xT
(j) β̂)
T
i∈R(t(j)) exp(xi β̂)
exp(−xT
(j) β̂)
⎤
⎥
⎥
⎥
⎥
⎥
⎦
exp(xT
(j) β̂)
T
i∈R(t(j) ) exp(xi β̂)
⎤
⎥
⎥
⎥
⎥
⎦
exp(−xT
(j) β̂)
for t(k) ≤ t < t(k + 1), k = 1, 2, ..., r − 1
with Ŝ0(t) = 1 for t < t(1) and Ŝ0(t) = 0
for t > t(r) unless there are censored
times after t(r). In that case, Ŝ0(t) =
Ŝ0 (t(r)) up to the largest censored time
and it is undefined after that time.
• Assume that the baseline hazard is
constant between adjacent failure
times. Then,
ĥ0(t) =
1 − η̂j
t(j+1) − t(j)
for t(j) ≤ t < t(j + 1) and j=1,2,...,r-1
Otherwise, an iterative procedure is
required to obtain a solution
with ĥ0(t) = 0 for t < t(1)
322
323
Breslow (Nelson-Aalen) method:
• When there are no covariates,
η̂j =
nj − dj
• Approximate
nj
exp(xT
i β̂)
and
η̂j
Ŝ0 (t) =
k
j=1
η̂j
⎛
η̃ = exp
• The baseline cumulative hazard function is estimated as
k
j=1
S̃0(t) =
⎞
⎟
⎟
⎟
⎟
⎟
T
⎠
i∈R(t(j) ) exp(xi β̂)
k
j=1
exp
⎜
⎜
⎜
⎜
⎜
⎝
⎞
−dj
⎟
⎟
⎟
⎟
⎟
T
⎠
i∈R(t(j) ) exp(xi β̂)
for t(k) ≤ t < t(k + 1), k=1,2,...,r-1.
and
−dj
⎛
Ĥi(t) = Ĥ0(t)exp(xT
i β̂)
⎜
⎜
⎜
⎜
⎜
⎝
• Then,
log(η̂j )
• For others
Ŝi(t) = Ŝ0(t)
⎞
T
T
with 1 + exi β̂ log(η̂j ) to obtain
is the Kaplan-Meier estimator
Ĥ0(t) = −log(Ŝ0(t)) = −
⎛
= exp ⎝exi β̂ log(η̂j )⎠
• This estimator is not necessarily zero
at the longest survival time,when it is
uncensored
exp(xT
i β̂)
324
325
SAS Example: VA data
/* SAS code for fitting a proportional hazards model
to data from the VA lung cancer trial of 137 male
patients with inoperable lung cancer. This code
is posted as vaphreg1.sas */
• Also,
H̃(t) = −log(S̃0(t))
⎛
=
k
j=1
⎜
⎜
⎜
⎜
⎜
⎝
⎞
dj
⎟
⎟
⎟
⎟
⎟
T
⎠
i∈R(t(j) ) exp(xi β̂)
/* Variables
Treatment: 1=standard, 2=test (chemotherapy)
Celltype: 1=squamous, 2=smallcell, 3=adeno, 4=large
Survival in days
Status: 1=dead, 0=censored
Karnofsky score: Measures patient
performance of activities of daily
living.
• S̃0 (t) is cheaper to compute than Ŝ0(t)
SCORE
100
90
• When there are few ties, S̃0(t) will be
similar to Ŝ0(t)
80
70
60
50
40
30
20
10
• Estimates of medians and other
percentiles can be obtained from
the estimated survival function
FUNCTION
Normal, no evidence of disease
Able to perform normal activity with
only minor symptoms
Able to perform normal activity with
effort, some symptoms
Able to care for self but unable to do
normal activities
Requires occasional assistance
Requires considerable assistance
Disabled, requires special assistance
Severely disabled
Very sick, requires supportive treatment
Moribund
Months from Diagnosis
Age in years
Prior therapy: 0=no, 10=yes
*/
326
327
data va;
infile ’c:\st565\va.dat’;
input rx cellt time status karno months
age prior_rx;
prior_rx = prior_rx/10;
ct2=0; if(cellt=2) then ct2=1;
ct3=0; if(cellt=3) then ct3=1;
ct4=0; if(cellt=4) then ct4=1;
run;
proc format;
value celltype 1
2
3
4
=
=
=
=
’squamous’
’smallcell’
’adeno’
’large’;
/* Create a new data file with covariate
values at which you want to estimate
survival probabilities */
data vanew;
input rx ct2 ct3 ct4 karno months age prior_rx;
datalines;
1 0 0 1 80 64 70 1
0 0 0 1 80 64 70 1
RUN;
proc phreg data=va;
model time*status(0)= rx ct2 ct3 ct4 karno
months age prior_rx/ties=efron;
baseline out=va2 covariates=vanew survival=s
stderr=se lower=l95 upper=u95;
run;
proc print data=va2; run;
/* Plot the estimated survival probabilities
for the standard and chemotherapy treatments
on the same plot when cell type is large,
Karnofsky score is 80, time from diagnosis
until entry into the study is 64 months,
age is 70 years, and there was previous
treatment */
data set1; set va2;
if(rx=1);
s1 = s;
keep time s1;
run;
328
329
The PHREG Procedure
Model Information
data set0; set va2;
if(rx=0);
s0=s;
keep time s0;
run;
Data Set
Dependent Variable
Censoring Variable
Censoring Value(s)
Ties Handling
data va3; merge set1 set0; by time;
run;
proc gplot data=va3;
plot s1*time s0*time /overlay vaxis=axis1 haxis=axis2;
symbol1 v=none interpol=step color=black line=1 w=4;
symbol2 v=none interpol=step color=black line=3 w=4;
axis1 label=(f=swiss h=1.9 a=90 r=0 "Survival Probability")
value=(f=swiss h=1.5);
axis2 label=(f=swiss h=1.9 "Time(days)")
value=(f=swiss h=1.5);
legend frame;
title ls=0.5 h=2.0 f=swiss "Comparison of Survivor Curves";
run;
WORK.VA
time
status
0
EFRON
Summary of the Number of Event and Censored Values
Total
Event
Censored
Percent
Censored
137
128
9
6.57
Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
330
331
Model Fit Statistics
Criterion
Without
Covariates
With
Covariates
1010.898
1010.898
1010.898
948.794
964.794
987.610
Obs
-2 LOG L
AIC
SBC
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
62.1039
66.7375
62.3675
8
8
8
Pr > ChiSq
<.0001
<.0001
<.0001
Analysis of Maximum Likelihood Estimates
Variable
rx
ct2
ct3
ct4
karno
months
age
prior_rx
DF
Parameter
Estimate
Standard
Error Chi-Square
1
1
1
1
1
1
1
1
0.29461
-0.86156
1.19607
0.40129
-0.03282
0.000082
-0.00871
0.07159
0.20755
0.27528
0.30092
0.28269
0.00551
0.00914
0.00930
0.23231
2.0148
9.7950
15.7986
2.0151
35.4979
0.0001
0.8763
0.0950
Hazard
Pr>ChiSq Ratio
0.1558
0.0017
<.0001
0.1557
<.0001
0.9929
0.3492
0.7580
1.343
0.423
3.307
1.494
0.968
1.000
0.991
1.074
1
2
3
4
.
.
.
95
96
97
98
99
100
101
.
.
.
195
196
197
.
.
.
292
293
294
rx ct2 ct3 ct4 karno months age prior_rx
time s
se
l95
u95
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
80
80
80
80
64
64
64
64
70
70
70
70
1
1
1
1
0
1
2
3
1.0000
0.9920
0.9880
0.9839
.
0.0073
0.0099
0.0124
.
0.9777
0.9688
0.9598
.
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
80
80
80
80
80
80
80
64
64
64
64
64
64
64
70
70
70
70
70
70
70
1
1
1
1
1
1
1
553
587
991
999
0
1
2
0.0092
0.0042
0.0007
0.0000
1.0000
0.9941
0.9910
0.0258
0.0137
0.0030
0.0000
.
0.0058
0.0079
0.00004
0.00001
0.00000
.
.
0.9828
0.9757
1
1
1
.
.
1
1
0
0
0
1
0
0
0
1
1.49 .65 .20 .20
80
64
70
1 991
80
64
70
1 999
58.6 8.8 58.3 .29 0
0.0046
0.0000
1.0000
0.0153
0.0000
.
0.00001
.
.
1
.
.
1.49 .65 .20 .20
1.49 .65 .20 .20
1.49 .65 .20 .20
58.6
58.6
58.6
8.8 58.3 .29 587 .00089
8.8 58.3 .29 991 .00009
8.8 58.3 .29 999 .00000
332
.001216 .00006
.000188 .00000
.000000 .
0.013
0.0048
.
333
Fitting the Cox Model
in
R or Splus
coxph(formula, data=sys.parent(), weights, subset,
na.action, eps = 1e-07, init, iter.max=10,
method=c("efron","breslow","exact"),
singular.ok=T, robust=F,
model=F, x=F, y=T)
334
335
#
#
#
#
#
library(survival)
options(contrasts=c("contr.treatment", "contr.poly"))
vafit <- coxph(Surv(time, status) ~ rx+celltf+karno
+months + age + priorrx, data = va,
x=T, y=T, robust=T,
method="efron", singular.ok=T)
R code for fitting the Cox model to data
from the VA lung cancer trial of 137 male
patients with inoperable lung cancer.
This code is posted as
vacoxph.ssc
# Variables
#
Treatment: 1=standard, 2=test (chemotherapy)
#
Celltype: 1=squamous, 2=smallcell,
#
3=adeno, 4=large
#
Survival in days
#
Status: 1=dead, 0=censored
#
Karnofsky score
#
Months from Diagnosis
#
Age in years
#
Prior therapy: 0=no, 10=yes
# Enter the data into a data frame.
va <- read.table("c:/stat565/va.dat",
header=F, col.names=c("rx", "cellt", "time",
"status", "karno", "months", "age", "priorrx"))
va$rx <- va$rx-1
va$priorrx <- va$priorrx/10;
va$celltf<-as.factor(va$cellt)
va
summary(vafit)
Call:
coxph(formula = Surv(time, status) ~ rx + celltf + karno
+ months + age + priorrx, data = va, method = "efron",
singular.ok = T, robust = T, x = T, y = T)
n= 137
coef exp(coef)
rx 0.294605 1.343
celltf2 0.861559 2.367
celltf3 1.196067 3.307
celltf4 0.401290 1.494
karno -0.032815 0.968
months 0.000082 1.000
age -0.008706 0.991
priorrx 0.071589 1.074
rx
celltf2
celltf3
celltf4
karno
months
age
priorrx
se(coef) robust se
z
0.20755
0.18867 1.562
0.27528
0.31424 2.742
0.30092
0.27596 4.334
0.28269
0.25242 1.590
0.00551
0.00522 -6.286
0.00914
0.00795 0.010
0.00930
0.01030 -0.845
0.23231
0.22068 0.324
exp(coef) exp(-coef) lower .95 upper .95
1.343
0.745
0.928
1.943
2.367
0.423
1.278
4.382
3.307
0.302
1.926
5.680
1.494
0.669
0.911
2.450
0.968
1.033
0.958
0.978
1.000
1.000
0.985
1.016
0.991
1.009
0.972
1.012
1.074
0.931
0.697
1.656
Rsquare= 0.364
(max possible= 0.999 )
336
Likelihood ratio test=
Wald test
=
Score (logrank) test =
Robust =
62.1
68.7
66.7
52.3
on 8 df,
p=1.8e-010
on 8 df,
p=8.96e-012
on 8 df,
p=2.19e-011,
p=1.46e-008
Note: The likelihood ratio and score tests assume
independence among observations within a cluster,
the Wald and robust score tests do not.
p
1.2e-001
6.1e-003
1.5e-005
1.1e-001
3.3e-010
9.9e-001
4.0e-001
7.5e-001
337
vafit$coef
rx
celltf2 celltf3
celltf4
0.2946054 0.8615587 1.196067 0.4012899
karno
months
-0.03281529 0.00008177293
age
priorrx
-0.008706219 0.07158904
> names(vafit)
[1]
[4]
[7]
[10]
[13]
[16]
[19]
"coefficients"
"score"
"residuals"
"n"
"naive.var"
"x"
"call"
"var"
"iter"
"means"
"terms"
"rscore"
"y"
"loglik"
"linear.predictors"
"method"
"assign"
"wald.test"
"formula"
vafit$x
1
2
3
4
5
6
7
8
9
10
rx celltf2 celltf3 celltf4 karno months age priorrx
0
0
0
0
60
7 69
0
0
0
0
0
70
5 64
1
0
0
0
0
60
3 38
0
0
0
0
0
60
9 63
1
0
0
0
0
70
11 65
1
0
0
0
0
20
5 49
0
0
0
0
0
40
10 69
1
0
0
0
0
80
29 68
0
0
0
0
0
50
18 43
0
0
0
0
0
70
6 70
0
338
vafit$var
[,1]
[,2]
[,3]
[,4]
[1,] 3.559674e-002 0.00258373 -0.0031549717 0.00028655499
[2,] 2.583736e-003 0.09874914 0.0564663712 0.04576688179
[3,] -3.154972e-003 0.05646637 0.0761517918 0.04834877156
[4,] 2.865550e-004 0.04576688 0.0483487716 0.06371456224
[5,] 2.117459e-006 -0.00009915 -0.0001843566 -0.00005868008
[6,] -2.567700e-004 -0.00071643 -0.0001088384 -0.00003811918
[7,] -8.354853e-004 -0.00028035 0.0001403169 0.00008960564
[8,] 2.024385e-003 -0.00179947 0.0001863414 -0.01106331757
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[,5]
2.117459e-006
-9.915142e-005
-1.843566e-004
-5.868008e-005
2.724976e-005
1.470556e-005
4.789953e-006
-1.094614e-004
[,6]
[,7]
[,8]
-0.00025677 -8.354853e-004 0.0020243849
-0.00071643 -2.803497e-004 -0.0017994743
-0.00010884 1.403169e-004 0.0001863414
-0.00003812 8.960564e-005 -0.0110633176
0.00001471 4.789953e-006 -0.0001094614
0.00006318 1.210810e-005 -0.0006854524
0.00001211 1.060784e-004 -0.0001006315
-0.00068545 -1.006315e-004 0.0486998221
339
#
Model selection
step(vafit, direction="both", trace=1,
steps=1000, k=2)
Start: AIC= 964.79
Surv(time, status) ~ rx + celltf + karno + months
+ age + priorrx
- months
- priorrx
- age
<none>
- rx
- celltf
- karno
Df
AIC
1 962.79
1 962.89
1 963.65
964.79
1 964.81
3 977.63
1 997.78
Step: AIC= 962.79
Surv(time, status) ~ rx + celltf + karno
+ age + priorrx
Df
AIC
- priorrx 1 960.92
- age
1 961.67
<none>
962.79
- rx
1 962.84
+ months
1 964.79
- celltf
3 975.67
- karno
1 996.82
Step: AIC= 960.92
Surv(time, status) ~ rx + celltf + karno + age
- age
<none>
- rx
+ priorrx
+ months
- celltf
- karno
Df
AIC
1 959.83
960.92
1 961.09
1 962.79
1 962.89
3 973.76
1 994.86
340
341
Step: AIC= 959.83
Surv(time, status) ~ rx + celltf + karno
- rx
<none>
+ age
+ priorrx
+ months
- celltf
- karno
Df
AIC
1 959.53
959.83
1 960.92
1 961.67
1 961.75
3 971.93
1 993.04
Call:
coxph(formula=Surv(time, status) ~ celltf + karno,
data = va, method = "efron", singular.ok = T,
robust = T, x = T, y = T)
coef exp(coef) se(coef) robust se
z
celltf2 0.7153
2.04 0.25269
0.31847 2.25
celltf3 1.1577
3.18 0.29294
0.27402 4.22
celltf4 0.3256
1.38 0.27668
0.25356 1.28
karno
-0.0311
0.97 0.00518
0.00541 -5.74
Step: AIC= 959.53
Surv(time, status) ~ celltf + karno
Df
<none>
+ rx
+ age
+ priorrx
+ months
- celltf
- karno
1
1
1
1
3
1
Likelihood ratio test=59.4
on 4 df, p=3.93e-12
n= 137
AIC
959.53
959.83
961.09
961.27
961.35
970.87
992.05
342
p
2.5e-02
2.4e-05
2.0e-01
9.6e-09
343
Model Selection
with
PHREG in SAS
\*
This code is posted as
vaphreg2.sas
Step 1. Variable karno is entered. The model
contains the following explanatory variables:
*\
data va;
infile ’c:\st565\va.dat’;
input rx cellt time status karno
months age prior_rx;
prior_rx = prior_rx/10;
ct2=0; if(cellt-2) then ct2=1;
ct3=0; if(cellt=3) then ct3=1;
ct4=0; if(cellt=4) then ct4=1;
run;
karno
Model Fit Statistics
proc phreg data=va;
model time*status(0)= rx ct2 ct3 ct4 karno
months age prior_rx/ties=efron
selection=stepwise sle=.10 sls=.05;
title "Results from Stepwise Selection";
run;
Criterion
Without
Covariates
With
Covariates
-2 LOG L
AIC
SBC
1010.898
1010.898
1010.898
968.867
970.867
973.719
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
42.0312
45.3191
43.3810
1
1
1
<.0001
<.0001
<.0001
Likelihood Ratio
Score
Wald
proc phreg data=va;
model time*status(0)= rx ct2 ct3 ct4 karno
months age prior_rx/ties=efron
selection=backward sls=.10;
title "Results from Backward Elimination";
run;
344
345
Step 3. Variable ct2 is entered. The model
contains the following explanatory variables:
Step 2. Variable ct3 is entered. The model
contains the following explanatory variables:
ct3
ct2
karno
Model Fit Statistics
Without
Criterion
With
Covariates
-2 LOG L
AIC
SBC
Likelihood Ratio
Score
Wald
karno
Model Fit Statistics
Without
With
Criterion
Covariates
Covariates
Covariates
1010.898
1010.898
1010.898
959.867
963.867
969.571
-2 LOG L
AIC
SBC
Testing Global Null Hypothesis: BETA=0
Test
ct3
1010.898
1010.898
1010.898
952.902
958.902
967.458
Testing Global Null Hypothesis: BETA=0
Chi-Square
DF
Pr > ChiSq
51.0313
57.0119
55.4497
2
2
2
<.0001
<.0001
<.0001
346
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
57.9965
62.9854
60.1330
3
3
3
<.0001
<.0001
<.0001
347
proc phreg data=va;
model time*status(0)= rx ct2 ct3 ct4 karno
months age prior_rx/ties=efron
selection=backward sls=.10;
title "Results from Backward Elimination";
run;
NOTE: No (additional) variables met the
0.1 level for entry into the model.
Analysis of Maximum Likelihood Estimates
Variable
Parameter Standard
DF Estimate Error Chi-Square
ct2
ct3
karno
1
1
1
-0.57401
1.00816
-0.03047
0.21562
0.25830
0.00512
Step
Hazard
Pr > ChiSq Ratio
7.0866
15.2333
35.4470
0.0078
<.0001
<.0001
0.563
2.741
0.970
Variable
Entered
1
2
3
karno
ct3
ct2
Removed
Number
Score
In Chi-Square
1
2
3
Wald
Chi-Square
45.3191
10.5638
7.2377
.
.
.
rx
ct2
ct3
ct4
karno
months
age
Criterion
Without
Covariates
With
Covariates
1010.898
1010.898
1010.898
948.794
964.794
987.610
Pr>ChiSq
<.0001
0.0012
0.0071
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
62.1039
66.7375
62.3675
8
8
8
<.0001
<.0001
<.0001
Likelihood Ratio
Score
Wald
348
349
Step 1. Variable months is removed. The model
contains the following explanatory variables:
rx
ct2
ct3
ct4
karno
age
Step 5. Variable ct4 is removed. The model
contains the following explanatory variables:
prior_rx
ct2
ct3
karno
Model Fit Statistics
Model Fit Statistics
Criterion
Without
Covariates
With
Covariates
-2 LOG L
AIC
SBC
1010.898
1010.898
1010.898
948.794
962.794
982.759
Without
Criterion
Likelihood Ratio
Score
Wald
With
Covariates
-2 LOG L
AIC
SBC
Chi-Square
DF
Pr > ChiSq
62.1038
66.6268
62.3647
7
7
7
<.0001
<.0001
<.0001
350
Covariates
1010.898
1010.898
1010.898
952.902
958.902
967.458
Testing Global Null Hypothesis: BETA=0
Testing Global Null Hypothesis: BETA=0
Test
prior_rx
Model Fit Statistics
-2 LOG L
AIC
SBC
Summary of Stepwise Selection
Step
0. The model contains the following variables:
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
57.9965
62.9854
60.1330
3
3
3
<.0001
<.0001
<.0001
351
NOTE: No (additional) variables met the 0.1
level for removal from the model.
Analysis of Maximum Likelihood Estimates
Variable DF
ct2
ct3
karno
1
1
1
Parameter
Estimate
Standard ChiError
Square
-0.57401
1.00816
-0.03047
0.21562 7.086
0.25830 15.233
0.00512 35.447
Pr>ChiSq
Hazard
Ratio
0.0078
<.0001
<.0001
0.563
2.741
0.970
proc phreg data=va;
model time*status(0)= rx ct2 ct3 ct4 karno
months age prior_rx/ties=efron
selection=score best=5 start=2 stop=6;
title "Searching Through All Possible Models";
run;
Summary of Backward Elimination
Step
Variable
Removed
Number
In
Wald
Chi-Square
Pr > ChiSq
1
2
3
4
5
months
prior_rx
age
rx
ct4
7
6
5
4
3
0.0001
0.1224
0.9315
1.6971
1.3853
0.9929
0.7265
0.3345
0.1927
0.2392
352
Time-Dependent Covariates
Searching Through All Possible Models
Regression Models Selected by Score Criterion
Number of
Variables
Score
Chi-Square
2
2
2
2
2
57.0119
47.5991
46.7842
46.1829
45.6533
ct3 karno
ct2 karno
rx karno
ct4 karno
karno prior_rx
3
3
3
3
3
62.9854
57.6836
57.2573
57.0362
57.0236
ct2 ct3 karno
rx ct3 karno
ct3 karno months
ct3 karno age
ct3 karno prior_rx
4
4
4
4
4
64.7222
63.9385
63.1407
63.0157
62.9861
rx ct2 ct3 karno
ct2 ct3 ct4 karno
ct2 ct3 karno months
ct2 ct3 karno prior_rx
ct2 ct3 karno age
5
5
5
5
5
66.5486
64.8535
64.7811
64.7318
64.1015
rx ct2 ct3 ct4 karno
rx ct2 ct3 karno months
rx ct2 ct3 karno age
rx ct2 ct3 karno prior_rx
ct2 ct3 ct4 karno months
6
6
6
6
6
66.6838
66.6154
66.5636
64.8984
64.8576
rx
rx
rx
rx
rx
Variables Included in Model
ct2
ct2
ct2
ct2
ct2
ct3
ct3
ct3
ct3
ct3
353
• Covariates whose values change over
the course of the study
• Can be used to create models where
the proportional hazards property of a
constant hazard ratio across time does
not hold
ct4 karno months
ct4 karno age
ct4 karno prior_rx
karno months age
karno months prior_rx
354
• Internal variables:
are subjectspecific and require periodic measurements on individual subjects (e.g.
measure the concentration of a specific substance in blood, or the presence/absence of a certain condition)
• External variables: Factors that apply to all subjects (e.g., average hourly
pollen count in an allergy study, or predetermined changes in dosages of a
drug)
355
• The hazard ratio for two subjects is
• Let xji(t) denote the value for j − th
covariate for the i − th subject at time
t.
The hazard for the i − th subject is
⎛
k
⎜ hi(t) = h0(t)exp ⎜⎝
j=1
⎞
⎟
βj xji(t)⎟⎠
In this model,
h0(t) is the hazard for an individual
for whom all covariates are zero at
the time origin, and remain zero
through time
The hazard ratio hi(t)/h0(t) can
change through time
hi(t)
hm(t)
= exp (β1[x1i(t) − x1m(t)] + · · · + βk [xki(t) − xkm(t)])
Then, βj can be interpretted as the
log(hazard ratio) for two individuals
whose values of the j − th explanatory
variable differ by one unit at some time
point t and who have the same values for all other covariates at that time
point.
• Assuming no ties, the log-partial
likelihood is
log(Lp(β)) =
⎡
n
k
⎢ i=1
δi ⎢⎣
j=1
βj xji(ti)
⎛
⎜
−log ⎜⎜⎝
⎞⎤
∈R(ti)
exp(
k
j=1
⎟⎥
βj xji(ti))⎟⎟⎠⎥⎥⎦
where δi = 0 for a censored time and is
unity otherwise.
356
• At each observed failure time you must
know the values of the time dependent
covariates for all individuals still at risk
– Generally no problem for external
variables
– No problem when the covariate is a
direct function of time
– You could measure internal time
dependent covariates using the same
schedule of inspection times for
each individual
(eg., each subject reports to the
clinic every six months)
– Otherwise, you may a have huge
missing data problem.
358
357
• Treatment effects can be confounded
with effects of time dependent covariates
Suppose you randomly assign
leukaemia patients to either a
standard or a new cytotoxic drug.
– White blood cell count is used
as a time dependent covariate
– The actual difference in the
drugs corresponds to how they
affect white blood cell count.
– The difference in the drugs is
not significant after adjusting
for white blood cell count
– There is a significant difference
in the drugs if white blood cell
counts at different times are
not included in the model.
359
Checking the proportional hazards
assumption
• Include an iteraction with time in the
model as a time dependent covariate
x(t) = var ∗ time
x(t) = var ∗ log(time)
• To check the proportional hazards
assumption for the treatment variable
rx in the VA lung cancer example, use
the follwing SAS code posted as
vacoxph3.sas
data va;
infile ’c:\st565\va.dat’;
input rx cellt time status karno
months age prior_rx;
prior_rx = prior_rx/10;
rx=rx-1; /* Recode rx=0 for standard */
ct2=0; if(cellt=2) then ct2=1;
ct3=0; if(cellt=3) then ct3=1;
ct4=0; if(cellt=4) then ct4=1;
rxt=rx*log(time);
karnot=karno*time;
run;
proc phreg data=va;
model time*status(0)= rx rxt ct2 ct3 ct4
karno months age prior_rx /
ties=efron;
title "Incorrect Check for Proportional Hazards";
run;
proc phreg data=va;
model time*status(0)= rx rxtime ct2 ct3 ct4
karno months age prior_rx /
ties=efron;
rxtime=rx*log(time);
title "Correct Check for proportional hazards";
run;
360
361
Incorrect Check for Proportional Hazards
Incorrect Check for Proportional Hazards
Model Fit Statistics
Criterion
-2 LOG L
AIC
SBC
Without
Covariates
With
Covariates
1010.898
1010.898
1010.898
867.172
885.172
910.840
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
143.7263
149.4959
109.6200
9
9
9
Pr > ChiSq
<.0001
<.0001
<.0001
362
Analysis of Maximum Likelihood Estimates
Variable DF
Parameter Standard
Estimate Error
Chi-Square
Pr>ChiSq
rx
rxt
ct2
ct3
ct4
karno
months
age
prior_rx
7.29916
-1.51682
-0.26993
0.44024
-0.19917
-0.01192
-0.00204
0.00275
0.15749
<.0001
<.0001
0.2943
0.1449
0.4843
0.0507
0.8328
0.7669
0.4882
1
1
1
1
1
1
1
1
1
0.87249
0.18476
0.25739
0.30201
0.28478
0.00610
0.00966
0.00928
0.22722
69.9883
67.4019
1.0998
2.1249
0.4891
3.8185
0.0446
0.0879
0.4804
363
Hazard
Ratio
1479.054
0.219
0.763
1.553
0.819
0.988
0.998
1.003
1.171
Correct Check for proportional hazards
7
Correct Check for proportional hazards
Model Fit Statistics
Analysis of Maximum Likelihood Estimates
Criterion
Without
Covariates
With
Covariates
-2 LOG L
AIC
SBC
1010.898
1010.898
1010.898
948.534
966.534
992.202
Variable
rx
rxtime
ct2
ct3
ct4
karno
months
age
prior_rx
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr>ChiSq
62.3642
66.7829
62.4007
9
9
9
<.0001
<.0001
<.0001
Parameter Standard
Hazard
Estimate Error Chi-Square Pr>ChiSq Ratio
DF
1
1
1
1
1
1
1
1
1
-0.00272
0.07706
-0.89068
1.23279
0.43529
-0.03342
0.00044
-0.00903
0.06030
0.61715
0.15080
0.28283
0.31045
0.29125
0.00564
0.00916
0.00934
0.23357
0.0000
0.2611
9.9169
15.7684
2.2338
35.1649
0.0023
0.9344
0.0667
0.9965
0.6093
0.0016
<.0001
0.1350
<.0001
0.9620
0.3337
0.7963
364
Suppose you wanted to analyze survival
times for patients with severe heart disease
where some patients receive heart transplants, and you want to allow the hazard
to change when a patient receives a transplant.
T = time to death or censoring
x1 = 1 if patient had previous surgery
= 0 otherwise
x2 =
age (in years) when accepted into
0.997
1.080
0.410
3.431
1.545
0.967
1.000
0.991
1.062
365
Hazard function for the i-th patient at time
t is
hi(t) = h0(t)exp(β1 x1 + β2x2 + β3x3 (t))
data set1;
infile ‘‘c:\stat565\tranplant.dat’’;
input Subject T status x1 x2 wait;
run;
proc phreg data=set1;
model T*status(0) = x1 x2 x3 / ties=exact;
if wait>T or wait=. then x3=0;
else x3=1;
run;
the study
wait =
time (in days) from acceptence
Parameter Std
Variable df Estimate Error
Wald
Chi-Sq
Pr>
Chi-sq
Risk
Ratio
4.602
4.995
0.023
0.032
0.025
0.878
0.462
1.032
0.955
until transplant
x3(t) = 1 if patient had transplant by day t
= 0 otherwise
366
x1
x2
x3
1
1
1
-0.771
0.031
-0.046
0.360
0.014
0.303
367
Time Dependent Covariates
Measured at Regular Intervals
Example:
Convicts
Recidivism Study of Released
• 52 weeks of followup
• week = time to arrest (in weeks)
• arrest = 1 if arrest time, otherwise 0
• age = age (in years) at release
data set1;
infile ’c:\stat565\recid.dat’;
input subject arrest week age prior emp1-emp52;
run;
proc phreg data=set1;
model week*arrest(0) = age prior employed /
ties=efron;
array emp(*) emp1-emp52;
employed=emp(week);
run;
Parameter Std
Variable df Estimate Error
Wald
Chi-Sq
Pr>
Chi-sq
Risk
Ratio
4.545
8.644
28.070
0.033
0.003
0.000
0.955
1.089
0.265
• prior = number of prior arrests
• emp1-emp52: coded 1 if employed, 0
otherwise
age
prior
employed
1
1
1
-0.046
0.085
-1.328
0.022
0.029
0.251
368
• You may be interested in effects of employment on the probability of being arrested
• If someone is arrested, that may terminate their employment
• Potential reverse causation is a common problem with time-dependnet covariates
• Consider lagged covariate values
proc phreg data=set1;
where week>2;
model week*arrest(0) = age prior employ1
employ2 / ties=efron;
array emp(*) emp1-emp52;
employ1=emp(week-1);
employ2=emp(week-2);
run;
370
369
• The hazard of arrest may depend on
cummulative employment rather than
employment status in the previous two
weeks
data set2; set set1;
array emp(*) emp1-emp52;
array cum(*) cum1-cum52;
cum1=emp1;
do i=2 to 52;
cum(i)=cum(i-1) + emp(i);
end;
do i = 1 to 52;
cum(i) = cum(i)/i:
end;
run;
proc phreg data=set2;
where week>1;
model week*arrest(0) = age prior cumemp
/ ties=efron;
array cemp(*) cum1-cum52;
cumemp=cemp(week-1);
run;
371
Fitting the Cox Model in R
with Time-Dependent Covariates
Suppose subject 84 receives a standard
treatment and survives 102 days, but
subject 94 switches from the new treamnet
to the standard treatment at day 57 and
survives 320 days. Put the data for
subject 94 on two lines
Subject
84
94
94
Subject
43
43
43
Start
0
0
57
Start
0
60
120
Stop Status rx
102
1
0
57
0
1
320
1
0
age conc
43
31
57
22
57
22
Stop Status rx age conc
60
0
1 34
45
120
0
1 34
36
240
0
1 34
28
vafit3 <- coxph(Surv(Start, Stop, Status) ~ rx+
age + conc, data = va,
method="efron", singular.ok=T)
372
Download