Cox Proportional Hazards Model • Incorporate the effects of covariates • Since the hazard function is strictly positive, consider the log link, log(g(x; β)) = xT β = β1x1 +β2x2+· · ·+βk xk ⇒ • Parametric survival distributions are not specified • Semi-parametric models Then, the hazard ratio is T exp((xT 1 − x0 )β) • We have just defined the Cox Proportional Hazards Model • Proportional hazards assumption h(t; x, β) = h0(t)exp(xT β) h(t; x, β) = h0(t)g(x, β) and for some unspecified baseline hazard h0(t) • hazard ratio: h(t; x1 , β) h(t; x0 , β) = g(x; β) = exp(xT β) T S(t; x, β) = [So(t)]exp(x β) (Cox, 1972 JRSS B, 74, 187-220) • Estimation is difficult since ho(t) is an infinite dimensional nuisance parameter. What can we do? g(x1 ; β) g(x0 ; β) 300 The proportional hazards assumption is a rather strong assumption that may not always be reasonable • Suppose there is a single covariate x= ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 0 non-surgical intervention 1 surgical intervention • Surgical intervention has high early risk, but very good long-term survival prospects • Non-surgical intervention as low early risk, but poor long-term survival prospects • The hazard ratio would not be constant; it would initially be high and then decrease over time. 302 301 Estimation in Cox Model • Instead of the full likelihood, use the partial likelihood proposed by Cox (1972, JRSS B and 1975, Biometrika ) • Consider the conditional probability that an individual dies at time t(j) given that t(j) is one of the r observed death times {t(1), t(2), ..., t(r)}. P (individual with values x(j) dies at t(j)) P (one death at t(j)) Since the baseline hazard has an arbitrary form, intervals between successive death times provide no information about the effects of covariates on the hazard function. 303 • Individuals are assumed to respond independently P (individual with values x(j) dies at t(j)) k∈R(t(j)) P (individual k dies at t(j)) • Replace probability of death at time t(j) with probability of death in the interval [t(j), t(j) + Δ). P (individual with x(j) dies in [t(j) , t(j) +Δ)) Δ P (individual k dies in [t(j) , t(j) +Δ)) k∈R(t(j)) Δ • Contribution to the likelihood is hj (t(j)) k∈R(t(j)) hk (t(j)) k∈R(t(j)) hazard at t(j)for individual k h0(t(j)) exp(xT j β) T k∈R(t(j)) h0(t(j))exp(xk β) • Cancel out the baseline hazard function to obtain exp(xT j β) T k∈R(t(j)) exp(xk β) • The joint partial likelihood is ⎡ Lp(β) = • Limit as Δ → 0 hazard at t(j) for individual with x(j) = n j=1 ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎤ exp(xT j β) T k∈R(tj ) exp(xk β) ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ δj where δj = 0 if tj is a censoring time, 1 otherwise. 304 305 • Then, the joint likelihood is Alternative explanation: n [f (t )]δi [S (t )]1−δi i (i) i=1 i (i) δi 1−δi = n i=1[hi(t(i))Si (t(i))] [Si (t(i))] fi (t) • By definition hi(t) = S i (t) δi = n i=1[hi(t(i))] Si (t(i)) • The contribution to the likelihood for an observed failure at time t is ⎡ = n i=1 ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ fi(t) = hi(t)Si(t) = k∈R(t(i))hk (t(i)) ⎤ δi ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ×[ k∈R(t(i))hk (t(i))]δi Si(t(i)) exp(xT i β) h0(t)exp(xT i β)[S0(t)] ⎡ = • The contribution to the likelihood for a right censored observation at time t is Si(t) = hi(t(i)) ⎤ ⎢ n ⎢⎢⎢ i=1 ⎢⎢⎣ exp(xT i β) T k∈R(t(i))exp(xk β) δi ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ δi × n i=1[ k∈R(t(i))hk (t(i))] Si(t(i)) T [S0(t)]exp(xi β) • The partial likelihood is • Let δi = 1 if ti is a failure time and δi = 0 if ti is a censoring time. 306 ⎡ L(β) = ⎢ ⎢ n ⎢⎢ i=1 ⎢⎢⎣ ⎤ exp(xT i β) T k∈R(t(i))exp(xk β) δi ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 307 • The local information matrix I(β) has elements More Details on Estimation in Cox Proportional Hazards Model Ik = −∂ 2 log(Lp(β))/∂βk ∂β • Assuming no ties, the log-partial likelihood is = n i=1 log(Lp(β)) = n T β − log T δ x j∈R(ti) exp(xj β) i=1 i i δi j∈R(ti) wij (xk,j − x̄k,i)(x,j − x̄,i) • Large sample distribution . β̂ ∼ N (β, I(β)−1 ) • The partial likelihood depends only on the ordering of the survival times, not the actual values; so it is invariant to monotone transformation of time • Hypothesis testing – Partial likelihood ratio tests • The gradient vector (score function) has elements ⎡ U (β) = where and ⎢ ⎢ ⎣ ⎤ ∂ log(Lp(β)) ⎥⎥ ∂βk ⎦ ⎡ = n ⎢ ⎣ i=1 ⎤ δi(xk,i − x̄k,i)⎥⎦ – Score tests x̄k,i = j∈R(ti) wij xk,j T wij = exp(xT j β)/ ∈R(ti) exp(x β) – Wald test 308 Tied Survival Times Due to the inability to continuously monitor subjects. 309 • The Breslow approximation: The contribution to the partial likelihood is ( r +r +rr1+r +r ) × ( r +r +rr2+r +r ) 1 2 3 4 5 1 2 3 4 5 • Suppose 5 individuals are at risk and two fail at time t with risk scores r1 = T exp(xT 1 β) and r2 = exp(x2 β) – This is a poor approximation when β is not close to zero • We cannot order these two subjects with respect their actual failure times. – β̂ is biased toward zero ( failed individuals are used too often in the denominator) – Default in SAS 310 311 • Efron approximation: The contribution to the partial likelihood is 2 ( r +r +rr1+r +r ) × ( .5r +.5r r+r ) 1 2 3 4 5 1 2 3 +r4+r5 • Average likelihood method: – If subjects with tied death times have identical risk scores, this method is exact – “exact” partila likelihood – Computationally intensive (TIES=exact option in the MODEL statement in the SAS PHREG procedure) – Biased toward zero for “larger” values of β , (eg |βi| > 2.5) – Smaller bias than the Breslow estimator – Least amount of bias – Default in S-Plus and R – Contribution to the likelihood for a pair of tied failure times is 1 2 Contribution if three subjects fail at the same time ⎛ ⎞ r1 r1+r2+r3+r4+r5 × ⎛ ⎜ 2 ( r +r +rr1+r +r )( r +r r+r ) 1 2 3 4 5 2 3 4 +r5 1 ) +( r +r +rr2+r +r )( r +r r+r 1 2 3 4 5 1 3 4 +r5 ⎟ r2 ⎟ ⎠ 2 2 r + r + r +r +r 1 2 3 4 5 3 3 3 ⎜ ⎜ ⎝2 ⎞ ⎟ r3 ⎟ ⎠ 1 1 3 r1 + 3 r2 + 3 r3 +r4 +r5 × ⎜⎝ 1 312 313 • If there are three individuals with tied failure times, the contribution to the partial likelihood is 1 6 • The sum of the k! terms can be expressed as intergral (use intergration by parts) r3 2 ( r +r +rr1+r +r )( r +r r+r )( ) 1 2 3 4 5 2 3 4 +r5 r3 +r4 +r5 r1 r3 r2 +( r +r +r +r +r )( r +r +r +r )( r +r +r ) 1 2 3 4 5 2 3 4 5 2 4 ∞ 5 0 r3 1 )( ) +( r +r +rr2+r +r )( r +r r+r 1 2 3 4 5 1 3 4 +r5 r3 +r4 +r5 r2 r3 r1 +( r +r +r +r +r )( r +r +r +r )( r +r +r ) 1 2 3 4 5 1 3 4 5 1 4 k j=1 ⎡ ⎛ ⎢ ⎢ ⎢ ⎣ ⎜ 1 − exp ⎜⎜⎝ ⎞⎤ rj t ⎟⎥ −t dt ⎟⎥ ⎟⎥ e ⎠⎦ rk+1 + rk+2 + · · · + rn (Delong, et al., 1994, Biometrika, 81, 607-611) 5 r2 1 )( ) +( r +r +rr3+r +r )( r +r r+r 1 2 3 4 5 1 2 4 +r5 r2 +r4 +r5 r1 2 )( ) +( r +r +rr3+r +r )( r +r r+r 1 2 3 4 5 1 2 4 +r5 r1 +r4 +r5 • Alternatively, you could use a sum of a random sample of the k! terms. • If k individuals had tied failure times, there would be k! = k(k − 1) · · · (2)(1) terms in this sum 314 315 Interpretation in the Cox Model • Coefficients are log hazard ratios. For a continuous covariate exp(βj (xj + 1)) eβj = exp(βj xj ) for any value of xj • Also, rβj e = • To construct confidence intervals for hazard ratios, exponentiate endpoints of confidence intervals for log hazard ratios Compute L̂ = β̂j − zα/2Sβ̂ j exp(βj (xj + r)) and Û = β̂j + zα/2Sβ̂ j exp(βj xj ) for any value of xj • rβj is a log-hazard ratio associated with an r unit increase in Xj , while the other covariates are held constant. Then, compute (exp(L̂), exp(Û )) as an approximate (1 − α) × 100% confidence interval for exp(β) • rβj is estimated as r β̂j with standard error rSβ̂ , where S 2 is the (j, j) β̂ j j element of the inverse of the local estimate of the information matrix for the partial likelihood 316 • To construct a confidence interval for exp(xT β) √ Compute SxT β̂ = xT V x, where V is the inverse of the local estimate of the information matrix for the partial likelihood. Compute L̂ = xT β̂j − zα/2SxT β̂ j and Û = xT β̂j + zα/2SxT β̂ j Then, compute (exp(L̂), exp(Û )) as an approximate (1 − α) × 100% confidence interval for exp(xT β) 317 • Estimate risk scores (use the XBETA option in the OUTPUT statement for the PHREG procedure in SAS) r̂(x, β̂) = β̂k xk = xT β̂ • Estimate of the covariate adjusted survivor function T Ŝ(t; x, β̂) = [Ŝ0(t)]exp(x β̂) • If the x’s are centered, the baseline survivor function S0(t) corresponds to an ’average’ subject • We need an estimate of the baseline survivor function S0(t) 318 319 Estimating the hazard and survivor functions The Kaflbeisch-Prentice (1973, Biometrika 60, 267-278) method: The probability that a subject with covariates values xi survives beyond time t is S(t; xj ) = (S0(t)) Let ηj denote the conditional probability of failing in [t(j), t(j+1)) given that a subject from the baseline population survives to time t(j). Defining η0 = 1, S0(t(j)) = j−1 i=0 ηj and the conditional probability that a subject with covariate values xj fails in [t(j), t(j+1)) given survival to the beginning of the interval is approximately T T S0(t(j))exp(xj β) − S0 (t(j+1))exp(xj β) exp(xT j β̂) S0 (t(j)) exp(xTj β) = 1 − ηj exp(xTj β) The conditional probability that a subject with covariate values xj survives beyond t(j) given survival to the beginning of the interval is approximately For observed failure times t(1) < t(2) < ... < t(r) suppose there are dj deaths and nj individuals at risk at time t(j). exp(xT j β) S0(t(j+1)) exp(xT j β) S0(t(j)) exp(xT j β) = ηj 320 321 • S0 (t) is estimated as An approximate joint likelihood is ⎡ r ⎢ ⎢ ⎢ ⎣ ⎛ j=0 k∈Dj ⎤ ⎞ ⎜ ⎜ ⎝ 1− exp(xT k β)⎟ ⎟ ηj ⎠ k∈Rj −Dj exp(xT k β) ⎥⎥⎥ ηj ⎦ Ŝ0 (t) = k j=1 η̂j ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ = 1− Estimate ηj by maximizing the log of this approximate likelihood. Setting first partial derviatives equal to zero we obtain ηˆj as the solution to exp(xT i β̂) k∈D(t(j)) 1 exp(xT i β̂) − η̂j = k∈R(t(j)) exp(xT i β̂) If there are no ties (dj = 1 for all j=1,2,...,r), then ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ η̂j = 1 − exp(xT (j) β̂) T i∈R(t(j)) exp(xi β̂) exp(−xT (j) β̂) ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ exp(xT (j) β̂) T i∈R(t(j) ) exp(xi β̂) ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ exp(−xT (j) β̂) for t(k) ≤ t < t(k + 1), k = 1, 2, ..., r − 1 with Ŝ0(t) = 1 for t < t(1) and Ŝ0(t) = 0 for t > t(r) unless there are censored times after t(r). In that case, Ŝ0(t) = Ŝ0 (t(r)) up to the largest censored time and it is undefined after that time. • Assume that the baseline hazard is constant between adjacent failure times. Then, ĥ0(t) = 1 − η̂j t(j+1) − t(j) for t(j) ≤ t < t(j + 1) and j=1,2,...,r-1 Otherwise, an iterative procedure is required to obtain a solution with ĥ0(t) = 0 for t < t(1) 322 323 Breslow (Nelson-Aalen) method: • When there are no covariates, η̂j = nj − dj • Approximate nj exp(xT i β̂) and η̂j Ŝ0 (t) = k j=1 η̂j ⎛ η̃ = exp • The baseline cumulative hazard function is estimated as k j=1 S̃0(t) = ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ T ⎠ i∈R(t(j) ) exp(xi β̂) k j=1 exp ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎞ −dj ⎟ ⎟ ⎟ ⎟ ⎟ T ⎠ i∈R(t(j) ) exp(xi β̂) for t(k) ≤ t < t(k + 1), k=1,2,...,r-1. and −dj ⎛ Ĥi(t) = Ĥ0(t)exp(xT i β̂) ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ • Then, log(η̂j ) • For others Ŝi(t) = Ŝ0(t) ⎞ T T with 1 + exi β̂ log(η̂j ) to obtain is the Kaplan-Meier estimator Ĥ0(t) = −log(Ŝ0(t)) = − ⎛ = exp ⎝exi β̂ log(η̂j )⎠ • This estimator is not necessarily zero at the longest survival time,when it is uncensored exp(xT i β̂) 324 325 SAS Example: VA data /* SAS code for fitting a proportional hazards model to data from the VA lung cancer trial of 137 male patients with inoperable lung cancer. This code is posted as vaphreg1.sas */ • Also, H̃(t) = −log(S̃0(t)) ⎛ = k j=1 ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎞ dj ⎟ ⎟ ⎟ ⎟ ⎟ T ⎠ i∈R(t(j) ) exp(xi β̂) /* Variables Treatment: 1=standard, 2=test (chemotherapy) Celltype: 1=squamous, 2=smallcell, 3=adeno, 4=large Survival in days Status: 1=dead, 0=censored Karnofsky score: Measures patient performance of activities of daily living. • S̃0 (t) is cheaper to compute than Ŝ0(t) SCORE 100 90 • When there are few ties, S̃0(t) will be similar to Ŝ0(t) 80 70 60 50 40 30 20 10 • Estimates of medians and other percentiles can be obtained from the estimated survival function FUNCTION Normal, no evidence of disease Able to perform normal activity with only minor symptoms Able to perform normal activity with effort, some symptoms Able to care for self but unable to do normal activities Requires occasional assistance Requires considerable assistance Disabled, requires special assistance Severely disabled Very sick, requires supportive treatment Moribund Months from Diagnosis Age in years Prior therapy: 0=no, 10=yes */ 326 327 data va; infile ’c:\st565\va.dat’; input rx cellt time status karno months age prior_rx; prior_rx = prior_rx/10; ct2=0; if(cellt=2) then ct2=1; ct3=0; if(cellt=3) then ct3=1; ct4=0; if(cellt=4) then ct4=1; run; proc format; value celltype 1 2 3 4 = = = = ’squamous’ ’smallcell’ ’adeno’ ’large’; /* Create a new data file with covariate values at which you want to estimate survival probabilities */ data vanew; input rx ct2 ct3 ct4 karno months age prior_rx; datalines; 1 0 0 1 80 64 70 1 0 0 0 1 80 64 70 1 RUN; proc phreg data=va; model time*status(0)= rx ct2 ct3 ct4 karno months age prior_rx/ties=efron; baseline out=va2 covariates=vanew survival=s stderr=se lower=l95 upper=u95; run; proc print data=va2; run; /* Plot the estimated survival probabilities for the standard and chemotherapy treatments on the same plot when cell type is large, Karnofsky score is 80, time from diagnosis until entry into the study is 64 months, age is 70 years, and there was previous treatment */ data set1; set va2; if(rx=1); s1 = s; keep time s1; run; 328 329 The PHREG Procedure Model Information data set0; set va2; if(rx=0); s0=s; keep time s0; run; Data Set Dependent Variable Censoring Variable Censoring Value(s) Ties Handling data va3; merge set1 set0; by time; run; proc gplot data=va3; plot s1*time s0*time /overlay vaxis=axis1 haxis=axis2; symbol1 v=none interpol=step color=black line=1 w=4; symbol2 v=none interpol=step color=black line=3 w=4; axis1 label=(f=swiss h=1.9 a=90 r=0 "Survival Probability") value=(f=swiss h=1.5); axis2 label=(f=swiss h=1.9 "Time(days)") value=(f=swiss h=1.5); legend frame; title ls=0.5 h=2.0 f=swiss "Comparison of Survivor Curves"; run; WORK.VA time status 0 EFRON Summary of the Number of Event and Censored Values Total Event Censored Percent Censored 137 128 9 6.57 Convergence Status Convergence criterion (GCONV=1E-8) satisfied. 330 331 Model Fit Statistics Criterion Without Covariates With Covariates 1010.898 1010.898 1010.898 948.794 964.794 987.610 Obs -2 LOG L AIC SBC Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Chi-Square DF 62.1039 66.7375 62.3675 8 8 8 Pr > ChiSq <.0001 <.0001 <.0001 Analysis of Maximum Likelihood Estimates Variable rx ct2 ct3 ct4 karno months age prior_rx DF Parameter Estimate Standard Error Chi-Square 1 1 1 1 1 1 1 1 0.29461 -0.86156 1.19607 0.40129 -0.03282 0.000082 -0.00871 0.07159 0.20755 0.27528 0.30092 0.28269 0.00551 0.00914 0.00930 0.23231 2.0148 9.7950 15.7986 2.0151 35.4979 0.0001 0.8763 0.0950 Hazard Pr>ChiSq Ratio 0.1558 0.0017 <.0001 0.1557 <.0001 0.9929 0.3492 0.7580 1.343 0.423 3.307 1.494 0.968 1.000 0.991 1.074 1 2 3 4 . . . 95 96 97 98 99 100 101 . . . 195 196 197 . . . 292 293 294 rx ct2 ct3 ct4 karno months age prior_rx time s se l95 u95 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 80 80 80 80 64 64 64 64 70 70 70 70 1 1 1 1 0 1 2 3 1.0000 0.9920 0.9880 0.9839 . 0.0073 0.0099 0.0124 . 0.9777 0.9688 0.9598 . 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 80 80 80 80 80 80 80 64 64 64 64 64 64 64 70 70 70 70 70 70 70 1 1 1 1 1 1 1 553 587 991 999 0 1 2 0.0092 0.0042 0.0007 0.0000 1.0000 0.9941 0.9910 0.0258 0.0137 0.0030 0.0000 . 0.0058 0.0079 0.00004 0.00001 0.00000 . . 0.9828 0.9757 1 1 1 . . 1 1 0 0 0 1 0 0 0 1 1.49 .65 .20 .20 80 64 70 1 991 80 64 70 1 999 58.6 8.8 58.3 .29 0 0.0046 0.0000 1.0000 0.0153 0.0000 . 0.00001 . . 1 . . 1.49 .65 .20 .20 1.49 .65 .20 .20 1.49 .65 .20 .20 58.6 58.6 58.6 8.8 58.3 .29 587 .00089 8.8 58.3 .29 991 .00009 8.8 58.3 .29 999 .00000 332 .001216 .00006 .000188 .00000 .000000 . 0.013 0.0048 . 333 Fitting the Cox Model in R or Splus coxph(formula, data=sys.parent(), weights, subset, na.action, eps = 1e-07, init, iter.max=10, method=c("efron","breslow","exact"), singular.ok=T, robust=F, model=F, x=F, y=T) 334 335 # # # # # library(survival) options(contrasts=c("contr.treatment", "contr.poly")) vafit <- coxph(Surv(time, status) ~ rx+celltf+karno +months + age + priorrx, data = va, x=T, y=T, robust=T, method="efron", singular.ok=T) R code for fitting the Cox model to data from the VA lung cancer trial of 137 male patients with inoperable lung cancer. This code is posted as vacoxph.ssc # Variables # Treatment: 1=standard, 2=test (chemotherapy) # Celltype: 1=squamous, 2=smallcell, # 3=adeno, 4=large # Survival in days # Status: 1=dead, 0=censored # Karnofsky score # Months from Diagnosis # Age in years # Prior therapy: 0=no, 10=yes # Enter the data into a data frame. va <- read.table("c:/stat565/va.dat", header=F, col.names=c("rx", "cellt", "time", "status", "karno", "months", "age", "priorrx")) va$rx <- va$rx-1 va$priorrx <- va$priorrx/10; va$celltf<-as.factor(va$cellt) va summary(vafit) Call: coxph(formula = Surv(time, status) ~ rx + celltf + karno + months + age + priorrx, data = va, method = "efron", singular.ok = T, robust = T, x = T, y = T) n= 137 coef exp(coef) rx 0.294605 1.343 celltf2 0.861559 2.367 celltf3 1.196067 3.307 celltf4 0.401290 1.494 karno -0.032815 0.968 months 0.000082 1.000 age -0.008706 0.991 priorrx 0.071589 1.074 rx celltf2 celltf3 celltf4 karno months age priorrx se(coef) robust se z 0.20755 0.18867 1.562 0.27528 0.31424 2.742 0.30092 0.27596 4.334 0.28269 0.25242 1.590 0.00551 0.00522 -6.286 0.00914 0.00795 0.010 0.00930 0.01030 -0.845 0.23231 0.22068 0.324 exp(coef) exp(-coef) lower .95 upper .95 1.343 0.745 0.928 1.943 2.367 0.423 1.278 4.382 3.307 0.302 1.926 5.680 1.494 0.669 0.911 2.450 0.968 1.033 0.958 0.978 1.000 1.000 0.985 1.016 0.991 1.009 0.972 1.012 1.074 0.931 0.697 1.656 Rsquare= 0.364 (max possible= 0.999 ) 336 Likelihood ratio test= Wald test = Score (logrank) test = Robust = 62.1 68.7 66.7 52.3 on 8 df, p=1.8e-010 on 8 df, p=8.96e-012 on 8 df, p=2.19e-011, p=1.46e-008 Note: The likelihood ratio and score tests assume independence among observations within a cluster, the Wald and robust score tests do not. p 1.2e-001 6.1e-003 1.5e-005 1.1e-001 3.3e-010 9.9e-001 4.0e-001 7.5e-001 337 vafit$coef rx celltf2 celltf3 celltf4 0.2946054 0.8615587 1.196067 0.4012899 karno months -0.03281529 0.00008177293 age priorrx -0.008706219 0.07158904 > names(vafit) [1] [4] [7] [10] [13] [16] [19] "coefficients" "score" "residuals" "n" "naive.var" "x" "call" "var" "iter" "means" "terms" "rscore" "y" "loglik" "linear.predictors" "method" "assign" "wald.test" "formula" vafit$x 1 2 3 4 5 6 7 8 9 10 rx celltf2 celltf3 celltf4 karno months age priorrx 0 0 0 0 60 7 69 0 0 0 0 0 70 5 64 1 0 0 0 0 60 3 38 0 0 0 0 0 60 9 63 1 0 0 0 0 70 11 65 1 0 0 0 0 20 5 49 0 0 0 0 0 40 10 69 1 0 0 0 0 80 29 68 0 0 0 0 0 50 18 43 0 0 0 0 0 70 6 70 0 338 vafit$var [,1] [,2] [,3] [,4] [1,] 3.559674e-002 0.00258373 -0.0031549717 0.00028655499 [2,] 2.583736e-003 0.09874914 0.0564663712 0.04576688179 [3,] -3.154972e-003 0.05646637 0.0761517918 0.04834877156 [4,] 2.865550e-004 0.04576688 0.0483487716 0.06371456224 [5,] 2.117459e-006 -0.00009915 -0.0001843566 -0.00005868008 [6,] -2.567700e-004 -0.00071643 -0.0001088384 -0.00003811918 [7,] -8.354853e-004 -0.00028035 0.0001403169 0.00008960564 [8,] 2.024385e-003 -0.00179947 0.0001863414 -0.01106331757 [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [,5] 2.117459e-006 -9.915142e-005 -1.843566e-004 -5.868008e-005 2.724976e-005 1.470556e-005 4.789953e-006 -1.094614e-004 [,6] [,7] [,8] -0.00025677 -8.354853e-004 0.0020243849 -0.00071643 -2.803497e-004 -0.0017994743 -0.00010884 1.403169e-004 0.0001863414 -0.00003812 8.960564e-005 -0.0110633176 0.00001471 4.789953e-006 -0.0001094614 0.00006318 1.210810e-005 -0.0006854524 0.00001211 1.060784e-004 -0.0001006315 -0.00068545 -1.006315e-004 0.0486998221 339 # Model selection step(vafit, direction="both", trace=1, steps=1000, k=2) Start: AIC= 964.79 Surv(time, status) ~ rx + celltf + karno + months + age + priorrx - months - priorrx - age <none> - rx - celltf - karno Df AIC 1 962.79 1 962.89 1 963.65 964.79 1 964.81 3 977.63 1 997.78 Step: AIC= 962.79 Surv(time, status) ~ rx + celltf + karno + age + priorrx Df AIC - priorrx 1 960.92 - age 1 961.67 <none> 962.79 - rx 1 962.84 + months 1 964.79 - celltf 3 975.67 - karno 1 996.82 Step: AIC= 960.92 Surv(time, status) ~ rx + celltf + karno + age - age <none> - rx + priorrx + months - celltf - karno Df AIC 1 959.83 960.92 1 961.09 1 962.79 1 962.89 3 973.76 1 994.86 340 341 Step: AIC= 959.83 Surv(time, status) ~ rx + celltf + karno - rx <none> + age + priorrx + months - celltf - karno Df AIC 1 959.53 959.83 1 960.92 1 961.67 1 961.75 3 971.93 1 993.04 Call: coxph(formula=Surv(time, status) ~ celltf + karno, data = va, method = "efron", singular.ok = T, robust = T, x = T, y = T) coef exp(coef) se(coef) robust se z celltf2 0.7153 2.04 0.25269 0.31847 2.25 celltf3 1.1577 3.18 0.29294 0.27402 4.22 celltf4 0.3256 1.38 0.27668 0.25356 1.28 karno -0.0311 0.97 0.00518 0.00541 -5.74 Step: AIC= 959.53 Surv(time, status) ~ celltf + karno Df <none> + rx + age + priorrx + months - celltf - karno 1 1 1 1 3 1 Likelihood ratio test=59.4 on 4 df, p=3.93e-12 n= 137 AIC 959.53 959.83 961.09 961.27 961.35 970.87 992.05 342 p 2.5e-02 2.4e-05 2.0e-01 9.6e-09 343 Model Selection with PHREG in SAS \* This code is posted as vaphreg2.sas Step 1. Variable karno is entered. The model contains the following explanatory variables: *\ data va; infile ’c:\st565\va.dat’; input rx cellt time status karno months age prior_rx; prior_rx = prior_rx/10; ct2=0; if(cellt-2) then ct2=1; ct3=0; if(cellt=3) then ct3=1; ct4=0; if(cellt=4) then ct4=1; run; karno Model Fit Statistics proc phreg data=va; model time*status(0)= rx ct2 ct3 ct4 karno months age prior_rx/ties=efron selection=stepwise sle=.10 sls=.05; title "Results from Stepwise Selection"; run; Criterion Without Covariates With Covariates -2 LOG L AIC SBC 1010.898 1010.898 1010.898 968.867 970.867 973.719 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq 42.0312 45.3191 43.3810 1 1 1 <.0001 <.0001 <.0001 Likelihood Ratio Score Wald proc phreg data=va; model time*status(0)= rx ct2 ct3 ct4 karno months age prior_rx/ties=efron selection=backward sls=.10; title "Results from Backward Elimination"; run; 344 345 Step 3. Variable ct2 is entered. The model contains the following explanatory variables: Step 2. Variable ct3 is entered. The model contains the following explanatory variables: ct3 ct2 karno Model Fit Statistics Without Criterion With Covariates -2 LOG L AIC SBC Likelihood Ratio Score Wald karno Model Fit Statistics Without With Criterion Covariates Covariates Covariates 1010.898 1010.898 1010.898 959.867 963.867 969.571 -2 LOG L AIC SBC Testing Global Null Hypothesis: BETA=0 Test ct3 1010.898 1010.898 1010.898 952.902 958.902 967.458 Testing Global Null Hypothesis: BETA=0 Chi-Square DF Pr > ChiSq 51.0313 57.0119 55.4497 2 2 2 <.0001 <.0001 <.0001 346 Test Likelihood Ratio Score Wald Chi-Square DF Pr > ChiSq 57.9965 62.9854 60.1330 3 3 3 <.0001 <.0001 <.0001 347 proc phreg data=va; model time*status(0)= rx ct2 ct3 ct4 karno months age prior_rx/ties=efron selection=backward sls=.10; title "Results from Backward Elimination"; run; NOTE: No (additional) variables met the 0.1 level for entry into the model. Analysis of Maximum Likelihood Estimates Variable Parameter Standard DF Estimate Error Chi-Square ct2 ct3 karno 1 1 1 -0.57401 1.00816 -0.03047 0.21562 0.25830 0.00512 Step Hazard Pr > ChiSq Ratio 7.0866 15.2333 35.4470 0.0078 <.0001 <.0001 0.563 2.741 0.970 Variable Entered 1 2 3 karno ct3 ct2 Removed Number Score In Chi-Square 1 2 3 Wald Chi-Square 45.3191 10.5638 7.2377 . . . rx ct2 ct3 ct4 karno months age Criterion Without Covariates With Covariates 1010.898 1010.898 1010.898 948.794 964.794 987.610 Pr>ChiSq <.0001 0.0012 0.0071 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq 62.1039 66.7375 62.3675 8 8 8 <.0001 <.0001 <.0001 Likelihood Ratio Score Wald 348 349 Step 1. Variable months is removed. The model contains the following explanatory variables: rx ct2 ct3 ct4 karno age Step 5. Variable ct4 is removed. The model contains the following explanatory variables: prior_rx ct2 ct3 karno Model Fit Statistics Model Fit Statistics Criterion Without Covariates With Covariates -2 LOG L AIC SBC 1010.898 1010.898 1010.898 948.794 962.794 982.759 Without Criterion Likelihood Ratio Score Wald With Covariates -2 LOG L AIC SBC Chi-Square DF Pr > ChiSq 62.1038 66.6268 62.3647 7 7 7 <.0001 <.0001 <.0001 350 Covariates 1010.898 1010.898 1010.898 952.902 958.902 967.458 Testing Global Null Hypothesis: BETA=0 Testing Global Null Hypothesis: BETA=0 Test prior_rx Model Fit Statistics -2 LOG L AIC SBC Summary of Stepwise Selection Step 0. The model contains the following variables: Test Likelihood Ratio Score Wald Chi-Square DF Pr > ChiSq 57.9965 62.9854 60.1330 3 3 3 <.0001 <.0001 <.0001 351 NOTE: No (additional) variables met the 0.1 level for removal from the model. Analysis of Maximum Likelihood Estimates Variable DF ct2 ct3 karno 1 1 1 Parameter Estimate Standard ChiError Square -0.57401 1.00816 -0.03047 0.21562 7.086 0.25830 15.233 0.00512 35.447 Pr>ChiSq Hazard Ratio 0.0078 <.0001 <.0001 0.563 2.741 0.970 proc phreg data=va; model time*status(0)= rx ct2 ct3 ct4 karno months age prior_rx/ties=efron selection=score best=5 start=2 stop=6; title "Searching Through All Possible Models"; run; Summary of Backward Elimination Step Variable Removed Number In Wald Chi-Square Pr > ChiSq 1 2 3 4 5 months prior_rx age rx ct4 7 6 5 4 3 0.0001 0.1224 0.9315 1.6971 1.3853 0.9929 0.7265 0.3345 0.1927 0.2392 352 Time-Dependent Covariates Searching Through All Possible Models Regression Models Selected by Score Criterion Number of Variables Score Chi-Square 2 2 2 2 2 57.0119 47.5991 46.7842 46.1829 45.6533 ct3 karno ct2 karno rx karno ct4 karno karno prior_rx 3 3 3 3 3 62.9854 57.6836 57.2573 57.0362 57.0236 ct2 ct3 karno rx ct3 karno ct3 karno months ct3 karno age ct3 karno prior_rx 4 4 4 4 4 64.7222 63.9385 63.1407 63.0157 62.9861 rx ct2 ct3 karno ct2 ct3 ct4 karno ct2 ct3 karno months ct2 ct3 karno prior_rx ct2 ct3 karno age 5 5 5 5 5 66.5486 64.8535 64.7811 64.7318 64.1015 rx ct2 ct3 ct4 karno rx ct2 ct3 karno months rx ct2 ct3 karno age rx ct2 ct3 karno prior_rx ct2 ct3 ct4 karno months 6 6 6 6 6 66.6838 66.6154 66.5636 64.8984 64.8576 rx rx rx rx rx Variables Included in Model ct2 ct2 ct2 ct2 ct2 ct3 ct3 ct3 ct3 ct3 353 • Covariates whose values change over the course of the study • Can be used to create models where the proportional hazards property of a constant hazard ratio across time does not hold ct4 karno months ct4 karno age ct4 karno prior_rx karno months age karno months prior_rx 354 • Internal variables: are subjectspecific and require periodic measurements on individual subjects (e.g. measure the concentration of a specific substance in blood, or the presence/absence of a certain condition) • External variables: Factors that apply to all subjects (e.g., average hourly pollen count in an allergy study, or predetermined changes in dosages of a drug) 355 • The hazard ratio for two subjects is • Let xji(t) denote the value for j − th covariate for the i − th subject at time t. The hazard for the i − th subject is ⎛ k ⎜ hi(t) = h0(t)exp ⎜⎝ j=1 ⎞ ⎟ βj xji(t)⎟⎠ In this model, h0(t) is the hazard for an individual for whom all covariates are zero at the time origin, and remain zero through time The hazard ratio hi(t)/h0(t) can change through time hi(t) hm(t) = exp (β1[x1i(t) − x1m(t)] + · · · + βk [xki(t) − xkm(t)]) Then, βj can be interpretted as the log(hazard ratio) for two individuals whose values of the j − th explanatory variable differ by one unit at some time point t and who have the same values for all other covariates at that time point. • Assuming no ties, the log-partial likelihood is log(Lp(β)) = ⎡ n k ⎢ i=1 δi ⎢⎣ j=1 βj xji(ti) ⎛ ⎜ −log ⎜⎜⎝ ⎞⎤ ∈R(ti) exp( k j=1 ⎟⎥ βj xji(ti))⎟⎟⎠⎥⎥⎦ where δi = 0 for a censored time and is unity otherwise. 356 • At each observed failure time you must know the values of the time dependent covariates for all individuals still at risk – Generally no problem for external variables – No problem when the covariate is a direct function of time – You could measure internal time dependent covariates using the same schedule of inspection times for each individual (eg., each subject reports to the clinic every six months) – Otherwise, you may a have huge missing data problem. 358 357 • Treatment effects can be confounded with effects of time dependent covariates Suppose you randomly assign leukaemia patients to either a standard or a new cytotoxic drug. – White blood cell count is used as a time dependent covariate – The actual difference in the drugs corresponds to how they affect white blood cell count. – The difference in the drugs is not significant after adjusting for white blood cell count – There is a significant difference in the drugs if white blood cell counts at different times are not included in the model. 359 Checking the proportional hazards assumption • Include an iteraction with time in the model as a time dependent covariate x(t) = var ∗ time x(t) = var ∗ log(time) • To check the proportional hazards assumption for the treatment variable rx in the VA lung cancer example, use the follwing SAS code posted as vacoxph3.sas data va; infile ’c:\st565\va.dat’; input rx cellt time status karno months age prior_rx; prior_rx = prior_rx/10; rx=rx-1; /* Recode rx=0 for standard */ ct2=0; if(cellt=2) then ct2=1; ct3=0; if(cellt=3) then ct3=1; ct4=0; if(cellt=4) then ct4=1; rxt=rx*log(time); karnot=karno*time; run; proc phreg data=va; model time*status(0)= rx rxt ct2 ct3 ct4 karno months age prior_rx / ties=efron; title "Incorrect Check for Proportional Hazards"; run; proc phreg data=va; model time*status(0)= rx rxtime ct2 ct3 ct4 karno months age prior_rx / ties=efron; rxtime=rx*log(time); title "Correct Check for proportional hazards"; run; 360 361 Incorrect Check for Proportional Hazards Incorrect Check for Proportional Hazards Model Fit Statistics Criterion -2 LOG L AIC SBC Without Covariates With Covariates 1010.898 1010.898 1010.898 867.172 885.172 910.840 Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Chi-Square DF 143.7263 149.4959 109.6200 9 9 9 Pr > ChiSq <.0001 <.0001 <.0001 362 Analysis of Maximum Likelihood Estimates Variable DF Parameter Standard Estimate Error Chi-Square Pr>ChiSq rx rxt ct2 ct3 ct4 karno months age prior_rx 7.29916 -1.51682 -0.26993 0.44024 -0.19917 -0.01192 -0.00204 0.00275 0.15749 <.0001 <.0001 0.2943 0.1449 0.4843 0.0507 0.8328 0.7669 0.4882 1 1 1 1 1 1 1 1 1 0.87249 0.18476 0.25739 0.30201 0.28478 0.00610 0.00966 0.00928 0.22722 69.9883 67.4019 1.0998 2.1249 0.4891 3.8185 0.0446 0.0879 0.4804 363 Hazard Ratio 1479.054 0.219 0.763 1.553 0.819 0.988 0.998 1.003 1.171 Correct Check for proportional hazards 7 Correct Check for proportional hazards Model Fit Statistics Analysis of Maximum Likelihood Estimates Criterion Without Covariates With Covariates -2 LOG L AIC SBC 1010.898 1010.898 1010.898 948.534 966.534 992.202 Variable rx rxtime ct2 ct3 ct4 karno months age prior_rx Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Chi-Square DF Pr>ChiSq 62.3642 66.7829 62.4007 9 9 9 <.0001 <.0001 <.0001 Parameter Standard Hazard Estimate Error Chi-Square Pr>ChiSq Ratio DF 1 1 1 1 1 1 1 1 1 -0.00272 0.07706 -0.89068 1.23279 0.43529 -0.03342 0.00044 -0.00903 0.06030 0.61715 0.15080 0.28283 0.31045 0.29125 0.00564 0.00916 0.00934 0.23357 0.0000 0.2611 9.9169 15.7684 2.2338 35.1649 0.0023 0.9344 0.0667 0.9965 0.6093 0.0016 <.0001 0.1350 <.0001 0.9620 0.3337 0.7963 364 Suppose you wanted to analyze survival times for patients with severe heart disease where some patients receive heart transplants, and you want to allow the hazard to change when a patient receives a transplant. T = time to death or censoring x1 = 1 if patient had previous surgery = 0 otherwise x2 = age (in years) when accepted into 0.997 1.080 0.410 3.431 1.545 0.967 1.000 0.991 1.062 365 Hazard function for the i-th patient at time t is hi(t) = h0(t)exp(β1 x1 + β2x2 + β3x3 (t)) data set1; infile ‘‘c:\stat565\tranplant.dat’’; input Subject T status x1 x2 wait; run; proc phreg data=set1; model T*status(0) = x1 x2 x3 / ties=exact; if wait>T or wait=. then x3=0; else x3=1; run; the study wait = time (in days) from acceptence Parameter Std Variable df Estimate Error Wald Chi-Sq Pr> Chi-sq Risk Ratio 4.602 4.995 0.023 0.032 0.025 0.878 0.462 1.032 0.955 until transplant x3(t) = 1 if patient had transplant by day t = 0 otherwise 366 x1 x2 x3 1 1 1 -0.771 0.031 -0.046 0.360 0.014 0.303 367 Time Dependent Covariates Measured at Regular Intervals Example: Convicts Recidivism Study of Released • 52 weeks of followup • week = time to arrest (in weeks) • arrest = 1 if arrest time, otherwise 0 • age = age (in years) at release data set1; infile ’c:\stat565\recid.dat’; input subject arrest week age prior emp1-emp52; run; proc phreg data=set1; model week*arrest(0) = age prior employed / ties=efron; array emp(*) emp1-emp52; employed=emp(week); run; Parameter Std Variable df Estimate Error Wald Chi-Sq Pr> Chi-sq Risk Ratio 4.545 8.644 28.070 0.033 0.003 0.000 0.955 1.089 0.265 • prior = number of prior arrests • emp1-emp52: coded 1 if employed, 0 otherwise age prior employed 1 1 1 -0.046 0.085 -1.328 0.022 0.029 0.251 368 • You may be interested in effects of employment on the probability of being arrested • If someone is arrested, that may terminate their employment • Potential reverse causation is a common problem with time-dependnet covariates • Consider lagged covariate values proc phreg data=set1; where week>2; model week*arrest(0) = age prior employ1 employ2 / ties=efron; array emp(*) emp1-emp52; employ1=emp(week-1); employ2=emp(week-2); run; 370 369 • The hazard of arrest may depend on cummulative employment rather than employment status in the previous two weeks data set2; set set1; array emp(*) emp1-emp52; array cum(*) cum1-cum52; cum1=emp1; do i=2 to 52; cum(i)=cum(i-1) + emp(i); end; do i = 1 to 52; cum(i) = cum(i)/i: end; run; proc phreg data=set2; where week>1; model week*arrest(0) = age prior cumemp / ties=efron; array cemp(*) cum1-cum52; cumemp=cemp(week-1); run; 371 Fitting the Cox Model in R with Time-Dependent Covariates Suppose subject 84 receives a standard treatment and survives 102 days, but subject 94 switches from the new treamnet to the standard treatment at day 57 and survives 320 days. Put the data for subject 94 on two lines Subject 84 94 94 Subject 43 43 43 Start 0 0 57 Start 0 60 120 Stop Status rx 102 1 0 57 0 1 320 1 0 age conc 43 31 57 22 57 22 Stop Status rx age conc 60 0 1 34 45 120 0 1 34 36 240 0 1 34 28 vafit3 <- coxph(Surv(Start, Stop, Status) ~ rx+ age + conc, data = va, method="efron", singular.ok=T) 372