STAT 405 - BIOSTATISTICS Handout 21 – Survival Analysis: Cox Proportional Hazards Regression In this section, we will examine regression models for the hazard function h(t). Before examine the form of these models discuss the role of the hazard function in a general sense. 1 As noted earlier, the log rank test carried out in PROC LIFETEST is useful when time to event is important (i.e., not just the occurrence of an event). Though not discussed here, this test can be extended to control for other variables by adding covariates to the model. However, another approach exists for analyzing survival data which is much more convenient: proportional hazards regression. As with other regression models, the identification of significant covariates and the interpretation of the estimated model coefficients is of primary concern. Cox Proportional Hazards Model Under a proportional hazards model, the hazard function h(t) is modeled as ℎ(𝑡) = ℎ𝑜 (𝑡)exp(𝜂1 𝑢1 + 𝜂2 𝑢2 + ⋯ + 𝜂𝑘 𝑢𝑘 ) Comments: 1. The terms u1, u2, … uk represent a collection of independent variables, created from the covariates x1, x2, …,xp. 2. The function h0(t) represents the baseline hazard at time t; this is the hazard for a person with u1= u2 = … = uk= 0. Note that this function cannot be negative. 3. The exponential function is used so the hazard function is positive for all t. Cox proportional hazards regression can be carried out in SAS using PROC PHREG. Consider our Leukemia data: proc phreg data=leukemia; model duration*censor(0)=group / ties=efron; run; 2 Interpretation of the Hazard Ratio For this example with a single predictor variable, we have h(t) = h0(t)exp(1u1). 0 for group 1 1 for group 2. Let u 1 Then the hazard ratio (HR) is given as follows: The estimated HR is 𝑒 𝜂̂1 =4.817 for this example. This means that at any given time, a patient who is still alive in treatment group 2 is 4.82 times more likely to die in the next very small time interval than a patient who is still alive in treatment group 1. That is, the instantaneous relative risk of death for a person in treatment group 2 compared to a person in treatment group 1 is 4.82. Note that this does NOT imply that the probability of surviving for a given time t is 4.82 times larger for group 1! Discussion of Hypothesis Tests PROC PHREG automatically tests the following hypotheses: Ho: j = 0 Ha: j ≠ 0 SAS provides three different test statistics, two of which are discussed below: Wald test statistic: 3 Likelihood ratio test statistic: Confidence Limits for the Hazard Ratio For a binary independent variable, a 100(1-α)% confidence interval is given by exp(ˆ j z 1α/2 se(ˆ j )) . Use this formula to find the 95% confidence interval for the hazard ratio: You can also request these endpoints by using the ‘risk limits’ option in PROC PHREG: proc phreg data=leukemia; model duration*censor(0)=group / ties=efron rl; run; 4 Cox Proportional Hazards Model in R To fit the Cox proportional hazard model in R you will need the load the preinstalled library survival. This library contains the survfit and survdiff functions used in the previous handout. > leuk.cph = coxph(Surv(Time,Censor)~Group,data=leukemia) > summary(leuk.cph) Call: coxph(formula = Surv(Time, Censor) ~ Group, data = leukemia) n= 42, number of events= 30 coef exp(coef) se(coef) z Pr(>|z|) Group2 1.5721 4.8169 0.4124 3.812 0.000138 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 Group2 4.817 0.2076 2.147 10.81 Concordance= 0.69 (se = 0.053 ) Rsquare= 0.322 (max possible= 0.988 ) Likelihood ratio test= 16.35 on 1 df, Wald test = 14.53 on 1 df, Score (logrank) test = 17.25 on 1 df, p=5.261e-05 p=0.0001378 p=3.283e-05 5 Another Example: Adding a Continuous Predictor Suppose that we also have information regarding the white blood cell count of the leukemia patients. So, we consider adding the continuous predictor (on the log base 2 scale) of the patient white blood cell count to our model. The data can be found in the file Leukemia2.sas. proc phreg data=leukemia; model duration*censor(0)=group logwbc / ties=efron rl; run; Interpretation of the Hazard Ratios To interpret the hazard ratio for a continuous predictor, we must choose an increment much like we did in logistic regression. Here, the only logical choice is an increment of 1 in the log base 2 scale, which corresponds to doubling the actual white blood cell count. Thus, after removing the effects of treatment group, a patient with twice as many white blood cells is 5.42 times more likely to die during the next very small interval in time. This implies that an increased white blood cell count has a negative effect on the survival function. Also, after removing the effects of white blood cell count, at any given time a patient who is still alive in treatment group 2 is 4 times more likely to die in the next very small time interval than a patient who is still alive in treatment group 1. 6 Confidence Limits for the Hazard Ratio for a Continuous Predictor In general, holding all other factors constant, the multiplicative risk effect associated with an increment of c units for a continuous predictor xj is exp(c ˆ j ) . A 100(1-α)% confidence interval for the hazard ratio is given by exp(c ˆ j c z 1α/2se(ˆ j )) . Discussion of Hypothesis Tests Note that SAS first tests the overall usefulness of the model: PROC PHREG also tests the following for each predictor variable: Ho: j = 0 Ha: j ≠ 0 Including Interaction Terms in the Model It is possible that the effect of a continuous predictor may not be the same for levels of a binary predictor; for example, the effect of white blood cell count on survival may differ between patients in treatment groups 1 and 2. To allow for this possible interaction, we should include an interaction term in the model: proc phreg data=leukemia2; model duration*censor(0)= group logwbc interaction / ties=efron rl; interaction = logwbc*group; run; 7 What are the conclusions of this test? 8