STAT 405 - BIOSTATISTICS Handout 19 – Survival Analysis: Cox Proportional Hazards Regression In this section, we will examine regression models for the hazard function h(t). Before examine the form of these models we will discuss the role of the hazard function in a general sense. 1 As noted earlier, the log rank or Wilcoxon tests are carried out in situations where the time to event is important (i.e., not just the occurrence of an event which we would typically analyze using logistic regression). Though not discussed here, the log rank test can be extended to control for other variables by adding covariates to the model. However, another approach exists for analyzing survival data which is much more convenient: proportional hazards regression. As with other regression models, the identification of significant covariates and the interpretation of the estimated model coefficients is of primary concern. Cox Proportional Hazards Model Under a proportional hazards model, the hazard function h(t) is modeled as ℎ(𝑡) = ℎ𝑜 (𝑡)exp(𝜂1 𝑢1 + 𝜂2 𝑢2 + ⋯ + 𝜂𝑘 𝑢𝑘 ) Comments: 1. The terms u1, u2, … uk represent a collection of independent variables, created from the potential covariates/predictors x1, x2, …,xp. 2. The function h0(t) represents the baseline hazard at time t; this is the hazard for a person with u1= u2 = … = uk= 0. Note that this function cannot be negative. 3. The exponential function is used so the hazard function is positive for all t. Cox proportional hazards regression can be carried out in JMP using the Reliability > Survival > Fit Proportional Hazards option: Here the dialog box has been set up to fit a Cox proportional hazards model (Cox PH model) to the leukemia data using treatment group as the effect of interest. Again we need to include information about censoring as well as the code (0 or 1) used to indicate censoring. 2 Consider again the Leukemia data where we the interest is in comparing the survival time of the two groups of patients. Interpretation of the Hazard Ratio For this example with a single predictor variable, we have h(t) = h0(t)exp(1u1). - 1 for treatm ent group 1 1 for treatm ent group 2. Let u 1 Then the hazard ratio (HR) is given as follows: The estimated HR is 𝑒 𝜂̂1 =4.52 for this example. This means that at any given time, a patient who is still alive in treatment group 2 is 4.52 times more likely to die in the next very small time interval than a patient who is still alive in treatment group 1. That is, the 3 instantaneous relative risk of death for a person in treatment group 2 compared to a person in treatment group 1 is 4.52. Note that this does NOT imply that the probability of surviving for a given time t is 4.52 times larger for group 1! Discussion of Hypothesis Tests For testing the significance of a term in our Cox PH model we test the following hypotheses: Ho: j = 0 Ha: j ≠ 0 Wald test statistic: 𝑧 = 𝜂̂ 𝑗 ~𝑁(0,1) 𝑆𝐸(𝜂̂ 𝑗 ) However if we square this statistics we obtain a chi-square statistic which can then be converted to a p-value. Likelihood ratio test statistic: JMP uses likelihood ratio tests (𝜒 2 ) for individual model effects. The details of this test are not important but the results will similar to the Wald tests above. JMP reports the chi-square test statistic and p-value for each effect in the model. Confidence Limits for the Hazard Ratio For a binary independent variable in JMP, a 100(1-α)% confidence interval is given by exp(2(ˆ j z1α/2se(ˆ j ))) multiplying by 2 is necessary because of +1/-1 coding Use this formula to find the 95% confidence interval for the hazard ratio: 4 If you select Risk Ratios from the Proportional Hazards Fit drop down menu, JMP fill compute RR/HR estimates and CI’s for you. The risk ratio/hazard ratios and CI’s are reported using both levels as the reference group. This is analgous to what was done in logistic regression as well. 5 Cox Proportional Hazards Model in R To fit the Cox proportional hazard model in R you will need the load the preinstalled library survival. This library contains the survfit and survdiff functions used in the previous handout. > leuk.cph = coxph(Surv(Time,Censor)~Group,data=leukemia) > summary(leuk.cph) Call: coxph(formula = Surv(Time, Censor) ~ Group, data = leukemia) n= 42, number of events= 30 coef exp(coef) se(coef) z Pr(>|z|) Group2 1.5721 4.8169 0.4124 3.812 0.000138 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 Group2 4.817 0.2076 2.147 10.81 Concordance= 0.69 (se = 0.053 ) Rsquare= 0.322 (max possible= 0.988 ) Likelihood ratio test= 16.35 on 1 df, Wald test = 14.53 on 1 df, Score (logrank) test = 17.25 on 1 df, p=5.261e-05 p=0.0001378 p=3.283e-05 6 Another Example: Adding a Continuous Predictor Suppose that we also have information regarding the white blood cell count of the leukemia patients. So, we consider adding the continuous predictor (on the log base 2 scale) of the patient white blood cell count to our model. The data can be found in the file Leukemia.JMP. The model dialog below has been set up to include the log base 2 white blood cell term to our model for treatment group. The resulting output is shown below: Interpretation of the Hazard Ratios 7 To interpret the hazard ratio for a continuous predictor, we must choose an increment much like we did in logistic regression. Here, the only logical choice is an increment of 1 in the log base 2 scale, which corresponds to doubling the actual white blood cell count. Thus, after removing the effects of treatment group, a patient with twice as many white blood cells is 4.97 times more likely to die during the next very small interval in time. This implies that an increased white blood cell count has a negative effect on the survival function. Also, after removing the effects of white blood cell count, at any given time a patient who is still alive in treatment group 2 is 3.65 times more likely to die in the next very small time interval than a patient who is still alive in treatment group 1. Confidence Limits for the Hazard Ratio for a Continuous Predictor In general, holding all other factors constant, the multiplicative risk effect associated exp(c ˆ j ) . A 100(1-α)% confidence interval for the hazard ratio is given by exp(c ˆ j c z 1α/2se(ˆ j )) . with an increment of c units for a continuous predictor xj is Discussion of Hypothesis Tests The following hypotheses are tested for each term in the model. Ho: j = 0 Ha: j ≠ 0 We can again use JMP to provide estimates of the risk ratio/hazard ratio for each variable. 8 Including Interaction Terms in the Model It is possible that the effect of a continuous predictor may not be the same for levels of a binary predictor; for example, the effect of white blood cell count on survival may differ between patients in treatment groups 1 and 2. To allow for this possible interaction, we should include an interaction term in the model which is done in JMP as shown below: What are the conclusions of this test? 9