Section 19

advertisement
STAT 405 - BIOSTATISTICS
Handout 19 – Survival Analysis: Cox Proportional Hazards Regression
In this section, we will examine regression models for the hazard function h(t). Before
examine the form of these models we will discuss the role of the hazard function in a
general sense.
1
As noted earlier, the log rank or Wilcoxon tests are carried out in situations where the
time to event is important (i.e., not just the occurrence of an event which we would
typically analyze using logistic regression). Though not discussed here, the log rank test
can be extended to control for other variables by adding covariates to the model.
However, another approach exists for analyzing survival data which is much more
convenient: proportional hazards regression. As with other regression models, the
identification of significant covariates and the interpretation of the estimated model
coefficients is of primary concern.
Cox Proportional Hazards Model
Under a proportional hazards model, the hazard function h(t) is modeled as
ℎ(𝑡) = ℎ𝑜 (𝑡)exp⁡(𝜂1 𝑢1 + 𝜂2 𝑢2 + ⋯ + 𝜂𝑘 𝑢𝑘 )
Comments:
1. The terms u1, u2, … uk represent a collection of independent variables, created
from the potential covariates/predictors x1, x2, …,xp.
2. The function h0(t) represents the baseline hazard at time t; this is the hazard for a
person with u1= u2 = … = uk= 0. Note that this function cannot be negative.
3. The exponential function is used so the hazard function is positive for all t.
Cox proportional hazards regression can be carried out in JMP using the Reliability >
Survival > Fit Proportional Hazards option:
Here the dialog box has been set up to fit a Cox proportional hazards model (Cox PH
model) to the leukemia data using treatment group as the effect of interest. Again we
need to include information about censoring as well as the code (0 or 1) used to indicate
censoring.
2
Consider again the Leukemia data where we the interest is in comparing the survival
time of the two groups of patients.
Interpretation of the Hazard Ratio
For this example with a single predictor variable, we have h(t) = h0(t)exp(1u1).
 - 1 for treatm ent group 1
 1 for treatm ent group 2.
Let u 1  
Then the hazard ratio (HR) is given as follows:
The estimated HR is 𝑒 𝜂̂1 =⁡4.52 for this example. This means that at any given time, a
patient who is still alive in treatment group 2 is 4.52 times more likely to die in the next
very small time interval than a patient who is still alive in treatment group 1. That is, the
3
instantaneous relative risk of death for a person in treatment group 2 compared to a
person in treatment group 1 is 4.52. Note that this does NOT imply that the probability
of surviving for a given time t is 4.52 times larger for group 1!
Discussion of Hypothesis Tests
For testing the significance of a term in our Cox PH model we test the following
hypotheses:
Ho: j = 0
Ha: j ≠ 0
Wald test statistic:
𝑧 =⁡
𝜂̂ 𝑗
~𝑁(0,1)
𝑆𝐸(𝜂̂ 𝑗 )
However if we square this statistics we obtain a chi-square statistic which can then be
converted to a p-value.
Likelihood ratio test statistic:
JMP uses likelihood ratio tests (𝜒 2 ) for individual model effects. The details of this test
are not important but the results will similar to the Wald tests above. JMP reports the
chi-square test statistic and p-value for each effect in the model.
Confidence Limits for the Hazard Ratio
For a binary independent variable in JMP, a 100(1-α)% confidence interval is given by
exp(2(ˆ j  z1α/2se(ˆ j )))  multiplying by 2 is necessary because of +1/-1 coding
Use this formula to find the 95% confidence interval for the hazard ratio:
4
If you select Risk Ratios from the Proportional Hazards Fit drop down menu, JMP
fill compute RR/HR estimates and CI’s for you.
The risk ratio/hazard ratios and CI’s are reported using both levels as the reference
group. This is analgous to what was done in logistic regression as well.
5
Cox Proportional Hazards Model in R
To fit the Cox proportional hazard model in R you will need the load the preinstalled library survival. This library contains the survfit and survdiff
functions used in the previous handout.
> leuk.cph = coxph(Surv(Time,Censor)~Group,data=leukemia)
> summary(leuk.cph)
Call:
coxph(formula = Surv(Time, Censor) ~ Group, data = leukemia)
n= 42, number of events= 30
coef exp(coef) se(coef)
z Pr(>|z|)
Group2 1.5721
4.8169
0.4124 3.812 0.000138 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
Group2
4.817
0.2076
2.147
10.81
Concordance= 0.69 (se = 0.053 )
Rsquare= 0.322
(max possible= 0.988 )
Likelihood ratio test= 16.35 on 1 df,
Wald test
= 14.53 on 1 df,
Score (logrank) test = 17.25 on 1 df,
p=5.261e-05
p=0.0001378
p=3.283e-05
6
Another Example: Adding a Continuous Predictor
Suppose that we also have information regarding the white blood cell count of the
leukemia patients. So, we consider adding the continuous predictor (on the log base 2
scale) of the patient white blood cell count to our model. The data can be found in the
file Leukemia.JMP.
The model dialog below has been
set up to include the log base 2
white blood cell term to our model
for treatment group.
The resulting output is shown below:
Interpretation of the Hazard Ratios
7
To interpret the hazard ratio for a continuous predictor, we must choose an increment
much like we did in logistic regression. Here, the only logical choice is an increment of 1
in the log base 2 scale, which corresponds to doubling the actual white blood cell count.
Thus, after removing the effects of treatment group, a patient with twice as many white
blood cells is 4.97 times more likely to die during the next very small interval in time.
This implies that an increased white blood cell count has a negative effect on the survival
function.
Also, after removing the effects of white blood cell count, at any given time a patient who
is still alive in treatment group 2 is 3.65 times more likely to die in the next very small
time interval than a patient who is still alive in treatment group 1.
Confidence Limits for the Hazard Ratio for a Continuous Predictor
In general, holding all other factors constant, the multiplicative risk effect associated
exp(c  ˆ j ) . A 100(1-α)%
confidence interval for the hazard ratio is given by exp(c  ˆ j  c  z 1α/2se(ˆ j )) .
with an increment of c units for a continuous predictor xj is
Discussion of Hypothesis Tests
The following hypotheses are tested for each term in the model.
Ho: j = 0
Ha: j ≠ 0
We can again use JMP to provide estimates of the risk ratio/hazard ratio for each
variable.
8
Including Interaction Terms in the Model
It is possible that the effect of a continuous predictor may not be the same for levels of a
binary predictor; for example, the effect of white blood cell count on survival may differ
between patients in treatment groups 1 and 2. To allow for this possible interaction, we
should include an interaction term in the model which is done in JMP as shown below:
What are the conclusions of this test?
9
Download