Section 21

advertisement
STAT 405 - BIOSTATISTICS
Handout 21 – Survival Analysis: Cox Proportional Hazards Regression
In this section, we will examine regression models for the hazard function h(t). Before
examine the form of these models discuss the role of the hazard function in a general
sense.
1
As noted earlier, the log rank test carried out in PROC LIFETEST is useful when time to
event is important (i.e., not just the occurrence of an event). Though not discussed here,
this test can be extended to control for other variables by adding covariates to the model.
However, another approach exists for analyzing survival data which is much more
convenient: proportional hazards regression. As with other regression models, the
identification of significant covariates and the interpretation of the estimated model
coefficients is of primary concern.
Cox Proportional Hazards Model
Under a proportional hazards model, the hazard function h(t) is modeled as
ℎ(𝑡) = ℎ𝑜 (𝑡)exp⁡(𝜂1 𝑢1 + 𝜂2 𝑢2 + ⋯ + 𝜂𝑘 𝑢𝑘 )
Comments:
1. The terms u1, u2, … uk represent a collection of independent variables, created
from the covariates x1, x2, …,xp.
2. The function h0(t) represents the baseline hazard at time t; this is the hazard for a
person with u1= u2 = … = uk= 0. Note that this function cannot be negative.
3. The exponential function is used so the hazard function is positive for all t.
Cox proportional hazards regression can be carried out in SAS using PROC PHREG.
Consider our Leukemia data:
proc phreg data=leukemia;
model duration*censor(0)=group / ties=efron;
run;
2
Interpretation of the Hazard Ratio
For this example with a single predictor variable, we have h(t) = h0(t)exp(1u1).
 0 for group 1
1 for group 2.
Let u 1  
Then the hazard ratio (HR) is given as follows:
The estimated HR is 𝑒 𝜂̂1 =⁡4.817 for this example. This means that at any given time, a
patient who is still alive in treatment group 2 is 4.82 times more likely to die in the next
very small time interval than a patient who is still alive in treatment group 1. That is, the
instantaneous relative risk of death for a person in treatment group 2 compared to a
person in treatment group 1 is 4.82. Note that this does NOT imply that the probability
of surviving for a given time t is 4.82 times larger for group 1!
Discussion of Hypothesis Tests
PROC PHREG automatically tests the following hypotheses:
Ho: j = 0
Ha: j ≠ 0
SAS provides three different test statistics, two of which are discussed below:
Wald test statistic:
3
Likelihood ratio test statistic:
Confidence Limits for the Hazard Ratio
For a binary independent variable, a 100(1-α)% confidence interval is given by
exp(ˆ j  z 1α/2 se(ˆ j )) .
Use this formula to find the 95% confidence interval for the hazard ratio:
You can also request these endpoints by using the ‘risk limits’ option in PROC PHREG:
proc phreg data=leukemia;
model duration*censor(0)=group / ties=efron rl;
run;
4
Cox Proportional Hazards Model in R
To fit the Cox proportional hazard model in R you will need the load the preinstalled library survival. This library contains the survfit and survdiff
functions used in the previous handout.
> leuk.cph = coxph(Surv(Time,Censor)~Group,data=leukemia)
> summary(leuk.cph)
Call:
coxph(formula = Surv(Time, Censor) ~ Group, data = leukemia)
n= 42, number of events= 30
coef exp(coef) se(coef)
z Pr(>|z|)
Group2 1.5721
4.8169
0.4124 3.812 0.000138 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
Group2
4.817
0.2076
2.147
10.81
Concordance= 0.69 (se = 0.053 )
Rsquare= 0.322
(max possible= 0.988 )
Likelihood ratio test= 16.35 on 1 df,
Wald test
= 14.53 on 1 df,
Score (logrank) test = 17.25 on 1 df,
p=5.261e-05
p=0.0001378
p=3.283e-05
5
Another Example: Adding a Continuous Predictor
Suppose that we also have information regarding the white blood cell count of the
leukemia patients. So, we consider adding the continuous predictor (on the log base 2
scale) of the patient white blood cell count to our model. The data can be found in the
file Leukemia2.sas.
proc phreg data=leukemia;
model duration*censor(0)=group logwbc / ties=efron rl;
run;
Interpretation of the Hazard Ratios
To interpret the hazard ratio for a continuous predictor, we must choose an increment
much like we did in logistic regression. Here, the only logical choice is an increment of 1
in the log base 2 scale, which corresponds to doubling the actual white blood cell count.
Thus, after removing the effects of treatment group, a patient with twice as many white
blood cells is 5.42 times more likely to die during the next very small interval in time.
This implies that an increased white blood cell count has a negative effect on the survival
function.
Also, after removing the effects of white blood cell count, at any given time a patient who
is still alive in treatment group 2 is 4 times more likely to die in the next very small time
interval than a patient who is still alive in treatment group 1.
6
Confidence Limits for the Hazard Ratio for a Continuous Predictor
In general, holding all other factors constant, the multiplicative risk effect associated
with an increment of c units for a continuous predictor xj is exp(c  ˆ j ) . A 100(1-α)%
confidence interval for the hazard ratio is given by exp(c  ˆ j  c  z 1α/2se(ˆ j )) .
Discussion of Hypothesis Tests
Note that SAS first tests the overall usefulness of the model:
PROC PHREG also tests the following for each predictor variable:
Ho: j = 0
Ha: j ≠ 0
Including Interaction Terms in the Model
It is possible that the effect of a continuous predictor may not be the same for levels of a
binary predictor; for example, the effect of white blood cell count on survival may differ
between patients in treatment groups 1 and 2. To allow for this possible interaction, we
should include an interaction term in the model:
proc phreg data=leukemia2;
model duration*censor(0)= group logwbc interaction /
ties=efron rl;
interaction = logwbc*group;
run;
7
What are the conclusions of this test?
8
Download