Cox Proportional Hazards

advertisement
Proportional Hazard
Regression
Cox Proportional Hazards
Modeling (PROC PHREG)
Consider the following data:
Drug addicts are enrolled in two different
residential treatment programs that differ in
length (treat = 0 is short, treat = 1 is long).
The patients are assigned to two different sites
(site = 0 is site A, site = 1 is site B).
Herco indicates heroine and cocaine use in the
past three months (1= heroine and cocaine use,
2 = heroine or cocaine use, 3 = neither heroine
or cocaine use).
Other variables recorded were age at time of
enrollment, ndrugtx (number of previous drug
treatments), time until return to drug use, and
censor (1=return to drug use, 0 = censored).
Reading a SAS Data Set into SAS
You will need to save the data set uis_small to
your computer. It is a SAS data set, and it can
be read into a SAS program using the following
code (making the appropriate adjustment to the
file location):
DATA uis;
SET 'C:\uis_small';
RUN;
To make sure the data set was read in
properly, print out the first 10 observations:
PROC PRINT DATA=uis (obs=10);
RUN;
The SAS System
Obs
ID
age
ndrugtx
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
39
33
33
32
24
30
39
27
40
36
1
8
3
1
5
1
34
2
3
7
treat
site
time
censor
1
1
1
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
188
26
207
144
551
32
459
22
210
184
1
1
1
1
0
1
1
1
1
1
herco
3
3
2
3
2
1
3
3
2
2
First compare survival rates for the three
categorical variables of treat, site and herco:
PROC LIFETEST DATA=uis PLOTS=(s);
TITLE 'Survival by Treatment';
TIME time*censor(0);
STRATA treat;
RUN;
PROC LIFETEST DATA=uis PLOTS=(s);
TITLE 'Survival by Site';
TIME time*censor(0);
STRATA site;
RUN;
PROC LIFETEST DATA=uis PLOTS=(s);
TITLE 'Survival by herco';
TIME time*censor(0);
STRATA herco;
RUN;
The Wilcoxon and Log-Rank Tests (output not shown) are
statistically significant (p = 0.0021, p = 0.0091, respectively).
Treatment affects risk of returning to drug use.
The Wilcoxon and Log-Rank Tests (output not shown) are not
statistically significant (p = 0.0779, p = 0.1240, respectively).
Site does not affect risk of returning to drug use.
The Wilcoxon and Log-Rank Tests (output not shown) are not
statistically significant (p = 0.2919, p = 0.1473, respectively). Herco
does not affect risk of returning to drug use, although the curves do
cross initially, so this may affect these statistical tests.
Now examine if ndrugtx and age affect the risk of
returning to drug use. Because these are
continuous variables, we will use proportional
hazard regression (PROC PHREG):
PROC PHREG DATA=uis;
MODEL time*censor(0) = ndrugtx;
RUN;
PROC PHREG DATA=uis;
MODEL time*censor(0) = age;
RUN;
Output from PHREG: ndrugtx
Interpreting the Output
• The proportional hazards regression model for these
data with ndrugtx as the predictor is:
λ(t) = λo(t)exp(0.02937*ndrugtx)
• The relative risk of a 1 unit increase in the number of
previous drug treatments is:
= λo(t)exp(0.02937*1)/ λo(t)exp(0.02937*0)
= exp(0.02937-0) = exp(0.02937) = 1.03
• With each increase in the number of prior drug
treatments, the risk of relapsing increases by 3% (1.031.00).
• Notice that the SAS output also gives you this relative
risk under “Hazard Ratio.”
• This term is significant (p<0.0001), which indicates that
prior drug treatments affect risk of relapse.
Output from PHREG: age
Interpreting the Output: Age
• The proportional hazards regression model for
these data with age as the predictor is:
λ(t) = λo(t)exp(-0.01286*age)
• The relative risk of a 1 year increase in age at
enrollment is:
= λo(t)exp(-0.01286*1)/ λo(t)exp(-0.01286*0)
= exp(-0.01286-0) = exp(-0.01286) = 0.987
• With each year increase in age of enrollment,
the risk of relapsing decreases by 1.3% (1.000.987).
• Notice that the SAS output also gives you this
relative risk under “Hazard Ratio.”
• Age is not significantly related to risk, however
(p=0.735).
The Full Model
First consider the full model with all of the predictor
variables. As part of the PHREG procedure, we will
create 2 new variables: herco2 and herco3. In addition,
we will conduct a test labeled “herco” to determine
whether both of these variables together are significant.
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx treat
site herco2 herco3;
herco2 = herco=2;
herco3 = herco=3;
herco: TEST herco2, herco3;
RUN;
Results from “herco” test
The test of our two new variables, herco2 and
herco3, is non-significant (p = 0.1130), so we will
drop herco from our model and run the refitted
model.
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx treat site;
RUN;
Output from Model w/o herco
All of the terms in the model are significant,
except for site, which is approaching
significance. Because we know from
previous research that site is important,
we will leave it in our model.
We will now check six different interactions
in our model, to see if any significant ones
exist: ndrugtx*age, ndrugtx*treat,
ndrugtx*site, age*treat, age*site, treat*site
Adding ndrugtx*age to the model (notice you
can create the interaction term within the
PHREG procedure):
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx
treat site drugage;
drugage = ndrugtx*age;
RUN;
ndrugtx*age interaction not significant
Adding ndrugtx*treat to the model
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx
treat site drugtreat;
drugtreat = ndrugtx*treat;
RUN;
ndrugtx*treat not significant
Adding ndrugtx*site to the model
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx
treat site drugsite;
drugsite = ndrugtx*site;
RUN;
ndrugtx*site not significant
Adding age*treat to the model
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx
treat site agetreat;
agetreat = age*treat;
RUN;
age*treat not significant
Adding age*site to the model
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx
treat site agesite;
agesite = age*site;
RUN;
age*site interaction IS significant
Adding treat*site to the model
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx
treat site treatsite;
treatsite = treat*site;
RUN;
treat*site not significant
Final Model Selection
Not only was the age*site interaction
significant, but once we included it in our
model, the site term also became
statistically significant.
The final proportional hazard model is:
λ(t) = λo(t)exp(β1*age + β2*ndrugtx + β3*treat
+ β4*site + β5*treatsite)
λ(t) = λo(t)exp(-0.034*age + 0.036*ndrugtx –
0.267*treat – 1.246*site + 0.034*treatsite)
Testing Proportionality
The Cox proportional hazard regression we
have just conducted assumes that the
risks are proportional, that is, that the
proportion is constant over time.
To test this assumption of proportionality, we
use time-dependent variables and test
whether they are significant. If they are
not significant, it means that time does not
affect the relative risk, and we can
conclude that the risks in our model are
proportional.
Creating and testing time-dependent
varibles (on the log scale):
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx treat
site agesite aget drugt treatt sitet;
agesite = age*site;
aget = age*log(time);
drugt = ndrugtx*log(time);
treatt = treat*log(time);
sitet = site*log(time);
test_proportionality: TEST aget, drugt,
treatt, sitet;
RUN;
Testing Proportionality Output
The test we labeled “test_proportionality” is
not significant (p = 0.7309), which means
that none of our time-dependent variables
are significant.
We can assume proportionality over time.
If we cannot assume proportionality…
If the assumption of proportionality was not met,
we could stratify across the variable that does
not have a proportionate risk.
For example, if we found the variable treat to be
not proportional, we could stratify on that
variable:
PROC PHREG DATA=uis;
MODEL time*censor(0) = age ndrugtx site
agesite;
agesite = age*site;
STRATA treat;
RUN;
Output stratifying on treat
Download