Chapter 5-23. Cox Regression Proportional Hazards (PH) Assumption

advertisement
Chapter 5-23. Cox Regression Proportional Hazards (PH) Assumption
In this chapter, we will discuss the c-statistic for Cox regression, as a way to assess goodness-offit, or more correctly, the discrimatory ability of the model. We will discuss the proportional
hazards assumption of Cox regression and how to test the assumption. We will discuss the
“stratification” approach for dealing with violations of the proportional hazards assumption, in a
detailed Cox regression example.
We will begin with the same dataset we used to introduce Cox regression in Chapter 5-7, the
LeeLife dataset (see box).
LeeLife dataset
This dataset came from Lee (1980, Table 3.5, p.31), which originally came from Myers (1969).
The data concern male patients with localized cancer of the rectum diagnosed in Connecticut
from 1935 to 1954. The research question is whether survival improved for the 1945-1954
cohort of patients (cohort = 1) relative to the earlier 1935-1944 cohort (cohort = 0).
Data Codebook
________________________________
id
study ID number
cohort
1 = 1945-1955 patient cohort
0 = 1935-1944 patient cohort
interval
1 to 10, time interval (year) following cancer diagnosis
11 = still alive and being followed at end of year 10
died
1 = died
0 = withdrawn alive or lost to follow-up during year interval
_________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
Chapter 5-23 (revision 16 May 2010)
p. 1
Reading the data in,
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on LeeLife.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\LeeLife.dta",
clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do-files"
use LeeLife.dta, clear
In preparation for using survival time commands, including Cox regression, which begin with st,
we use the stset command to inform stata which is the death, or event, variable, and which is the
time variable.
Statistics
Survival analysis
Setup & utilities
Declare data to be survival time data
Main tab: Time variable: interval
Failure variable: died
OK
stset interval, failure(died)
Chapter 5-23 (revision 16 May 2010)
p. 2
Assessing Goodness of Fit with the c Statistic
Fitting a Cox regression model using,
Statistics
Survival analysis
Regression models
Cox proportional hazards model
Model tab: Independent variables: cohort
OK
stcox cohort
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
1137
798
4835
-5245.5703
Number of obs
=
1137
LR chi2(1)
Prob > chi2
=
=
39.74
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------cohort |
.6282795
.0454925
-6.42
0.000
.5451539
.7240802
------------------------------------------------------------------------------
Using the hazard ratio as an estimate of relative risk, we see that the later cohort had a smaller
risk of dying from cancer than the earlier cohort (HR = 0.63, p < 0.001).
In the logistic regression part of the course, we assessed goodness of fit using the c-statistic. The
same statistic is used to assess goodness of fit, or discrimatory value, in Cox regression.
The original paper for the c-statistic, as it has become known, was published for use with Cox
regression. (Harrell, et al., 1982) Harrell presented it as,
“Draw a pair of patients and determine which patient lived longer from his baseline
evaluation. Survival times can be validly compared either when both patients have died,
or when one has died and the other’s follow-up time has exceeded the survival time of the
first. If both patients are still alive, which will live longer is not known, and that pair of
patients is not used in the analysis. Otherwise, it can be determined whether the patient
with the higher prognostic score (i.e., the weighted combination of baseline and test
variables used to predict survival) also had the longer survival time. The process is
repeated until all possible pairs of patients have been examined. Of the pairs of patients
for which the ordering of survival times could be inferred, the fraction of pairs such that
the patient with the higher score had the longer survival time will be denoted by c.
The index c estimates the probability that, of two randomly chosen patients, the patient
with the higher prognostic score will outlive the patient with the lower prognostic score.
Values of c near .5 indicate that the prognostic score is no better than a coin-flip in
determining which patient will live longer. Values of c near 0 or 1 indicate the baseline
data virtually always determine which patient has a better prognosis.”
Chapter 5-23 (revision 16 May 2010)
p. 3
Computing the c-statistic,
Statistics
Survival analysis
Regression models
Test proportional hazards assumption
Main tab: Reports and statistics: Harrell’s C index (concordance)
OK
estat concordance
Harrell's C concordance statistic
failure _d:
analysis time _t:
Number
Number
Number
Number
of
of
of
of
died
interval2
subjects (N)
=
comparison pairs (P)
=
orderings as expected (E) =
tied predictions (T)
=
1137
488830
150665
262446
Harrell's C = (E + T/2) / P =
Somers' D =
.5767
.1533
We see that the c = .58 does not achieve the 0.70 mark for acceptable discrimination (see box)
for a prognostic model, but that is fine since we are not attempting to derive one.
Rule of Thumb for Interpreting ROC and c-statistic
Hosmer and Lemeshow (2000, p. 162) suggest the following general rule for interpreting the
area under the ROC curve:
ROC =0.5 suggests no discrimination (i.e., no better than flipping a coin)
0.7  ROC < 0.8 is considered acceptable discrimination
0.8  ROC < 0.9 is considered excellent discrimination
ROC  0.9 is considered outstanding discrimination (extremely unusual to observe this in
practice)
The same rule of thumb holds for the c-statistic, since the c-statistic is identically the area under
the ROC curve (Hosmer and Lemeshow, 2000, p.163; Harrell, 2001, p.248).
Chapter 5-23 (revision 16 May 2010)
p. 4
Testing the Proportional Hazards Assumption of Cox Regression
In epidemiology courses, when the topic of a stratified analysis is presented, it is pointed out that
Mantel-Haenszel pooled estimate requires an assumption of homogeneity of stratum-specific
effect estimates (Chapter 3-11, p.7). This is required for the pooled estimate, or summary
estimate, to be a good representation of what happens in the individual strata.
Since we verified in Chapter 5-7 that the Cox regression HR closely resembles the individual
time strata RR estimates, it makes since that Cox regression has a similiar assumption. In Cox
regression, this is called the proportional hazards assumption.
The assumption actually arises in how the log likelihood is specified, so that the assumption is an
inherent part of the way the regression coefficients are estimated (Harrell, 2001, pp.466-468).
To test the proportional hazards assumption using a significance test approach, we use
estat phtest , detail
Test of proportional-hazards assumption
Time: Time
---------------------------------------------------------------|
rho
chi2
df
Prob>chi2
------------+--------------------------------------------------cohort
|
0.04554
1.64
1
0.1999
------------+--------------------------------------------------global test |
1.64
1
0.1999
----------------------------------------------------------------
We see that the proportional hazards assumption is justified, since the test is not significant (p =
0.1999).
The first section of the table lists each predictor separately, testing the proportional hazards
assumption for that predictor specifically. The second section provides an overall test, the global
test, which is a test that the model, over all, meets the proportional hazards assumption. If you
leave off the “detail” option, you just get the global test.
The test for the individual predictors uses the unscaled Schoenfeld residuals, while the global test
uses the scaled Schoenfeld residuals (Grambsch and Therneau, 1994). If significant, then the PH
assumption is rejected. See the Kleinbaum quote on the next page, which justified using a small
alpha here (p< 0.01 or maybe even p<.001).
___________
Note: In Stata version 9, this test was done using:
capture drop sc*
stcox cohort , schoenfeld(sch*) scal(sc*)
stphtest, detail
Chapter 5-23 (revision 16 May 2010)
p. 5
Protocol Suggestion
Suggested wording for describing using the PH test based on Schoenfeld residuals is:
A test of the proportional hazards, a required assumption of Cox regression, will be
performed for each covariate and globally using a formal significance test based on the
unscaled and scaled Schoenfeld residuals. (Grambsch and Therneau, 1994)
Example Sjöström et al (N Engl J Med, 2007) reported in their Statistical Analysis section using
Schoenfeld residuals to test the proportional hazards assumption,
“Schoenfeld residuals from the models were examined to assess possible departures from
model assumptions.32”
____________
32
Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on
weighted residuals. Biometrika 1994;81:515-526.
Chapter 5-23 (revision 16 May 2010)
p. 6
To test the proportional hazards assumption with a graphical approach, we use
Statistics
Survival analysis
Regression models
Graphically assess proportional hazards assumption
Main tab: Independent variable: cohort
OK
-.5
0
.5
1
1.5
stphplot, by(cohort)
0
.5
1
1.5
ln(analysis time)
cohort = 0
2
2.5
cohort = 1
The advantage of the graphical approach over the signficance test is that it permits us to be more
liberal in our assessment. If these lines look approximately parallel, then the PH assumption is
met. If the lines cross, then clearly there is a problem. Other than lines crossing, it might be
okay to assume the assumption is met. Kleinbaum (1996, p.141) recommends,
“We recommend that one should use a conservative strategy for this decision of assuming
the PH assumption is satisfied unless there is strong evidence of nonparallelism of the
log-log curves.”
Revised Protocol Suggestion
There is no reason not to use both methods, graphical and significance test, and consider the
evidence from both approaches. Here is another way to state how the PH assumption will be
tested:
A test of the proportional hazards, a required assumption of Cox regression, will be
performed for each covariate and globally using a formal significance test based on the
unscaled and scaled Schoenfeld residuals (Grambsch and Therneau, 1994). In addition, a
graphical assessment of proportional hazards will be made using log-log survival curves.
Chapter 5-23 (revision 16 May 2010)
p. 7
Testing Proportional Hazards With a Time-Dependent Covariate
The proportional hazards (PH) assumption is just another way of saying that the hazard ratio
(HR) does not change over time for some predictor X. If the HR does change over time for X,
then X is said to interact with time, so adding an X × time interaction term provides a better
fitting model and fixes the problem of non-PH. To test the PH assumption, then, you can add a
X × time interaction term and test it for significance. If not significant, then the PH assumption
is satisfied and the interaction term can be dropped from the model. If significant, then the
interaction term can be kept in the model as a fix for the PH assumption violation.
Adding a X × time interaction term is done using the time-dependent covariate opion, tvc( ), and
the time expression, or function analysis time, option, texp( ), which uses as an argument _t, the
time variable created by the stset command.
Although it is mostly a matter of taste, some researchers like to use the log of time, ln(t), instead
of the original time, t, which would be specified as texp(ln(_t)) instead of texp(_t).
There is no clear guideline, so a more complete approach would be to try both. (Cleves, et al,
2004, p.177).
Testing for a cohort × time interaction,
stcox cohort, tvc(cohort) texp(_t)
Cox regression -- Breslow method for ties
No. of subjects =
1137
No. of failures =
798
Time at risk
=
4835
Number of obs
=
1137
LR chi2(2)
=
41.39
Log likelihood =
-5244.7466
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------rh
|
cohort |
.5613034
.063886
-5.07
0.000
.4490724
.701583
-------------+---------------------------------------------------------------t
|
cohort |
1.043777
.0350215
1.28
0.202
.9773441
1.114725
-----------------------------------------------------------------------------Note: Second equation contains variables that continuously vary with respect to
time; variables are interacted with current values of _t.
The output is now split into two panels. The first contains the predictors that are constant with
time, or fixed covariates (rh). The second contains the time-varying covariates (t), which a
footnote to remind you which is which, and also to remind you what function of time was used.
Chapter 5-23 (revision 16 May 2010)
p. 8
Testing for a cohort × time interaction, but this time using the log of time,
stcox cohort, tvc(cohort) texp(ln(_t))
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
1137
798
4835
Number of obs
=
1137
LR chi2(2)
=
41.28
Log likelihood =
-5244.7997
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------rh
|
cohort |
.5773083
.0573695
-5.53
0.000
.4751388
.7014475
-------------+---------------------------------------------------------------t
|
cohort |
1.131921
.1133381
1.24
0.216
.9302206
1.377355
-----------------------------------------------------------------------------Note: Second equation contains variables that continuously vary with respect to
time; variables are interacted with current values of ln(_t).
Usually you would not include the variable twice like this in Cox regression. If you really
wanted to model cohort as a time-dependent covariate, you would just incude it in the tvc( )
option. Separating like we did, however, allows us to separate the predictor into a “main effect”
and the “interaction” effect.
We see the p=0.202 with non-log transformed time and p=0.216 with log-transformed time are
very similar to the Schoenfeld residuals test for proportional hazards computed above, p=0.1999.
We should conclude that the proportion hazards assumption is met and drop the interaction term.
Our final model is
stcox cohort
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
1137
803
4841
Number of obs
=
1137
LR chi2(1)
=
42.31
Log likelihood =
-5271.7697
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------cohort |
.6204562
.0447145
-6.62
0.000
.5387255
.7145865
------------------------------------------------------------------------------
For completeness, if we had chosen to model cohort as a time-dependent covariate, we would
just include in the in tvc( ) option. Trying this,
stcox , tvc(cohort) texp(_t)
option tvc() not allowed
r(198);
we get an error message. It turns out that Stata needs at least one variable in front of the comma.
Chapter 5-23 (revision 16 May 2010)
p. 9
Since we don’t have any other predictor in this dataset, we can include a variable that contains all
ones, which represents the baseline hazard. Ordinarily you would not do this, because it is done
anyway behind the scenes.
gen ones = 1
stcox ones , tvc(cohort) texp(_t)
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
1137
798
4835
Number of obs
=
1137
LR chi2(1)
=
15.96
Log likelihood =
-5257.4616
Prob > chi2
=
0.0001
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------t
|
cohort |
.9173471
.0195219
-4.05
0.000
.8798719
.9564184
-----------------------------------------------------------------------------Note: Second equation contains variables that continuously vary with respect to
time; variables are interacted with current values of _t.
This is not as impressive as our fixed predictor approach (HR=0.62) on the previous page.
Trying log time,
stcox ones , tvc(cohort) texp(ln(_t))
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
1137
798
4835
Number of obs
=
1137
LR chi2(1)
=
11.36
Log likelihood =
-5259.7624
Prob > chi2
=
0.0008
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------t
|
cohort |
.7780817
.0568277
-3.44
0.001
.6743061
.8978282
-----------------------------------------------------------------------------Note: Second equation contains variables that continuously vary with respect to
time; variables are interacted with current values of ln(_t).
This time we get a slightly more impressive result, but still not as good as the fixed approach.
The time-dependent predictor variables are a type of combined main effect and interaction term
with time, so you don’t need to have the same variable listed before the comma to model the
variable.
Chapter 5-23 (revision 16 May 2010)
p. 10
A Detailed Worked Example
In this detailed example, we will go into much more analysis detail than you would probably care
to do in practice. It is a Rolls Royce approach.
We will analyze the Reyes dataset, reyeswithdates.dta (see box).
Reyeswithdates dataset
This dataset came from Cleves et al. (2004, p.185). Cleves’ file reyes.dta was modified slightly,
by replacing the days variable with beginning and ending dates, representing the same number of
days of follow-up.
This is a randomized clinical trial involving N=150 children diagnosed with Reye’s syndrome.
Study subjects were randomized to a new medication or to a standard medication. The study
hypothesis is that the new treatment will be effective in preventing death from Reye’s syndrome.
Cleves’ describes the dataset,
“Reye’s syndrome is a rare disease, usually affecting children under the age of
fifteen who are recovering from an upper respiratory illness, chicken pox, or flu. The
condition causes severe brain swelling and inflammation of the liver. This acute illness
requires immediate and aggressive medical attention. The earlier the disease is
diagnosed, the better the chances of a successful recovery. Treatment protocols include
drugs to control the brain swelling and intravenuous fluids to restore normal blood
chemistry.
For this study of a new medication to control the brain swelling, and thus to
prevent death, 150 Reye’s syndrome patients were randomly allocated at time of hospital
presentation to either the standard high-dose barbiturate treatment protocol or to a
treatment protocol that included the new experimental drug. The time from treatment
allocation to death or end of follow-up was recorded in days.”
Data Codebook
id
begindate
enddate
dead
treat
age
sex
ftliver
ammonia
sgot
study ID number (one observation per subject)
date of treatment allocation
date of death or end of follow-up
1 = death , 0 = censored (survived)
treatment, 1 = patient on experimental protocol
0 = patient on standard protocol
patient age (years)
gender, 0=?, 1=? (Cleve’s did not say)
fatty liver disease, 1 = present, 0 = absent (from liver biopsy within
24 hours of treatment allocation
baseline blood ammonia level (mg/dl)
baseline serum level of aspartate transaminase (SGOT) (I.U.)
Chapter 5-23 (revision 16 May 2010)
p. 11
Reading in the data,
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\reyeswithdates.dta",
clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do-files"
use reyeswithdates, clear
Checking the variable types,
describe
Contains data from reyeswithdates.dta
obs:
150
vars:
10
27 Apr 2006 11:50
size:
4,950 (99.9% of memory free)
------------------------------------------------------------------------------storage display
value
variable name
type
format
label
variable label
------------------------------------------------------------------------------id
int
%8.0g
begindate
str8
%9s
enddate
str8
%9s
dead
byte
%8.0g
treat
byte
%8.0g
age
byte
%8.0g
sex
byte
%8.0g
ftliver
byte
%8.0g
ammonia
float %9.0g
sgot
int
%8.0g
------------------------------------------------------------------------------Sorted by:
We discover that the two date variables have storage type “str8”, meaning they are strings of up
to 8 text characters, or string variables. The other variables have storage types of int, byte, and
float, all of which are numeric variables.
Before dates can be used in Stata, such as computing the number of days between them, they
have to be converted to “date” variables.
If we look at the date using Stata’s browser, we will see dates of the form “7/2/01”. To create
two “date” variables, we use,
capture drop begindate2 enddate2
gen begindate2 = date(begindate,"md20y")
gen enddate2 = date(enddate,"md20y")
*
capture drop begindate2 enddate2
gen begindate2 = date(begindate,"MD20Y")
gen enddate2 = date(enddate,"MD20Y")
Chapter 5-23 (revision 16 May 2010)
// Stata 10
// Stata 10
// Stata 11
// Stata 11
p. 12
The second argument of the date function, the “md20y” part, informs Stata that the dates are
month/day/year. If the dates where of the form “7/2/2001”, or “7/2/1991”, it would be sufficient
to use “mdy” as the second argument. Since the years are only two digits, Stata requires us to
inform it whether the dates are for the 1900s or 2000s. The “20y” informs Stata to put “20” in
front of the two digits of the date. (Note that in Stata version 10, the “mdy” are lowercase, but
in Stata version 11, these must be uppercase “MDY”.)
If we look at the data now, using the data browser, we will see the new date variables are of the
form, “15158”. Statistical software uses what are called “elapsed dates”, which in Stata in the
number of days since January 1, 1960.
To make the dates appear in the format “02jul2001”, use
format begindate2 enddate2 %d
The “%” indices that a format specification follows, which in this case is “d” for date”.
Arithmetic can be done on date variables, so we can finally compute the days of follow-up. Do
this using,
capture drop days
gen days = enddate2 - begindate2
To check our work, we look at a few dates, using
list begindate enddate begindate2 enddate2 days in 1/5 , abbrev(15)
1.
2.
3.
4.
5.
+-----------------------------------------------------+
| begindate
enddate
begindate2
enddate2
days |
|-----------------------------------------------------|
|
7/2/01
7/10/01
02jul2001
10jul2001
8 |
|
7/2/01
9/22/01
02jul2001
22sep2001
82 |
|
7/3/01
7/22/01
03jul2001
22jul2001
19 |
|
7/4/01
8/19/01
04jul2001
19aug2001
46 |
|
7/5/01
8/7/01
05jul2001
07aug2001
33 |
+-----------------------------------------------------+
To get the means, standard deviations, percents, etc., for the potential covariates for the Cox
regression model, which we might use for a “Table 1. Patient Characteristics” table in our
manuscript, use,
ttest age , by(treat)
ttest ammonia , by(treat) unequal
ttest sgot , by(treat)
tab sex treat, expect
tab sex treat, col exact
tab ftliver treat, expect
tab ftliver treat, col chi2
Notice we asked for an “unequal” variance t test for amonia, because we noticed the standard
deviation was twice as large in one group (pretending we ran the command the first time), so the
equal variances assumption was suspect. Actually, our sample size is large enough that this
assumption is not critical, so the ordinary equal variance t test would probably be just fine. We
Chapter 5-23 (revision 16 May 2010)
p. 13
also used the “expect” option to get the expected frequencies for the crosstabulations, and then
commented it out so that we don’t get confused later thinking these are column frequencies.
When at least one expected frequency, for a 2 × 2 table, was < 5, we used Fisher’s exact test;
otherwise we used the chi-square test (this minimum expected frequency rule is in Chapter 2-3,
p.18).
. ttest age , by(treat)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
71
12.49296
.2268437
1.911419
12.04053
12.94538
1 |
79
12.29114
.2125195
1.888914
11.86805
12.71423
---------+-------------------------------------------------------------------combined |
150
12.38667
.1547999
1.895904
12.08078
12.69255
---------+-------------------------------------------------------------------diff |
.2018185
.3106441
-.4120523
.8156893
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t =
0.6497
Ho: diff = 0
degrees of freedom =
148
Ha: diff < 0
Pr(T < t) = 0.7415
Ha: diff != 0
Pr(|T| > |t|) = 0.5169
Ha: diff > 0
Pr(T > t) = 0.2585
. ttest ammonia , by(treat) unequal
Two-sample t test with unequal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
71
2.683099
.3921769
3.304542
1.900926
3.465271
1 |
79
5.143038
.7920279
7.039698
3.566231
6.719844
---------+-------------------------------------------------------------------combined |
150
3.978667
.4661303
5.708907
3.057587
4.899746
---------+-------------------------------------------------------------------diff |
-2.459939
.8838049
-4.210859
-.7090201
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -2.7834
Ho: diff = 0
Satterthwaite's degrees of freedom = 113.345
Ha: diff < 0
Pr(T < t) = 0.0032
Ha: diff != 0
Pr(|T| > |t|) = 0.0063
Ha: diff > 0
Pr(T > t) = 0.9968
. ttest sgot , by(treat)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
71
279.9155
.962422
8.109512
277.996
281.835
1 |
79
283.2911
1.04432
9.282119
281.2121
285.3702
---------+-------------------------------------------------------------------combined |
150
281.6933
.7250671
8.880222
280.2606
283.1261
---------+-------------------------------------------------------------------diff |
-3.375646
1.430435
-6.202361
-.5489318
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -2.3599
Ho: diff = 0
degrees of freedom =
148
Ha: diff < 0
Pr(T < t) = 0.0098
Chapter 5-23 (revision 16 May 2010)
Ha: diff != 0
Pr(|T| > |t|) = 0.0196
Ha: diff > 0
Pr(T > t) = 0.9902
p. 14
. tab sex treat, col exact
|
treat
sex |
0
1 |
Total
-----------+----------------------+---------0 |
3
0 |
3
|
4.23
0.00 |
2.00
-----------+----------------------+---------1 |
68
79 |
147
|
95.77
100.00 |
98.00
-----------+----------------------+---------Total |
71
79 |
150
|
100.00
100.00 |
100.00
Fisher's exact =
1-sided Fisher's exact =
0.104
0.104
. tab ftliver treat, col chi2
|
treat
ftliver |
0
1 |
Total
-----------+----------------------+---------0 |
62
59 |
121
|
87.32
74.68 |
80.67
-----------+----------------------+---------1 |
9
20 |
29
|
12.68
25.32 |
19.33
-----------+----------------------+---------Total |
71
79 |
150
|
100.00
100.00 |
100.00
Pearson chi2(1) =
3.8310
Pr = 0.050
These data are from a randomized trial. Notice randomization achieved balance on age and sex,
but it did not achieve balance on ammonia, sgot, and ftliver. With these variables reported in a
Table 1, the reader will question the randomization procedure so that should be discussed. The
reader will also expect to see ammonia, sgot, and ftliver included in the final Cox model, or at
least a statement that it was included and was dropped due to lack of significance or was dropped
after determining it was not a confounder.
To analyze these data using the survival analysis procedures, including Cox regression, we first
“stset” the data, informing Stata which variable is the follow-up time and which variable is the
event outcome. Use,
stset days , failure(dead==1)
Next, try looking at the hazard function using Kaplan-Meier estimates,
ltable days dead, by(treat) hazard
This will give a table that is too long to be useful. Next, collapse days into two-week intervals,
ltable days dead, by(treat) hazard intervals(14)
Beg.
Cum.
Std.
Std.
Interval
Total
Failure
Error
Hazard
Error
[95% Conf. Int.]
-------------------------------------------------------------------------------
Chapter 5-23 (revision 16 May 2010)
p. 15
treat 0
0
14
71
0.0851 0.0332
0.0063
0.0026
0.0013
0.0114
14
28
64
0.2833 0.0552
0.0174
0.0048
0.0080
0.0267
28
42
43
0.4303 0.0638
0.0163
0.0057
0.0051
0.0276
42
56
27
0.5363 0.0706
0.0147
0.0073
0.0004
0.0289
56
70
12
0.6754 0.0834
0.0252
0.0143
0.0000
0.0533
70
84
5
0.7836 0.1044
0.0286
0.0280
0.0000
0.0834
treat 1
0
14
79
0.1282 0.0379
0.0098
0.0031
0.0037
0.0158
14
28
67
0.2683 0.0533
0.0125
0.0041
0.0044
0.0206
28
42
36
0.2931 0.0570
0.0025
0.0025
0.0000
0.0073
42
56
22
0.3360 0.0677
0.0045
0.0045
0.0000
0.0132
56
70
10
0.4245 0.1012
0.0102
0.0102
0.0000
0.0302
70
84
4
0.6547 0.1884
0.0357
0.0346
0.0000
0.1035
-------------------------------------------------------------------------------
Looking at either the “Cumulative Failure” or the “Hazard” columns, the study treatment does
not appear to have any effect until after the first month.
Looking at a Kaplan-Meier cumulative survival graph,
sts graph , by(treat)
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier survival estimates, by treat
0
20
40
analysis time
treat = 0
Chapter 5-23 (revision 16 May 2010)
60
80
treat = 1
p. 16
Looking at a Kaplan-Meier cumulative hazard graph,
sts graph , by(treat) failure
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier failure estimates, by treat
0
20
40
analysis time
treat = 0
60
80
treat = 1
Which of these two graphs we would choose to publish depends on what is a more natural
presentation. Do we want to make statements about the treatment improving survival (survival
graph) or reducing mortality (failure graph)?
The failure graph is more aligned with the hazard ratio from Cox regression, which gives it a
particular intuitive appeal. If the study treatment is effective, the HR < 1, which corresponds to
the cumulative hazard line for study treatment being drawn below the cumulative hazard line for
the standard treatment.
This particular graph will make some readers uncomfortable, since the graphs are not
proportionally separated along the range of the follow-up time. It appears the proportional
hazards assumption is not met. We see no treatment effect for the first month, after which the
drug is providing a protective effect against death. Perhaps this is due, at least in part, to the
sicker patients ending up in the treatment group, which maybe offsets any early protective effect,
or perhaps it just takes a few weeks for the treatment effect to be discernible.
Although not sufficient to test the study hypothesis, given that we have suspected confounding
due to the imbalance of baseline covariates, we can compare the treatments using the log-rank
test survival test by,
Chapter 5-23 (revision 16 May 2010)
p. 17
sts test treat
failure _d:
analysis time _t:
dead == 1
days
Log-rank test for equality of survivor functions
|
Events
Events
treat | observed
expected
------+------------------------0
|
35
29.56
1
|
23
28.44
------+------------------------Total |
58
58.00
chi2(1) =
Pr>chi2 =
2.07
0.1504
The univariable Cox regression gives a p value very similar to the log-rank test. A Cox
regression without covariates is also called the Cox-Mantel test (another of the many survival
analysis tests like the log-rank test). Computing the univariable Cox regression,
stcox treat
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-253.1617
Number of obs
=
150
LR chi2(1)
Prob > chi2
=
=
2.06
0.1508
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.682312
.1833923
-1.42
0.155
.4028993
1.155499
------------------------------------------------------------------------------
At this stage of our analysis, it appears that the treatment effect is not going to be significant.
Chapter 5-23 (revision 16 May 2010)
p. 18
Let’s throw all of the covariates into a multivariable model and see what happens.
stcox treat age sex ftliver ammonia sgot
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-221.1916
Number of obs
=
150
LR chi2(6)
Prob > chi2
=
=
66.00
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.3380707
.1104805
-3.32
0.001
.1781712
.6414721
age |
1.101588
.0834979
1.28
0.202
.9495113
1.278022
sex |
1.002736
.7487444
0.00
0.997
.2320559
4.332915
ftliver |
1.778003
.6144496
1.67
0.096
.9031728
3.500211
ammonia |
1.145347
.0263403
5.90
0.000
1.094867
1.198154
sgot |
1.056477
.0183852
3.16
0.002
1.021051
1.093133
------------------------------------------------------------------------------
Apparently, there was some confounding, since the HR for treatment change by > 10% and
became significant once covariates were included. Confounding can either detract from or
enhance significance.
“drop only if p > 0.20” variable selection rule
A variable might confound a result even if statistical significance is not obtained. It is easy to
imagine this could be the case for a potential confounder with a p = 0.06, for example, so where
should we draw the line? Vittinghoff et al (2005, p.146) support this idea,
“...we do not recommend ‘parsimoniuous’ models that only include predictors that are
statistically significant at P < 0.05 or even stricter criteria, because the potential for
residual confounding in such models is substantial.”
To protect against residual confounding, it has been suggested that potential confounders be
eliminated only if p > 0.20. (Maldonado and Greenland, 1993).
Chapter 5-23 (revision 16 May 2010)
p. 19
For now, let’s retain all predictors but sex (p = 0.997 using the “drop only if p > 0.20” variable
selection rule.
stcox treat age ftliver ammonia sgot
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-221.1916
Number of obs
=
150
LR chi2(5)
Prob > chi2
=
=
66.00
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.3381223
.1095964
-3.35
0.001
.1791315
.6382281
age |
1.101635
.0825134
1.29
0.196
.9512224
1.275832
ftliver |
1.778163
.6129628
1.67
0.095
.9047903
3.494581
ammonia |
1.145348
.0263389
5.90
0.000
1.094871
1.198152
sgot |
1.056471
.0183002
3.17
0.002
1.021205
1.092955
------------------------------------------------------------------------------
In linear regression, the completeness, or goodness of fit, of the model can be assessed with the
multiple R, or multiple R-squared, statistic. In logistic regression, this is popularly done with the
c-statistic. In Cox regression, we use the c-statistic as well. Let’s compute it now, using the
estat command, so we can see can compare the fit of this model with other models we will
derive.
stcox treat age ftliver ammonia sgot
estat concordance
Harrell's C concordance statistic
failure _d:
analysis time _t:
Number
Number
Number
Number
of
of
of
of
dead == 1
days
subjects (N)
=
comparison pairs (P)
=
orderings as expected (E) =
tied predictions (T)
=
150
5545
4358
0
Harrell's C = (E + T/2) / P =
Somers' D =
.7859
.5719
We see that the c-statistic = 0.79 is close to the upper end of the 0.7  c-statistic < 0.8 range for
“acceptable discrimination”, using the Hosmer and Lemeshow (2000, p. 162) rule-of-thumb.
Let’s assess if keeping age provides any benefit, by dropping it and comparing the results to the
previous model.
stcox treat ftliver ammonia sgot
estat concordance
Chapter 5-23 (revision 16 May 2010)
p. 20
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-222.02514
Number of obs
=
150
LR chi2(4)
Prob > chi2
=
=
64.34
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.3522512
.1123675
-3.27
0.001
.188504
.6582401
ftliver |
1.747151
.6099943
1.60
0.110
.8813431
3.463507
ammonia |
1.139031
.0258365
5.74
0.000
1.089502
1.190812
sgot |
1.056876
.0183906
3.18
0.001
1.021438
1.093542
-----------------------------------------------------------------------------. estat concordance
Harrell's C concordance statistic
failure _d:
analysis time _t:
Number
Number
Number
Number
of
of
of
of
dead == 1
days
subjects (N)
=
comparison pairs (P)
=
orderings as expected (E) =
tied predictions (T)
=
150
5545
4351
1
Harrell's C = (E + T/2) / P =
Somers' D =
.7848
.5695
The c-statistic changed from 0.7859 to 0.7848, so age did not help much with the overall
discrimatory ability of the model.
“10% change in estimate” variable selection rule
Confounding is said to be present if the unadjusted effect differs from the effect adjusted for
putative confounders. [Rothman, 1998].
A variable selection rule consistent with this definition of confounding is the change-in-estimate
method of variable selection. In this method, a potential confounder is included in the model if it
changes the coefficient, or effect estimate, of the primary exposure variable (treat in our
example) by 10%. This method has been shown to produce more reliable models than variable
selection methods based on statistical significance [Greenland, 1989].
By dropping age, the HR=0.338 for treat changed to HR=0.352 in the reduced model (a 0.3520.338)/0.338 = 0.04, or 4% relative change). The c-statistic changed from 0.7859 to 0.7849,
hardly at all, suggesting age contributed nothing to the overall goodness of fit. (The c-statistic is
actually only important for models developed for prediction. For the purpose of this example,
which is to test the treatment effect while controlling for confounders, the c-statistic does not
apply. We are considering it only for our own illustration.)
Chapter 5-23 (revision 16 May 2010)
p. 21
Both the statistical significance and the 10% change-in-estimate variable selection methods
suggest dropping age from the model. We do not have to be concerned here with the “drop only
if p > 0.20” variable selection rule, since we have already identified that confounding is not a
problem using the “10% change in estimate” rule.
Let’s assess if keeping ftliver provides any benefit, by dropping it and comparing the results to
the previous model.
stcox treat ammonia sgot
estat concordance
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-223.23418
Number of obs
=
150
LR chi2(3)
Prob > chi2
=
=
61.92
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.3555073
.1127858
-3.26
0.001
.1908983
.6620564
ammonia |
1.155364
.02421
6.89
0.000
1.108874
1.203803
sgot |
1.058267
.0185142
3.24
0.001
1.022595
1.095183
-----------------------------------------------------------------------------. estat concordance
Harrell's C concordance statistic
failure _d:
analysis time _t:
Number
Number
Number
Number
of
of
of
of
dead == 1
days
subjects (N)
=
comparison pairs (P)
=
orderings as expected (E) =
tied predictions (T)
=
150
5545
4340
1
Harrell's C = (E + T/2) / P =
Somers' D =
.7828
.5656
We see the HR=0.352 in the previous model with ftliver increased to HR=0.356, a change of
(0.352-0.356)/0.352 = 0.01, or 1%. The c-statistic, c=0.7848 changed to c=0.7828, essentially no
change at all.
Notice we have used a combination of the statistical significance and change-in-effect rules for
variable selection. Using one or the other, or the combination, are three possible variable
selection strategies.
Chapter 5-23 (revision 16 May 2010)
p. 22
Backwards Variable Selection
To arrive at this “final” model, we have used a variable selection approach known as backward
selection, where one begins with all of the predictor variables of interest, and removes them in
the order of least significant (or a combination of least significant and least clinically relevant).
Backwards selection is considered superior to forwards selection (forward selection addes one
variable at a time), because negatively confounded sets of variables are less likely to be omitted
from the model (Sun et al, 1999), since the complete set is included in the initial model. In
contrast, forward and stepwise (stepwise is where variables can be added and subsequently
removed) selection procedures will only include such sets if at least one memeber meets the
inclusion criterion in the absence of the others. (Vittinghoff et al, 2005, p.151).
By “negatively confounded sets” , we are referring to the situation where two or more variables
must be included in the model as a set to control for confounding. When one of the variables is
dropped, confounding increases.
Should we check the model assumptions?
It is tempting to publish without bothering to check the model assumptions, because most papers
reporting Cox regression models do not mention having checked the model assumptions in their
Statistical Methods section. The same can be said for all types of regression models.
Devereaux et al (2006), for example, published a Cox regression analysis in JAMA without any
mention of checking the model assumptions. In fact, most authors do not mention checking the
assumptions, including the proportional hazards assumption.
In this tutorial, we will compare our “final model” to models where violated assumptions are
recognized and adjusted for, so that we can see what affect violated assumptions have on the
model results.
Chapter 5-23 (revision 16 May 2010)
p. 23
Although this is something we should have done at the very beginning, let’s now see how many
deaths occurred in this sample, so we can determine how many variables we can test for model
inclusion, without introducing “overfitting”.
tab dead
dead |
Freq.
Percent
Cum.
------------+----------------------------------0 |
92
61.33
61.33
1 |
58
38.67
100.00
------------+----------------------------------Total |
150
100.00
We see that there are 58 dead events. Therefore, we can model 58/10 = 5.8 variables, let’s say 6,
without introducing overfitting. Strictly applying this rule, we should only consider a total of 6
variables in our variable selection exercise, not just limiting the final model to 6 variables.
We have 6 predictor variables in our dataset. If we take any one of these variables and convert it
to quartiles, which would requires 3 dummy variables to model, this would count as 3 variables
rather than one.
Actually, a 58/5 = 11.6 variables is okay, since it has lately been shown that 5 events per
predictor are as good as 10 events per predictor when the goal is to test the effect of a primary
predictor while controlling for confounding (see “overfitting” in the sample size chapter, Chapter
2-5, p.30).
In logistic regression, it is assumed that the odds increase exponentially across the range of the
predictor. In Cox regression, it is assumed that the hazard increases exponentially across the
range of the predictor. One way to check this is to first convert a continuous predictor to
quartiles, or some other quantile. Let’s do this for our three continuous predictors.
xtile age4 = age , nq(4)
tabstat age, stat ( count min max ) by(age4) nototal col(stat)
xtile ammonia4 = ammonia , nq(4)
tabstat ammonia, stat ( count min max ) by(ammonia4) nototal col(stat)
xtile sgot4 = sgot , nq(4)
tabstat sgot, stat ( count min max ) by(sgot4) nototal col(stat)
. tabstat age, stat ( count min max ) by(age4) nototal col(stat)
Summary for variables: age
by categories of: age4 (4 quantiles of age )
age4 |
N
min
max
---------+-----------------------------1 |
48
9
11
2 |
63
12
13
3 |
14
14
14
4 |
25
15
17
---------------------------------------. xtile ammonia4 = ammonia , nq(4)
. tabstat ammonia, stat ( count min max ) by(ammonia4) nototal col(stat)
Chapter 5-23 (revision 16 May 2010)
p. 24
Summary for variables: ammonia
by categories of: ammonia4 (4 quantiles of ammonia )
ammonia4 |
N
min
max
---------+-----------------------------1 |
42
.2
.6
2 |
34
.7
1.2
3 |
37
1.3
5.3
4 |
37
5.4
27.8
---------------------------------------. xtile sgot4 = sgot , nq(4)
. tabstat sgot, stat ( count min max ) by(sgot4) nototal col(stat)
Summary for variables: sgot
by categories of: sgot4 (4 quantiles of sgot )
sgot4 |
N
min
max
---------+-----------------------------1 |
41
263
276
2 |
37
277
281
3 |
36
282
288
4 |
36
289
306
----------------------------------------
The xtile command created a new variable with 4 categories, each category representing 25% of
the variable, as best it could. The tabstat command verified the variable was set up correctly, and
also tells us what the range of the variable is composing each quartile.
Next we model the quartiles very easily using the “xi” (generate indictor variables) facility. We
will include treat in each model, as well, since we know we want treat in our final model.
xi: stcox treat i.age4
// Stata version 10
xi: stcox treat i.ammonia4 // Stata version 10
xi: stcox treat i.sgot4
// Stata version 10
In Stata version 11, we can use “ib1”, which means generate indicator variables behind the
scenes, using the first category as the baseline, or referent category,
stcox treat ib1.age4 // Stata version 11
stcox treat ib1.ammonia4 // Stata version 11
stcox treat ib1.sgot4
// Stata version 11
Chapter 5-23 (revision 16 May 2010)
p. 25
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-252.7307
Number of obs
=
150
LR chi2(4)
Prob > chi2
=
=
2.93
0.5702
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.6782666
.1823941
-1.44
0.149
.4004074
1.148944
|
age4 |
2 |
1.224904
.3872601
0.64
0.521
.6591589
2.276219
3 |
1.01796
.5181894
0.03
0.972
.375344
2.760783
4 |
1.384599
.5396833
0.83
0.404
.6449801
2.972363
-----------------------------------------------------------------------------Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-225.9478
Number of obs
=
150
LR chi2(4)
Prob > chi2
=
=
56.49
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.7238032
.2030522
-1.15
0.249
.4176655
1.254332
|
ammonia4 |
2 |
2.065668
1.097025
1.37
0.172
.7294713
5.849419
3 |
5.955138
2.902689
3.66
0.000
2.290836
15.48067
4 |
18.29635
8.854006
6.01
0.000
7.086789
47.23671
-----------------------------------------------------------------------------. stcox treat ib1.sgot4
// Stata version 11
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-245.52822
Number of obs
=
150
LR chi2(4)
Prob > chi2
=
=
17.33
0.0017
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.5737291
.1582266
-2.01
0.044
.3341621
.9850462
|
sgot4 |
2 |
2.367308
1.13604
1.80
0.073
.9242179
6.063664
3 |
2.887154
1.423618
2.15
0.032
1.098382
7.58903
4 |
5.128798
2.404224
3.49
0.000
2.046436
12.85384
------------------------------------------------------------------------------
Looking at age, we see that no quartile is significant, just as the continuous variable was not
significant. The hazard ratios for the quartiles are 1.0 (the 1st quartile, which is the referent), 1.2,
1.0, and 1.4. This is not an exponential increase, but we don’t care because we will drop age out
of the model as being not significant.
Chapter 5-23 (revision 16 May 2010)
p. 26
Looking at ammonia, we see hazard ratios of 1.0, 2.1, 6.0, and 18.3. This is probably close
enough to an exponential increase that modeling the variable as a continuous variable would not
be a problem.
Looking at SGOT, we see hazard ratios of 1.0, 2.4, 2.9, and 5.1. This is probably close enough to
an exponential increase that modeling the variable as a continuous variable would not be a
problem.
Another way to make the determination made above is to compare the effect estimate of treat,
when moding the predictor either as a continuous variable or as quantiles. Obtaining the models
with these predictors included as continuous variables,
stcox treat age
stcox treat ammonia
stcox treat sgot
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-253.06596
Number of obs
=
150
LR chi2(2)
Prob > chi2
=
=
2.26
0.3237
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.6828319
.1835354
-1.42
0.156
.4032024
1.156391
age |
1.031866
.0739947
0.44
0.662
.8965695
1.187579
-----------------------------------------------------------------------------Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-228.55595
Number of obs
=
150
LR chi2(2)
Prob > chi2
=
=
51.28
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.433673
.1309922
-2.77
0.006
.2399134
.7839172
ammonia |
1.168163
.0230618
7.87
0.000
1.123826
1.214249
-----------------------------------------------------------------------------Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-242.91484
Number of obs
=
150
LR chi2(2)
Prob > chi2
=
=
22.56
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.5262863
.147063
-2.30
0.022
.304345
.9100765
sgot |
1.083774
.0199286
4.38
0.000
1.04541
1.123545
------------------------------------------------------------------------------
Chapter 5-23 (revision 16 May 2010)
p. 27
We get treat HR = .68 for age included as quartiles, and treat HR = 0.68 for age included as
continuous.
We get treat HR = .72 for ammonia included as quartiles, and treat HR = 0.43 for ammonia
included as continuous.
We get treat HR = .57 for sgot included as quartiles, and treat HR = 0.53 for sgot included as
continuous.
It appears that the HR for treat depends on how to choose to model ammonia, since 0.72 is very
different from 0.43.
We could consider smaller quantiles, such as quintiles (10 categories) but we would run into an
overfitting problem.
The big difference is probably due to the range of the quartile. Above, we found the ranges of
ammonia for the four quartiles to be 0.2-0.6, 0.7-1.2, 1.3-5.3, and 5.4-27.8. The fourth quartile is
just too wide.
There is another way to determine the best function form of a predictor variable.
We can use martingale residuals, obtained by specifying the mgale( ) option when fitting the Cox
model. These residuals can be interpreted simply as the difference between the observed number
of failures in the data and the number of failures predicted by the model. (Cleves et al, 2004,
p.186-187). We begin by fitting a Cox model without predictor variables, which models the
baseline hazard, to create a variable containing the martingale residuals. The “estimate” option is
required when fitting a model without covariates (called the null model).
capture drop mgresid
stcox , mgale(mgresid) estimate
Next, we separately plot each predictor, whose functional form we are interested in determining,
against mgresid. We use the lowess (locally weighted regression) smoother to get a more easily
interpreted graph. Run each of the lowess graphs separately
lowess mgresid age
lowess mgresid ammonia
lowess mgresid sgot
Chapter 5-23 (revision 16 May 2010)
p. 28
0
-.5
-1.5
-1
martingale
.5
1
Lowess smoother
0
10
20
30
ammonia
bandwidth = .8
0
-.5
-1
-1.5
martingale
.5
1
Lowess smoother
260
270
280
290
300
310
sgot
bandwidth = .8
If the graph for a predictor variable appears approximately linear, as it did with age and sgot, the
variable can be included as a continuous variable using its original scale.
For the ammonia variable, we see that the graph deviates from a linear relationship at the lower
values of ammonia. Since this graph has the shape of a logarithm function, a log transform
should make it linear.
Chapter 5-23 (revision 16 May 2010)
p. 29
Creating a logged transformed variable of ammonia, and regraphing
gen lnammonia = ln(ammonia)
lowess mgresid lnammonia
0
-.5
-1
-1.5
martingale
.5
1
Lowess smoother
-2
-1
0
1
lnammonia
2
3
bandwidth = .8
This is a more linear relationship.
Fitting a model with the transformed variable
stcox treat lnammonia
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-222.58415
Number of obs
=
150
LR chi2(2)
Prob > chi2
=
=
63.22
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.5957854
.1659649
-1.86
0.063
.3451238
1.028501
lnammonia |
2.518588
.3149692
7.39
0.000
1.971096
3.218152
------------------------------------------------------------------------------
Chapter 5-23 (revision 16 May 2010)
p. 30
Our three different models gave us
Ammonia
continuous
quartiles
log transformed
treat effect
HR = 0.43 , p = 0.006
HR = 0.72 , p = 0.249
HR = 0.60 , p = 0.063
which are very different results. With ammonia modeled as continuous in its original scale, we
get the biggest and most significant effect or treat, but we might be getting fooled. Quartiles
were no fun at all, since the effect went away, but this might be because the fourth quartile had
too wide of a range. The log-transformed variable seems like a good compromise.
Let’s see if using the c-statistic helps determine which is the best fit.
stcox treat ammonia
estat concordance
xi: stcox treat i.ammonia4
estat concordance
stcox treat lnammonia
estat concordance
Cox regression -- Breslow method for ties
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.433673
.1309922
-2.77
0.006
.2399134
.7839172
ammonia |
1.168163
.0230618
7.87
0.000
1.123826
1.214249
-----------------------------------------------------------------------------Harrell's C concordance statistic
Harrell's C = (E + T/2) / P =
.7737
Cox regression -- Breslow method for ties
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.7238032
.2030522
-1.15
0.249
.4176655
1.254332
_Iammonia4_2 |
2.065668
1.097025
1.37
0.172
.7294713
5.849419
_Iammonia4_3 |
5.955138
2.902689
3.66
0.000
2.290836
15.48067
_Iammonia4_4 |
18.29635
8.854006
6.01
0.000
7.086789
47.23671
-----------------------------------------------------------------------------Harrell's C concordance statistic
Harrell's C = (E + T/2) / P =
Somers' D =
.7788
.5576
Cox regression -- Breslow method for ties
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.5957854
.1659649
-1.86
0.063
.3451238
1.028501
lnammonia |
2.518588
.3149692
7.39
0.000
1.971096
3.218152
-----------------------------------------------------------------------------Harrell's C concordance statistic
Harrell's C = (E + T/2) / P =
Chapter 5-23 (revision 16 May 2010)
.7959
p. 31
The three different models give us
Ammonia
continuous
quartiles
log transformed
treat effect
HR = 0.43 , p = 0.006
HR = 0.72 , p = 0.249
HR = 0.60 , p = 0.063
c statistic
0.77
0.78
0.80
so the log-transformed model provides the best goodness of fit.
Let’s do our backwards elimination variable selection again, this time with the log transformed
ammonia variable, along with a c-statistic for the final model
stcox treat age ftliver lnammonia sgot
stcox treat ftliver lnammonia sgot
estat concordance
Cox regression -- Breslow method for ties
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.4875879
.142898
-2.45
0.014
.2745303
.8659954
age |
1.047432
.075271
0.64
0.519
.9098217
1.205855
ftliver |
2.106065
.677367
2.32
0.021
1.121251
3.955861
lnammonia |
2.187477
.2828102
6.05
0.000
1.697833
2.818331
sgot |
1.047659
.0182932
2.67
0.008
1.012412
1.084134
-----------------------------------------------------------------------------Cox regression -- Breslow method for ties
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.4924155
.1444691
-2.41
0.016
.2770759
.8751139
ftliver |
2.078928
.6711683
2.27
0.023
1.104166
3.914212
lnammonia |
2.174083
.2798651
6.03
0.000
1.689284
2.798012
sgot |
1.048324
.018281
2.71
0.007
1.013099
1.084774
-----------------------------------------------------------------------------Harrell's C concordance statistic
Harrell's C = (E + T/2) / P =
.8183
Now that we have correctly specified ammonia, by log transformating, the ftliver predictor
remains significant, whereas it was dropped from the model when ammonia was not log
transformed.
Chapter 5-23 (revision 16 May 2010)
p. 32
Let’s see if ftliver is a confounder, using the 10% rule.
stcox treat ftliver lnammonia sgot
stcox treat lnammonia sgot
Cox regression -- Breslow method for ties
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.4924155
.1444691
-2.41
0.016
.2770759
.8751139
ftliver |
2.078928
.6711683
2.27
0.023
1.104166
3.914212
lnammonia |
2.174083
.2798651
6.03
0.000
1.689284
2.798012
sgot |
1.048324
.018281
2.71
0.007
1.013099
1.084774
-----------------------------------------------------------------------------Cox regression -- Breslow method for ties
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.5152024
.1481634
-2.31
0.021
.2932154
.9052509
lnammonia |
2.321445
.2980431
6.56
0.000
1.804992
2.985668
sgot |
1.050645
.0187338
2.77
0.006
1.014561
1.088012
------------------------------------------------------------------------------
Dropping ftliver, the effect for treat changes by (0.492-0.515)/0.492=0.047, or 4.7%. We could
conclude that ftliver is not a confounder, by the 10% rule, and drop it from the model anyway.
However, the three putative confounders, ftliver, ammonia, and sgot, have been shown to be
important predictors of the final outcome of children with Reye’s syndrome (Cleves et al, 2004,
p.186). Along with the treated and untreated groups being imbalanced on each of these three
variables, the reader will feel much more comfortable with our final model if all three variables
are included. That is, by the “association definition” of a confounder, where each of these
putative confounders is associated with both the treatment exposure and the death outcome, these
three variables can be considered confounders, and should be in the final model.
In other words, it is useful to retain the variables to provide “face validity”.
Chapter 5-23 (revision 16 May 2010)
p. 33
Proportional Hazards Assumption
Next we will check the proportional hazards assumption.
The Cox proportional hazards (Cox PH) model has the form:
k
h(t , X 1 ,..., X k )  h0 (t ) exp( i X i )
i 1
where h0 (t ) is the baseline hazard, and
X 1 ,...., X k are the predictor variables.
The model predicts an individual’s hazard for the event, based on multiplying the baseline hazard
at time t (t being the individual’s follow-up time) by the individuals linear predictor. The linear
predictor is the sum of regression weights, or betas, multiplied by the individual’s values for the
predictor variables.
It was demonstrated in Chapter 5-7 of this course manual that the hazard ratio represents the
time-specific risk ratios, being a type of pooled estimate (or weighted average) across the
individual time strata. As taught in epidemiology courses, the Mantel-Haenszel pooled estimate
of risk ratios, odds ratios, or rate ratios across strata assumes homogeneity of these stratumspecific estimates in order for a single pooled estimate to represent what is happening in the
individual strata. In an anologous fashion, in Cox regression, it is assumed that the hazard ratio
at each time point is homogeneous in order for the single HR provided by the Cox model to be a
good estimate of what is happening at each follow-up time. This assumption is called the
proportional hazards assumption.
Since the hazard ratio (HR) is a single number that summarizes the hazard for all follow-up
times, it can only be a good estimate if the hazard remains constant across the range of follow-up
times. When looking at a cumulative hazard graph, such as the one computed above, at any point
of follow-up time (the X axis), the ratio of the value of the cumulative hazard (the Y axis) for the
two curves should have the same value (said to be “proportional hazards”). This ratio is the HR.
Chapter 5-23 (revision 16 May 2010)
p. 34
Recall that the graph of the univarible analysis, duplicated here, looks like it might not meet the
proportional hazards assumption, because it does not separate during the first 30 days of
followup, whereas there is wide separation at 50 days.
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier failure estimates, by treat
0
20
40
analysis time
treat = 0
60
80
treat = 1
Let’s test the proportional hazards assumption for this univariable model, using
stcox treat
estat phtest, detail
Cox regression -- Breslow method for ties
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.682312
.1833923
-1.42
0.155
.4028993
1.155499
-----------------------------------------------------------------------------. estat phtest, detail
Test of proportional-hazards assumption
Time: Time
---------------------------------------------------------------|
rho
chi2
df
Prob>chi2
------------+--------------------------------------------------treat
|
-0.22228
2.78
1
0.0956
------------+--------------------------------------------------global test |
2.78
1
0.0956
----------------------------------------------------------------
The significance test based on the Schoenfeld residuals (Grambsch and Therneau, 1994) is not
significant, suggesting the assumption is sufficiently met.
The significance test for the PH hazard assumption was marginally significant (p = 0.096), so we
should verify the assumption graphically as well.
Chapter 5-23 (revision 16 May 2010)
p. 35
We do this using the log-log graph, by
0
1
2
3
4
stphplot, by(treat)
0
1
2
ln(analysis time)
treat = 0
3
4
treat = 1
In this graph, the proportional hazards assumption is met if the graphs are approximately linear.
Since the graphs clearly cross at about 18 days [ln(time)=3, so exp(ln(time)=exp(3)=18 days], the
proportional hazards assumption is not met. This is the same place the graphs cross, although
slightly, on the cumulative hazard graph shown on the previous page.
In the univariable model, then, the HR=0.68 is not a good estimate of the effect, since it does not
hold for the whole range of follow-up time.
Since the univariable is not the model we will be using to test the study hypothesis, we can
ignore that the proportional hazards (PH) assumption does not hold for that model. Testing the
PH assumption for our final multivariable model,
stcox treat ftliver lnammonia sgot
estat phtest, detail
Chapter 5-23 (revision 16 May 2010)
p. 36
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-216.25577
Number of obs
=
150
LR chi2(4)
Prob > chi2
=
=
75.88
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.4924155
.1444691
-2.41
0.016
.2770759
.8751139
ftliver |
2.078928
.6711683
2.27
0.023
1.104166
3.914212
lnammonia |
2.174083
.2798651
6.03
0.000
1.689284
2.798012
sgot |
1.048324
.018281
2.71
0.007
1.013099
1.084774
-----------------------------------------------------------------------------. estat phtest, detail
Test of proportional-hazards assumption
Time: Time
---------------------------------------------------------------|
rho
chi2
df
Prob>chi2
------------+--------------------------------------------------treat
|
0.00580
0.00
1
0.9657
ftliver
|
-0.34388
5.13
1
0.0236
lnammonia
|
0.08187
0.38
1
0.5394
sgot
|
-0.02179
0.03
1
0.8667
------------+--------------------------------------------------global test |
5.25
4
0.2628
----------------------------------------------------------------
We see that the PH assumption was met for the model overall (p=0.263), but not for predictor
ftliver. Interestingly, it is now met very nicely for treat.
The simplest approach to dealing with a violation of the PH assumption, as long as the variable
in violation is not our primary exposure variable, is to stratify the Cox model on that variable. If
the variable is continuous, we first must convert it to an ordered categorical variable, with say 5
categories (quintiles). Since ftliver is already categorical, we do not need to categorize it.
Stratifying the Cox model by ftliver, and again checking the PH assumption,
capture drop sch* sca*
stcox treat lnammonia sgot, strata(ftliver) schoenfeld(sch*) scal(sca*)
stphtest, detail
Chapter 5-23 (revision 16 May 2010)
p. 37
Stratified Cox regr. -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
150
58
4971
-191.27316
Number of obs
=
150
LR chi2(3)
Prob > chi2
=
=
51.91
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.4668395
.1374445
-2.59
0.010
.2621568
.8313312
lnammonia |
2.063892
.2678633
5.58
0.000
1.600344
2.661709
sgot |
1.041826
.0181285
2.35
0.019
1.006894
1.07797
-----------------------------------------------------------------------------Stratified by ftliver
. estat phtest, detail
Test of proportional-hazards assumption
Time: Time
---------------------------------------------------------------|
rho
chi2
df
Prob>chi2
------------+--------------------------------------------------treat
|
-0.02878
0.05
1
0.8308
lnammonia
|
0.05933
0.19
1
0.6594
sgot
|
-0.04538
0.12
1
0.7277
------------+--------------------------------------------------global test |
0.34
3
0.9522
----------------------------------------------------------------
We see that the PH assumption is now met both globally (the overall model) and individually for
each predictor variable.
This model, with three predictor variables and one stratification variable, is the final model we
would report in our article.
A popular way to report a final model is to just show the HR, Confidence Interval, and p value
for the primary exposure variable (treat in our case) and then report what was controlled for in a
footnote. Look at Table 2 of the Devereux et al (2004) paper for an example of this. In our
footnote, we would state, “Adjusted for log-transformed ammonia and SGOT as predictor
variables, and stratified by fatty liver disease since this variable did not meet the proportional
hazards assumption.”
Finally, we want to be able to report a Kaplan-Meier graph, but if we report the graph produced
above, our paper will lose credibility. That is, many readers will be nervous about the lack of
proportional hazards in the univariable graph, even if we make efforts to correct for it in the
multivariable model. This occurs because most readers with familiarity with Cox regression are
not trained on how to deal with violations of proportional hazards, although they are usually
trained to look for violations of proportional hazards in the Kaplan-Meier graph.
To get a Kaplan-Meier graph that is adjusted for covariates, we might try
sts graph , by(treat) failure adjust(lnammonia sgot ftliver)
Chapter 5-23 (revision 16 May 2010)
p. 38
0.00
0.25
0.50
0.75
1.00
Failure functions, by treat
adjusted for lnammonia sgot ftliver
0
20
40
analysis time
treat = 0
60
80
treat = 1
This graph is a disaster. If we tried adjusting for each of the three covariates one at a time, we
would discover the problem comes from the variable sgot. This could possibly be due to sgot
having a range of 263 to 306. The graph, by default, holds the covariates at a value of zero, an
extreme extrapolation from the actual values of sgot.
Chapter 5-23 (revision 16 May 2010)
p. 39
To resolve this problem, we can use mean centering. That is, we first subtract the mean from
each of the continuous covariates, and then adjust for these mean-centered variables. The zero
value then represents the mean in such a variable. For dichotomous variables, such as ftliver, the
variable is held at 0, or “no exposure”. [It turns out if we used the mean-centered values in the
Cox regression model, the effect estimates would be unchanged.]
Computing mean centered variables, by using “r(mean)” after the “sum” command, which is
where Stata returns the mean, and then including these new variables in the graph,
sgot
sgotcen = sgot - r(mean)
lnammonia
lnammcen = lnammonia - r(mean)
graph , by(treat) failure adjustfor(sgotcen lnammcen ftliver)
Failure functions, by treat
adjusted for sgotcen lnammcen ftliver
0.00
0.25
0.50
0.75
1.00
sum
gen
sum
gen
sts
0
20
40
analysis time
treat = 0
60
80
treat = 1
This is the Kaplan-Meier graph we would report in our article. In the figure legend, we could
state, “This graph displays the Kaplan-Meier cumulative hazard, after adjusting for the same
covariates used in the final multivariable model, with ammonia and SGOT held constant at their
mean value, and fatty liver disease held constant at absence of disease.”
Although we stratified on ftliver in our Cox model, we can just still include it in the “adjustfor()”
option, since only one strata is used in the adjustment anyway (which is the same thing as
stratifying on it.)
Chapter 5-23 (revision 16 May 2010)
p. 40
References
Blosseld H-P, Hamerle A, Mayer KU. (1989). Event History Analysis: Statistical Theory and
Application in the Social Sciences. Hillsdale NJ, Lawrence Erlbaum Associates.
Choudhury JB. (2002). Non-parametric confidence interval estimation for competing risks
analysis: application to contraceptive data. Statist. Med. 21:1129-1144.
Cleves MA, Gould WW, Gutierrez RG. An Introduction to Survival Analysis Using Stata.
Revised edition. College Station, TX, Strata Press, 2004.
Coviello V, Boggess M. (2004). Cumulative incidence estimation in the presence of competing
risks. The Stata Journal 4(2):103-112.
Devereux RB, Wachtell K, Gerdts E, el al. Prognostic significance of left ventricular mass
change during treatment of hypertension. JAMA 292(19):2350-2356.
Freireich EO et al. (1963). The effect of 6-mercaptopmine on the duration of steroid induced
remission in acute leukemia. Blood 21:699-716.
Grambsch PM, Therneau TM. (1994). Proportional hazards tests and diagnostics based on
weighted residuals. Biometrika 81:515-526.
Greenland S. (editor) (1987). Evolution of Epidemiologic Ideas: Annotated Readings on
Concepts and Methods. Chestnut Hill, Massachusetts.
Harrell FE, Califf RM, Pryor DB, et al. (1982). Evaluating the yield of medical tests. JAMA
247(18):2543-2546.
Harrell Jr FE. (2001). Regression Modeling Strategies With Applications to Linear Models,
Logistic Regression, and Survival Analysis. New York, Springer-Verlag.
Hosmer DW, Lemeshow S. (2000) Applied Logistic Regression. 2nd ed. New York, John Wiley
& Sons.
Kalbfleisch JD, Prentice RL. (1980). The Statistical Analysis of Failure Time Data. New York,
John Wiley & Sons.
Kleinbaum DG (1996). Survival Analysis: A Self-Learning Text. New York, Springer-Verlag.
Lee ET. (1980). Statistical Methods for Survival Data Analysis. Belmont CA, Lifetime Learning
Publications.
Maldonado G, Greenland S. (1993). Simulation study of confounder-selection strategies.
Am J Epidemiol 183:923-936.
Chapter 5-23 (revision 16 May 2010)
p. 41
Mantel N. (1966). Evaluation of survival data and two new rank order statistics arising in its
consideration. Cancer Chem Rep 50:163-170.
Myers MH. (1969). A Computing Procedure for a Significance Test of the Difference Between
Two Survival Curves, Methodological Note No. 18 in Methodoligcal Notes compiled by
the End Results Sections, National Cancer Institute, National Institute of Health,
Bethesda, Maryland.
Sun GW, Shock TL, Kay GL. (1999). Inappropriate use of bivariable analysis to screen risk
factors for use in multivariable analysis. Journal of Clinical Epidemiology 49:907-916.
Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE. (2005). Regression Methods in
Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. New
York, Springer.
Chapter 5-23 (revision 16 May 2010)
p. 42
Download