Lab Session 2 - Mathematics and Statistics

advertisement
Survival and Event History Analysis (MATH463)
Lab Session 2: Proportional hazards regression analysis
The Multicenter Diltiazem Postinfarction Trial (MDPIT) was a clinical evaluation of
long-term administration of the drug diltiazem in patients who had already suffered
from a myocardial infarction (heart attack). A total of 2466 patients from 38 hospitals
in the United States and Canada were randomised between diltiazem and placebo and
followed up for between 12 and 52 months. Both mortality and reinfarction (having
another heart attack) were of interest, but here we will focus on mortality. Full details
of the trial were published by the MDPIT Research Group (1988). This Lab Session
involves various analyses of data from this trial.
1.
Reading in the data
mdpitsas.dat is a data file containing an extract of the data from this study. The 12
data columns are as follows.
Column 1:
Column 2:
Column 3:
Column 4:
patno
treat
hosp
region
Column 5:
survt
Column 6:
Column 7:
cens
pc
Column 8:
ef

unique patient number
treatment group, coded 0 = placebo, 1 = diltiazem
a code number identifying the treating hospital
a character variable identifying the region of North
America where the hospital is located
time from randomisation to death or until last time seen
alive (days)
censoring code, 0 = alive, 1 = dead
pulmonary congestion, coded 0 = none, 1 = mild, 2 =
moderate, 3 = severe, 9 = missing
ejection fraction at baseline, 999 = missing
Create formats for your dataset by running the program format_mdpit.sas:
libname library 'C:\Home\MSc Lancaster\Survival and event history
analysis\Datasets';
proc format library = library;
value trtfmt
0 = 'placebo'
value pcfmt
0 = 'none'
2 = 'moderate'
run;
1 = 'diltiazem';
1 = 'mild'
3 = 'severe';
data mdpit.mdpitsas;
set mdpit.mdpitsas;
format
treat trtfmt.
pc pcfmt.;
run;
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
1

Convert the data into a SAS dataset, using the program read_mdpit.sas:
options ps=54 ls=72 nodate nonumber;
libname mdpit 'C:\Home\MSc Lancaster\Survival and event history
analysis\Datasets';
data mdpit.mdpitdat;
infile 'C:\Home\MSc Lancaster\Survival and event history
analysis\Datasets\mdpitsas.dat';
label patno
= 'Patient Study Number'
treat
= 'Treatment Group'
hosp
= 'Hospital'
region
= 'Region'
survt
= 'Survival Time'
cens
= 'Censoring Indicator'
pc
= 'Pulmonary Congestion'
ef
= 'Ejection Fraction';
input @8 patno f5.0 @17 treat @20 hosp @25 region$
@40 survt f4.0 @50 cens @55 pc @60 ef;
if pc = 9 then pc = '.';
if ef = 999 then ef = '.';
run;
2.
Plotting the data
The program below will plot the Kaplan-Meier estimate of the survival function for
the MDPIT data. Other output will be given, but for the moment focus on the graphs
produced.
proc lifetest data = mdpit.mdpitsas notable plot = (s);
time survt*cens(0);
run;

Run the program. The curve below is obtained. It is obscured by the circle
plotting symbol which marks where each censored observation is located. In this
dataset there are so many censored observations that the default curve is difficult to
see.
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
2

To see what the curve looks like with a smaller dataset, obtain the curve for
patients with pulmonary congestion in hospital 1. In the SAS code below, a data step
is first used to extract the subset of the data required.
data hosp1_pc;
set mdpit.mdpitsas;
if pc = 0 then delete;
if hosp ne 1 then delete;
run;
proc lifetest data = hosp1_pc notable plot = (s);
time survt*cens(0);
run;
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
3
Now it is clear how long each of the censored patients survived.

Returning to the complete dataset, it is of interest to compare the patients on
diltiazem with those on placebo. This is achieved using the following program. The
problem of having too many censoring symbols has been resolved by adding the
option “censoredsymbol = none” in the proc line, and separate plots for the two
treatments have been requested using the line “strata treat;”.
proc lifetest data = mdpit.mdpitsas notable plot = (s)
censoredsymbol = none;
time survt*cens(0);
strata treat;
run;
It can be seen that there is very little difference between the two curves.

Now plot separate curves on the same figure for patients in the four different
pulmonary congestion groups. Which group has the better survival?

Next prepare a dataset containing only patients who do not have pulmonary
congestion (for all hospitals), and on a single figure plot the survival curves for the
two treatment groups. Take care to exclude patients for whom the pc grade is
missing. Repeat for a dataset containing only patients who do have pulmonary
congestion, at any grade of severity. Comment on the pattern observed.
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
4
3.
Simple comparisons of groups
PROC LIFETEST can also be used to conduct a logrank test comparing two groups of
patients.

Run the code below, and look at the output window (no plots have been
requested, and so there will be no graph window this time).
proc lifetest data = mdpit.mdpitsas notable;
time survt*cens(0);
strata treat;
run;
The out put is given below:
The SAS System
The LIFETEST Procedure
Summary of the Number of Censored and Uncensored Values
Stratum
treat
Total
Failed
Censored
Percent
Censored
1
diltiazem
1232
165
1067
86.61
2
placebo
1234
166
1068
86.55
---------------------------------------------------------------Total
2466
331
2135
86.58
Testing Homogeneity of Survival Curves for survt over Strata
Rank Statistics
treat
Log-Rank
Wilcoxon
diltiazem
placebo
-0.47441
0.47441
2038.0
-2038.0
Covariance Matrix for the Log-Rank Statistics
treat
diltiazem
placebo
diltiazem
placebo
82.7270
-82.7270
-82.7270
82.7270
Covariance Matrix for the Wilcoxon Statistics
treat
diltiazem
placebo
diltiazem
placebo
3.5384E8
-3.538E8
-3.538E8
3.5384E8
Test of Equality over Strata
Test
Log-Rank
Wilcoxon
-2Log(LR)
MPS/MSc in Statistics
Chi-Square
DF
Pr >
Chi-Square
0.0027
0.0117
0.0020
1
1
1
0.9584
0.9137
0.9640
Survival Analysis - Lab Session 2
5
Some of the output is self explanatory. Of the rest, we are interested here only in the
log rank test.
From the row that reads
placebo
0.47441
-2038.0
we extract Z = 0.47441. This is the logrank statistic defined in Lecture 3.3, and its
positive sign indicates a slight advantage of diltiazem. The “covariance matrix for the
log-rank statistics” repeats the value 82.7270 four times, sometimes with a negative
sign. From this, we deduce that V = 82.7270. To test whether there is a treatment
effect, we calculate
Z2/V = 0.00272
and this is given in the output under “Test of Equality over Strata”, with an
accompanying p-value (against the two-sided alternative that the treatments are not
the same).

In the third row of the code just run, change “strata” to “test” and rerun.
The output looks quite different, but it includes the following lines:
Univariate Chi-Squares for the Log-Rank Test
Variable
Test
Statistic
treat
0.4744
Standard
Deviation
Chi-Square
Pr >
Chi-Square
0.00272
0.9584
9.0965
Label
Treatment Group
Covariance Matrix for the Log-Rank Statistics
Variable
treat
treat
82.7466
The test statistic Z = 0.47441 can be found here, although only to four decimal places.
The value of V is reported under the heading “Covariance matrix” as V = 82.7466.
This is slightly different from the run using the strata command. The reason is that:
strata uses Cox’s treatment of ties
test uses Breslow’s treatment of ties
Sometimes the differences will effect the magnitude of Z and the test statistic.

Use the logrank test to compare patients with pulmonary congestion to those
without. Be sure to exclude patients with missing grades for pulmonary congestion
first.
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
6

Prepare a dataset comprising only patients with pulmonary congestion, and
within that dataset use the logrank test to compare the two treatment groups. Repeat
for patients without pulmonary congestion.
4.
Analysis using Cox’s proportional hazards regression model
In SAS, PROC PHREG can be used to fit Cox’s proportional hazards regression
model.

Run the following SAS program which fits treatment to the data:
proc phreg data = mdpit.mdpitsas;
model survt*cens(0) = treat/ ties = discrete;
run;
The resulting output is:
The SAS System
The PHREG Procedure
Model Information
Data Set
Dependent Variable
Censoring Variable
Censoring Value(s)
Ties Handling
MDPIT.MDPITSAS
survt
cens
0
DISCRETE
Number of Observations Read
Number of Observations Used
Survival Time
Censoring Indicator
2466
2466
Summary of the Number of Event and Censored Values
Total
Event
Censored
Percent
Censored
2466
331
2135
86.58
Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
7
Model Fit Statistics
Criterion
-2 LOG L
AIC
SBC
Without
Covariates
With
Covariates
4890.264
4890.264
4890.264
4890.261
4892.261
4896.063
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
0.0027
0.0027
0.0027
1
1
1
0.9584
0.9584
0.9584
Likelihood Ratio
Score
Wald
Analysis of Maximum Likelihood Estimates
Variable
treat
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
1
-0.00573
0.10995
0.0027
0.9584
Analysis of Maximum Likelihood Estimates
Variable
treat
Hazard
Ratio
Variable Label
0.994
Treatment Group
Results relating to testing the null hypothesis are:
Model Fit Statistics
Criterion
-2 LOG L
Without
Covariates
With
Covariates
4890.264
4890.261
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
0.0027
0.0027
0.0027
1
1
1
0.9584
0.9584
0.9584
“-2 LOG L” is minus twice the log-likelihood, as known as the (raw) deviance. The
value without covariates is the deviance when no factors are fitted in the model, while
the value with covariates is the deviance when treatment is fitted. The difference
between the two values, 4890.264 – 4890.261 = 0.003 is the change in deviance due
to fitting treatment. This is reported below, to an additional decimal place, as
Likelihood Ratio = 0.0027, and the resulting p-value of 0.9584 is given.
“Score” is the logrank test statistic Z2/V, as given in Lecture 3.3. The SAS option
“ties = discrete” in the model statement has ensured that Cox’s treatment of ties
has been used. This test should always be consistent with the logrank test drawn from
PROC LIFETEST, provided that ties are treated in the same way. The SAS option
“ties = Breslow” can also be used (but is not recommended!).
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
8
   , where ̂ is the maximum
2
“Wald” is the Wald test statistic calculated from ˆ / se ˆ
likelihood estimate of the treatment effect on the log-hazard ratio scale. In this
example, all three tests give the same answer to all decimal places reported.
Asymptotically, for large sample sizes and small values of , this should always be
the case. In smaller sample sizes, or when  is far from zero, then the statistics will
differ. Experience suggests that the likelihood ratio test is to be preferred, with the
score test being second best and the Wald the least accurate.
Estimation results are given in the last part of the output.
Analysis of Maximum Likelihood Estimates
Variable
treat
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
1
-0.00573
0.10995
0.0027
0.9584
Analysis of Maximum Likelihood Estimates
Variable
treat
Hazard
Ratio
Variable Label
0.994
Treatment Group
The maximum likelihood estimate of the log-hazards ratio (which is  in our
notation) is given as treat. Hence ̂ = 0.00573. The positive sign indicates a (very
small) advantage of diltiazem. Also given is the standard error of ̂ : se ˆ =

0.10995. The Wald test is then repeated here:
ˆ / se ˆ   0.00573/ 0.10995  0.05211  0.002716 ,
2
2
2
together with its related p-value. In view of the relative accuracy of the three tests
available, do not rely on this test. Judge significance by the likelihood ratio test. The
hazard ratio itself is estimated by exp ˆ = 0.9943. Note that the approximate
 
maximum likelihood estimate of  is Z/V, which from the PROC LIFETEST output is
0.47441/82.7466 = 0.005733. In this large sample situation the approximation is very
good. Note that the true maximum likelihood estimate from PROC PHREG should
always be quoted. The approximation Z/V is useful for understanding and for
developing theory, and it can be helpful for “back-of-envelope calculations”, but it
should not be used in serious data analysis reports.
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
9

Next, we wish to explore the joint effects of pulmonary congestion and
treatment. The following code accomplishes this.
data pcg;
set mdpit.mdpitsas;
if pc = '.' then delete;
pcg = (pc > 0);
int = treat*pcg;
run;
proc phreg data = pcg;
model survt*cens(0) = pcg/ ties = discrete;
run;
proc phreg data = pcg;
model survt*cens(0) = pcg treat/ ties = discrete;
run;
proc phreg data = pcg;
model survt*cens(0) = pcg treat int/ ties = discrete;
run;
First, patients with unknown grades of pulmonary congestion are deleted, and an
indicator, pcg, of pulmonary congestion YES (1) or NO (0) is created. Then a series
of nested models is fitted, first entering pcg, then adding treat and finally adding int –
the treatment by pulmonary congestion interaction. The series of models are fitted so
that likelihood ratio tests can be conducted (rather than Wald tests) of the additional
effect of each new term in the model. Finally, separate models by treatment are fitted
to each pulmonary congestion group.
Extracts from the output are given below.
From proc phreg data = pcg;
model survt*cens(0) = pcg/ ties = discrete;
run;
The SAS System
The PHREG Procedure
Number of Observations Used
2399
Model Fit Statistics
Criterion
-2 LOG L
MPS/MSc in Statistics
Without
Covariates
With
Covariates
4768.105
4672.136
Survival Analysis - Lab Session 2
10
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
95.9691
119.3062
106.7609
1
1
1
<.0001
<.0001
<.0001
Likelihood Ratio
Score
Wald
Analysis of Maximum Likelihood Estimates
Variable
pcg
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Hazard
Ratio
1
1.16280
0.11254
106.7609
<.0001
3.199
All three tests show a highly significant effect of pulmonary congestion, with the chisquare values now being quite distinct. Notice that the score chi-square is precisely
the same as that obtained from the logrank test in Section 3. These two values are
mathematically equivalent. The positive parameter estimate indicates that having
pulmonary congestion (at any grade) increases mortality. The hazard ratio is 3.199 –
pulmonary congestion triples the risk of death.
From proc phreg data = pcg;
model survt*cens(0) = pcg treat/ ties = discrete;
run;
Model Fit Statistics
Criterion
-2 LOG L
Without
Covariates
With
Covariates
4768.105
4671.996
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
96.1089
119.4447
106.8988
2
2
2
<.0001
<.0001
<.0001
Likelihood Ratio
Score
Wald
Analysis of Maximum Likelihood Estimates
Variable
pcg
treat
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
1
1
1.16402
0.04158
0.11259
0.11118
106.8932
0.1398
<.0001
0.7084
The tests of the global null hypothesis investigate whether the model with pcg and
treat is better than the model with no factors, thus the tests are on 2 degrees-offreedom. This test of little interest, and can usually be ignored. Of greater interest is
whether treatment reduces deviance further, once pulmonary congestion has been
fitted. The reduction in 2logL is 4672.136  4671.996 = 0.140, taking the first
value from the output for the model with pcg only and the second from the output
with both pcg and treat. After adjusting for pulmonary conclusion, the treatment
estimate becomes positive, indicating disadvantage of diltiazem – but remember this
treatment effect is so small in magnitude to as to be essentially zero. The Wald test
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
11
statistic is 0.1398, which is equal to 3 decimal places to the likelihood ratio test in this
case. Both of these tests consider the effect of treatment having taken into account the
effect of pulmonary congestion, which is what we are interested in. The Wald test
reported for pcg, giving chi-square = 106.8932, tests whether pulmonary congestion is
significant, having allowed for treatment effect. Such an analysis is less appropriate.
From proc phreg data = pcg;
model survt*cens(0) = pcg treat int/ ties = discrete;
run;
Model Fit Statistics
Criterion
-2 LOG L
Without
Covariates
With
Covariates
4768.105
4666.082
Analysis of Maximum Likelihood Estimates
Variable
pcg
treat
int
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
1
1
1
0.88647
-0.19005
0.54866
0.16275
0.14688
0.22641
29.6678
1.6743
5.8723
<.0001
0.1957
0.0154
The further reduction in 2logL due to fitting the treatment by pulmonary congestion
interaction is 4671.996  4666.082 = 5.914, and the corresponding p-value is
0.015021. The Wald test in this case is not quite the same, with a statistic of 5.8723
and a p-value of 0.0154. The other Wald tests reported are quite inappropriate. The
first tests for the effect of pulmonary congestion, allowing for treatment and for
treatment by pulmonary congestion interaction. The second tests for the effect of
treatment, allowing for pulmonary congestion and for treatment by pulmonary
congestion interaction. These are what SAS calls “Type III analyses”, but they have
little meaning when interaction terms are included.
We conclude that there is a significant treatment by pulmonary congestion interaction.

Because of the interaction just found, it is not possible to reach a single
conclusion about the effect of treatment for all patients. Instead, different analyses
are required for each pulmonary congestion group. Run PROC PHREG programs to
investigate (a) the effect of treatment for patients without pulmonary congestion and
(b) the effect of treatment for patients with pulmonary congestion.
MPS/MSc in Statistics
Survival Analysis - Lab Session 2
12
Download