Chapter 4 Supplement

advertisement
Statistical Methods in Epi II (171:242)
Chapter 4 Supplement – Comments on the
Log-Rank Test and Alternative Methods
Brian J. Smith, Ph.D.
February 12, 2003
4S Comments on the Log-Rank Test and
Alternative Methods
4S.1 Proportional hazards assumption
The log-rank (Mantel-Haenzel) test is most powerful in the presence
of proportional hazards. One method for determining if the hazards
are proportional is to plot smooth estimates of the hazard functions.


-1
-3
-2
log(-log S(t))
0
1
Another graphical check is to plot log  log Ŝ t  , the log-log
transformed Kaplan-Meier estimates (see Figure 1).
0
5
10
15
20
Weeks
Figure 1. Log-log transformed Kaplan Meier curves for the
leukemia trial.
1
Notes:
1. Parallel log-log survival curves are indicative of proportional
hazards.
2. The advantage of this method over the use of smoothed
hazard plots is that the Kaplan-Meier estimates are
deterministic and do not depend on subjective choices of
smoothing functions.
4S.2 The effect of outliers
In forming the weighted log-rank statistic, a 2x2 table is formed at
each failure time. Results of those tables are summed and used to
construct the test statistic. The actual failure times are not used in
the calculation of the statistic. This has both advantages and
disadvantages. The advantage is that the log-rank statistics are
robust to outliers among the failure times.
Leukemia Example
Recall the data from the leukemia clinical trial of children treated
with 6-mercaptopurine versus placebo:
6-MP (21 patients): 6, 6, 6, 6*, 7, 9*, 10, 10*, 11*, 13, 16, 17*, 19*,
20*, 22, 23, 25*, 32*, 32*, 34*, 35*
Placebo (21 patients): 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12,
12, 15, 17, 22, 23
Suppose that the time of the last failure in the Placebo group was
changed from 23 to 50. A before and after comparison of the
survival curves is given in Figure 2. Although the curves decrease
at different rates at the beginning of the study, the outlier leads to
curves that seem to behave more similar at the end of the study.
What impact does this have on the results from the log-rank test?
2
1.0
6-MP
Placebo
0.2
0.2
0.4
0.4
Survival
0.6
0.6
0.8
0.8
6-MP
Placebo
0.0
0.0
Cumulative Probability of Remission
1.0
The test statistic for the original data was 16.8 (p = 0.0000417).
With the outlier, the resulting statistic is 14.3 (p = 0.000157). The
outlier has very little effect on the test results.
0
10
20
30
0
10
20
Weeks
30
40
50
Weeks
Figure 2. Effects of an outlier on Kaplan-Meier curves for the
leukemia trial data.
4S.3 Limitations of the log-rank tests
1.0
6-MP
Placebo
0.2
0.2
0.4
0.4
Survival
0.6
0.6
0.8
0.8
6-MP
Placebo
0.0
0.0
Cumulative Probability of Remission
1.0
The actual failure times are not used in the calculation of the
statistics. The disadvantage is that these times may contain
valuable information about the survival experience. For example,
the following three configurations
0
10
20
30
0
Weeks
10
20
Weeks
3
30
0.0
0.2
0.4
Survival
0.6
0.8
1.0
6-MP
Placebo
0
5
10
15
20
25
Weeks
all yield the same log-rank statistic, even thought the survival
experience in the 6-MP group gets progressively worse in the plots.
The test statistics do not reflect the changing situations.
1.0
1.0
Furthermore, the log-rank tests do not distinguish between the next
two configurations because the actual failure times are not used.
Group 1
Group 2
Survival
0.0
0.0
0.2
0.2
0.4
0.4
Survival
0.6
0.6
0.8
0.8
Group 1
Group 2
0
2
4
6
8
10
12
14
0
Time
5
10
15
Time
Notes:
1. At each failure time, the log-rank tests only compares those
groups which still contain subjects at risk.
2. Curves cannot be compared beyond the follow-up period for
the risk set.
4
4S.4 An alternative two-sample testing procedure
One natural way to compare the survival curves would be to
accumulate the area between them over the length of the study
period, i.e.
 Sˆ t   Sˆ t dt .
1
2
t
where Sˆi t  is the Kaplan-Meier estimator. Under H0: S1 = S2 the
area between the survival curves ought to be close to zero.
Pepe and Fleming proposed a weighted version of the above
statistic. Let Cˆ i t  be the estimated probability that censoring does
not occur before time t. Cˆ i t  is computed the sample way that
Sˆi t  is except that failure and censoring are interchanged. For
example, the following plot displays the Cˆ t  in the leukemia trial.
i
0.6
0.4
0.2
0.0
Censoring
0.8
1.0
6-MP
Placebo
0
10
20
Weeks
5
30
The Pepe-Fleming statistic, called the weighted Kaplan-Meier
statistic, is
WKM 


n1n2
wˆ t j   Sˆ1 t j    Sˆ2 t j   t j   t j 1 

n j
where
Cˆ 1 t  j  Cˆ 2 t  j  
wˆ t  j   
.
n1 ˆ
n
C1 t  j    2 Cˆ 2 t  j  
n
n
Notes:
1. WKM is a weighted sum of the area between the survival
functions, making use of the actual failure times.
2. The test statistic WKM / VW KM , where VWKM is the variance
estimate, has an approximate N(0,1) distribution under the
null hypothesis.
6
Download