Statistical Methods in Epi II (171:242) Chapter 4 Supplement – Comments on the Log-Rank Test and Alternative Methods Brian J. Smith, Ph.D. February 12, 2003 4S Comments on the Log-Rank Test and Alternative Methods 4S.1 Proportional hazards assumption The log-rank (Mantel-Haenzel) test is most powerful in the presence of proportional hazards. One method for determining if the hazards are proportional is to plot smooth estimates of the hazard functions. -1 -3 -2 log(-log S(t)) 0 1 Another graphical check is to plot log log Ŝ t , the log-log transformed Kaplan-Meier estimates (see Figure 1). 0 5 10 15 20 Weeks Figure 1. Log-log transformed Kaplan Meier curves for the leukemia trial. 1 Notes: 1. Parallel log-log survival curves are indicative of proportional hazards. 2. The advantage of this method over the use of smoothed hazard plots is that the Kaplan-Meier estimates are deterministic and do not depend on subjective choices of smoothing functions. 4S.2 The effect of outliers In forming the weighted log-rank statistic, a 2x2 table is formed at each failure time. Results of those tables are summed and used to construct the test statistic. The actual failure times are not used in the calculation of the statistic. This has both advantages and disadvantages. The advantage is that the log-rank statistics are robust to outliers among the failure times. Leukemia Example Recall the data from the leukemia clinical trial of children treated with 6-mercaptopurine versus placebo: 6-MP (21 patients): 6, 6, 6, 6*, 7, 9*, 10, 10*, 11*, 13, 16, 17*, 19*, 20*, 22, 23, 25*, 32*, 32*, 34*, 35* Placebo (21 patients): 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23 Suppose that the time of the last failure in the Placebo group was changed from 23 to 50. A before and after comparison of the survival curves is given in Figure 2. Although the curves decrease at different rates at the beginning of the study, the outlier leads to curves that seem to behave more similar at the end of the study. What impact does this have on the results from the log-rank test? 2 1.0 6-MP Placebo 0.2 0.2 0.4 0.4 Survival 0.6 0.6 0.8 0.8 6-MP Placebo 0.0 0.0 Cumulative Probability of Remission 1.0 The test statistic for the original data was 16.8 (p = 0.0000417). With the outlier, the resulting statistic is 14.3 (p = 0.000157). The outlier has very little effect on the test results. 0 10 20 30 0 10 20 Weeks 30 40 50 Weeks Figure 2. Effects of an outlier on Kaplan-Meier curves for the leukemia trial data. 4S.3 Limitations of the log-rank tests 1.0 6-MP Placebo 0.2 0.2 0.4 0.4 Survival 0.6 0.6 0.8 0.8 6-MP Placebo 0.0 0.0 Cumulative Probability of Remission 1.0 The actual failure times are not used in the calculation of the statistics. The disadvantage is that these times may contain valuable information about the survival experience. For example, the following three configurations 0 10 20 30 0 Weeks 10 20 Weeks 3 30 0.0 0.2 0.4 Survival 0.6 0.8 1.0 6-MP Placebo 0 5 10 15 20 25 Weeks all yield the same log-rank statistic, even thought the survival experience in the 6-MP group gets progressively worse in the plots. The test statistics do not reflect the changing situations. 1.0 1.0 Furthermore, the log-rank tests do not distinguish between the next two configurations because the actual failure times are not used. Group 1 Group 2 Survival 0.0 0.0 0.2 0.2 0.4 0.4 Survival 0.6 0.6 0.8 0.8 Group 1 Group 2 0 2 4 6 8 10 12 14 0 Time 5 10 15 Time Notes: 1. At each failure time, the log-rank tests only compares those groups which still contain subjects at risk. 2. Curves cannot be compared beyond the follow-up period for the risk set. 4 4S.4 An alternative two-sample testing procedure One natural way to compare the survival curves would be to accumulate the area between them over the length of the study period, i.e. Sˆ t Sˆ t dt . 1 2 t where Sˆi t is the Kaplan-Meier estimator. Under H0: S1 = S2 the area between the survival curves ought to be close to zero. Pepe and Fleming proposed a weighted version of the above statistic. Let Cˆ i t be the estimated probability that censoring does not occur before time t. Cˆ i t is computed the sample way that Sˆi t is except that failure and censoring are interchanged. For example, the following plot displays the Cˆ t in the leukemia trial. i 0.6 0.4 0.2 0.0 Censoring 0.8 1.0 6-MP Placebo 0 10 20 Weeks 5 30 The Pepe-Fleming statistic, called the weighted Kaplan-Meier statistic, is WKM n1n2 wˆ t j Sˆ1 t j Sˆ2 t j t j t j 1 n j where Cˆ 1 t j Cˆ 2 t j wˆ t j . n1 ˆ n C1 t j 2 Cˆ 2 t j n n Notes: 1. WKM is a weighted sum of the area between the survival functions, making use of the actual failure times. 2. The test statistic WKM / VW KM , where VWKM is the variance estimate, has an approximate N(0,1) distribution under the null hypothesis. 6