Lecture 23 1. Survival time 2. Censored observations 3. Proc Lifetest: Kaplan-Meier estimate of the survival distribution 4. Comparing survival distributions 5. Proportional hazards regression: Proc PHreg References: Collett (2003) Modelling Survival Data in Medical Research, 2nd ed. Allison (1995) Survival Analysis Using the SAS System. Cantor (2003) SAS Survival Analysis Techniques for Medical Research Singer & Willett (2003) Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence 1 Time-to-event or survival data In many situations, time until an event occurs is important: • New treatment for brain cancer: do patients survive longer than after standard treatment? • In the AHC, are men awarded tenure earlier and more often than women? • Are young adults getting married later than 10 years ago? Are women delaying the birth of their first child? In theory, each individual has their own time Ti to the event. In reality, some do not have the event during observation period (censoring). Objective is not point estimate (mean, slope, odds ratio) but an estimate of the distribution of these times {Ti }. 2 US Census Bureau cross-section (“synthetic cohort”) for 2002 Histogram of survival times for 2002 US population, truncated at 101. 3.5 Percent of Deaths by age, 2002 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0 20 40 60 80 100 US Population, Age in Years in 2002 E. Arias (2004) United States Life Tables, 2002 (National vital statistics reports; vol 53 no 6. Hyattsville, Maryland: National Center for Health Statistics.) 3 Survivor function S(t ) = chance of surviving to age t Percent Surviving = percent still alive (free of the event) at age t 100 100 80 80 60 60 40 40 20 20 0 0 0 20 40 60 US Population, Age in Years in 2002 4 80 100 Hazard function h(t ) = age-specific death rate = percent dying at age t of those alive at age ∏ t Age−Specific Death Rate per 100,000 0.20 0.15 0.10 0.05 0.00 0 20 40 60 80 US Population, Age in Years in 2002 5 Probability theory defines distribution by: • histogram of lifetimes, called probability density function f (t ) • cumulative distribution function = cumulative area under histogram, starting from left. F (t ) = Zt f (u)d u °1 Survivor function S(t ) = 1 ° F (t ). Percent without the event (still alive) at time t . Hazard function h(t ) = f (t ) chance of event at time t = S(t ) percent at risk at time t Hazard h(t ) gives the chance of event during a short interval after time t , for those who are at risk (alive) at time t . 6 Outline: two main analyses for survival data 1. Estimate survivor function, compare survivor functions between groups. Proc Lifetest gives nonparametric product-limit (Kaplan-Meier) or lifetable estimate, draws graphs, tests for differences. Nice pictures, but no adjustments—only strata. Proc LifeReg gives regression adjustment but must specify parametric formula for survivor function; rarely used in health sciences. 2. Estimate ratio of hazard functions between groups, compare ratio to 1. Proc PHreg does proportional hazards regression to estimate ratio. No pictures (almost) but regression adjustment for fixed and time-varying predictors. 7 Censored observation times Common problem in survival data is that we don’t observe all event times: • we stop the study and analyze the data before everyone has had the event • a person leaves the study and we cannot find out whether they had the event In these cases, all we have is final time t 0 subject was known to be alive; we know only that T > t 0 The final time t 0 is called a censored observation, and it’s a lower bound for the unknown event time T . 8 Clinical study example: eligible participants were enrolled as soon as they volunteered, and recruitment lasted 2.5 years. The study ended on 1/1/2008. Subjects died (open circle), dropped out (triangle), or were still alive at study end (gray dot). ● ● ● ● ● start end 1/1/2005 1/1/2006 1/1/2007 1/1/2008 Calendar Time 9 Analysis of clinical study example: each subject’s time is aligned to start at “study time” = 0. ● ● ● ● ● start 0.0 ● end 0.5 1.0 1.5 2.0 2.5 3.0 Time from Enrollment (years) * marks study enrollment, horizontal line indicates time participant was alive, deaths are indicated by an open circle, censoring by a gray dot. 10 No histogram of survival times with censored data We can draw a histogram of all the times t i If there are censored times, we know that t i < actual survival time. No correct place in histogram for censored observations, because they are lower bounds, not observed times. However, excluding them gives a biased histogram. Kaplan and Meier (1958) proposed break-through method to estimate survivor function S(t ) from partially censored data. 11 Kaplan-Meier estimate of the survivor function Order the event times from earliest to latest: t 0 (baseline), t 1, t 2, . . . , t v . Within each interval [t i , t i +1) (left end included, right end excluded) let n i = number at risk of event at start of interval d i = number of events within interval Then d i /n i = event rate in interval, ° ¢ and 1 ° d i /n i = proportion with no event (surviving) 12 Survivor function gives chance of surviving to time t Estimate this by product of chances of surviving each interval up to t : µ ∂ di Ŝ(t ) = 1° ni t i <t Y Notice that length of time intervals is ignored. 13 Stomach cancer example Survival times after treatments A or B for 89 patients with stomach cancer (source: Chapter 12, Der and Everitt). • 45 received treatment A: 38 died, 7 still alive at end of study (= censored) • 44 received treatment B: 41 died, 3 still alive Obs 1 2 3 4 5 days 17 185 542 1 383 trt A A A B B years 0.04654 0.50650 1.48392 0.00274 1.04860 died 1 1 1 1 1 days, years give times t patients were last known alive. died = 1 if an event happened at t . died = 0 if censored. 14 Proc Lifetest: Kaplan-Meier (Product-Limit) estimate of survivor function ODS graphics on; Proc Lifetest data = stomach_cancer plots =(survival(atrisk=0 to 4 by 1)); TIME years * died(0); STRATA trt ; run; ODS graphics off; TIME statement is like model statement, specifies response TIME length-of-time * event-status ( censored-value ) ; STRATA variable identifying treatment groups to be compared by test 15 plots=(survival(atrisk=0 to 4 by 1)) censoredsymbol="|" ; Sample sizes given at bottom. Need at least 10–15 in each group. 16 plots=( survival( CL atrisk=0 to 4 by 1)) ; 17 The LIFETEST Procedure Stratum 1: trt = A years 0.00000 0.04654 0.11499 0.12047 0.13142 0.16427 . . . . 3.32375 3.37303* 3.73990 3.98357* 4.33949* 4.44079* 4.45175* 4.75291* Survival Failure Survival Standard Error Number Failed Number Left 1.0000 0.9778 0.9556 0.9333 0.9111 0.8889 0 0.0222 0.0444 0.0667 0.0889 0.1111 0 0.0220 0.0307 0.0372 0.0424 0.0468 0 1 2 3 4 5 45 44 43 42 41 40 0.1750 . 0.1458 . . . . . 0.8250 . 0.8542 . . . . . 0.0572 . 0.0546 . . . . . 37 37 38 38 38 38 38 38 7 6 5 4 3 2 1 0 NOTE: The marked survival times are censored observations. 18 years: time t when survivor function starts a new value Survival: Kaplan-Meier (product-limit) estimate Ŝ(t ) of the survivor function Failure: Kaplan-Meier estimate of cumulative mortality, [1 ° Ŝ(t )] = F̂ (t ) Survival Standard Error: the pointwise standard error of the estimate Ŝ(t ) Number Failed: the total number of events Number Left: the number still under observation and at risk for the event 95% confidence interval for the estimated survivor function from the usual formula with a standard error (from output): Ŝ(t ) ± 1.965 § SE{Ŝ(t )} 19 Stratum 1: trt = A Quartile Estimates Percent 75 50 25 Point Estimate 1.58795 0.69541 0.39425 Mean 1.34660 95% Confidence Interval [Lower Upper) 1.27036 . 0.52841 1.32512 0.20260 0.53388 Standard Error 0.19441 Median survival time is time t when Ŝ(t ) = 0.5, the survivor function equals 50%. If Ŝ(t ) = 0.5 over an interval, the median is midpoint of the interval. Mean survival time is area under the Kaplan-Meier survival curve. If the largest observed time in the data is censored, then this area is unspecified. Don’t report mean survival time if there is any censoring. 20 Summary of censoring in each group. Summary of the Number of Censored and Uncensored Values Stratum group Total Failed Censored Percent Censored 1 A 45 38 7 15.56 2 B 44 41 3 6.82 ------------------------------------------------------------------Total 89 79 10 11.24 Precision of estimates depends on the number of events (“Failed”) not the number of observations. 21 Tests to compare population survivor functions Lifetest compares population survivor functions S(t ) between groups listed in the STRATA statement. Null hypothesis: all groups have the same population survivor function; here, S A (t ) = S B (t ). • Log rank • Wilcoxon • Likelihood ratio test Ignore likelihood ratio test—it depends on strong assumption (exponential density) that is usually wrong. 22 Rank Statistics trt A B Log-Rank Wilcoxon 3.3043 -3.3043 502.00 -502.00 Test of Equality over Strata Test Log-Rank Wilcoxon -2Log(LR) Chi-Square DF Pr > Chi-Square 0.5654 4.3162 0.3574 1 1 1 0.4521 0.0378 0.5500 Two usable tests disagree here. 23 All three tests are based on H0 : S A (t ) = S B (t ): • combine all groups to get a common event rate on each time interval • for each group in each interval, multiply event rate by sample size to get expected numbers of events e j k = expected numbers of events in group j during time period k d j k = observed numbers of events in group j at time k. 24 Log-rank test statistic is cumulative difference between observed and expected: dL = X° ¢ d 1k ° e 1k . k Rank Statistics trt A B Log-Rank 3.3043 -3.3043 Wilcoxon 502.00 -502.00 Test statistic for A was +3.3043, indicating more deaths than expected. Test statistic for B was °3.3043, indicating fewer deaths than expected. Usually more sensitive test. Best test when the estimated survivor functions do not cross each other. Often the basis for sample size calculations. 25 Wilcoxon test. Sum of differences between observed and expected events, weighted by sample size: dW = X k n k (d 1k ° e 1k ). Rank Statistics trt A B Log-Rank 3.3043 -3.3043 Wilcoxon 502.00 -502.00 Wilcoxon test gives more weight to the early part of the estimated survivor functions, where there is more information. Wilcoxon is less sensitive to late differences in survivor functions. Use Wilcoxon when estimated survivor functions cross each other. 26 27 Rank Statistics trt A B Log-Rank Wilcoxon 3.3043 -3.3043 502.00 -502.00 Test of Equality over Strata Test Log-Rank Wilcoxon -2Log(LR) Chi-Square DF Pr > Chi-Square 0.5654 4.3162 0.3574 1 1 1 0.4521 0.0378 0.5500 Which test should we report? Think about sample size as well as whether survivor curves cross. 28 Usually display K-M curves only where sample size is at least 10. To truncate plot, set maxtime. ODS graphics on; Proc Lifetest data=two_years maxtime=3.0 plots=survival(atrisk=0 to 4 by 1) ; time years * censor(1); strata trt ; run; ODS graphics off; Has no effect on tests, which still use all the data. 29 30 Proc Lifetest TEST statment Proc Lifetest also compares groups identified in the TEST statement. This is intended to test the effect of a continuous explanatory variable. When used with a categorical variable, such as treatment results are not the same as from STRATA. Use STRATA not TEST. 31 Summarizing tests comparing survivor functions When a test finds a significant difference between survivor curves, it does not tell us when they differ. Three common approaches to summarizing comparison of two groups A and B : 1. If curves are consistently separated, report overall result. If Ŝ A is consistently lower than Ŝ B , then group A suffered more early events and had a smaller percent surviving at any given time. Report results of log-rank test. During the ten years after treatment, the proportion surviving was significantly higher in group B (log-rank test, p = .016). 32 2. Select a percent surviving, and report times when each group reached it. Common choice is 50% surviving, the median survival time. Group A Percent 75 50 25 Group B 75 50 25 Quartile Estimates Point 95% Confidence Interval Estimate [Lower Upper) 1.58795 1.27036 . 0.69541 0.52841 1.32512 0.39425 0.20260 0.53388 2.39836 1.38535 0.95277 1.55784 1.04860 0.68446 3.47981 1.85079 1.06229 Median survival in group A was 0.7 years (95% CI: 0.5–1.3 years), while in group B median survival was 1.4 years, twice as long (95% CI: 1.0–1.9 years). 33 3. Select a time and report percent in each group surviving at that time. Percent surviving (and confidence interval) beyond a particular time comes from theProc Lifetest listing of Ŝ(t ). Stratum 1: group = A years 0.00000 . . . . 0.86242 1.09788 Survival 1.0000 Failure 0 Survival Standard Error 0 0.4444 0.4222 0.5556 0.5778 0.0741 0.0736 Number Failed 0 Number Left 45 25 26 20 19 After one year of treatment, estimated proportion surviving in group A was 44% ± 7%, but in group B the estimated proportion surviving was 68% ± 7%. 34 Hazard function hazard function h(t ) = f (t ) chance of an event at time t = S(t ) percent alive at time t Hazard h(t) is time-specific event rate. Survivor function S(t ) = percent still at risk (alive) at time t . Hazard function h(t ) = chance of event at time t for the subset of people at risk. 35 Proportional Hazards Regression Proportional hazards regression (D.R Cox, 1972) assumes that different groups have proportional hazard functions. With two groups A and B, there is a common hazard function h(t ), which applies to group A. Being in group B multiplies the hazard by r. h B (t ) = r · h A (t ). Proportional hazards regression estimates r without estimating h(t ). Since hazards are chances, this means that the ratio of the hazard functions r = h B (t ) h A (t ) can be interpreted as a relative risk or relative rate. 36 Proportional hazards regression makes several assumptions: 1. There is a baseline hazard function h 0(t ) common to all individuals in all the study groups. 2. Study group j has a hazard function h j (t ) that is a positive multiple of the baseline hazard: h j (t ) = r j h 0(t ). Each group has its own hazard ratio r j . For reference group, r j = 1. 3. Explanatory variables act only on the r j not the baseline hazard. 37 Proportional Hazards Regression Model Model the hazard ratios (relative risks) on the log scale as function of predictors: ° ¢ log hazard ratio r j = Ø1Group + Ø2 X + Ø3 Z + . . . No intercept—it is part of baseline hazard. What makes proportional hazards regression work is that we can fit the model without needing to estimate the baseline hazard h 0(t ). Proportional hazards regression is about the hazard ratio or relative risk, not the hazard. 38 Interpretation of the regression coefficients is very similar to logistic regression: • Class variable A : exp(Ø̂i ) is the hazard ratio or relative risk comparing i -th level of A to reference level. • Continuous variable X : exp(Ø̂ j ) is relative risk corresponding to a 1-unit increase in X , comparing those with X = x + 1 to those with X = x. 39 Breast cancer example Study in 1987 compared survival times of women diagnosed with breast cancer divided into two groups: staining test of biopsy tissue positive or negative. Data from Collett (2003) Example 1.2. 40 Proc Lifetest TIME data=breast_cancer ; surv_months * died(0); STRATA positive_stain; positive_ Percent Stratum stain Total Failed Censored Censored 1 0 13 5 8 61.54 2 1 32 21 11 34.38 Pr > Test Chi-Square DF Chi-Square Log-Rank 3.5150 1 0.0608 Wilcoxon 4.1800 1 0.0409 -2Log(LR) 4.3563 1 0.0369 41 Which test is appropriate here? Test gives no estimate of difference between groups. 42 PHreg: proportional hazards regression Proc PHreg MODEL ; time * event-status ( censored_value ) = predictors / risklimits CI for hazard ratios ties=efron ; Proc PHreg fitting method option data = breast_cancer; model surv_months * died(0) = positive_stain / risklimits ties=efron; The response is specified in the same way as for Proc Lifetest. 43 Analysis of Maximum Likelihood Estimates Parameter positive_stain DF Parameter Estimate Standard Error Chi-Square Pr > ChiSq 1 0.90933 0.50089 3.2957 0.0695 Analysis of Maximum Likelihood Estimates Parameter positive_stain Hazard Ratio 95% Hazard Ratio Confidence Limits 2.483 0.930 6.626 Hazard rate in the positive-stain group was estimated to be 2.5 times greater than in the negative-stain group, although this did not reach significance (p = .0695). 44