SAS LAB TWO, April 27, 2004 Lab Objectives After today’s lab you should be able to: 1. Use PROC LIFETEST to generate Kaplan-Meier product-limit survival estimates, and understand the output of the LIFETEST procedure. 2. Generate confidence limits for the Kaplan-Meier curve. Understand upgrades in SAS 9 for obtaining confidence bands for the KM survivor function. 3. Use the LIFETEST procedure to compare survival times of two or more groups. 4. Generate log-survival and log-log survival curves. 5. Run a simple SAS macro. SAS LAB TWO, April 27, 2004 We will produce the following plots today: SAS LAB TWO, April 27, 2004 SAS LAB TWO, April 27, 2004 LAB EXERCISE STEPS: Follow along with the computer in front… 1. Save to desktop, if it’s not already there, the excel dataset: “hmohiv.xls” from the hrp262 website: http://www-stat.stanford.edu/~jtaylo//courses/stats262/spring.2004/index.html 2. Open SAS: From the desktop double-click “Applications” double-click SAS icon 3. Import data using point-and-click options: Select from the menu: FileImport Data…Next>Browse and find file on the desktopNext>name in work library as member “HmoHiv”Finish 4. Fix datetime variables, enddate and startdate, via the following code: Dealing with date-time variables /**REMINDER: YOU MUST CLOSE THE DATASET BEFORE TRYING TO MODIFY IT**/ data hmohiv; set hmohiv; format enddate date.; format startdate date.; enddate=datepart(enddate); startdate=datepart(startdate); Time=12*(enddate-startdate)/365.25; *gives time in months; Time=round(time); *to match Time variable in textbook; run; 5. Generate the Kaplan-Meier product limit survival estimates for the hmohiv data: /**Kaplan-Meier estimates of survivorship function**/ proc lifetest data=hmohiv; time time*censor(0); title 'Kaplan-Meier Estimates for HMO HIV data'; run; 6. Examine the “product limit survival estimates” output from the lifetest procedure. Notice that there are several events that have the same failure times. Confirm this fact by examining the distribution of the Time variable using point-andclick as follows: 1. From the menu select: SolutionsAnalysisInteractive Data Analysis 2. Double click to open: library “Work”, dataset “HmoHiv” 3. Highlight “Time” variable and from the menu select: AnalyzeDistribution(Y) 4. From the menu select: TablesFrequency Counts 5. Scroll down the open analysis window to examine the frequency counts for Time. Notice that there are many repeats. SAS LAB TWO, April 27, 2004 Explanation of output from Lifetest procedure: Kaplan-Meier Estimates for HMO HIV data The LIFETEST Procedure Product-Limit Survival Estimates Time Gives KM estimate at each failure/event time. Reported for the last of the tied cases, when ties exist. Censored observations are starred. Note: KM estimate does not change until next failure/event time, so it’s not written. 0.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000* 1.0000* 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000* 2.0000* 2.0000* 2.0000* 2.0000* 3.0000 3.0000 3.0000 . . . Survival 1.0000 . . . . . . . . . . . . . . 0.8500 . . . . . . 0.7988 . . . . . . . . Failure 0 . . . . . . . . . . . . . . 0.1500 . . . . . . 0.2012 . . . . . . . . Survival Standard Error 0 . . . . . . . . . . . . . . 0.0357 . . . . . . 0.0402 . . . . . . . . Number Failed Number Left 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 15 16 17 18 19 20 20 20 20 20 20 21 22 23 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 Size of the risk set for each time point. 1-Survival = the estimated probability of death prior to the specified time. (Pointwise) standard error of KM estimate, calculated with Greenwood’s formula. Cumulative # of failures. NOTE: The marked survival times are censored observations. Summary Statistics for Time Variable Time Quartile Estimates Smallest event time such that the probability of dying earlier is .75 Percent Point Estimate 75 50 25 15.0000 7.0000 3.0000 95% Confidence Interval [Lower Upper) 10.0000 5.0000 2.0000 34.0000 9.0000 4.0000 Estimated median death time and 95% confidence interval. Estimated mean survival time. Note: Median is usually preferred measure of central tendency for survival data. SAS LAB TWO, April 27, 2004 Mean Standard Error 14.5912 1.9598 NOTE: The mean survival time and its standard error were underestimated because the largest observation was censored and the estimation was restricted to the largest event time. Summary of the Number of Censored and Uncensored Values Total 100 Failed 80 Censored 20 Percent Censored 20.00 Remark: the confidence interval for 75 percentile is not symmetric, method used worth looking into. SAS LAB TWO, April 27, 2004 “s” asks for survival plot; see reference page for full list of plotting options 7. Plot the Kaplan-Meier curve, as in figure 2.2, p.34: Or try “none” here, to eliminate censoring marks. If you don’t specify, it will give you annoying circles as the default. /*Plot KM curve*/ goptions reset=all; proc lifetest data= hmohiv plots=(s) graphics censoredsymbol=X; time time*censor(0); title 'Figure 2.2, p. 34'; symbol v=none ; Requests high resolution graphics run; Tell sas to omit symbol for each event. You may also specify this above with option: “eventsymbol=none” 8. To get pointwise confidence intervals for the survival curve, use the outsurv option: /*get confidence limits*/ proc lifetest data= hmohiv outsurv=outdata; time time*censor(0); title 'outputs pointwise confidence limits'; run; 9. Open new outdata set using point-and-click to view new variables. 10. Plot survival curve with point-wise confidence intervals: /*plot confidence limits*/ goptions reset=all; axis1 label=(angle=90); proc gplot data= outdata ; title 'Figure 2.5, p.46'; label survival='Survival Probability'; label time='Survival Time (Months)'; plot survival*time SDF_UCL*time SDF_LCL*time /overlay vaxis=axis1; symbol1 v=none i=join c=black line=1; symbol2 v=none i=join c=black line=2; symbol3 v=none i=join c=black line=2; run; quit; Asks for lines that differ in line type (eg, dashed, solid) rather than color (which is SAS default). SAS LAB TWO, April 27, 2004 Note: In SAS 8, there is no easy way (other than programming a macro) to obtain the simultaneous 95% confidence bands (Hall and Wellner) for the survivor function or to calculate confidence intervals based on transformations of the survivor function, such as log-log, but SAS 9 has these features: Useful features in SAS 9 that are not available in SAS 8: The new SURVIVAL statement enables you to create confidence bands (also known as simultaneous confidence intervals) for the survivor function S(t) and to specify a transformation for computing the confidence bands and the pointwise confidence intervals. It contains the following options: the OUT= option names the output SAS data set that contains survival estimates as in the OUTSURV= option in the PROC LIFETEST statement. the CONFTYPE= option specifies the transformation applied to S(t) to obtain the pointwise confidence intervals and the confidence bands. Four transforms are available: the arcsine-square root transform, the complementary log-log transform, the logarithmic transform, and the logit transform. CONFBAND= option specifies the confidence bands to add to the OUT= data set. You can choose the equal precision confidence bands (Nair, 1984), or the Hall-Wellner bands (Hall and Wellner, 1980), or both. The BANDMAX= option specifies the maximum time for the confidence bands. The BANDMIN= option specifies the minimum time for the confidence bands. The STDERR option adds the column of standard error of the estimated survivor function to the OUT= data set. The ALPHA= option sets the confidence level for pointwise confidence intervals as well as the confidence bands. 11. We could also write a “macro” (like a function) to give us a plot of the survivor function with confidence limits. If we were going to be plotting many survival curves, this would save time. Variables that will be entered into the function; here: dataset, time variable, censoring variable. They will be called with &variable. below. %macro cl(data, time, censor); goptions reset=all; axis1 label=(angle=90); proc lifetest data=&data. outsurv=outdata; time &time.*&censor.(0); run; proc gplot data=outdata ; title 'Plot of survivor function with pointwise confidence intervals'; label survival='Survival Probability'; label &time.='Survival Time'; plot survival*&time. SDF_UCL*&time. SDF_LCL*&time. /overlay vaxis=axis1; symbol1 v=none i=join c=black line=1; symbol2 v=none i=join c=black line=2; symbol3 v=none i=join c=black line=2; run; quit; %mend cl; %cl(hmohiv, time, censor); Invoke macro SAS LAB TWO, April 27, 2004 11. Compare drug groups. /**Figure 2.7 , p. 58**/ proc lifetest data= hmohiv plots=(s) graphics censoredsymbol=none; time time*censor(0); title 'Figure 2.7, p. 58'; strata drug; Requests comparison by drug group. symbol1 v=none color=black line=1; symbol1 v=none color=black line=2; run; Explanation of output from Lifetest procedure: Tests of Null hypothesis: S1(t)=S2(t) Using: -log-rank test -Wilcoxon test -Likelihood ratio test (assumes event times have an exponential distribution) Test of Equality over Strata Test Chi-Square Log-Rank Wilcoxon -2Log(LR) 11.8556 10.9104 20.9264 Pr > Chi-Square DF 1 1 1 0.0006 0.0010 <.0001 If median is a better measure of the center, why not test the equality of medians? The method is only recently available, by empirical likelihood. SAS LAB TWO, April 27, 2004 12. Plot the Kaplan-Meier survival curve for the hmohiv data by age group (as in figure 2.8, p.69) by changing strata statement (and title) as below: /*by age group*/ proc lifetest data= hmohiv plots=(s) graphics censoredsymbol=none; time time*censor(0); title 'Figure 2.8, p. 69'; strata age(30 35 40 45); *look at survival by age groups; run; Asks SAS to divide into age groups: [- ,30) [30,35) [3540) [40-45) {45, ) ^ 13. Change plot from “s” (survival) to “ls” (log-survival) plot, which plots = –log S(t) versus t. proc lifetest data= hmohiv plots=(ls) graphics censoredsymbol=none; time time*censor(0); title '-log survival plot’; strata drug; run; Equivalent to the cumulative hazard function: t log S (ˆt ) h(u )du 0 The plot tells us how the hazard changes with time: For example, if the hazard is constant (no change over time), should be a straight line with origin at 0. 14. Change plot from “s” (survival) to “lls” (log-survival) plot, which plots = log(–log S(t)) versus logt. proc lifetest data= hmohiv plots=(lls) graphics; time time*censor(0); title ‘log log survival plot’; strata drug; run; Asks for plot of log(-logS(t)) vs. log(time): t log[ log S ˆ(t )] log h(u )du 0 If the survival times follow a Weibull distribution with log h(t ) log t , then log-log survival plot should be a straight line with slope β