SAS LAB ONE, April 20, 2004 Lab Objectives After today’s lab you should be able to: 1. 2. 3. 4. 5. 6. 7. 8. Import data from excel into SAS via point-and-click features as well as PROC IMPORT. Manipulate and format date and time variables in SAS. Put data into the correct structure for survival analysis: create time and censor variables. Quickly examine univariate distributions and identify outliers via point-and-click features. Produce enhanced graphs using PROC GPLOT. a. Use the TITLE, FOOTNOTE, SYMBOL, LABEL, AXIS, LEGEND statements (which are global statements). b. Use different symbols for different values of a classification variable (such as censored/failed). c. Overlay two plots. d. Directly edit and enhance graphs using point and click features. e. Export graphs as image files. f. Add an exponential regression line to a plot (using PROC LIFEREG). g. Know where to go for help on SAS/GRAPH: http://v8doc.sas.com/sashtml/ Produce the plots in Chapter 1 of your textbook (Applied Survival Analysis, by Hosmer and Lemeshow) (see next page). Produce a simple Kaplan-Meier curve (we will continue this in lab next week). If time, complete the computer-based exercises at the end of chapter 1 of your textbook. SAS LAB ONE, April 20, 2004 We will produce the following plots today: SAS LAB ONE, April 20, 2004 SAS LAB ONE, April 20, 2004 SAS LAB ONE, April 20, 2004 LAB EXERCISE STEPS: Follow along with the computer in front… 1. Save (to desktop) the two excel datasets “hmohiv.xls” and “whas.xls” from the hrp262 website: http://www-stat.stanford.edu/~jtaylo//courses/stats262/spring.2004/index.html 2. Open SAS: From the desktop double-click “Applications” double-click SAS icon 3. Use point-and-click features to obtain file extension for the library (where your datasets sit). Copy. 4. Create library and import data via the following code (note: comments—ignored by SAS—appear in brackets /* */): Importing Data /*Create library hrp262*/ libname hrp262 ‘PASTE YOUR FILE EXTENSION HERE’; /**Import data**/ proc import out = work.hmohiv datafile = "M:\HRP 262\SAS datasets\hmohiv.xls" dbms = Excel2000 replace; run; 5. Import data via point-and-click. 6. Fix datetime variables, enddate and startdate, via the following code: Dealing with date-time variables /**Dates are automatically imported from excel as datetime variables. We want to remove the time part**/ /*Values of data variable represent # of days before or after Jan. 1, 1960**/ data hrp262.hmohiv; set hmohiv; format enddate date.; format startdate date.; enddate=datepart(enddate); startdate=datepart(startdate); Time=12*(enddate-startdate)/365.25; *gives time in months; Time=round(time); *to match Time variable in textbook; run; Reference: alternate date formats: date. 20April04 date9. 20April2004 day. 20 dowName. Tuesday dowName3. Tue monName. April monName3. Apr month. 4 year2. 04 mmddyy6. 042004 mmddyy8. 04/20/04 mmddyy10. 04/20/2004 weekdate. Tuesday, April 20, 2004 worddate. April 20, 2004 year. 2004 SAS LAB ONE, April 20, 2004 7. Use point-and-click features to check univariate distributions and plot age vs. time. 8. Recreate plot Figure 1.1 Hosmer and Lemeshow’s Applied Survival Analysis, p. 6. We’ll start with the simplest version and add features as we go along. Use the following sets of code: /** Figure 1.1, Hosmer and Lemeshow's Applied Survival Analysis, p. 6, version 1: simple.**/ /**Note specification of vertical and horizontal axes scales and use of title statement**/ goptions reset=all; *resets graphing options; proc gplot data=hrp262.hmohiv; title1 'Figure 1.1, page 6: version 1'; plot time*age / vaxis = 0 10 20 30 40 50 60 haxis = 15 20 25 30 35 40 45 50 55 ; run; quit; /** Figure 1.1, Hosmer and Lemeshow's Applied Survival Analysis, p. 6, version 2: change symbol color, shape, and size; reduce minor tick marks to match textbook figure**/ symbol1 value=circle color=red w=2 h=2; proc gplot data=hrp262.hmohiv; title1 'Figure 1.1, page 6: version 2'; plot time*age / vaxis = 0 10 20 30 40 50 60 vminor=1 haxis = 15 20 25 30 35 40 45 50 55 hminor=1; run; quit; NOTE: Titles stay in effect until they are replaced by new ones or removed by entering a blank title: title1 ' ' ; /** Figure 1.1, Hosmer and Lemeshow's Applied Survival Analysis, p. 6, finished version: Make axis labels match figure in textbook; divide symbols by classification variable, censor; add a legend**/ goptions reset=all; axis1 order= (0 to 60 by 10) label=(height= 4pct font='Times New Roman' angle=90); axis2 order= (15 to 55 by 10) label=(height= 4pct font='Times New Roman'); symbol1 v=circle c=blue h=1 w=1; symbol2 value=X color=red h=1 w=1; legend1 label=('Censor') shape=symbol(1,2) position=(bottom center outside); proc gplot data=hrp262.hmohiv; title1 'Figure 1.1, page 6'; label time='Survival Time (Months)'; label Age='Age'; plot time*age=censor / vaxis = axis1 haxis=axis2 vminor=1 hminor=1 legend=legend1; run; quit; SAS LAB ONE, April 20, 2004 NOTE: label statements assigned to a variable within a PROC only are valid for duration of that PROC. 9. Recreate plot Figure 1.2 Hosmer and Lemeshow’s Applied Survival Analysis, p. 6 via the SAS code: /** Figure 1.2 in H&L, p.7**/ /*create temporary dataset new with inverse age variable*/ data new; set hrp262.hmohiv; ageinv=1000/age; run; proc gplot data=new; title1 'Figure 1.2, page 7'; label time='Survival Time (Months)'; label Ageinv='1000/Age'; plot time*ageinv=censor / vaxis = axis1 haxis=axis2 vminor=1 hminor=1; run; quit; 10. Recreate plot Figure 1.4 Hosmer and Lemeshow’s Applied Survival Analysis, p. 19 via the following SAS code: /**Generate graph Figure 1.4, p.19. Note the use of the overlay function to graph two X values for each individual**/ goptions reset=all; axis1 order= (0 to 4 by 1) label=(height= 4pct font='Times New Roman' angle=90) minor=none; axis2 label=(height= 4pct font='Times New Roman') minor=none; title1 'Figure 1.4, p. 19'; symbol1 v=circle c=blue h=1 w=1 ; symbol2 value=X color=red h=1 w=1 ; proc gplot data=hrp262.hmohiv; title1 'Figure 1.4, page 19'; label ID='Subject'; label StartDate='Calendar Time'; plot ID*StartDate ID*EndDate / overlay vaxis=axis1 haxis=axis2; where id le 4; *we only want 4 individuals; run; quit; 11. Right click on the graph. Select edit graph feature. Add horizontal connecting lines to complete the graph. 12. Exit Edit screen and save changes to graph. Use right click again to save graph as a .jpg or .gif file. Open a new Word Document and insert graph as a picture into Word. SAS LAB ONE, April 20, 2004 13. Recreate Figure 1.5, p. 20 via the following SAS Code: axis2 order=(0 to 12 by 3) label=(height= 4pct font='Times New Roman'); symbol1 v=circle c=blue h=1 w=1 i=; symbol2 value=X color=red h=1 w=1 ; proc gplot data=hrp262.hmohiv; title1 'Figure 1.5, page 20'; label ID='Subject'; label StartDate='Time in Months'; plot id*Time/ haxis=axis2 vaxis=axis1 where id le 4; run; quit; ; 14. Right click on the graph. Select edit graph feature. Add horizontal connecting lines to complete the graph. 15. Exit Edit screen and save changes to graph. Use right click again to save graph as a .jpg or .gif file. Add graph to your current open Word file. 16. Fit an exponential regression model as in Table 1.2, p. 14 of Hosmer and Lemeshow, using PROC LIFEREG: /*Fits the exponential curve as on p. 14 of H&L*/ Proc Lifereg data=hrp262.hmohiv outest=a; title 'Table 1.2, p.14'; model time*censor(0)= age /dist=exponential; output out=OutData p=median; *gives predicted median survival times; run; 17. Plot predicted survival times as a function of age on your scatterplot of Time vs. Age: goptions reset=all ; axis1 order= (0 to 60 by 10) label=(height= 4pct font='Times New Roman' angle=90); axis2 order= (15 to 55 by 10) label=(height= 4pct font='Times New Roman'); axis3 order= (0 to 60 by 60) minor=none major=none label=(' ' angle=90); symbol1 v=circle c=blue h=1 w=1; symbol2 value=X color=red h=1 w=1; legend1 frame cframe=ligr cborder=black position=center; /*Sorting data by independent variable is critical here*/ proc sort data=OutData; by age; run; /*Fits the exponential curve as on p. 14 of H&L*/ proc gplot data=OutData; title1 'Figure 1.3, page 16'; label time='Survival Time (Months)'; SAS LAB ONE, April 20, 2004 label Age='Age'; plot time*age=censor / vaxis = axis1 haxis=axis2 vminor=1 hminor=1 legend=legend1; plot2 median*age / overlay vaxis=axis3; symbol3 v=none c=black w=2 i=join line=1; run; quit; 18. Remove the proc sort statement in the above SAS code and rerun. What happens? 19. Change symbol statements in the above SAS code and rerun. What happens? symbol1 v=star c=orange h=1 w=1; symbol2 value=& color=green h=2 w=2; symbol3 v=none c=black w=2 i=join line=5; 20. Plot the Kaplan-Meier survival curve for the hmohiv data: Figure 2.2, p. 34. /**Kaplan-Meier estimate of survivorship function**/ proc lifetest data=hrp262.hmohiv; time time*censor(0); title 'Kaplan-Meier Estimates for HMO HIV data'; run; /*Plot this curve*/ goptions reset=all; proc lifetest data=hrp262.hmohiv plots=(s) graphics censoredsymbol=none; time time*censor(0); title 'Figure 2.2, p. 34'; symbol v=none ; run; 21. Obtain a cross-tabulation of DRUG and the censoring variable CENSOR and compute the percent dead and percent censored in each drug group. /*Proc freq by drug**/ proc freq data=hrp262.hmohiv; tables drug*censor / chisq; run; 22. Run a logistic regression comparing survival (censor) in the drug vs. placebo groups. proc logistic data=hrp262.hmohiv descending; model censor=drug; run; 23. Plot the Kaplan-Meier survival curve for the hmohiv data by drug group. /**Figure 2.7 , p. 58**/ proc lifetest data=hrp262.hmohiv plots=(s) graphics censoredsymbol=none; time time*censor(0); title 'Figure 2.7, p. 58'; label s='survival probability'; strata drug; run; 24. If there’s time (or for optional homework), import the whas data into SAS and complete exercises 1(a), 1(d), and 1(e) using the coding strategies we learned today.