Lesson 8 - Topics • • • • Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs 14-15 in course notes • LSB 4:11;5:3 Making SAS Datasets From Procedures Output from SAS PROCs can be put into SAS datasets: 1. To do further processing of the information from the output 2. To reformat output to make a report 3. To restructure original SAS dataset or create new variables Ways to Put Output into SAS Datasets • Using OUTPUT statement available from many procedures • Using ODS OUTPUT statement – any output table can be put into a SAS dataset Report We Want to Generate Quartiles of Weight by Gender and Center sex Male Male Male Male Female Female Female Female clinic N P25 P50 A B C D A B C D 9 16 29 11 6 9 6 6 180.0 158.3 178.0 172.0 125.0 150.0 132.5 131.0 190.0 174.8 199.5 184.5 143.5 164.5 134.3 137.5 P75 208.0 218.3 220.5 194.0 160.5 184.0 138.5 148.5 Program 14 LIBNAME class ‘C:\SAS_Files'; * Will use SAS dataset version of TOMHS data; DATA wt; SET class.tomhsp (KEEP=ptid age sex clinic wtbl wt12 ); wtchg = wt12 - wtbl; RUN; PROC FORMAT; VALUE sexF 1 = ‘Male’ 2=‘Female’; RUN; Create report by sex and clinic of univariate info; PROC SORT DATA = wt; BY sex clinic; PROC UNIVARIATE DATA = wt NOPRINT; BY sex clinic; VAR wt12 ; OUTPUT OUT=univinfo Name of new dataset N = n Q1 = p25 Statistic name = variable name MEDIAN = p50 Q3 = P75 ; Dataset univinfo will have one observation for each combination of sex and clinic. PROC PRINT DATA = univinfo; FORMAT sex sexF.; RUN; Obs 1 2 3 4 5 6 7 8 sex Male Male Male Male Female Female Female Female clinic n p75 p50 p25 A B C D A B C D 9 16 29 11 6 9 6 6 208.00 218.25 220.50 194.00 160.50 184.00 138.50 148.50 190.00 174.75 199.50 184.50 143.50 164.50 134.25 137.50 180.00 158.25 178.00 172.00 125.00 150.00 132.50 131.00 PROC PRINT DATA = univinfo NOOBS; VAR sex clinic n p25 p50 p75; FORMAT p25 p50 p75 6.1 ; TITLE 'Quartiles of Weight by Gender/Center'; RUN; Quartiles of Weight by Gender/Center sex Male Male Male Male Female Female Female Female clinic N P25 P50 A B C D A B C D 9 16 29 11 6 9 6 6 180.0 158.3 178.0 172.0 125.0 150.0 132.5 131.0 190.0 174.8 199.5 184.5 143.5 164.5 134.3 137.5 P75 208.0 218.3 220.5 194.0 160.5 184.0 138.5 148.5 Using ODS to Send Output to a SAS Dataset Syntax: ODS OUTPUT output-table = new-data-set; * Output quantile table to a dataset; ODS OUTPUT quantiles = qwt; PROC UNIVARIATE DATA = wt ; VAR wtbl wt12 ; RUN; ODS OUTPUT CLOSE ; PROC PRINT DATA=qwt; RUN; Display of Output Dataset Obs Varname Quantile Estimate 1 2 3 4 5 6 7 8 9 10 11 wtbl wtbl wtbl wtbl wtbl wtbl wtbl wtbl wtbl wtbl wtbl 100% Max 99% 95% 90% 75% Q3 50% Median 25% Q1 10% 5% 1% 0% Min 279.30 274.15 246.40 237.40 215.15 192.65 165.90 141.50 137.40 130.25 128.50 12 13 14 15 16 17 18 19 20 21 22 wt12 wt12 wt12 wt12 wt12 wt12 wt12 wt12 wt12 wt12 wt12 100% Max 99% 95% 90% 75% Q3 50% Median 25% Q1 10% 5% 1% 0% Min 271.50 271.50 239.00 227.00 202.50 180.00 153.50 133.00 130.00 123.00 123.00 Would like to put side-by-side DATA wtbl wt12 ; SET qwt; if varname = 'wtbl' if varname = 'wt12' RUN; then output wtbl; else then output wt12; PROC DATASETS ; MODIFY wtbl; RENAME estimate = wtbl; MODIFY wt12; RENAME estimate = wt12; RUN; DATA all; MERGE wtbl wt12; DROP varname; RUN; PROC PRINT; Separate the data into 2 datasets PROC DATASETS used for changing variable names Put 2 datasets side-by-side Obs Quantile wtbl wt12 1 100% Max 279.30 271.50 2 99% 274.15 271.50 3 95% 246.40 239.00 4 90% 237.40 227.00 5 75% Q3 215.15 202.50 6 50% Median 192.65 180.00 7 25% Q1 165.90 153.50 8 10% 141.50 133.00 9 5% 137.40 130.00 10 1% 130.25 123.00 11 0% Min 128.50 123.00 ODS OUTPUT ParameterEstimates (persist=proc) = betas; PROC REG DATA=WT; MODEL dbpchg = wtchg age sex; RUN; PROC REG data=wt; MODEL sbpchg = wtchg age sex; RUN; ODS OUTPUT CLOSE; PROC PRINT DATA=betas; RUN; Display of Output Dataset - Report Obs Dependent Variable 1 2 3 4 dbpchg dbpchg dbpchg dbpchg Intercept wtchg age sex 5 6 7 8 sbpchg sbpchg sbpchg sbpchg Intercept wtchg age sex Estimate StdErr tValue Probt -0.059 0.175 -0.101 -2.622 6.431 0.073 0.112 1.572 -0.01 2.38 -0.91 -1.67 0.99 0.02 0.37 0.10 -3.849 0.364 -0.042 -4.118 13.304 0.152 0.231 3.253 -0.29 2.40 -0.18 -1.27 0.77 0.02 0.86 0.21 Display of Output Dataset Using BY Statement PROC PRINT; VAR variable estimate stderr tvalue probt; BY dependent NOTSORTED; FORMAT estimate 7.3 stderr 7.3 probt pvalue5.2 ; Dependent=dbpchg Obs 1 2 3 4 Variable Intercept wtchg age sex Estimate StdErr tValue Probt -0.059 0.175 -0.101 -2.622 6.431 0.073 0.112 1.572 -0.01 2.38 -0.91 -1.67 0.99 0.02 0.37 0.10 Dependent=sbpchg Obs Variable Estimate StdErr tValue Probt 5 6 7 8 Intercept wtchg age sex -3.849 0.364 -0.042 -4.118 13.304 0.152 0.231 3.253 -0.29 2.40 -0.18 -1.27 0.77 0.02 0.86 0.21 PROC RANK • Used to divide observations into equal size categories based on values of a variable • Creates a new variable containing the categories • New variable is added to the dataset or to a new dataset • Example: Divide weight change into 5 equal categories (Quinitiles) PROC RANK SYNTAX PROC RANK DATA = dataset OUT = outdataset GROUPS = # of categories VAR varname; RANKS newvarname; Most of the time you can set OUT to be the same dataset specified in DATA. PROC RANK writes no output PROGRAM 15 LIBNAME class ‘C:\SAS_Files'; DATA wtchol; SET class.tomhsp (KEEP=ptid clinic sex wtbl wt12 cholbl chol12); wtchg = wt12 - wtbl; cholchg = chol12 - cholbl; RUN; *This PROC will add a new variable to dataset which is the tertile of weight change. The new variable will be 0,1,or 2; PROC RANK DATA = wtchol GROUPS=3 OUT = wtchol; VAR wtchg; RANKS twtchg; Name of new variable PARTIAL LOG 8 9 10 11 12 DATA wtchol; SET class.tomhsp (KEEP=ptid clinic sex wtbl wt12 cholbl chol12); wtchg = wt12 - wtbl; cholchg = chol12 - cholbl; RUN; NOTE: There were 100 observations read from the data set CLASS.TOMHSP. NOTE: The data set WORK.WTCHOL has 100 observations and 9 variables. PROC RANK DATA = wtchol GROUPS=3 OUT = wtchol; 20 VAR wtchg; RANKS twtchg; 21 RUN; NOTE: The data set WORK.WTCHOL has 100 observations and 10 variables. PROC FREQ DATA = wtchol; TABLES twtchg; RUN; OUTPUT: Rank for Variable wtchg Cumulative Cumulative twtchg Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 31 33.70 31 33.70 1 30 32.61 61 66.30 2 31 33.70 92 100.00 Frequency Missing = 8 PROC PRINT DATA = wtchol (obs=20); VAR ptid wtchg twtchg; TITLE 'Partial Listing of Datset wtchol with new variable added'; RUN; Partial Listing of Datset wtchol with new variable added Obs 1 2 3 4 5 6 7 8 9 10 PTID A00083 A00301 A00312 A00354 A00400 A00504 A00608 A00720 A00762 A00811 wtchg -12.00 . -9.50 -21.00 . -9.25 . -18.50 -5.25 -6.75 twtchg 1 . 1 0 . 1 . 0 2 1 PROC MEANS N MEAN MIN MAX MAXDEC=2; VAR cholchg wtchg; CLASS twtchg; TITLE 'Mean Cholesterol Change by Tertile of Weight Change'; RUN; Mean Cholesterol Change by Tertile of Weight Change The MEANS Procedure Cutpoints for tertiles Rank for Variable N wtchg Obs Variable N Mean Minimum Maximum -------------------------------------------------------------------------0 31 cholchg 30 -13.43 -55.00 47.00 wtchg 31 -22.51 -36.50 -14.30 1 30 2 31 cholchg wtchg 30 30 -4.70 -10.21 -37.00 -14.00 26.00 -6.80 cholchg 31 -0.74 -52.00 45.00 wtchg 31 -1.82 -6.50 13.00 -------------------------------------------------------------------------- Could graph this data in an x-y plot (3 points)