SAS Example A10 data sales; infile “U:\Documents\...\sales.txt”; input Region : $8. State $2. +1 Month monyy5. Headcnt Revenue Expenses; format Month monyy5. Revenue dollar12.2; run; proc sort; by Region State Month; run; proc print; run; Mervyn Marasinghe proc print label; by Region State ; format Expenses dollar10.2 ; label State= State Month= Month Revenue=“Sales Revenue” Expenses=“Overhead Expenses”; id Region State; sum Revenue Expenses; sumby Region; title “ Sales report by State and Region”; run; 2 OutputDeliverySystem(ODS) Sample Data Set sales.txt SOUTHERN FL JAN78 10 10000 8000 SOUTHERN FL FEB78 10 11000 8500 SOUTHERN FL MAR78 9 13500 9800 SOUTHERN GA JAN78 5 8000 2000 SOUTHERN GA FEB78 7 6000 1200 PLAINS NM MAR78 2 500 1350 NORTHERN MA MAR78 3 1000 1500 NORTHERN NY FEB78 4 2000 4000 NORTHERN NY MAR78 5 5000 6000 EASTERN NC JAN78 12 20000 9000 EASTERN NC FEB78 12 21000 8990 EASTERN NC MAR78 12 20500 9750 EASTERN VA JAN78 10 15000 7500 EASTERN VA FEB78 10 15500 7800 EASTERN VA MAR78 11 16600 8200 CENTRAL OH JAN78 13 21000 12000 CENTRAL OH FEB78 14 22000 13000 CENTRAL OH MAR78 14 22500 13200 CENTRAL MI JAN78 10 10000 8000 CENTRAL MI FEB78 9 11000 8200 CENTRAL MI MAR78 10 12000 8900 CENTRAL IL JAN78 4 6000 2000 CENTRAL IL FEB78 4 6100 2000 CENTRAL IL MAR78 4 6050 2100 ODS allows flexibility in presenting output from SAS procedures ODS can arrange output in more presentable ways It also can create output in a variety of formats (called destinations) Examples of currently available ODS destinations: LISTING: produces traditional monospace output PS or PDF: output that is formatted for a high- resolution printer such as PostScript or PDF HTML: output that is formatted in various markup languages such as HTML RTF: output that is formatted for use with Microsoft Word Traditionally, results of a SAS procedure were displayed in a the output window in the LISTING format. Use the Results tab in the Preferences window to change HTML to LISTING format 3 SAS ODS Example 1 Current Default Settings in SAS 9.3 The default destination in the SAS windowing environment is HTML (changed from traditional LISTING destination) data oranges; The default style for the HTML destination is HTMLBlue datalines; input Variety $ Flavor texture Looks; Total=Flavor+Texture+Looks; navel 9 8 6 This new style is an all-color style that is designed to integrate tables and modern statistical graphics temple 7 7 7 valencia 8 9 9 In the past, SAS users used SAS/Graph to produce high quality graphics. We’ll call these traditional SAS graphics. Template-based graphics produced by ODS Graphics (different from traditional SAS graphics) is now used for producing graphs is most SAS procs Previously, in the SAS Windowing environment, ODS GRAPHICS ON statement was used to enable (and ODS GRAPHICS OFF statement to disable) ODS Graphics mandarin 5 7 8 ; proc sort data=oranges; by descending Total; run; ods rtf file=“U:\Documents\...\oranges.rtf"; proc print data=oranges; title 'Taste Test Results for Oranges'; run; ods rtf close; 6 PROC MEANS (continued) PROC MEANS Statistics: PROC MEANS options ; N, NMISS, MEAN, STD, MIN, MAX, RANGE, SUM, VAR, USS, CSS, CV, STDERR, T, PRT, SUMWGT CLASS variables ; VAR variables ; TYPES request(s); WAYS list; BY variables ; FREQ variable ; WEIGHT variable ; ID variables ; OUTPUT OUT=SAS_dataset statistics ; Example of OUTPUT statement ; output out=stats mean=Av_Ht Av_Wt stderr=SE_Ht SE_Wt; Options: DATA= , PRINT, MAXDEC= , FW= , MISSING, NWAY, IDMIN, DESCENDING, ORDER= , VARDEF= , statistics MAXDEC = n no. of decimal places (2) FW = n field width (12) For printing computed statistics ORDER=FREQ, DATA, INTERNAL, FORMATTED VARDEF=DF, N, WGT, WDF 7 8 PROC UNIVARIATE SAS Example B5 data biology; input Id Sex $ Age Year Height Weight; datalines; 7389 M 24 4 69.2 132.5 3945 F 19 2 58.5 112.0 4721 F 20 2 65.3 98.6 …………………………….. …………………………….. 6327 M 20 1 70.2 135.4 8472 F 20 4 66.8 142.6 4875 M 20 1 74.2 160.4 ; PROC UNIVARIATE options ; proc means data=biology fw=8 maxdec=3; class Year Sex; var Height Weight; output out=stats mean=Av_Ht Av_Wt stderr=SE_Ht SE_Wt; run; Some Options: DATA = , NOPRINT, PLOT, FREQ, NORMAL, ALPHA=, CIBASIC (TYPE= ALPHA=), MU0= , TRIM= (TYPE= ALPHA=) , PCTLDEF=, VARDEF=, VAR variables ; CLASS variable-1 <(v-options)> < variable-2 <(v- options)> > ...< / KEYLEVEL= value1 | ( value1 value2 ) >; BY variables ; ID variable ; OUTPUT < OUT=SAS-data-set > < keyword_1=names...keyword_k=names > < percentile-options >; PCTLDEF=1,2,3,4 OR 5 (5 methods of computing percentiles) VARDEF=DF, N, WEIGHT, WDF (divisor for computing variance) proc print data=stats; title “Biology Class Data Set: Output Statement”; run; NORMAL computes the Shapiro-Wilk statistic W if n 2000 or the Kolmogorov-Smirnov statistic D if n > 2000 10 PROC UNIVARIATE (continued) SAS ODS Example 2 TYPE= LOWER, UPPER, TWOSIDED data biology; input Id Sex $ Age Year Height Weight; datalines; 7389 M 24 4 69.2 132.5 3945 F 19 2 58.5 112.0 4721 F 20 2 65.3 98.6 …………………………….. …………………………….. 6327 M 20 1 70.2 135.4 8472 F 20 4 66.8 142.6 4875 M 20 1 74.2 160.4 ; (specify type of confidence intervals) TRIM= list of integers or fractions specifying amount of trimming ALPHA=.05 (for both CIBASIC and TRIM specifications) Class Options: The v-options are MISSING or ORDER= ORDER = FREQ, DATA, INTERNAL, FORMATTED specifies how levels of the variable are ordered in the output Some Keywords for use with OUTPUT statement: ods rtf file=“U:\Documents\...\progB2_out.rtf"; ods select BasicMeasures Quantiles TestsForNormality; N, NMISS, NOBS, MEAN, SUM, SD, VAR, SKEWNESS, KURTOSIS, SUMWGT, MAX, MIN, RANGE, Q3, MEDIAN, Q1, ORANGE, P1, P5, P10, P90, P95, P99, MODE, SIGNRANK, NORMAL, PCTLNAME= , PCTLPTS= , PCTLPRE= proc univariate data=biology Normal; var Height; title “Biology class: Analysis of Height Distribution”; run; Example: proc univariate data=survey; class county; var acreage rainfall; output out=new mean=ave1 ave2 var= var1 var2; ods rtf close; 11 PROC TABULATE SAS Example C12 (simplified) data rainfall; PROC TABULATE options ; input Control Seeded @@; Logseed = log10(seeded); label Logseed="Log of Seeded Rainfall"; datalines; 1202.6 2745.6 830.1 1697.8 372.4 1656.0 345.5 978.0 321.2 703.4 244.3 489.1 163.0 430.0 147.8 334.1 95.0 302.8 87.0 274.7 81.2 274.7 68.5 255.0 47.3 242.5 41.1 200.7 36.6 198.6 29.0 129.6 28.6 119.0 26.3 118.3 26.1 115.3 24.4 92.4 21.7 40.6 17.3 32.7 11.5 31.4 4.9 17.5 4.9 7.7 1.0 4.1 CLASS variable ; VAR variable ; FREQ variable ; WEIGHT variable ; BY variable ; TABLE expression, expression, … / options ; KEYLABEL keyword=‘text’ … ; Options: DATA= , MISSING, FORMAT= , ORDER= , FORMCHAR( ) = ‘ ’, NOSEPS, DEPTH= ; proc univariate data=rainfall; var Logseed; probplot Logseed/normal(mu=est sigma=est); FORMAT = any valid SAS or user defined format (BEST12.2) run; ORDER = FREQ, DATA, INTERNAL,FORMATTED FORMCHAR=‘|- - - -| + | - - -’ DEPTH = 10 14 PROC TABULATE (continued) PROC TABULATE (continued) TABLE statement Examples of TABLE statements. TABLE expression 1, expression 2, expression 3 /options; class variable defines defines pages rows defines analysis variables TABLE PRODUCT, REGION, SALETYPE*(QUANTITY AMOUNT); columns ‘Expression’ consists of: page expression Classification variables (CLASS variables) Analysis variables (variables in VAR) Statistics (N, MEAN, STD, MIN, MAX, etc.) Format specifications Other expressions row expression column expression In the examples below, the required table format is specified with a TABLE statement and the output produced by different TABLE statements are sketched. Example Expressions: Assume we have the variables A,B,C,D,X and Y defined in SAS statements as follows: CLASS A B C D ; VAR X Y ; The TABLE statement used appears on top of the table produced. A*B A*B*C A*MEAN*X A B A B*C (A B)*C 15 16 PROC TABULATE Examples Examples of TABLE statement Example 1: TABLE REGION,POP ; First we will consider only three variables: REGION, CITYSIZE, and POP where REGION and CITYSIZE are CLASS variables and POP is an analysis variable (numerical, continuous-valued). Each observation in the data contains data for a single city for many cities in several regions. Variable Description REGION code for region of the country CITYSIZE code for relative population size (S=small, M=medium, L=large) POP urban population POP SUM REGION NC 4650000.00 NE 6666000.00 SO 6864000.00 WE 8376000.00 Example 2: TABLE REGION,CITYSIZE*POP *SUM ; CITYSIZE Most applications like this need both CLASS and VAR statements in the PROC TABULATE step in addition to the TABLE statement: L M S POP POP POP SUM SUM SUM PROC TABULATE; REGION TITLE “ Population by Region”; NC 3750000.00 750000.00 15000.00 CLASS REGION ; NE 5022000.00 1422000.00 222000.00 VAR POP ; SO 4488000.00 2088000.00 288000.00 WE 5592000.00 2592000.00 192000.00 17 18 Example 4: TABLE REGION*CITYSIZE*(SUM MEAN),POP ; Example 3: TABLE REGION*CITYSIZE,POP*SUM ; POP REGION CITYSIZE NC L SUM M SUM 750000.00 MEAN 125000.00 SUM 150000.00 MEAN SUM REGION CITYSIZE NC L 3750000.00 M 750000.00 S 150000.00 L 5022000.00 M 1422000.00 S 222000.00 L 4488000.00 M 2088000.00 NE SO WE S 288000.00 L 5592000.00 M 2592000.00 S 192000.00 POP S MEAN NE L SUM M SUM MEAN S 25000.00 5022000.00 837000.00 1422000.00 237000.00 SUM 222000.00 L SUM M SUM MEAN S 625000.00 MEAN MEAN SO 3750000.00 37000.00 4488000.00 748000.00 2088000.00 MEAN 348000.00 SUM 288000.00 MEAN 48000.00 WE 19 20 Next, add four additional variables to the example: Example 5: TABLE PRODUCT, REGION CITYSIZE,SALETYPE*(QUANTITY AMOUNT); PRODUCT A100 PRODUCT and SALETYPE are class variables: SALETYPE R QUANTITY and AMOUNT are analysis variables. W QUANTITY AMOUNT QUANTITY AMOUNT SUM SUM SUM SUM Modified proc step: REGION NC 1250.00 31250.00 1250.00 25000.00 PROC TABULATE; NE 1600.00 40000.00 1600.00 32000.00 SO 1880.00 47000.00 1880.00 37600.00 WE 1840.00 46000.00 1840.00 36800.00 63800.00 TITLE “ Sales by Region and City Size”; CLASS REGION PRODUCT SALETYPE ; CITYSIZE L 3190.00 79750.00 3190.00 VAR POP QUANTITY AMOUNT; M 2440.00 61000.00 2440.00 48800.00 S 940.00 23500.00 940.00 18800.00 The report produced by the following TABLE statement contains a separate table for each PRODUCT and provides an analysis of wholesale and retail sales by REGION and by CITYSIZE. PRODUCT A200 SALETYPE R W QUANTITY AMOUNT QUANTITY AMOUNT SUM SUM SUM SUM REGION NC 21 1295.00 32375.00 1295.00 25900.00 22