SAS Global Forum 2008 Applications Development Paper 079-2008 A Step-by-Step Introduction to PROC REPORT David Lewandowski, Thomson Healthcare, Evanston, IL ABSTRACT Have you read the description of PROC REPORT in the SAS® manual and been left scratching your head wondering where to start? Then, here’s a step by step introduction. It walks through the basics, one feature at a time, so you can see each one’s impact on the report. When you’re finished, you will already know how to create ninety percent of the reports you need. And more importantly, you will have a context so you can go back to the manual and learn the advanced features. INTRODUCTION The syntax of PROC REPORT is different from all the other procedures and many of us found the manual less than helpful in learning its unique statements. Most of us turned to a colleague and asked him to explain PROC REPORT. We got a brief introduction and a set of sample code that we’ve been modifying ever since. This paper will give you the same thing—a simple introduction and lots of sample code to enhance as needed. STEP 1: CREATE A SMALL DATASET FOR REPORTING To start off, let’s create a dataset to use for reporting—say monthly wine sales by zip code and county. Here’s the program and the resulting listing: data mnthly_sales; length zip $ 5 cty $ 8 var $ 10; input zip $ cty $ var $ sales; label zip="Zip Code" cty="County" var="Variety" sales="Monthly Sales"; datalines; 52423 Scott Merlot 186. 52423 Scott Chardonnay 156.61 52423 Scott Zinfandel 35.5 52423 Scott Merlot 55.3 52388 Scott Merlot 122.89 52388 Scott Chardonnay 78.22 52388 Scott Zinfandel 15.4 52200 Adams Merlot 385.51 52200 Adams Chardonnay 246 52200 Adams Zinfandel 151.1 52200 Adams Chardonnay 76.24 52199 Adams Merlot 233.03 52199 Adams Chardonnay 185.22 52199 Adams Zinfandel 95.84 ; Raw Data Obs ZIP CTY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 52423 52423 52423 52423 52388 52388 52388 52200 52200 52200 52200 52199 52199 52199 Scott Scott Scott Scott Scott Scott Scott Adams Adams Adams Adams Adams Adams Adams VAR Merlot Chardonnay Zinfandel Merlot Merlot Chardonnay Zinfandel Merlot Chardonnay Zinfandel Chardonnay Merlot Chardonnay Zinfandel SALES 186.00 156.61 35.50 55.30 122.89 78.22 15.40 385.51 246.00 151.10 76.24 233.03 185.22 95.84 proc print data=mnthly_sales; title "Raw Data"; run; SYNTAX Next, let me describe PROC REPORT’s syntax. The COLUMN statement is used to list each report column. Each column, in turn, has a DEFINE statement that describes how that column is created and formatted. You use the TITLE statement to specify the title at the top of each page. PROC REPORT DATA=datasetname <options>; TITLE title text; COLUMN variable list and column specifications; DEFINE column / define type and column attributes; DEFINE column / define type and column attributes; ... RUN; 1 SAS Global Forum 2008 Applications Development STEP 2: A SIMPLE REPORT Putting the data and a basic PROC REPORT together… proc report data=mnthly_sales nofs; title1 "Simple Report"; column cty zip var sales; define cty / display; define zip / display; define var / display; define sales / display; run; Simple Report County Scott Scott Scott Scott Scott Scott Scott Adams Adams Adams Adams Adams Adams Adams Zip Code 52423 52423 52423 52423 52388 52388 52388 52200 52200 52200 52200 52199 52199 52199 Variety Merlot Chardonnay Zinfandel Merlot Merlot Chardonnay Zinfandel Merlot Chardonnay Zinfandel Chardonnay Merlot Chardonnay Zinfandel Monthly Sales 186 156.61 35.5 55.3 122.89 78.22 15.4 385.51 246 151.1 76.24 233.03 185.22 95.84 If you compare “Simple Report” to “Raw Data” created in Step 1, you will notice a few differences. • Simple Report doesn’t have an OBS column • the variables are listed in their order in the column statement • the column headers are the labels not the variable names • the column headers are adjusted to the column width, not the other way around By the way, nofs is used to turn off the procedure’s interactive features. STEP 3: SOME FORMATTING OPTIONS On the define statement, you can specify the formatting options for that column. “format” applies the standard SAS formats to the column, “width” sets the column width, “flow” wraps the text within the width you specified, and “noprint” suppresses printing that column. You can replace the label as the column heading by specifying the new heading in quotes. The slash (i.e. “/”) is the line break symbol used to force the heading to wrap lines. The PROC REPORT option “headline” adds a line after the column headings and the “headskip” option adds the blank line. Let’s add formatting to Simple Report. proc report data=mnthly_sales nofs headline headskip; title1 "Simple Formatted Report"; column cty zip var sales; define cty / display width=6 ‘County/Name’; define zip / display; define var / display; define sales / display format=6.2 width=10; run; 2 Simple Formatted Report County Zip Monthly Name Code Variety Sales ------------------------------------Scott Scott Scott Scott Scott Scott Scott Adams Adams Adams Adams Adams Adams Adams 52423 52423 52423 52423 52388 52388 52388 52200 52200 52200 52200 52199 52199 52199 Merlot Chardonnay Zinfandel Merlot Merlot Chardonnay Zinfandel Merlot Chardonnay Zinfandel Chardonnay Merlot Chardonnay Zinfandel 186.00 156.61 35.50 55.30 122.89 78.22 15.40 385.51 246.00 151.10 76.24 233.03 185.22 95.84 SAS Global Forum 2008 Applications Development STEP 4: THE ORDER DEFINE TYPE In a DEFINE statement, the word after the slash specifies the define type for that column. The valid define types are DISPLAY, ORDER, GROUP, ANALYSIS, ACROSS and COMPUTED. Up to now, we’ve been using the DISPLAY define type. Now, let’s use each of the other five types in turn. The ORDER define type specifies the column used to sort the report. proc report data=mnthly_sales nofs headline headskip; title1 "Ordered Report (Order Type)"; column cty zip var sales; define cty / order width=6 ‘County/Name’; define zip / display; define var / display; define sales / display format=6.2 width=10; run; Ordered Report (Order Type) County Zip Monthly Name Code Variety Sales ------------------------------------Adams Scott 52200 52200 52200 52200 52199 52199 52199 52423 52423 52423 52423 52388 52388 52388 Merlot Chardonnay Zinfandel Chardonnay Merlot Chardonnay Zinfandel Merlot Chardonnay Zinfandel Merlot Merlot Chardonnay Zinfandel 385.51 246.00 151.10 76.24 233.03 185.22 95.84 186.00 156.61 35.50 55.30 122.89 78.22 15.40 Notice how cty, the ordered column, doesn’t repeat in each row, only when it changes. STEP 5: THE GROUP DEFINE TYPE The GROUP define type consolidates all the observations with the same unique combination of grouped variables. You can specify the order of the rows within the group by using the ORDER= option of the DEFINE statement. In this case they are ordered by descending frequency of var. proc report data=mnthly_sales nofs headline headskip; title1 "Grouped Report (Group Type)"; column cty zip var sales; define cty / group width=6 ‘County/Name’; define zip / group; define var / group order=freq descending; define sales / display format=6.2 width=10; run; Grouped Report (Group Type) County Zip Monthly Name Code Variety Sales ------------------------------------Adams 52199 52200 Scott 52388 52423 Merlot Chardonnay Zinfandel Merlot Chardonnay Zinfandel Merlot Chardonnay Zinfandel Merlot Chardonnay Zinfandel 233.03 185.22 95.84 385.51 246.00 76.24 151.10 122.89 78.22 15.40 186.00 55.30 156.61 35.50 You probably noticed that the GROUP define type isn’t very helpful—unless it is used with the ANALYSIS define type. 3 SAS Global Forum 2008 Applications Development STEP 6: THE ANALYSIS DEFINE TYPE The ANALYSIS define type lets you specify for that column any of the statistics used in PROC MEANS, SUMMARY and UNIVARIATE. The statistics are calculated for the group you defined. proc report data=mnthly_sales nofs headline headskip; title1 "Summed Groups Rept (Analysis Type)"; column cty zip sales; define cty / group width=6 ‘County/Name’; define zip / group; define sales / analysis sum format=6.2 width=10; run; Summed Groups Rept (Analysis Type) County Zip Monthly Name Code Sales ------------------------Adams Scott 52199 52200 52388 52423 514.09 858.85 216.51 433.41 STEP 7: MULTIPLE STATISTICS ON THE SAME COLUMN If you want to calculate more than one statistic on the same column, you create an alias in the COLUMN statement. proc report data=mnthly_sales nofs headline headskip; title1 "Report with Multiple Statistics"; column cty zip sales sales=mean_sales; define cty / group width=6 ‘County/Name’; define zip / group; define sales / analysis sum format=6.2 width=10 ‘Sum’; define mean_sales / analysis mean format=6.2 width=10 ‘Mean of/Sales’; run; Report with Multiple Statistics County Zip Mean of Name Code Sum Sales ------------------------------------Adams Scott 52199 52200 52388 52423 514.09 858.85 216.51 433.41 171.36 214.71 72.17 108.35 STEP 8: THE ACROSS DEFINE TYPE If you want a report where all the unique values of a variable have their own column, you use the ACROSS define type. It’s the easiest way to create a cross-tab. In this case, we’re going to make the Merlot, Chardonnay and Zinfandel values of the var variable their own columns. Besides setting the define type of var to ACROSS, we also specify sales as the variable nested under var. This is done in the COLUMN statement by separating the ACROSS variable from the nested variable with a comma. In this case var,sales. If you use dashes as the first and last characters in the ACROSS column header, they span all the columns (it works for : = \_ .* + too). proc report data=mnthly_sales nofs headline headskip; title1 "Cross Tab Report (Across Type)"; column cty zip var,sales; define cty / group width=6 ‘County/Name’; define zip / group; define var / across order=freq descending '- Grape Variety -'; define sales / analysis sum format=6.2 width=10 'Revenue'; run; Cross Tab Report (Across Type) --------- Grape Variety ---------County Zip Merlot Chardonnay Zinfandel Name Code Revenue Revenue Revenue ------------------------------------------------Adams Scott 52199 52200 52388 52423 233.03 385.51 122.89 241.30 4 185.22 322.24 78.22 156.61 95.84 151.10 15.40 35.50 SAS Global Forum 2008 Applications Development STEP 9: BREAK AND RBREAK STATEMENTS The BREAK statement adds summaries (subtotals in this case) every time the group column(s) change. You can specify whether the break occurs before or after the group. The RBREAK statement gives you grand totals. proc report data=mnthly_sales nofs headline headskip; title1 "Report with Breaks"; column cty zip var,sales; define cty / group width=6 ‘County/Name’; define zip / group; define var / across order=freq descending '- Grape Variety -'; define sales / analysis sum format=6.2 width=10 'Revenue'; break after cty / ol skip summarize suppress; rbreak after / dol skip summarize; run; Report with Breaks --------- Grape Variety ---------County Zip Merlot Chardonnay Zinfandel Name Code Revenue Revenue Revenue ------------------------------------------------Adams 52199 52200 233.03 385.51 ---------618.54 185.22 322.24 ---------507.46 95.84 151.10 ---------246.94 Scott 52388 52423 122.89 241.30 ---------364.19 78.22 156.61 ---------234.83 15.40 35.50 ---------50.90 ========== 982.73 ========== 742.29 ========== 297.84 There are many options for the BREAK and RBREAK statements: OL overline DOL UL underline DUL summarize summarize each group skip suppress don’t repeat the break variable on the summary line double overline double underline skip a line after the break STEP 10: THE COMPUTED DEFINE TYPE AND COMPUTE BLOCK You can also compute your own values from the other data in your report. These computed columns have a COMPUTED define type. Besides the DEFINE statement for each computed column, you need to write a COMPUTE block which starts with a COMPUTE statement and ends with ENDCOMPUTE. Let’s add a row total variable, called row_sum, to the report. row_sum is the sum of all the analytic variables in the row. You can use the automatically defined _C#_ variables to make it easier. For example, _C3_ is the value in third column. Here’s how it works. proc report data=mnthly_sales nofs headline headskip; title1 "Report with Row Sums (Computed Type)"; column cty zip var,sales row_sum; define cty / group width=6 ‘County/Name’; define zip / group; define var / across order=freq descending '- Grape Variety -'; define sales / analysis sum format=6.2 width=10 'Revenue'; define row_sum / computed format=comma10.2 'Total'; break after cty / ol skip summarize suppress; rbreak after / dol skip summarize; compute row_sum; row_sum = sum(_C3_,_C4_,_C5_,_C6_,_C7_,_C8_); endcompute; run; 5 SAS Global Forum 2008 Applications Development Report with Row Sums (Computed Type) --------- Grape Variety ---------County Zip Merlot Chardonnay Zinfandel Name Code Revenue Revenue Revenue Total ------------------------------------------------------------Adams 52199 52200 233.03 385.51 ---------618.54 185.22 322.24 ---------507.46 95.84 151.10 ---------246.94 514.09 858.85 ---------1,372.94 Scott 52388 52423 122.89 241.30 ---------364.19 78.22 156.61 ---------234.83 15.40 35.50 ---------50.90 216.51 433.41 ---------649.92 ========== 982.73 ========== 742.29 ========== 297.84 ========== 2,022.86 If you don't know how many columns your ACROSS define type will create, just use the maximum possible _C#_ variables—the extras don't hurt. A COMPUTE block can contain, more or less, everything allowed in a DATA step including macro variables and %include. And remember, you can also reference any report item from within the block. CONCLUSION See how far you’ve come—just one step at a time. Now you can build most of the reports you need using just these few elements. PROC REPORT does have more advanced features--use the manual to fill in those finer points. I suggest looking at the COLUMN statement. It has several capabilities that I haven’t mentioned. The COMPUTE block is very powerful. But use it with caution. Often your programs will be easier to understand and maintain if you make your calculations in a DATA step before PROC REPORT rather than getting fancy in a COMPUTE block. RECOMMENDED READING Carpenter, Arthur. “PROC REPORT Basics: Getting Started with the Primary Statements" http://www.caloxy.com/papers/65_HOW07.pdf. ® Carpenter, Arthur. 2007. Carpenter’s Complete Guide to the SAS REPORT Procedure. Cary, NC: SAS Institute Inc. CONTACT INFORMATION Your comments and questions are valued and encouraged. I can be reached at: David Lewandowski Thomson Healthcare 1007 Church Street Suite 700 Evanston, IL 60201 dave.lewandowski@thomson.com www.thomsonhealthcare.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. 6