079-2008: A Step-by-Step Introduction to PROC REPORT

advertisement
SAS Global Forum 2008
Applications Development
Paper 079-2008
A Step-by-Step Introduction to PROC REPORT
David Lewandowski, Thomson Healthcare, Evanston, IL
ABSTRACT
Have you read the description of PROC REPORT in the SAS® manual and been left scratching your head wondering
where to start? Then, here’s a step by step introduction. It walks through the basics, one feature at a time, so you
can see each one’s impact on the report. When you’re finished, you will already know how to create ninety percent of
the reports you need. And more importantly, you will have a context so you can go back to the manual and learn the
advanced features.
INTRODUCTION
The syntax of PROC REPORT is different from all the other procedures and many of us found the manual less than
helpful in learning its unique statements. Most of us turned to a colleague and asked him to explain PROC REPORT.
We got a brief introduction and a set of sample code that we’ve been modifying ever since. This paper will give you
the same thing—a simple introduction and lots of sample code to enhance as needed.
STEP 1: CREATE A SMALL DATASET FOR REPORTING
To start off, let’s create a dataset to use for reporting—say monthly wine sales by zip code and county. Here’s the
program and the resulting listing:
data mnthly_sales;
length zip $ 5 cty $ 8 var $ 10;
input zip $ cty $ var $ sales;
label zip="Zip Code"
cty="County"
var="Variety"
sales="Monthly Sales";
datalines;
52423 Scott Merlot 186.
52423 Scott Chardonnay 156.61
52423 Scott Zinfandel 35.5
52423 Scott Merlot 55.3
52388 Scott Merlot 122.89
52388 Scott Chardonnay 78.22
52388 Scott Zinfandel 15.4
52200 Adams Merlot 385.51
52200 Adams Chardonnay 246
52200 Adams Zinfandel 151.1
52200 Adams Chardonnay 76.24
52199 Adams Merlot 233.03
52199 Adams Chardonnay 185.22
52199 Adams Zinfandel 95.84
;
Raw Data
Obs
ZIP
CTY
1
2
3
4
5
6
7
8
9
10
11
12
13
14
52423
52423
52423
52423
52388
52388
52388
52200
52200
52200
52200
52199
52199
52199
Scott
Scott
Scott
Scott
Scott
Scott
Scott
Adams
Adams
Adams
Adams
Adams
Adams
Adams
VAR
Merlot
Chardonnay
Zinfandel
Merlot
Merlot
Chardonnay
Zinfandel
Merlot
Chardonnay
Zinfandel
Chardonnay
Merlot
Chardonnay
Zinfandel
SALES
186.00
156.61
35.50
55.30
122.89
78.22
15.40
385.51
246.00
151.10
76.24
233.03
185.22
95.84
proc print data=mnthly_sales;
title "Raw Data";
run;
SYNTAX
Next, let me describe PROC REPORT’s syntax. The COLUMN statement is used to list each report column. Each
column, in turn, has a DEFINE statement that describes how that column is created and formatted. You use the
TITLE statement to specify the title at the top of each page.
PROC REPORT DATA=datasetname <options>;
TITLE title text;
COLUMN variable list and column specifications;
DEFINE column / define type and column attributes;
DEFINE column / define type and column attributes;
...
RUN;
1
SAS Global Forum 2008
Applications Development
STEP 2: A SIMPLE REPORT
Putting the data and a basic PROC REPORT together…
proc report data=mnthly_sales nofs;
title1 "Simple Report";
column cty zip var sales;
define cty / display;
define zip / display;
define var / display;
define sales / display;
run;
Simple Report
County
Scott
Scott
Scott
Scott
Scott
Scott
Scott
Adams
Adams
Adams
Adams
Adams
Adams
Adams
Zip
Code
52423
52423
52423
52423
52388
52388
52388
52200
52200
52200
52200
52199
52199
52199
Variety
Merlot
Chardonnay
Zinfandel
Merlot
Merlot
Chardonnay
Zinfandel
Merlot
Chardonnay
Zinfandel
Chardonnay
Merlot
Chardonnay
Zinfandel
Monthly
Sales
186
156.61
35.5
55.3
122.89
78.22
15.4
385.51
246
151.1
76.24
233.03
185.22
95.84
If you compare “Simple Report” to “Raw Data” created in Step 1, you will notice a few differences.
•
Simple Report doesn’t have an OBS column
•
the variables are listed in their order in the column statement
•
the column headers are the labels not the variable names
•
the column headers are adjusted to the column width, not the other way around
By the way, nofs is used to turn off the procedure’s interactive features.
STEP 3: SOME FORMATTING OPTIONS
On the define statement, you can specify the formatting options for that column. “format” applies the standard SAS
formats to the column, “width” sets the column width, “flow” wraps the text within the width you specified, and
“noprint” suppresses printing that column. You can replace the label as the column heading by specifying the new
heading in quotes. The slash (i.e. “/”) is the line break symbol used to force the heading to wrap lines. The PROC
REPORT option “headline” adds a line after the column headings and the “headskip” option adds the blank line. Let’s
add formatting to Simple Report.
proc report data=mnthly_sales nofs
headline headskip;
title1 "Simple Formatted Report";
column cty zip var sales;
define cty / display width=6 ‘County/Name’;
define zip / display;
define var / display;
define sales / display format=6.2 width=10;
run;
2
Simple Formatted Report
County Zip
Monthly
Name
Code
Variety
Sales
------------------------------------Scott
Scott
Scott
Scott
Scott
Scott
Scott
Adams
Adams
Adams
Adams
Adams
Adams
Adams
52423
52423
52423
52423
52388
52388
52388
52200
52200
52200
52200
52199
52199
52199
Merlot
Chardonnay
Zinfandel
Merlot
Merlot
Chardonnay
Zinfandel
Merlot
Chardonnay
Zinfandel
Chardonnay
Merlot
Chardonnay
Zinfandel
186.00
156.61
35.50
55.30
122.89
78.22
15.40
385.51
246.00
151.10
76.24
233.03
185.22
95.84
SAS Global Forum 2008
Applications Development
STEP 4: THE ORDER DEFINE TYPE
In a DEFINE statement, the word after the slash specifies the define type for that column. The valid define types are
DISPLAY, ORDER, GROUP, ANALYSIS, ACROSS and COMPUTED. Up to now, we’ve been using the DISPLAY
define type. Now, let’s use each of the other five types in turn. The ORDER define type specifies the column used to
sort the report.
proc report data=mnthly_sales nofs
headline headskip;
title1 "Ordered Report (Order Type)";
column cty zip var sales;
define cty / order width=6 ‘County/Name’;
define zip / display;
define var / display;
define sales / display format=6.2 width=10;
run;
Ordered Report (Order Type)
County Zip
Monthly
Name
Code
Variety
Sales
------------------------------------Adams
Scott
52200
52200
52200
52200
52199
52199
52199
52423
52423
52423
52423
52388
52388
52388
Merlot
Chardonnay
Zinfandel
Chardonnay
Merlot
Chardonnay
Zinfandel
Merlot
Chardonnay
Zinfandel
Merlot
Merlot
Chardonnay
Zinfandel
385.51
246.00
151.10
76.24
233.03
185.22
95.84
186.00
156.61
35.50
55.30
122.89
78.22
15.40
Notice how cty, the ordered column, doesn’t repeat in each row, only when it changes.
STEP 5: THE GROUP DEFINE TYPE
The GROUP define type consolidates all the observations with the same unique combination of grouped variables.
You can specify the order of the rows within the group by using the ORDER= option of the DEFINE statement. In this
case they are ordered by descending frequency of var.
proc report data=mnthly_sales nofs
headline headskip;
title1 "Grouped Report (Group Type)";
column cty zip var sales;
define cty / group width=6 ‘County/Name’;
define zip / group;
define var / group order=freq descending;
define sales / display format=6.2 width=10;
run;
Grouped Report (Group Type)
County Zip
Monthly
Name
Code
Variety
Sales
------------------------------------Adams
52199
52200
Scott
52388
52423
Merlot
Chardonnay
Zinfandel
Merlot
Chardonnay
Zinfandel
Merlot
Chardonnay
Zinfandel
Merlot
Chardonnay
Zinfandel
233.03
185.22
95.84
385.51
246.00
76.24
151.10
122.89
78.22
15.40
186.00
55.30
156.61
35.50
You probably noticed that the GROUP define type isn’t very helpful—unless it is used with the ANALYSIS define
type.
3
SAS Global Forum 2008
Applications Development
STEP 6: THE ANALYSIS DEFINE TYPE
The ANALYSIS define type lets you specify for that column any of the statistics used in PROC MEANS, SUMMARY
and UNIVARIATE. The statistics are calculated for the group you defined.
proc report data=mnthly_sales nofs
headline headskip;
title1 "Summed Groups Rept (Analysis Type)";
column cty zip sales;
define cty / group width=6 ‘County/Name’;
define zip / group;
define sales / analysis sum
format=6.2 width=10;
run;
Summed Groups Rept (Analysis Type)
County Zip
Monthly
Name
Code
Sales
------------------------Adams
Scott
52199
52200
52388
52423
514.09
858.85
216.51
433.41
STEP 7: MULTIPLE STATISTICS ON THE SAME COLUMN
If you want to calculate more than one statistic on the same column, you create an alias in the COLUMN statement.
proc report data=mnthly_sales nofs
headline headskip;
title1 "Report with Multiple Statistics";
column cty zip sales sales=mean_sales;
define cty / group width=6 ‘County/Name’;
define zip / group;
define sales / analysis sum
format=6.2 width=10 ‘Sum’;
define mean_sales / analysis mean
format=6.2 width=10 ‘Mean of/Sales’;
run;
Report with Multiple Statistics
County Zip
Mean of
Name
Code
Sum
Sales
------------------------------------Adams
Scott
52199
52200
52388
52423
514.09
858.85
216.51
433.41
171.36
214.71
72.17
108.35
STEP 8: THE ACROSS DEFINE TYPE
If you want a report where all the unique values of a variable have their own column, you use the ACROSS define
type. It’s the easiest way to create a cross-tab. In this case, we’re going to make the Merlot, Chardonnay and
Zinfandel values of the var variable their own columns. Besides setting the define type of var to ACROSS, we also
specify sales as the variable nested under var. This is done in the COLUMN statement by separating the ACROSS
variable from the nested variable with a comma. In this case var,sales. If you use dashes as the first and last
characters in the ACROSS column header, they span all the columns (it works for : = \_ .* + too).
proc report data=mnthly_sales nofs headline headskip;
title1 "Cross Tab Report (Across Type)";
column cty zip var,sales;
define cty / group width=6 ‘County/Name’;
define zip / group;
define var / across order=freq descending '- Grape Variety -';
define sales / analysis sum format=6.2 width=10 'Revenue';
run;
Cross Tab Report (Across Type)
--------- Grape Variety ---------County Zip
Merlot Chardonnay
Zinfandel
Name
Code
Revenue
Revenue
Revenue
------------------------------------------------Adams
Scott
52199
52200
52388
52423
233.03
385.51
122.89
241.30
4
185.22
322.24
78.22
156.61
95.84
151.10
15.40
35.50
SAS Global Forum 2008
Applications Development
STEP 9: BREAK AND RBREAK STATEMENTS
The BREAK statement adds summaries (subtotals in this case) every time the group column(s) change. You can
specify whether the break occurs before or after the group. The RBREAK statement gives you grand totals.
proc report data=mnthly_sales nofs headline headskip;
title1 "Report with Breaks";
column cty zip var,sales;
define cty / group width=6 ‘County/Name’;
define zip / group;
define var / across order=freq descending '- Grape Variety -';
define sales / analysis sum format=6.2 width=10 'Revenue';
break after cty / ol skip summarize suppress;
rbreak after / dol skip summarize;
run;
Report with Breaks
--------- Grape Variety ---------County Zip
Merlot Chardonnay
Zinfandel
Name
Code
Revenue
Revenue
Revenue
------------------------------------------------Adams
52199
52200
233.03
385.51
---------618.54
185.22
322.24
---------507.46
95.84
151.10
---------246.94
Scott
52388
52423
122.89
241.30
---------364.19
78.22
156.61
---------234.83
15.40
35.50
---------50.90
==========
982.73
==========
742.29
==========
297.84
There are many options for the BREAK and RBREAK statements:
OL
overline
DOL
UL
underline
DUL
summarize summarize each group
skip
suppress
don’t repeat the break variable on the summary line
double overline
double underline
skip a line after the break
STEP 10: THE COMPUTED DEFINE TYPE AND COMPUTE BLOCK
You can also compute your own values from the other data in your report. These computed columns have a
COMPUTED define type. Besides the DEFINE statement for each computed column, you need to write a COMPUTE
block which starts with a COMPUTE statement and ends with ENDCOMPUTE. Let’s add a row total variable, called
row_sum, to the report. row_sum is the sum of all the analytic variables in the row. You can use the automatically
defined _C#_ variables to make it easier. For example, _C3_ is the value in third column. Here’s how it works.
proc report data=mnthly_sales nofs headline headskip;
title1 "Report with Row Sums (Computed Type)";
column cty zip var,sales row_sum;
define cty / group width=6 ‘County/Name’;
define zip / group;
define var / across order=freq descending '- Grape Variety -';
define sales / analysis sum format=6.2 width=10 'Revenue';
define row_sum / computed format=comma10.2 'Total';
break after cty / ol skip summarize suppress;
rbreak after / dol skip summarize;
compute row_sum;
row_sum = sum(_C3_,_C4_,_C5_,_C6_,_C7_,_C8_);
endcompute;
run;
5
SAS Global Forum 2008
Applications Development
Report with Row Sums (Computed Type)
--------- Grape Variety ---------County Zip
Merlot Chardonnay
Zinfandel
Name
Code
Revenue
Revenue
Revenue
Total
------------------------------------------------------------Adams
52199
52200
233.03
385.51
---------618.54
185.22
322.24
---------507.46
95.84
151.10
---------246.94
514.09
858.85
---------1,372.94
Scott
52388
52423
122.89
241.30
---------364.19
78.22
156.61
---------234.83
15.40
35.50
---------50.90
216.51
433.41
---------649.92
==========
982.73
==========
742.29
==========
297.84
==========
2,022.86
If you don't know how many columns your ACROSS define type will create, just use the maximum possible _C#_
variables—the extras don't hurt. A COMPUTE block can contain, more or less, everything allowed in a DATA step
including macro variables and %include. And remember, you can also reference any report item from within the
block.
CONCLUSION
See how far you’ve come—just one step at a time. Now you can build most of the reports you need using just these
few elements.
PROC REPORT does have more advanced features--use the manual to fill in those finer points. I suggest looking at
the COLUMN statement. It has several capabilities that I haven’t mentioned. The COMPUTE block is very powerful.
But use it with caution. Often your programs will be easier to understand and maintain if you make your calculations
in a DATA step before PROC REPORT rather than getting fancy in a COMPUTE block.
RECOMMENDED READING
Carpenter, Arthur. “PROC REPORT Basics: Getting Started with the Primary Statements"
http://www.caloxy.com/papers/65_HOW07.pdf.
®
Carpenter, Arthur. 2007. Carpenter’s Complete Guide to the SAS REPORT Procedure. Cary, NC: SAS Institute Inc.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. I can be reached at:
David Lewandowski
Thomson Healthcare
1007 Church Street
Suite 700
Evanston, IL 60201
dave.lewandowski@thomson.com
www.thomsonhealthcare.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
6
Download