Revising Output from the TABULATE Procedure

advertisement
NESUG 2007
And Now, Presenting...
Revising Output from the TABULATE Procedure
Michael Tuchman, Surveillance Data Inc., Plymouth Meeting, PA
ABSTRACT
Many changes required in reports produced by the TABULATE procedure can be made without re-running the
underlying analysis. This can be helpful in shortening the reviewing cycle on a reporting project. The first method is to
re-run the procedure on the summary data set produced by the TABULATE procedure. The TABLE statement
required is virtually identical, and the minor adjustments in nomenclature are reviewed. You will also learn how to
interpret the structure of the summary table to make several common changes to the shape of the tables produced
without re-running the original data.
Once the table is perfected, additional options for saving and managing the final report are new in version 9.1. In
particular, the DOCUMENT destination and the DOCUMENT procedure were introduced. The commands for
managing an ODS DOCUMENT require a large learning curve. There is a new set of commands to master and a new
data structure. In order to make the concepts tangible, we'll develop the tools for the document procedure using the
output from the first section's TABULATE procedures to make the concepts more concrete. As a result, when you
finish this paper, you'll have a sample document as a practical basis for further exploration.
PREREQUISITES
This paper assumes that you are familiar with the basics of the terminology that SAS uses to summarize data. In
particular, the reader should be familiar with the usage of CLASS and VAR statements. The reader should also be
familiar with the FORMAT statement, some common formats for numbers and dates, and making custom formats to
regroup data. Some knowledge of the TABULATE procedure is also assumed, but the use of ‘fancy’ tabulate magic
will be kept to a minimum. For the most part, the features used will be similar to those already available in the MEANS
procedure.
INTRODUCTION
Despite arduous work and careful design, changes need to be made to a report. Perhaps your audience prefers a
different grouping level. You may have ages broken down by five year bands, but your audience decides on ten-year
bands. Often, a report for business users requires fewer decimal places, or experimenting with putting or removing
additional level of subtotals. It is desirable to do this without re-running a possibly time consuming analysis. You will
learn two ways to do this. Firstly, there is the output data set produced form the TABULATE procedure. Running
TABULATE again on this data set, slightly modified, can produce the needed changes.
Additionally, you may have competition for your attention. With Microsoft Excel, you can reshape, and redefine fields
in a pivot table by clicking and dragging. Of course, the TABULATE procedure can deliver a much richer functionality,
including row and column percentages with any denominator we wish. Still, as SAS users, we need to show our
clients that we can be just as nimble when a change in reporting is required.
The strategy recommended here for fine tuning a table is to output the intermediate totals into another table. The
TABULATE procedure has such an output. We’ll begin by understanding how the output data set from the
TABULATE procedure is laid out, then go into further detail. By using this secondary table, a great deal can be done
to re-cast totals, means, sums, and standard deviations without recomposing from scratch.
The first step is to see that with minor modifications, you can use the same TABULATE code on the summarized data
that you can on the original data and produce the same table. The modifications are simple. First: Make sure you
save the frequency of each cell combination. Then you can feed the output of the TABULATE procedure back into the
same table statements provided you make the following modifications.
For each example, I will show how the TABLE statement works on the original data, and on the SUMMARIZED data.
Another solution for fine-tuning tabular reports is to design your work with a random subset of the data, or the first
rows of a dataset. The only disadvantage to this approach is that you may not realize there is a problem with your
output until it is too late to do anything about it. For example, you may find that the formats you choose are not wide
enough to accommodate the width of a subtotal field until you run it on the final data, and then the boss says he
needs the final report in 5 minutes!
Having a nice output is only half the battle. With PROC DOCUMENT, we can actually assemble a group of related
tables, as well as text, and print them out in any of the standard ODS formats. This means, for example, that it is
possible to add a block of text to your report that specifically references a particular cell in a table. No more hand
editing text every time a table has to be updated!
-1-
NESUG 2007
And Now, Presenting...
DATA
A dataset that provides enough ground for exploration is the example in the SAS 9.1.3 Online Documentation™. I
added an income field, in order to have a numeric variable to analyze.
Data Jobclass;
input Gender Occupation
datalines;
1 1 42300 1 1 41900 1 1
1 1 41800 1 1 41200 1 2
1 2 45300 1 2 44400 1 2
1 3 40900 1 3 41200 1 3
1 3 40200 1 1 41600 1 1
1 2 46100 1 2 45700 1 2
1 3 41100 1 3 41000 1 4
1 4 41200 1 4 40300 1 4
1 1 42300 1 1 41700 1 1
1 2 46000 1 2 45200 1 2
1 3 41300 1 3 41100 1 3
1 4 42500 1 4 41800 1 4
1 3 40900 2 1 42100 2 1
2 1 41700 2 1 40700 2 1
2 2 45800 2 2 45300 2 2
2 3 41200 2 4 40100 2 4
2 4 41200 2 4 41000 2 1
2 3 41400 2 3 41700 2 3
2 4 42100 2 4 41600 2 4
2 1 42300 2 1 42200 2 1
2 2 45300 2 2 45700 2 2
2 3 40800 2 3 41200 2 4
2 1 41900 2 1 42100 2 1
2 2 45200 2 2 45800 2 2
2 3 41300 2 3 41000 2 4
Income @@;
42300
46100
45000
41700
42400
46500
41600
41000
41900
44900
41200
40600
43100
42200
44900
41000
42500
40700
41000
41700
45800
42400
42100
45900
41200
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
3
1
2
4
1
2
2
3
4
1
2
3
4
3
4
1
2
2
4
1
3
42200
44900
45400
41000
41400
45200
41400
41600
45600
45100
41100
41700
42200
45600
40900
41300
41600
40700
42200
45300
46200
41200
42600
40900
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
1
2
3
3
2
2
4
1
2
2
4
1
1
2
3
4
3
4
1
2
2
4
1
3
42200
44700
41200
40700
44500
46300
41400
42700
46000
47000
41200
42100
42600
44900
41500
40800
41100
41700
41300
45100
44600
42300
41500
41100
Let’s begin our exploration of this data by computing the mean and standard deviation by gender and occupation
class. Table 1 shows the first tabulation and the results:
proc tabulate data=jobclass;
class gender occupation;
var income;
table gender * occupation ,n
run;
/* This is directly from the
proc format;
value gend 1=s'Female'
2='Male'
other='*** Data Entry Error
value occup 1='Technical'
2='Manager/Supervisor'
3='Clerical'
4='Administrative'
other='*** Data Entry Error
run;
income * (mean std);
documentation */
***';
***';
2
NESUG 2007
And Now, Presenting...
The results are as follows:
income
N
Mean
Std
Gender
Occupation
1
1
16 41975.00 405.79
2
20 45495.00 702.23
3
14 41042.86 334.47
4
11 41336.36 595.44
1
18 42055.56 543.65
2
15 45426.67 446.36
3
14 41171.43 302.37
4
15 41306.67 623.89
2
Table 1 - The first attempt might not have all the formatting you want.
Of course, this is not terribly helpful. Gender and occupation should be more human readable, income should be in
the comma or currency format, and let’s say we’d also like the gender means directly compared. We could re-run the
tabulation on the original data, and it with a few bells and whistles, the more human readable tabulation would look as
follows:
proc tabulate data=jobclass;
label income = 'Annual Income';
keylabel std = 'Std. Dev';
format gender gend. occupation occup.;class gender occupation;
var income;
table gender * occupation ,n income * (mean*f=comma9. std);
run;
Annual Income
N
Mean
Std. Dev
Gender
Occupation
Female
Technical
16 41,975 405.79
Manager/Supervisor
20 45,495 702.23
Clerical
14 41,043 334.47
Administrative
11 41,336 595.44
Technical
18 42,056 543.65
Manager/Supervisor
15 45,427 446.36
Clerical
14 41,171 302.37
Administrative
15 41,307 623.89
Male
Table 2 - Formatting can be applied after summarization
Figure 1 - A nicer version, but still some work to do.
The goal of this exercise is to see how each table is produced first from the original, then from the summary data.
With this in mind, we’ll add an OUT= option to the tabulate statement. The purpose of the OUT= is to show the
underlying data used to create the final table. It is desirable to work with this instead of the original table to make
quick changes happen quickly.
Before we get into this, let’s look at the summary data set and its nomenclature. Use the same tabulate statement as
before,and this time produce the summary data set JOBSUMM:
To better illustrate the effect of the _TYPE_ variable, the following TABULATE output shows all combinations of
subtotals. Only the summary output will be printed as table Table 3.
3
NESUG 2007
And Now, Presenting...
proc tabulate data=nesug.jobclass out=nesug.jobsumm;
format gender gend. occupation occup.;
class gender occupation;
var income;
table (gender all) * (occupation all) ,n*f=3. income * (mean*f=comma9. std);
run;
Gender
Occupation
_TYPE_
Mean Income
Std.
Female
Technical
11
41975.00
405.79
Female
Manager/Supervisor
11
45495.00
702.23
Female
Clerical
11
41042.86
334.47
Female
Administrative
11
41336.36
595.44
Male
Technical
11
42055.56
543.65
Male
Manager/Supervisor
11
45426.67
446.36
Male
Clerical
11
41171.43
302.37
Male
Administrative
11
41306.67
623.89
Female
All Occupations
10
42800.00
1999.08
Male
All Occupations
10
42490.32
1776.69
Both
Technical
01
42017.65
478.30
Both
Manager/Supervisor
01
45465.71
598.99
Both
Clerical
01
41107.14
319.64
Both
Administrative
01
41319.23
600.01
Both
All Occupations
00
42643.90
1888.89
Table 3 - Summary Data from the TABULATE procedure
There are a couple of things to notice about this table. First, the occupation and gender are right justified. It may look
better left justified. This will be fixed towards the very end of the revision process, once the final table format is set.
For the time being, let’s fix our attention on the cells in the above printout that do not come directly from the
table in
Figure 1Table 3, They are the _TYPE_, _PAGE_, and _TABLE_. Since this is a one table, one page report, the
_PAGE_ and _TABLE_ were1 for this output and were omitted from the display.
While _PAGE_ and _TABLE_ are straightforward in their explanation, The _TYPE_ variable requires a little bit of
explanation. _TYPE_ is a character variable, indicating which variables were held constant during a summation. Each
1 corresponds to a cell that is restricted, whereas 0 refers to a statistic that is unrestricted. Thus ‘00’ refers to an
unrestricted mean in this instance, whereas ‘11’ refers to a total where gender and occupation are both fixed.
_TYPE_ required the most detailed explanation. The others are more straightforward. _PAGE_ refers to the page
number if more than two table dimensions were created in building the table.. If the TABULATE procedure had more
than one table statement , the _TABLE_ would reflect this number.
FIRST RETABULATION
Now that we understand a simple example of the summary data set, let’s use it to move the gender column to the top.
Here’s how it looks as a statement on the original data.
proc tabulate data=nesug.jobclass ;
var income;
class occupation gender;
table occupation,gender*(n income*(mean std));
run;
4
NESUG 2007
And Now, Presenting...
Sex
Female
Male
Income
N
Mean
Income
Std
N
Mean
Std
Occupation
Technical
16 41975.00 405.79 18 42055.56 543.65
Manager/Supervisor
20 45495.00 702.23 15 45426.67 446.36
Clerical
14 41042.86 334.47 14 41171.43 302.37
Administrative
11 41336.36 595.44 15 41306.67 623.89
Table 4 - Summary Statistics for the Occupational Data Set
The advantage to the individual level table, is that any statistic, including variance statistics can be recomputed. The
disadvantage is that in a rush situation, you may not have even one minute per tabulate revision.
The tabulate statement on the summary data is virtually identical.
adjustments:
Rename the N variable to something more descriptive.
To make it truly identical, make the following
I used ‘employee_count’ here.
Use this variable as the new frequency weights.
The differences with the summary data is that our cell counts must now be summed. Furthermore, each salary is now
replaced by its mean salary, so to obtain the same addresses, we now need frequency counts. Also, to promote
clarity, we will rename N to something more descriptivfe, since N has special meaning in the TABULATE procedure.
The other difference is that the TABULATE procedure has created a new variable called income_mean to replace
the original income variable. Although it is possible to rename the variable income_mean to income, we will not do
that here so that we are reminded that we are working with summary values rather than the original data.
proc tabulate data=jobsumm;
/* N has been renamed to employee_count */
freq employee_count;
var income_mean;
class occupation gender;
table occupation,gender*(n income_mean * mean);
run;
The new element here is FREQ count. As with the FREQ option in other SAS procedures, the purpose is that each
record in the input table is counted multiple times. The multiple is determined by the value of the COUNT variable.
Where did the standard deviation go? Since we are working with summary data, we lose the information on the
variability of income within each occupation. However, we have saved the information on the prior table and will splice
it back on shortly.
PERCENTAGES BY COUNT
Percentages also behave nicely after passing to summary level data.
The apparent difference, in means by gender, is that in this company, men and women choose different jobs.
Whether this is by choice, or by social conditioning, is beyond the scope of our paper. First, here is the code for
summarization on the non-aggregated data.
proc tabulate data=nesug.jobclass;
class gender occupation;
format gender gend. occupation occup.;
table occupation,(gender all)*colpctn;
run;
and on the aggregated data, with the bold code showing the only difference.
proc tabulate data=nesug.jobsumm out=job2;
freq employee_count;
class gender occupation;
format gender gend. occupation occup.;
table occupation,(gender all)*colpctn;
run;
5
NESUG 2007
And Now, Presenting...
Gender
All
Female
Male
ColPctN
ColPctN
ColPctN
26.23
29.03
27.64
Manager/Supervisor
32.79
24.19
28.46
Clerical
22.95
22.58
22.76
Administrative
18.03
24.19
21.14
Occupation
Technical
Table 5 - Percentage of people in occupation classes, by gender
MAKING CHANGES TO TABULATE OUTPUT – REVIEW
The two examples are typical of the types of work you can do with summary data. The only difference is the insertion
of the FREQ statement to make sure all the observations are counted. If the original data had several million
observations, re-printing this table could take up to one minute per revision. While this does not sound like much, it is
many times more than the few seconds it should take. After all, when you’re rushed, there’s rarely just one thing to
do. Shortening the development cycle of a report should reduce errors by giving the developer more time for
proofreading.
In this final example, we’ll highlight the cells corresponding to the most popular choice by making the font size larger.
Recall that in ODS, any style attribute can be chosen by means of a format. It may take a little while to fiddle with
format settings to get exactly the results that please you. Focusing on summary data makes it possible to try more
things in less time.
Code:
proc format;
value maxf (fuzz=.1)
&f_pct. = '6'
&m_pct. = '6'
other='2';
run;
The code here makes sure that the maximum value of percentage is put into a larger font (size 6) than other values
(size 2).
Explore other options of the tabulate procedure’s ODS formatting capabilities. Doing so with a summary
data set will shorten the learning curve.
title "Fiddling with formats";
ods html file='c:\documents and settings\michael\final.html';
proc tabulate data=nesug.jobsumm style={background=yellow font_size=1};
class occupation gender;
classlev occupation / style=[just=l background=darkblue foreground=white
font_weight=bold];
classlev gender / style=[background=darkred foreground=white font_weight=bold];
freq employee_count;
table occupation='',
gender=''*pctn<occupation>=''
*[style=[background=white foreground=black font_size=maxf.]]
/box=[label='Occupation Class'
style=[font_size=2 background=darkred foreground=white]];
run;
ods html close;
And the pièce de résistance :
format. )
(The Blue and Red had to be changed to black in order to appear correctly in PDF
6
NESUG 2007
And Now, Presenting...
Occupation Class
Female
Male
26.23
29.03
Manager/Supervisor
32.79
24.19
Clerical
22.95
22.58
Administrative
18.03
24.19
Technical
Table 6 –Font differentiated exhibit
THE DOCUMENT PROCEDURE
Now that we have finished our work and wish to manage our printed reports within SAS, it’s time to manage your
finished product with DOCUMENT procedure. Typically the next step is to include tables in a final report and provide
some discussion suitable for your audience. Often this can be a tedious process, rife with errors, hand-editing, and its
corresponding inconsistencies.
It is important to understand the advantages of learning this way of managing reports before attempting the myriad
and powerful commands for managing document stores. The ODS Document facility enables you to manage related
exhibits as a group. Since related figures stay together, revisions are easier to keep consistent, and since the results
are stored in SAS catalogs, they are persistent. Other SAS users can replay your reports at a later date. Of course,
there are other ways to manage report output outside of SAS. But there, there is no enforcement of keeping related
documents together.
One particularly nice feature is the ability to store data in one format, and present it in many others, including some
not yet invented! Both SAS and independent developers are adding styles and tagsets to ODS every day. By
storing your work in document format, you enable yourself to take advantage not only the styles of today, but also of
the future.
For our simple example, our business problem is to analyze the apparent discrepancy in average male and female
income for the JOBCLASS data set., and gather related text, tabular, and graphic demonstrations together.
We will do this by building a simple document consisting of the tabular output produced earlier, then create some text
that discusses it. Once you develop grounding in the fundamentals using practical examples, you may wish to try your
hand with longer documents, and also use equations..
WHAT IS A DOCUMENT
According to the SAS® documentation, a Document is a collection of
Graphs,
Tables,
Equations,
Notes
According to the SAS Online Documentation, the REPORT procedure is not yet supported with the DOCUMENT
procedure, nor is all features of the PRINT procedure. To take full advantage of printing in ODS, the PUT _ODS_
statement in the ODS in the data step will be the ‘understudy’ until the PRINT procedure is fully supported by ODS
document. . Fortunately, the output from the TABULATE procedure is fully supported as a DOCUMENT type.
The miniature business analysis will be to compare the distribution of job classes taken by male and female
employees, and report the most common choice for male and female.
CREATING THE DOCUMENT
ods document name=nesug.letstry;
7
NESUG 2007
And Now, Presenting...
This creates the document. You may, as with all ODS destinations, print as much stuff out to it as you want. Pick a
permanent library for this, since the advantage of having your reports always ready would be lost if you use the
WORK library.
TITLE 'Occupation Choices By Gender';
proc tabulate data=nesug.jobsumm out=job2;
freq count;
class gender occupation;
format gender gend. occupation occup.;
table occupation,(gender all)*colpctn;
run;
If you have an instance of BASE SAS running on your machine, you can use the DOCUMENT window. In fact, if
you’re already familiar with the RESULTS window of BASE SAS, the DOCUMENT window works very similarly, but
with a few improvements.
We won’t close the document destination just yet. Our next move is to write a small paragraph based on the results of
the tabulation. The first step is to extract the interesting numbers to macro variables, then include these macro
variables in text.
The next step is optional, but to get the percentage of occupation by gender (PCTN_10) and unrestricted percentage
(PCTN_00) into a single column, use a data step.
data job3;
set job2;
file print ods;
genderC = put(gender,gend1.);
if _type_ = '11' then do;
occ_percent = pctn_10;
end;
else do;
occ_percent = pctn_00;
end;
put _ods_;
keep genderC occupation occ_percent;
run;
genderc
Occupation
occ_percent
F
Technical
26.229508197
M
Technical
29.032258065
F
Manager/Supervisor
32.786885246
M
Manager/Supervisor
24.193548387
F
Clerical
22.950819672
M
Clerical
22.580645161
F
Administrative
18.032786885
M
Administrative
24.193548387
B
Technical
27.642276423
B
Manager/Supervisor
28.455284553
B
Clerical
22.764227642
B
Administrative
21.138211382
Table 7 - Summary Data. B refers to both genders.
8
NESUG 2007
And Now, Presenting...
This data step also illustrates a new way of thinking about printing and reporting that will become more prevalent, I
predict, as the ODS evolves. This is implemented by the FILE PRINT ODS; statement and the PUT _ODS_. By
default, the PUT _ODS_ is similar to a PUT _ALL_ statement, unless restricted by listing the desired variables. The
_ODS_ output can be customized with styles and conditional formatting. There is far more to the ODS in the Data
Step than can be covered here. The interested reader is referred to the SAS ODS documentation or Lauren
Haworth’s text ‘ODS by Example’.
The next step is to load the values we are most interested in discussing into macro variables. These variables can
then be used in document notes. This code takes the most popular occupations from each gender, then puts the most
popular for women into the macro variable &f_occ and the corresponding percentage into the macro variable &f_pct.
The corresponding variables for the men were populated as well.
/* make sure the most popular is the first one we see in each gender */
/* PRINT THE FILE AND SAVE THE MOST POPULAR OCCUPATION CLASSES
FOR EACH GENDER */
proc sort data=job3;
by genderc descending occ_percent;
data _null_;
set job3;
by genderc;
if first.genderc then do;
macro_variable_1 = compress(cats(genderc||'_occ'));
macro_variable_2 = compress(cats(genderc||'_pct'));
call symput(macro_variable_1,put(occupation,occup.));
call symput(macro_variable_2,put(occ_percent,5.1));
end;
The effect of this code is to create macro variables that can be used to create document notes that refer directly to
figures in the document. These comments will be used to add context-sensitive notes to our graphs and tables.
With the macro variables loaded, the final step is to create a small document consisting of the text block just created a
table, and a pie chart. When you are done, you will have a text, table, and graph illustrating the same point. This
combination should appeal to a variety of readers with different styles.
The following code creates your document, and populates it with a table and a graph. In the next section, you will
learn how to navigate it.
ods document name=nesug.letstry;
title "Mean Income by Occupation";
proc tabulate data=nesug.jobsumm;
var income_mean;
class gender occupation;
freq employee_count;
table occupation,income_mean='Mean Income'*mean=''*f=comma9.;
table gender,income_mean='Mean Income'*mean=''*f=comma9.;
run;
proc format;
value mw
1 = 'Female' 2='Male' .='Combined';
run;
title "Occupational Choices by Gender";
/* for brevity, some graphics options were deferred to the appendix */
proc gchart data=nesug.jobsumm;
where _type_='11';
format gender mw. occupation occup.;
pie occupation / freq = employee_count
discrete across=2
group=gender
type=percent
legend=legend1
value=arrow
explode=2; /* emphasize the manager/supervisor */
run;
quit;
ods document close;
9
NESUG 2007
And Now, Presenting...
GUI MODE
Now that the document has been created, you will learn how to ‘replay’ your document, which means print it out to all
open ODS destinations.
Oddly, the SAS Log does not print out any notification of successful writing of an ODS document, as it does with the
HMTL or RTF destination. Therefore, to see whether your documents have been successfully created, platform, enter
the odsdocuments command in the main window. Do not get wrapped up in how the documents are ordered in the
document because they can be replayed in any order you want. During the writing of this paper, the author used the
GUI DOCUMENT procedure extensively to create RTF and HTML illustrations for this paper. Simply learning this
navigation and the OPEN As menu command will greatly enhance the convenience of ODS. Some page breaks were
manually removed, but this can also be done within the DOCUMENT procedure.
A screenshot of the GUI, with our document open is shown below. The screen shot shows one document with the
output of various SAS procedures, each in its own ‘directory’. I put ‘directory’ in quotes because these are not physical
directories on your operating system. SAS actually stores them in something called an ITEMSTORE in your SAS
library. However, you can work with your document as though it were stored in a directory tree. There are commands
for cut and paste, creating directories, linking. Many of these are beyond the scope of this paper, but it is hoped that
after gaining basic competence, that you will be curious and seek out the additional information as needed.
Figure 2 - The view of SAS documents from a GUI interface. The window has been un-docked and
maximized.
The GUI shows which documents are open. Choosing Open As from the POP-up menu you see when right-clicking
your mouse in windows.
For other operating systems, review the environment specific documentation. This will
replay the document. If you right click on the section you wish to print, and select ‘replay’, the document will print out
to the ODS destination – in this case HTML.
The figure below shows our document. We’d like to make a few changes:
Change the titles on the exhibits
Reorder the exhibits
Change the font.
Add some context sensitive notes.
10
NESUG 2007
And Now, Presenting...
Mean Income
Occupation
42,018
Technical
Manager/Supervisor
45,466
Clerical
41,107
Administrative
41,319
Mean Income by Occupation
Figure 3 – Our First Document. There is no discussion explaining these exhibits or the connection between
them.
Right now, the charts and figures appear to have no relationship to each other This is intentional, and you will learn to
remedy situations like this by connecting diagrams and figures with explanatory notes.
BATCH MODE
Tables from various SAS procedures are laid out, then this can be printed out to other ODS destinations. Thus, for
many purposes, it may not be necessary at all to use the Document commands. Nonetheless, this will give the “oil
change” version of what we can do when we get under the hood. For the “Engine Overhaul”, the SAS documentation
will be more accessible after we finish this lesson.
11
NESUG 2007
And Now, Presenting...
The batch mode also allows finer control over the document hierarchy than you can get from the GUI alone.
Additionally, automation of tasks may require macros, which requires mastering the commands of the DOCUMENT
procedure.
STRUCTURE OF A DOCUMENT
You may rightly ask why storing a document as a series of nested objects makes sense. After all, why create
something that is more complex to learn and manage than a simple text document? If you stop to think of it, yare final
business-ready documents really that simple? Most office projects, which typically are all stored in a folder, with some
specialized subdirectories for spreadsheets, graphs, and text. This is a hierarchical organization, just as the
DOCUMENT procedure output is. All of this must be integrated, and often something winds up inconsistent or broken.
Typically, this can only be noticed thirty minutes after the document is reviewed by the CEO.
Fortunately, on a first reading, we can still get some useful work done without understanding the entire procedure.
Now that we explained the reason for managing our document as a nested structure, despite its apparent
complications, lets get some work done.
SEEING YOUR DOCUMENT
In our earlier example, we saved a document as NESUG.LETSTRY. (I hope you remembered to close your
destination with an ODS close statement, or the document will be empty). Let’s see what’s in it.
proc document name=nesug.letstry;
list;
quit;
The DOCUMENT procedure is interactive; commands are executed immediately. This means, we terminate it with a
QUIT rather than a RUN: statement. Although the list statement takes several arguments, as a novice, you can run it
without arguments at all to see an overview of the entire document.
Listing of: \Nesug.Letstry\
Order by: Insertion
Number of levels: 1
Obs Path
Type
1 Datastep#1
Dir
2 Tabulate#1
Dir
3 Gchart#1
Dir
Figure 4 - Our first look at a SAS ODS Document
A few preliminaries are worth mentioning. The DOCUMENT destination, unlike file with file output, is cumulative. Each
time you write a new report, the DOCUMENT procedure will add a directory for the new report. If the report shares the
name with another procedure, it will create ‘#2’, and increasing the number each time. For example, if I were to re-run
Gchart, an entry for Gchart#2 would be created. The names are not case sensitive, and entries can be deleted.
To see the actual output, use the REPLAY command, as in the following example:
proc document name=nesug.letstry;
replay Tabulate#1;
run;
and behold, you will have your original document. The output is exactly the same as
Figure 1. A complete list can be found by listing all possible sublevels. The DOCUMENT procedure is still running. If
you are running SAS Enterprise Guide, however, the default method operation may be to close destinations and
terminate all procedures after any code is executed. In that case, in the following example, you may wish to execute
as one single block of code.
12
NESUG 2007
And Now, Presenting...
Warning: Although SAS has tried to make working with these documents similar to navigating a file system, there are
a couple of bits of nomenclature. Note that the LIST here has nothing to do with the ODS LISTING destination. It
merely lists entries of the document. Similarly DIR, which displays files in DOS/Windows environments merely
changes the current directory in the DOCUMENT procedure.
It’s also worth noting that the job of the DOCUMENT procedure is to let you drill down to the level of a particular
report, but not to dig into the actual cells to change any values
list / levels=all;
run;
Listing of: \Nesug.Letstry\
Order by: Insertion
Number of levels: All
Obs Path
Type
1 \Tabulate#1
Dir
2 \Tabulate#1\Report#1
Dir
3 \Tabulate#1\Report#1\Table#1
Table
4 \Gchart#1
Dir
5 \Gchart#1\Gchart#1
Graph
Table 8 - A complete view of your document
What does this tell us? The document consists of one directory for each procedure. There are two tables
(\Tabulate#5\Report#1\Table#1 and \Tabulate#5\Report#1\Table#2) Each directory contains output objects, which
can be, as mentioned before, Tables, Graphs, Notes, Equations, or more directories.
This seems rather intimidating at first. The numbers after the TABULATE, REPORT, and GCHART objects may
change as your session progresses, so if you are following along, your numbers may not be the same as the ones
here. However, printing out a listing such as this will help you navigate the document.
What do you do with that information? That is the result of the next section.
MAKING CHANGES
The top level directory is fine for replaying, but if you want to make changes, it will be necessary to access the actual
output object. To illustrate, say you wish to add a note to the table noting the difference in salary between males and
females. Each time the report is run, the text should change to reflect the numbers. To do this, first write code that
produces your message. Once it is satisfactory, store it in a macro variable.
The first set of code stores the percentages of men who choose management and respectively for women.
data messages2;
set job3;
if occupation=2 then call symput(compress(genderc)||'_mgmt',put(occ_percent,4.1));
run;
This code creates macro variables for comparing the mean salaries.
proc summary data=nesug.jobsumm nway;
freq employee_count;
class gender;
var income_mean;
output out=job4 mean(income_mean)=;
run;
proc transpose data=job4 out=job5;
run;
/* col1 = female, col2 = male */
13
NESUG 2007
And Now, Presenting...
data _null_;
file print;
set job5;
if _name_ = 'Income_Mean' then do;
if col1 > col2 then do;
call symput('higher','Women');
call symput('ratio',put(col1/col2 - 1,5.1));
end;
else do;
call symput('higher','Men');
call symput('ratio',put(col2/col1 - 1,5.1));
end;
end;
run;
First, let’s add a note after the TABULATE output. Of course, this example is rather contrived, but it is intended to
show you what you can do, not to be brilliant.
It may be wise to replay your table so you can be sure you’re annotating the right thing.
proc document name=nesug.letstry;
replay Tabulate#5\report#1\table#2;
run;
quit;
Gender
Mean Income
Female
42,800
Male
42,490
Overall
42,644
Table 9 - Replay of Mean Table by Gender
proc document name=nesug.letstry;
obanote Tabulate#5\report#1\table#2 "&higher. are higher by &ratio. percent";
run;
replay Tabulate#5\report#1\table#2 ;
run;
quit;
The OBANOTE command is new. OB stands for object. We have to drill down to the object level of a document to
apply it. ANOTE stands for ‘After Note’. We can also have before notes, and can have up to ten of them, as with
footnotes and titles. Notice that this is not the same as the title and footnote commands. These can be changed with
the OBTITLE and OBFOOTN. AFTER NOTES come before FOOTNOTES. See the SAS ODS documentation,
Chapter 4, for more details on the order of notes.
The note reads: “Women are higher by 0.73 percent”. Otherwise, the output is exactly the same as Table 9,a nd
does not need to be repeated. Typically the note text would then easily be integrated into the rest of the document.
This text would automatically adapt if the ratio changes, or if men were higher during a different period the report is
run.
Now that this text is included in the document, you can continue writing after pasting your output into word. Although
this division example was simple, more complex applications, such as computing p-values can be done more easily in
the context of the SAS process than with Microsoft Office. Of course, that’s a matter of opinion.
FINAL CHANGES
Once all the changes are made, it is easy to replay the final document to an ODS destination. During the writing of
this document, I found it most helpful to use the DOCUMENT procedure for managing the document’s structure, and
use the ODSDOCUMENT GUI to perform the replay. Replaying an entire document is straightforward. The final bit
of code manages page breaks and inserts a text note in between exhibits.
14
NESUG 2007
And Now, Presenting...
* --------------------------------------------------------------;
* Let's replay and explore the document we just created
* --------------------------------------------------------------;
title "Some views of this document";
* Fix secondary titles;
* Change fonts,styles?;
options nocenter;
%let current_graph = \Gchart#4\Gchart3#1;
proc document name=nesug.letstry label='Our First Try';
obpage &current_graph. / delete;
run;
obpage &current_graph / delete after;
obpage \Tabulate#2\Report#1\Table#1 / delete;
obanote \Tabulate#2\Report#1\Table#1 "Notice that &f_mgmt. percent of women
choose management";
run;
obstitle1 &current_graph. 'Practice';
obstitle2 &current_graph 'Subtitles';
run;
quit;
And here is the final document without page breaks and a relevant note inserted between. The original is presented
to contrast it with the final table. Speaking of notes, it is noteworthy that the note is inserted as text. This makes it
easy to create and maintain a group of figures and text together.
Gender
All
Female
Male
ColPctN
ColPctN
ColPctN
26.23
29.03
27.64
Manager/Supervisor
32.79
24.19
28.46
Clerical
22.95
22.58
22.76
Administrative
18.03
24.19
21.14
Occupation
Technical
15
NESUG 2007
And Now, Presenting...
Occupation Class
Female
Male
26.23
29.03
32.79
24.19
Clerical
22.95
22.58
Administrative
18.03
24.19
Technical
Manager/Supervisor
Notice that 32.8 percent of women choose management. The following graph makes
this same point.
16
NESUG 2007
And Now, Presenting...
NEXT STEPS
This paper focused on the structure of the table to produce different reports using summary data. It did not deal with
issues such as font style, justification, or patterns. However these aspects can dramatically improve the
professionalism of a report. Once you have the hang of the structure of tables, you may want to accentuate the
appearance of various cells beyond the number format. After reading this, you are encouraged read the ODS
documentation on the TEMPLATE procedure to make additional cosmetic improvements to your tables.
Also, it is not necessary to print out the complete path each time a document is referenced. There are shortcuts.
It would also be helpful t make sure that there is only one document with a given name on a given level, so that it is
not necessary to remember which ‘#’ number document you are working with.
CONCLUSION
Final number formats and alignments should be done on summary data, so that small changes are quick to
implement.
By understanding how to use the summary output provided by the TABULATE Procedure, with minor modification you
can re-run the same table statements on the summarized data to produce revised tables using a shorter cycle of
revisions between tables. This makes it more productive to produce output that communicates effectively to your
clients.
Using our ODS objects, we can build upon this by creating a document using the new DOCUMENT procedure. We
performed a simple example using a table and text that describes the table and is dynamically linked to it. Documents
can be managed by either a GUI interface, or the DOCUMENT procedure
Think of the document as working around the output objects. The document serves as a container, and lets you
manage the order, and some surrounding material.
But it does not let you change even the style of the output
objects. T/he main advantage to using the DOCUMENT procedure is the enormous flexibility inherent in being able
to replay a finished product to other formats and tagsets.
. Although the example was simple, it shows the potential of this tool. Hopefully you’ll take this example and use the
DOCUMENT procedure to make the reporting procedure consistent and creative!
APPENDIX
The full SAS/GRAPH code for the pie charts is listed here.
ods document name=nesug.letstry;
title "Occupational Choices by Gender";
legend1 position=(bottom center )
cborder=black;
pattern1 color=black
value=empty;
pattern2 color=black
value=p2;
pattern3 color=black
value=p2x45;
pattern4 color=black
value=solid;
proc gchart data=nesug.jobsumm;
where _type_='11';
format gender mw. occupation occup.;
pie occupation / freq = employee_count
discrete across=2
group=gender
type=percent
legend=legend1
value=arrow
explode=2;
run;
quit;
ods document close;
17
NESUG 2007
And Now, Presenting...
REFERENCES
SAS OnlineDoc for the Web, The TABULATE Procedure, example 13, SAS Institute, Cary NC
SAS Output Delivery System User Guide, Version 9.1.3, SAS Institute, Cary NC
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Name: Michael Tuchman
Enterprise: Surveillance Data Incorporated
Address: 220 West Germantown Pike, Suite 140
City, State ZIP: Plymouth Meeting, PA 19462-1423
Work Phone: (610) 834-0800 x1054
E-mail: mtuchman@survdata.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
18
Download