NESUG 2007 And Now, Presenting... Revising Output from the TABULATE Procedure Michael Tuchman, Surveillance Data Inc., Plymouth Meeting, PA ABSTRACT Many changes required in reports produced by the TABULATE procedure can be made without re-running the underlying analysis. This can be helpful in shortening the reviewing cycle on a reporting project. The first method is to re-run the procedure on the summary data set produced by the TABULATE procedure. The TABLE statement required is virtually identical, and the minor adjustments in nomenclature are reviewed. You will also learn how to interpret the structure of the summary table to make several common changes to the shape of the tables produced without re-running the original data. Once the table is perfected, additional options for saving and managing the final report are new in version 9.1. In particular, the DOCUMENT destination and the DOCUMENT procedure were introduced. The commands for managing an ODS DOCUMENT require a large learning curve. There is a new set of commands to master and a new data structure. In order to make the concepts tangible, we'll develop the tools for the document procedure using the output from the first section's TABULATE procedures to make the concepts more concrete. As a result, when you finish this paper, you'll have a sample document as a practical basis for further exploration. PREREQUISITES This paper assumes that you are familiar with the basics of the terminology that SAS uses to summarize data. In particular, the reader should be familiar with the usage of CLASS and VAR statements. The reader should also be familiar with the FORMAT statement, some common formats for numbers and dates, and making custom formats to regroup data. Some knowledge of the TABULATE procedure is also assumed, but the use of ‘fancy’ tabulate magic will be kept to a minimum. For the most part, the features used will be similar to those already available in the MEANS procedure. INTRODUCTION Despite arduous work and careful design, changes need to be made to a report. Perhaps your audience prefers a different grouping level. You may have ages broken down by five year bands, but your audience decides on ten-year bands. Often, a report for business users requires fewer decimal places, or experimenting with putting or removing additional level of subtotals. It is desirable to do this without re-running a possibly time consuming analysis. You will learn two ways to do this. Firstly, there is the output data set produced form the TABULATE procedure. Running TABULATE again on this data set, slightly modified, can produce the needed changes. Additionally, you may have competition for your attention. With Microsoft Excel, you can reshape, and redefine fields in a pivot table by clicking and dragging. Of course, the TABULATE procedure can deliver a much richer functionality, including row and column percentages with any denominator we wish. Still, as SAS users, we need to show our clients that we can be just as nimble when a change in reporting is required. The strategy recommended here for fine tuning a table is to output the intermediate totals into another table. The TABULATE procedure has such an output. We’ll begin by understanding how the output data set from the TABULATE procedure is laid out, then go into further detail. By using this secondary table, a great deal can be done to re-cast totals, means, sums, and standard deviations without recomposing from scratch. The first step is to see that with minor modifications, you can use the same TABULATE code on the summarized data that you can on the original data and produce the same table. The modifications are simple. First: Make sure you save the frequency of each cell combination. Then you can feed the output of the TABULATE procedure back into the same table statements provided you make the following modifications. For each example, I will show how the TABLE statement works on the original data, and on the SUMMARIZED data. Another solution for fine-tuning tabular reports is to design your work with a random subset of the data, or the first rows of a dataset. The only disadvantage to this approach is that you may not realize there is a problem with your output until it is too late to do anything about it. For example, you may find that the formats you choose are not wide enough to accommodate the width of a subtotal field until you run it on the final data, and then the boss says he needs the final report in 5 minutes! Having a nice output is only half the battle. With PROC DOCUMENT, we can actually assemble a group of related tables, as well as text, and print them out in any of the standard ODS formats. This means, for example, that it is possible to add a block of text to your report that specifically references a particular cell in a table. No more hand editing text every time a table has to be updated! -1- NESUG 2007 And Now, Presenting... DATA A dataset that provides enough ground for exploration is the example in the SAS 9.1.3 Online Documentation™. I added an income field, in order to have a numeric variable to analyze. Data Jobclass; input Gender Occupation datalines; 1 1 42300 1 1 41900 1 1 1 1 41800 1 1 41200 1 2 1 2 45300 1 2 44400 1 2 1 3 40900 1 3 41200 1 3 1 3 40200 1 1 41600 1 1 1 2 46100 1 2 45700 1 2 1 3 41100 1 3 41000 1 4 1 4 41200 1 4 40300 1 4 1 1 42300 1 1 41700 1 1 1 2 46000 1 2 45200 1 2 1 3 41300 1 3 41100 1 3 1 4 42500 1 4 41800 1 4 1 3 40900 2 1 42100 2 1 2 1 41700 2 1 40700 2 1 2 2 45800 2 2 45300 2 2 2 3 41200 2 4 40100 2 4 2 4 41200 2 4 41000 2 1 2 3 41400 2 3 41700 2 3 2 4 42100 2 4 41600 2 4 2 1 42300 2 1 42200 2 1 2 2 45300 2 2 45700 2 2 2 3 40800 2 3 41200 2 4 2 1 41900 2 1 42100 2 1 2 2 45200 2 2 45800 2 2 2 3 41300 2 3 41000 2 4 Income @@; 42300 46100 45000 41700 42400 46500 41600 41000 41900 44900 41200 40600 43100 42200 44900 41000 42500 40700 41000 41700 45800 42400 42100 45900 41200 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 3 1 2 4 1 2 2 3 4 1 2 3 4 3 4 1 2 2 4 1 3 42200 44900 45400 41000 41400 45200 41400 41600 45600 45100 41100 41700 42200 45600 40900 41300 41600 40700 42200 45300 46200 41200 42600 40900 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 3 3 2 2 4 1 2 2 4 1 1 2 3 4 3 4 1 2 2 4 1 3 42200 44700 41200 40700 44500 46300 41400 42700 46000 47000 41200 42100 42600 44900 41500 40800 41100 41700 41300 45100 44600 42300 41500 41100 Let’s begin our exploration of this data by computing the mean and standard deviation by gender and occupation class. Table 1 shows the first tabulation and the results: proc tabulate data=jobclass; class gender occupation; var income; table gender * occupation ,n run; /* This is directly from the proc format; value gend 1=s'Female' 2='Male' other='*** Data Entry Error value occup 1='Technical' 2='Manager/Supervisor' 3='Clerical' 4='Administrative' other='*** Data Entry Error run; income * (mean std); documentation */ ***'; ***'; 2 NESUG 2007 And Now, Presenting... The results are as follows: income N Mean Std Gender Occupation 1 1 16 41975.00 405.79 2 20 45495.00 702.23 3 14 41042.86 334.47 4 11 41336.36 595.44 1 18 42055.56 543.65 2 15 45426.67 446.36 3 14 41171.43 302.37 4 15 41306.67 623.89 2 Table 1 - The first attempt might not have all the formatting you want. Of course, this is not terribly helpful. Gender and occupation should be more human readable, income should be in the comma or currency format, and let’s say we’d also like the gender means directly compared. We could re-run the tabulation on the original data, and it with a few bells and whistles, the more human readable tabulation would look as follows: proc tabulate data=jobclass; label income = 'Annual Income'; keylabel std = 'Std. Dev'; format gender gend. occupation occup.;class gender occupation; var income; table gender * occupation ,n income * (mean*f=comma9. std); run; Annual Income N Mean Std. Dev Gender Occupation Female Technical 16 41,975 405.79 Manager/Supervisor 20 45,495 702.23 Clerical 14 41,043 334.47 Administrative 11 41,336 595.44 Technical 18 42,056 543.65 Manager/Supervisor 15 45,427 446.36 Clerical 14 41,171 302.37 Administrative 15 41,307 623.89 Male Table 2 - Formatting can be applied after summarization Figure 1 - A nicer version, but still some work to do. The goal of this exercise is to see how each table is produced first from the original, then from the summary data. With this in mind, we’ll add an OUT= option to the tabulate statement. The purpose of the OUT= is to show the underlying data used to create the final table. It is desirable to work with this instead of the original table to make quick changes happen quickly. Before we get into this, let’s look at the summary data set and its nomenclature. Use the same tabulate statement as before,and this time produce the summary data set JOBSUMM: To better illustrate the effect of the _TYPE_ variable, the following TABULATE output shows all combinations of subtotals. Only the summary output will be printed as table Table 3. 3 NESUG 2007 And Now, Presenting... proc tabulate data=nesug.jobclass out=nesug.jobsumm; format gender gend. occupation occup.; class gender occupation; var income; table (gender all) * (occupation all) ,n*f=3. income * (mean*f=comma9. std); run; Gender Occupation _TYPE_ Mean Income Std. Female Technical 11 41975.00 405.79 Female Manager/Supervisor 11 45495.00 702.23 Female Clerical 11 41042.86 334.47 Female Administrative 11 41336.36 595.44 Male Technical 11 42055.56 543.65 Male Manager/Supervisor 11 45426.67 446.36 Male Clerical 11 41171.43 302.37 Male Administrative 11 41306.67 623.89 Female All Occupations 10 42800.00 1999.08 Male All Occupations 10 42490.32 1776.69 Both Technical 01 42017.65 478.30 Both Manager/Supervisor 01 45465.71 598.99 Both Clerical 01 41107.14 319.64 Both Administrative 01 41319.23 600.01 Both All Occupations 00 42643.90 1888.89 Table 3 - Summary Data from the TABULATE procedure There are a couple of things to notice about this table. First, the occupation and gender are right justified. It may look better left justified. This will be fixed towards the very end of the revision process, once the final table format is set. For the time being, let’s fix our attention on the cells in the above printout that do not come directly from the table in Figure 1Table 3, They are the _TYPE_, _PAGE_, and _TABLE_. Since this is a one table, one page report, the _PAGE_ and _TABLE_ were1 for this output and were omitted from the display. While _PAGE_ and _TABLE_ are straightforward in their explanation, The _TYPE_ variable requires a little bit of explanation. _TYPE_ is a character variable, indicating which variables were held constant during a summation. Each 1 corresponds to a cell that is restricted, whereas 0 refers to a statistic that is unrestricted. Thus ‘00’ refers to an unrestricted mean in this instance, whereas ‘11’ refers to a total where gender and occupation are both fixed. _TYPE_ required the most detailed explanation. The others are more straightforward. _PAGE_ refers to the page number if more than two table dimensions were created in building the table.. If the TABULATE procedure had more than one table statement , the _TABLE_ would reflect this number. FIRST RETABULATION Now that we understand a simple example of the summary data set, let’s use it to move the gender column to the top. Here’s how it looks as a statement on the original data. proc tabulate data=nesug.jobclass ; var income; class occupation gender; table occupation,gender*(n income*(mean std)); run; 4 NESUG 2007 And Now, Presenting... Sex Female Male Income N Mean Income Std N Mean Std Occupation Technical 16 41975.00 405.79 18 42055.56 543.65 Manager/Supervisor 20 45495.00 702.23 15 45426.67 446.36 Clerical 14 41042.86 334.47 14 41171.43 302.37 Administrative 11 41336.36 595.44 15 41306.67 623.89 Table 4 - Summary Statistics for the Occupational Data Set The advantage to the individual level table, is that any statistic, including variance statistics can be recomputed. The disadvantage is that in a rush situation, you may not have even one minute per tabulate revision. The tabulate statement on the summary data is virtually identical. adjustments: Rename the N variable to something more descriptive. To make it truly identical, make the following I used ‘employee_count’ here. Use this variable as the new frequency weights. The differences with the summary data is that our cell counts must now be summed. Furthermore, each salary is now replaced by its mean salary, so to obtain the same addresses, we now need frequency counts. Also, to promote clarity, we will rename N to something more descriptivfe, since N has special meaning in the TABULATE procedure. The other difference is that the TABULATE procedure has created a new variable called income_mean to replace the original income variable. Although it is possible to rename the variable income_mean to income, we will not do that here so that we are reminded that we are working with summary values rather than the original data. proc tabulate data=jobsumm; /* N has been renamed to employee_count */ freq employee_count; var income_mean; class occupation gender; table occupation,gender*(n income_mean * mean); run; The new element here is FREQ count. As with the FREQ option in other SAS procedures, the purpose is that each record in the input table is counted multiple times. The multiple is determined by the value of the COUNT variable. Where did the standard deviation go? Since we are working with summary data, we lose the information on the variability of income within each occupation. However, we have saved the information on the prior table and will splice it back on shortly. PERCENTAGES BY COUNT Percentages also behave nicely after passing to summary level data. The apparent difference, in means by gender, is that in this company, men and women choose different jobs. Whether this is by choice, or by social conditioning, is beyond the scope of our paper. First, here is the code for summarization on the non-aggregated data. proc tabulate data=nesug.jobclass; class gender occupation; format gender gend. occupation occup.; table occupation,(gender all)*colpctn; run; and on the aggregated data, with the bold code showing the only difference. proc tabulate data=nesug.jobsumm out=job2; freq employee_count; class gender occupation; format gender gend. occupation occup.; table occupation,(gender all)*colpctn; run; 5 NESUG 2007 And Now, Presenting... Gender All Female Male ColPctN ColPctN ColPctN 26.23 29.03 27.64 Manager/Supervisor 32.79 24.19 28.46 Clerical 22.95 22.58 22.76 Administrative 18.03 24.19 21.14 Occupation Technical Table 5 - Percentage of people in occupation classes, by gender MAKING CHANGES TO TABULATE OUTPUT – REVIEW The two examples are typical of the types of work you can do with summary data. The only difference is the insertion of the FREQ statement to make sure all the observations are counted. If the original data had several million observations, re-printing this table could take up to one minute per revision. While this does not sound like much, it is many times more than the few seconds it should take. After all, when you’re rushed, there’s rarely just one thing to do. Shortening the development cycle of a report should reduce errors by giving the developer more time for proofreading. In this final example, we’ll highlight the cells corresponding to the most popular choice by making the font size larger. Recall that in ODS, any style attribute can be chosen by means of a format. It may take a little while to fiddle with format settings to get exactly the results that please you. Focusing on summary data makes it possible to try more things in less time. Code: proc format; value maxf (fuzz=.1) &f_pct. = '6' &m_pct. = '6' other='2'; run; The code here makes sure that the maximum value of percentage is put into a larger font (size 6) than other values (size 2). Explore other options of the tabulate procedure’s ODS formatting capabilities. Doing so with a summary data set will shorten the learning curve. title "Fiddling with formats"; ods html file='c:\documents and settings\michael\final.html'; proc tabulate data=nesug.jobsumm style={background=yellow font_size=1}; class occupation gender; classlev occupation / style=[just=l background=darkblue foreground=white font_weight=bold]; classlev gender / style=[background=darkred foreground=white font_weight=bold]; freq employee_count; table occupation='', gender=''*pctn<occupation>='' *[style=[background=white foreground=black font_size=maxf.]] /box=[label='Occupation Class' style=[font_size=2 background=darkred foreground=white]]; run; ods html close; And the pièce de résistance : format. ) (The Blue and Red had to be changed to black in order to appear correctly in PDF 6 NESUG 2007 And Now, Presenting... Occupation Class Female Male 26.23 29.03 Manager/Supervisor 32.79 24.19 Clerical 22.95 22.58 Administrative 18.03 24.19 Technical Table 6 –Font differentiated exhibit THE DOCUMENT PROCEDURE Now that we have finished our work and wish to manage our printed reports within SAS, it’s time to manage your finished product with DOCUMENT procedure. Typically the next step is to include tables in a final report and provide some discussion suitable for your audience. Often this can be a tedious process, rife with errors, hand-editing, and its corresponding inconsistencies. It is important to understand the advantages of learning this way of managing reports before attempting the myriad and powerful commands for managing document stores. The ODS Document facility enables you to manage related exhibits as a group. Since related figures stay together, revisions are easier to keep consistent, and since the results are stored in SAS catalogs, they are persistent. Other SAS users can replay your reports at a later date. Of course, there are other ways to manage report output outside of SAS. But there, there is no enforcement of keeping related documents together. One particularly nice feature is the ability to store data in one format, and present it in many others, including some not yet invented! Both SAS and independent developers are adding styles and tagsets to ODS every day. By storing your work in document format, you enable yourself to take advantage not only the styles of today, but also of the future. For our simple example, our business problem is to analyze the apparent discrepancy in average male and female income for the JOBCLASS data set., and gather related text, tabular, and graphic demonstrations together. We will do this by building a simple document consisting of the tabular output produced earlier, then create some text that discusses it. Once you develop grounding in the fundamentals using practical examples, you may wish to try your hand with longer documents, and also use equations.. WHAT IS A DOCUMENT According to the SAS® documentation, a Document is a collection of Graphs, Tables, Equations, Notes According to the SAS Online Documentation, the REPORT procedure is not yet supported with the DOCUMENT procedure, nor is all features of the PRINT procedure. To take full advantage of printing in ODS, the PUT _ODS_ statement in the ODS in the data step will be the ‘understudy’ until the PRINT procedure is fully supported by ODS document. . Fortunately, the output from the TABULATE procedure is fully supported as a DOCUMENT type. The miniature business analysis will be to compare the distribution of job classes taken by male and female employees, and report the most common choice for male and female. CREATING THE DOCUMENT ods document name=nesug.letstry; 7 NESUG 2007 And Now, Presenting... This creates the document. You may, as with all ODS destinations, print as much stuff out to it as you want. Pick a permanent library for this, since the advantage of having your reports always ready would be lost if you use the WORK library. TITLE 'Occupation Choices By Gender'; proc tabulate data=nesug.jobsumm out=job2; freq count; class gender occupation; format gender gend. occupation occup.; table occupation,(gender all)*colpctn; run; If you have an instance of BASE SAS running on your machine, you can use the DOCUMENT window. In fact, if you’re already familiar with the RESULTS window of BASE SAS, the DOCUMENT window works very similarly, but with a few improvements. We won’t close the document destination just yet. Our next move is to write a small paragraph based on the results of the tabulation. The first step is to extract the interesting numbers to macro variables, then include these macro variables in text. The next step is optional, but to get the percentage of occupation by gender (PCTN_10) and unrestricted percentage (PCTN_00) into a single column, use a data step. data job3; set job2; file print ods; genderC = put(gender,gend1.); if _type_ = '11' then do; occ_percent = pctn_10; end; else do; occ_percent = pctn_00; end; put _ods_; keep genderC occupation occ_percent; run; genderc Occupation occ_percent F Technical 26.229508197 M Technical 29.032258065 F Manager/Supervisor 32.786885246 M Manager/Supervisor 24.193548387 F Clerical 22.950819672 M Clerical 22.580645161 F Administrative 18.032786885 M Administrative 24.193548387 B Technical 27.642276423 B Manager/Supervisor 28.455284553 B Clerical 22.764227642 B Administrative 21.138211382 Table 7 - Summary Data. B refers to both genders. 8 NESUG 2007 And Now, Presenting... This data step also illustrates a new way of thinking about printing and reporting that will become more prevalent, I predict, as the ODS evolves. This is implemented by the FILE PRINT ODS; statement and the PUT _ODS_. By default, the PUT _ODS_ is similar to a PUT _ALL_ statement, unless restricted by listing the desired variables. The _ODS_ output can be customized with styles and conditional formatting. There is far more to the ODS in the Data Step than can be covered here. The interested reader is referred to the SAS ODS documentation or Lauren Haworth’s text ‘ODS by Example’. The next step is to load the values we are most interested in discussing into macro variables. These variables can then be used in document notes. This code takes the most popular occupations from each gender, then puts the most popular for women into the macro variable &f_occ and the corresponding percentage into the macro variable &f_pct. The corresponding variables for the men were populated as well. /* make sure the most popular is the first one we see in each gender */ /* PRINT THE FILE AND SAVE THE MOST POPULAR OCCUPATION CLASSES FOR EACH GENDER */ proc sort data=job3; by genderc descending occ_percent; data _null_; set job3; by genderc; if first.genderc then do; macro_variable_1 = compress(cats(genderc||'_occ')); macro_variable_2 = compress(cats(genderc||'_pct')); call symput(macro_variable_1,put(occupation,occup.)); call symput(macro_variable_2,put(occ_percent,5.1)); end; The effect of this code is to create macro variables that can be used to create document notes that refer directly to figures in the document. These comments will be used to add context-sensitive notes to our graphs and tables. With the macro variables loaded, the final step is to create a small document consisting of the text block just created a table, and a pie chart. When you are done, you will have a text, table, and graph illustrating the same point. This combination should appeal to a variety of readers with different styles. The following code creates your document, and populates it with a table and a graph. In the next section, you will learn how to navigate it. ods document name=nesug.letstry; title "Mean Income by Occupation"; proc tabulate data=nesug.jobsumm; var income_mean; class gender occupation; freq employee_count; table occupation,income_mean='Mean Income'*mean=''*f=comma9.; table gender,income_mean='Mean Income'*mean=''*f=comma9.; run; proc format; value mw 1 = 'Female' 2='Male' .='Combined'; run; title "Occupational Choices by Gender"; /* for brevity, some graphics options were deferred to the appendix */ proc gchart data=nesug.jobsumm; where _type_='11'; format gender mw. occupation occup.; pie occupation / freq = employee_count discrete across=2 group=gender type=percent legend=legend1 value=arrow explode=2; /* emphasize the manager/supervisor */ run; quit; ods document close; 9 NESUG 2007 And Now, Presenting... GUI MODE Now that the document has been created, you will learn how to ‘replay’ your document, which means print it out to all open ODS destinations. Oddly, the SAS Log does not print out any notification of successful writing of an ODS document, as it does with the HMTL or RTF destination. Therefore, to see whether your documents have been successfully created, platform, enter the odsdocuments command in the main window. Do not get wrapped up in how the documents are ordered in the document because they can be replayed in any order you want. During the writing of this paper, the author used the GUI DOCUMENT procedure extensively to create RTF and HTML illustrations for this paper. Simply learning this navigation and the OPEN As menu command will greatly enhance the convenience of ODS. Some page breaks were manually removed, but this can also be done within the DOCUMENT procedure. A screenshot of the GUI, with our document open is shown below. The screen shot shows one document with the output of various SAS procedures, each in its own ‘directory’. I put ‘directory’ in quotes because these are not physical directories on your operating system. SAS actually stores them in something called an ITEMSTORE in your SAS library. However, you can work with your document as though it were stored in a directory tree. There are commands for cut and paste, creating directories, linking. Many of these are beyond the scope of this paper, but it is hoped that after gaining basic competence, that you will be curious and seek out the additional information as needed. Figure 2 - The view of SAS documents from a GUI interface. The window has been un-docked and maximized. The GUI shows which documents are open. Choosing Open As from the POP-up menu you see when right-clicking your mouse in windows. For other operating systems, review the environment specific documentation. This will replay the document. If you right click on the section you wish to print, and select ‘replay’, the document will print out to the ODS destination – in this case HTML. The figure below shows our document. We’d like to make a few changes: Change the titles on the exhibits Reorder the exhibits Change the font. Add some context sensitive notes. 10 NESUG 2007 And Now, Presenting... Mean Income Occupation 42,018 Technical Manager/Supervisor 45,466 Clerical 41,107 Administrative 41,319 Mean Income by Occupation Figure 3 – Our First Document. There is no discussion explaining these exhibits or the connection between them. Right now, the charts and figures appear to have no relationship to each other This is intentional, and you will learn to remedy situations like this by connecting diagrams and figures with explanatory notes. BATCH MODE Tables from various SAS procedures are laid out, then this can be printed out to other ODS destinations. Thus, for many purposes, it may not be necessary at all to use the Document commands. Nonetheless, this will give the “oil change” version of what we can do when we get under the hood. For the “Engine Overhaul”, the SAS documentation will be more accessible after we finish this lesson. 11 NESUG 2007 And Now, Presenting... The batch mode also allows finer control over the document hierarchy than you can get from the GUI alone. Additionally, automation of tasks may require macros, which requires mastering the commands of the DOCUMENT procedure. STRUCTURE OF A DOCUMENT You may rightly ask why storing a document as a series of nested objects makes sense. After all, why create something that is more complex to learn and manage than a simple text document? If you stop to think of it, yare final business-ready documents really that simple? Most office projects, which typically are all stored in a folder, with some specialized subdirectories for spreadsheets, graphs, and text. This is a hierarchical organization, just as the DOCUMENT procedure output is. All of this must be integrated, and often something winds up inconsistent or broken. Typically, this can only be noticed thirty minutes after the document is reviewed by the CEO. Fortunately, on a first reading, we can still get some useful work done without understanding the entire procedure. Now that we explained the reason for managing our document as a nested structure, despite its apparent complications, lets get some work done. SEEING YOUR DOCUMENT In our earlier example, we saved a document as NESUG.LETSTRY. (I hope you remembered to close your destination with an ODS close statement, or the document will be empty). Let’s see what’s in it. proc document name=nesug.letstry; list; quit; The DOCUMENT procedure is interactive; commands are executed immediately. This means, we terminate it with a QUIT rather than a RUN: statement. Although the list statement takes several arguments, as a novice, you can run it without arguments at all to see an overview of the entire document. Listing of: \Nesug.Letstry\ Order by: Insertion Number of levels: 1 Obs Path Type 1 Datastep#1 Dir 2 Tabulate#1 Dir 3 Gchart#1 Dir Figure 4 - Our first look at a SAS ODS Document A few preliminaries are worth mentioning. The DOCUMENT destination, unlike file with file output, is cumulative. Each time you write a new report, the DOCUMENT procedure will add a directory for the new report. If the report shares the name with another procedure, it will create ‘#2’, and increasing the number each time. For example, if I were to re-run Gchart, an entry for Gchart#2 would be created. The names are not case sensitive, and entries can be deleted. To see the actual output, use the REPLAY command, as in the following example: proc document name=nesug.letstry; replay Tabulate#1; run; and behold, you will have your original document. The output is exactly the same as Figure 1. A complete list can be found by listing all possible sublevels. The DOCUMENT procedure is still running. If you are running SAS Enterprise Guide, however, the default method operation may be to close destinations and terminate all procedures after any code is executed. In that case, in the following example, you may wish to execute as one single block of code. 12 NESUG 2007 And Now, Presenting... Warning: Although SAS has tried to make working with these documents similar to navigating a file system, there are a couple of bits of nomenclature. Note that the LIST here has nothing to do with the ODS LISTING destination. It merely lists entries of the document. Similarly DIR, which displays files in DOS/Windows environments merely changes the current directory in the DOCUMENT procedure. It’s also worth noting that the job of the DOCUMENT procedure is to let you drill down to the level of a particular report, but not to dig into the actual cells to change any values list / levels=all; run; Listing of: \Nesug.Letstry\ Order by: Insertion Number of levels: All Obs Path Type 1 \Tabulate#1 Dir 2 \Tabulate#1\Report#1 Dir 3 \Tabulate#1\Report#1\Table#1 Table 4 \Gchart#1 Dir 5 \Gchart#1\Gchart#1 Graph Table 8 - A complete view of your document What does this tell us? The document consists of one directory for each procedure. There are two tables (\Tabulate#5\Report#1\Table#1 and \Tabulate#5\Report#1\Table#2) Each directory contains output objects, which can be, as mentioned before, Tables, Graphs, Notes, Equations, or more directories. This seems rather intimidating at first. The numbers after the TABULATE, REPORT, and GCHART objects may change as your session progresses, so if you are following along, your numbers may not be the same as the ones here. However, printing out a listing such as this will help you navigate the document. What do you do with that information? That is the result of the next section. MAKING CHANGES The top level directory is fine for replaying, but if you want to make changes, it will be necessary to access the actual output object. To illustrate, say you wish to add a note to the table noting the difference in salary between males and females. Each time the report is run, the text should change to reflect the numbers. To do this, first write code that produces your message. Once it is satisfactory, store it in a macro variable. The first set of code stores the percentages of men who choose management and respectively for women. data messages2; set job3; if occupation=2 then call symput(compress(genderc)||'_mgmt',put(occ_percent,4.1)); run; This code creates macro variables for comparing the mean salaries. proc summary data=nesug.jobsumm nway; freq employee_count; class gender; var income_mean; output out=job4 mean(income_mean)=; run; proc transpose data=job4 out=job5; run; /* col1 = female, col2 = male */ 13 NESUG 2007 And Now, Presenting... data _null_; file print; set job5; if _name_ = 'Income_Mean' then do; if col1 > col2 then do; call symput('higher','Women'); call symput('ratio',put(col1/col2 - 1,5.1)); end; else do; call symput('higher','Men'); call symput('ratio',put(col2/col1 - 1,5.1)); end; end; run; First, let’s add a note after the TABULATE output. Of course, this example is rather contrived, but it is intended to show you what you can do, not to be brilliant. It may be wise to replay your table so you can be sure you’re annotating the right thing. proc document name=nesug.letstry; replay Tabulate#5\report#1\table#2; run; quit; Gender Mean Income Female 42,800 Male 42,490 Overall 42,644 Table 9 - Replay of Mean Table by Gender proc document name=nesug.letstry; obanote Tabulate#5\report#1\table#2 "&higher. are higher by &ratio. percent"; run; replay Tabulate#5\report#1\table#2 ; run; quit; The OBANOTE command is new. OB stands for object. We have to drill down to the object level of a document to apply it. ANOTE stands for ‘After Note’. We can also have before notes, and can have up to ten of them, as with footnotes and titles. Notice that this is not the same as the title and footnote commands. These can be changed with the OBTITLE and OBFOOTN. AFTER NOTES come before FOOTNOTES. See the SAS ODS documentation, Chapter 4, for more details on the order of notes. The note reads: “Women are higher by 0.73 percent”. Otherwise, the output is exactly the same as Table 9,a nd does not need to be repeated. Typically the note text would then easily be integrated into the rest of the document. This text would automatically adapt if the ratio changes, or if men were higher during a different period the report is run. Now that this text is included in the document, you can continue writing after pasting your output into word. Although this division example was simple, more complex applications, such as computing p-values can be done more easily in the context of the SAS process than with Microsoft Office. Of course, that’s a matter of opinion. FINAL CHANGES Once all the changes are made, it is easy to replay the final document to an ODS destination. During the writing of this document, I found it most helpful to use the DOCUMENT procedure for managing the document’s structure, and use the ODSDOCUMENT GUI to perform the replay. Replaying an entire document is straightforward. The final bit of code manages page breaks and inserts a text note in between exhibits. 14 NESUG 2007 And Now, Presenting... * --------------------------------------------------------------; * Let's replay and explore the document we just created * --------------------------------------------------------------; title "Some views of this document"; * Fix secondary titles; * Change fonts,styles?; options nocenter; %let current_graph = \Gchart#4\Gchart3#1; proc document name=nesug.letstry label='Our First Try'; obpage &current_graph. / delete; run; obpage &current_graph / delete after; obpage \Tabulate#2\Report#1\Table#1 / delete; obanote \Tabulate#2\Report#1\Table#1 "Notice that &f_mgmt. percent of women choose management"; run; obstitle1 &current_graph. 'Practice'; obstitle2 &current_graph 'Subtitles'; run; quit; And here is the final document without page breaks and a relevant note inserted between. The original is presented to contrast it with the final table. Speaking of notes, it is noteworthy that the note is inserted as text. This makes it easy to create and maintain a group of figures and text together. Gender All Female Male ColPctN ColPctN ColPctN 26.23 29.03 27.64 Manager/Supervisor 32.79 24.19 28.46 Clerical 22.95 22.58 22.76 Administrative 18.03 24.19 21.14 Occupation Technical 15 NESUG 2007 And Now, Presenting... Occupation Class Female Male 26.23 29.03 32.79 24.19 Clerical 22.95 22.58 Administrative 18.03 24.19 Technical Manager/Supervisor Notice that 32.8 percent of women choose management. The following graph makes this same point. 16 NESUG 2007 And Now, Presenting... NEXT STEPS This paper focused on the structure of the table to produce different reports using summary data. It did not deal with issues such as font style, justification, or patterns. However these aspects can dramatically improve the professionalism of a report. Once you have the hang of the structure of tables, you may want to accentuate the appearance of various cells beyond the number format. After reading this, you are encouraged read the ODS documentation on the TEMPLATE procedure to make additional cosmetic improvements to your tables. Also, it is not necessary to print out the complete path each time a document is referenced. There are shortcuts. It would also be helpful t make sure that there is only one document with a given name on a given level, so that it is not necessary to remember which ‘#’ number document you are working with. CONCLUSION Final number formats and alignments should be done on summary data, so that small changes are quick to implement. By understanding how to use the summary output provided by the TABULATE Procedure, with minor modification you can re-run the same table statements on the summarized data to produce revised tables using a shorter cycle of revisions between tables. This makes it more productive to produce output that communicates effectively to your clients. Using our ODS objects, we can build upon this by creating a document using the new DOCUMENT procedure. We performed a simple example using a table and text that describes the table and is dynamically linked to it. Documents can be managed by either a GUI interface, or the DOCUMENT procedure Think of the document as working around the output objects. The document serves as a container, and lets you manage the order, and some surrounding material. But it does not let you change even the style of the output objects. T/he main advantage to using the DOCUMENT procedure is the enormous flexibility inherent in being able to replay a finished product to other formats and tagsets. . Although the example was simple, it shows the potential of this tool. Hopefully you’ll take this example and use the DOCUMENT procedure to make the reporting procedure consistent and creative! APPENDIX The full SAS/GRAPH code for the pie charts is listed here. ods document name=nesug.letstry; title "Occupational Choices by Gender"; legend1 position=(bottom center ) cborder=black; pattern1 color=black value=empty; pattern2 color=black value=p2; pattern3 color=black value=p2x45; pattern4 color=black value=solid; proc gchart data=nesug.jobsumm; where _type_='11'; format gender mw. occupation occup.; pie occupation / freq = employee_count discrete across=2 group=gender type=percent legend=legend1 value=arrow explode=2; run; quit; ods document close; 17 NESUG 2007 And Now, Presenting... REFERENCES SAS OnlineDoc for the Web, The TABULATE Procedure, example 13, SAS Institute, Cary NC SAS Output Delivery System User Guide, Version 9.1.3, SAS Institute, Cary NC CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Michael Tuchman Enterprise: Surveillance Data Incorporated Address: 220 West Germantown Pike, Suite 140 City, State ZIP: Plymouth Meeting, PA 19462-1423 Work Phone: (610) 834-0800 x1054 E-mail: mtuchman@survdata.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. 18