NESUG 18 Applications The Survey Data Book an Easy Read via Proc Tabulate and the ODS RTF Destination Electra Small, MDRC Abstract The intake period for a lengthy survey was over. The next step was to create a nicely formatted “preliminary results book.” The concept of the survey data book was that it would mirror the initial questionnaire in design. It was also to report the total number who responded to each question and the percentage who gave each response. This paper steps through the use of SAS® Proc Tabulate, Proc Format, Proc Template, the ODS RTF destination, and the macro language accomplish this project. Introduction to the Project and Purpose Can an employment program located in public housing developments help residents earn more money and improve quality of life? The Jobs-Plus program, set up in six cities, sought to achieve these goals. Surveys were conducted with the residents during the first year and about 2 ½ years later. To communicate the results of the survey we decided that we would create a nicely formatted “preliminary results book” for each city. The data book would display the exact wording of each question and the possible responses. It would report the total number who responded to each question and the percentage who gave each response. To create the books for the first wave of the survey Proc Freqs were run in SAS®, printed out, and column percentages were hand input into table shells created in Excel®. This process was labor intensive and error prone. By the time the survey’s second wave was completed our company had upgraded to SAS 8.2 which included many features added to Proc Tabulate and Proc Report and ODS was now in production. Also our company had purchased two new books ODS the Basics, and Proc Tabulate by Example, both by Lauren E. Haworth. After reading these books and testing out the features of this newer version of SAS it was decided that the production of the survey data books would be automated as much as possible. We knew we wanted to closely match the formatting used when creating the first wave books. Below is an example of one table from the first wave books. 89. When you were growing up, did you ever live in a public housing development? Rainier Vista (%) Yesler Terrace (%) 28.5 71.5 43.2 56.8 0.0 0.0 0.0 0.0 Yes No Refused Don't know This paper will step through how SAS Proc Format, Proc Tabulate, Proc Template, and ODS were used to accomplish this project. Once initial coding was completed, it was placed in a parameterized macro. The Programming In reading the documentation about ODS destinations, we realized quickly that the ODS’ RTF destination could create nicely formatted tables in an RTF document, and that RTF documents become MS Word® documents in just a few clicks. On the other hand, ODS and Excel don’t work as well together on things like page breaks and table headings. So, we decided to go with the ODS RTF destination. 1 NESUG 18 Applications We discovered that some of the Proc Tabulate formatting syntax affects how standard output looks but has no effect on the look of RTF/MS Word documents. For this reason we decided to start by producing tables for just three of the survey questions using a small dummy dataset. There were six main steps involved in the programming of the survey data books. First we used formats to give each response its proper wording. Next we used the label statement to give each question its exact wording. We used the ODS statement to direct our output to a RTF document and to select a general style for our tables. A Proc Tabulate statement was used to produce the tables. This tabulate statement was placed inside a simple looping macro to produce a table for each of the questions in the survey. Finally we used Proc Template to tweak the style of the tables. Each of these steps will be described in detail below. Formats to give each response its proper wording We used Proc Format and the FORMAT statement to add the exact wording of each response to each question. proc format; value $site 'a'='\ul PROGRAM (%)' 'c'='\ul COMPARISON 1 (%)' 'd'='\ul COMPARISON 2 (%)'; VALUE YESNO 1="Yes" 2="No" 98="Don't Know" 99="Refuse to answer"; VALUE AGE 15-24='15-24' 25-34='25-34' 35-44='35-44' 45-54='45-54' 55-64='55-64' other=[3.0]; Notice the \ul for site on the VALUE statement of the Proc Format. These are special characters that let RTF know that we want the text that follows underlined. Later we will have to submit a clause as part of our Proc Tabulate statement to have SAS pass on these characters to the RTF destination. The survey dataset had age reported in the 4.2 format and we preferred not to show the decimal values on age. Notice that for age on the VALUE statement of the Proc Format we have other=[3.0]. This is a nested format. This lets SAS know that all values that meet the condition should be handled by the 3.0 format instead of the groupings of the AGE. format. Labels to display the exact wording of each question SAS’ LABEL statement was used to label each variable with the exact wording of the question. The limit for variable labels is 256 and there were a handful of questions with wording that exceeded 256 characters. In these cases, we abbreviated the wording. Continuing to build our basic program we added the following: DATA NESUG; SET MYDATA.NESUG; LABEL AGE="Age at the time of interview" QF4 ="Did a program called Jobs-Plus help you get in any ABE classes, that is, classes for improving your basic reading and math skills, or GED classes to help you prepare for the GED test, or classes to prepare for a regular high school diploma?" QF5="Did the Housing Authority provide or help you get in any ABE classes, classes for improving your basic reading and math skills, or GED classes to help you prepare for the GED test, or classes to prepare for a regular high school diploma?" ; format SITE $SITE. QF4 QF5 YESNO. AGE age.; 2 NESUG 18 Applications ODS used to direct output to RFT file and to select a table style ods listing close; ods rtf file="surveybook.rtf" startpage=no style=Theme; The RFT LISTING CLOSE statement above tells SAS to stop sending output to the standard output file. This is an optional statement. The ODF RTF statement tells SAS to send ODS output to a file called surveybook.rtf and startpage=no tells SAS not to put each table on a separate page by itself. The style=Theme tells SAS which style template we want to use as a template for formatting our RTF tables. SAS ships with several style templates, others are available for download on their website, and still others are available for purchase from SAS’ publications department. For this project we chose the Theme style to start because it was very simple, with no shading or coloring on the tables, a basic font, and no lines around the cells of the tables. To find out what styles are available at your site you can submit the following SAS command: PROC TEMPLATE; LIST STYLES; See either of Ms. Haworth’s books for additional details on ODS style templates. Proc Tabulate used to produced the tables We submitted three proc tabulate statements, one for each of the three test tables we wanted to produce. Each procedure only differs in the information provided about the row variable in the table: PROC TABULATE DATA=nesug format=5.1; CLASS AGE; CLASS site /PRELOADFMT; classlev site /s=[protectspecialchars=off ]; table AGE* pctn<AGE> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX="AGE"; keylabel pctn=' ' all='-Sample Size-'; PROC TABULATE DATA=nesug format=5.1; CLASS QF4; CLASS site /PRELOADFMT; classlev site /s=[protectspecialchars=off]; table QF4* pctn<QF4> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX="QF4"; keylabel pctn=' ' all='-Sample Size-'; PROC TABULATE DATA=nesug format=5.1; CLASS QF5; CLASS site /PRELOADFMT; classlev site /s=[protectspecialchars=off]; table QF5* pctn<QF5> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX="QF5"; keylabel pctn=' ' all='-Sample Size-'; run; ods rtf close; run; The proc tabulate statement The first statement in each procedure is the actual proc tabulate statement and it simply specifies the dataset to use. Adding format=5.1 to the PROC Tabulate statement limits percentages to one decimal place. The class statements You may have noticed that the Proc Tabulate includes two CLASS statements. Proc Tabulate treats class variables and analysis variables differently, for example the default statistic for a class variable is a count but for an analysis variable the default statistic is a sum. In the newer versions of SAS you can format each class variable differently but to do this you must list each on its own class statement. Therefore, the first class statement simply declares QF5 as class variable and the second class statement declares sites as a class variable and alerts SAS that missing data on this variable should be treated differently than missing data on the other class variables. 3 NESUG 18 Applications The missing data instructions You have to tell Proc Tabulate how to deal with missing data. Notice that the PRELOADFMT was added to the class statement and PRINTMISS was added to the table statement. If you have assigned user-defined formats for your class variables these options together tell SAS to create a column (or row) for every possible value of your class variable, whether or not this value appears in your dataset. So for example, if your class variable/column header were gender and some questions were only asked of and answered by the females, the male column would still show up in every table as long as you had a user-defined format that specifies both values. We only wanted this behavior for the column variable/header. For the row variables if some possible responses were never used by residents we didn’t want those responses in the rows of the table. MISSTEXT= “0.0”on the table statement tells SAS to print 0.0 instead of “.” anytime a cell has no responses at all. For an example of how these options affected our project look at QF4 in the output on the next page. The whole second column (the one now filled with 0.0) would have been left out of the table completely were it not for the additions of these options. The classlev statement The CLASSLEV statement is used with Proc Tabulate only when using ODS and has only one option /style (this option can be abbreviated: /s). The CLASSLEV statement is used to give ODS special instructions on how to format your class variables. There are many style attributes (font face and size, background and foreground colors, cell width…) that can be changed for just one or many variables in the table. In this case we wanted to underline our column class variable so we used a style to tell SAS not to filter the special \ul characters we are sending to our RTF destination (Protectspecialchars=off ). The tables statement In the table statement of Proc Tabulate, the first variable you specify is the variable that you want to be in the row section of the table. In this case for the rows, we want the percent of residents who gave each response to the particular survey questions. So in Proc Tabulate, first we list the row variable (the question) and “multiplied” that by the desired statistic (i.e. age*pctn<AGE>). We also wanted to add a sample size row to the bottom of each table. This a summary line, so we used the ALL keyword to request a row summary line and “multiplied” it by N because the count is the desired statistic. Here using the table statement for the “all” row we changed the format to f=5.0, for that line only, because we didn’t need decimals reported on the sample size line. On the TABLE statement the “,” tells SAS that we are done specifying the table’s row dimension and ,site= ‘ ‘ declares that SITE will be the variable to be used in the column dimension of the table and that we don’t want to use the site variable label as part of the table. Instead only the value labels (program, control1, control2) will be used as header for the columns. Notice the BOX= on the table statement. Here we are requesting that the box above the row header be used to display the variable name. For this project we decided to use that box in each table to display the variables name but literally whatever is put in quotes after BOX= will be displayed in that first upper left-hand box in the table. The RUN statement just before the ODS close statement is required. You must make sure that all of the procedures whose output you want in the RTF file have been executed before you close the RTF file. After executing the above steps, the SURVEYBOOK.RFT file looked as follows: (Note that the last column was cut off of the last 2 tables because it extended beyond the margin allowed for this paper.) 4 NESUG 18 Applications Program (%) AGE Control 1 (%) Control 2 (%) Age at the time of interview 15-24 2.6 4.8 7.5 25-34 28.9 28.6 32.5 35-44 21.1 19.0 27.5 45-54 18.4 19.0 15.0 55-64 10.5 9.5 7.5 65 5.3 4.8 2.5 66 2.6 9.5 0.0 67 5.3 0.0 0.0 68 5.3 4.8 0.0 70 0.0 0.0 2.5 80 0.0 0.0 5.0 38 21 40 -Sample Size_ Program (%) QF4 Control 1 (%) Did a program called Jobs-Plus help you get in any ABE classes, that is, classes for improving your basic reading and math skills, or GED classes to help you prepare for the GED test, or classes to prepare for a regular high school diploma? Yes 41.0 0.0 No 51.3 0.0 7.7 0.0 39 0.0 Don't Know -Sample Size_ Program (%) QF5 Control 1 (%) Did the Housing Authority provide or help you get in any ABE classes, classes for improving your basic reading and math skills, or GED classes to help you prepare for the GED test, or classes to prepare for a regular high school diploma? Yes 56.4 76.2 No 43.6 23.8 39 21 -Sample Size_ 5 NESUG 18 Applications The Macro So you may be wonder why three separate Proc Tabulate statements. With Proc Tabulate if a case is missing on any one class variable the case is dropped out of every table produced by that Proc Tabulate statement. Question F4 was only asked of residents living in program developments; therefore, all cases in the comparison (control) developments were missing for that class variable. To keep the missing responses of any one question from affecting the tables for the other questions, we ran a separate Proc Tabulate statement for each question. As you can see the syntax for each of these Proc Tabulate statements varied only in the places the row variable name is declared. This makes enclosing this Proc Tabulate statement in the do loop of a macro very easy to implement: run; %macro sbook(ds=&syslast,vl=); ods listing close; ods rtf file="surveybook.rtf" startpage=no style=Theme; %Let i=1; %do %while(%scan(&vl,&i,' ') ne ); %let rv=%scan(&vl,&i,' '); PROC TABULATE DATA=&ds format=5.1; CLASS &RV; CLASS site /PRELOADFMT; classlev site /s=[protectspecialchars=off ]; table &RV* pctn<&RV> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX="&RV"; keylabel pctn=' ' all='-Sample Size-'; %let i=%eval(&i +1); %end; run; ods rtf close; run; %mend sbook; %sbook(ds=nesug,vl=age qf4 qf5) run; The name of the macro is declared as sbook. The sbook macro will use the last dataset created (&syslast) by default unless another dataset is declared when the macro is called. In this example when the sbook macro is called, the default &syslast dataset is replaced with the NESUG dataset (using the DS= macro parameter). All the variables/questions that are to be included in the RTF document are specified next to the VL= macro parameter when the macro is called. The macro scan function is used in a do loop to one-by-one set the RV (row variable) macro variable to each of the row variables specified in the VL= list when the sbook macro is called. The RV macro variable is used in the Proc Tabulate procedure everywhere that the row variable for the table had been previously specified. Tweaking the style As we reviewed our first draft of the RTF document there were a few things we wanted to change. We wanted the variable name in the box to be in all capital letters and left justified at the top of the box. We wanted to use a smaller font and standard column widths in every table. We wanted a 1 inch margin at the top, bottom, left, and right of the document. ODS provides a lot of controls and options for changing most elements of the table style. Because you can change so many things writing a program to do so can be quite complicated. For example, you can change the font for the numbers in the tables, for the row header, for the column header, or the summary (all) line… And not only can you change the font face for each of these or any of these but you can also change the font weight, and size, and style… Because ODS allows you to customize so many areas of the table this also means that you often have to specify something that you think of as a global change in many places. For example, if you want to change the font face for whole table from Times Roman to Arial you must specify that for 6 NESUG 18 Applications many areas of the table. Therefore, there are some style changes that are easier to make as part of a Proc Tabulate statement and others (mainly global style changes) that are easier to make using Proc Template. First we will display the style changes we decided to make on the Proc Tabulate statement. PROC TABULATE DATA=&ds format=5.1 ; CLASS &RV; CLASS site /PRELOADFMT; classlev site /s=[protectspecialchars=off cellwidth=130]; classlev &RV /s=[cellwidth=600]; table &RV*pctn<&RV> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX=[label=%upcase("&RV") s=[just=L vjust=T]]; keylabel pctn=' ' all='-Sample Size-'; As you can remember, we had already added one style using the CLASSLEV statement to underline our column headers. To this style we added the cellwidth (measured in pixels) to standardize the width of the last 3 columns of the table. We added another CLASSLEV statement to specify the cellwidth for the first column of the table. Next we wanted to upcase the variable name used in the first “box”of the table so we used the upcase macro function to do that. To move the variable name to the top left of the box we specified these style changes right on the table statement after requesting the box. We determined that the rest of the changes we wanted to make would be easier to make using Proc Templates. The easiest way to create a style using Proc Templates is to find one that is close to what you want and modify the source code for it. We had decided that the Theme Style was the style closest to what we wanted. Next we printed the source code for the Theme style. To do this the syntax was easy: PROC TEMPLATE; SOURCE STYLES.Theme; Running the code above will send the Proc Tabulate source code to your log. You will find that the source code for most styles is hundreds of lines long but you only need to copy into your program the sections you want to modify. We wanted to change the font section, the body section (controls table margins), and the output section (controls the frame around the tables). Below are sections we copied and modified: proc template; define style Styles.mystyle; parent = styles.Theme; replace fonts / 'NoFont' = (,2) 'TitleFont2' = ("verdana",2,Bold Italic) 'TitleFont' = ("verdana",2,Bold Italic) 'StrongFont' = ("verdana",2,Bold) 'EmphasisFont' = ("verdana",2,Italic) 'FixedEmphasisFont' = ("courier new",2,Italic) 'FixedStrongFont' = ("courier new",2,Bold) 'FixedHeadingFont' = ("courier new",2) 'FixedFont' = ("courier new",2) 'headingEmphasisFont' = ("verdana",2,Bold Italic) 'headingFont' = ("verdana",2,Bold) 'docFont' = ("verdana",2); replace body / leftmargin=1in rightmargin=1in topmargin=1in bottommargin=1in; replace Output/ background = colors('tablebg') rules = NONE frame = BOX cellpadding = 7 7 NESUG 18 Applications cellspacing = 1 /* bordercolor = colors('tableborder') */ borderwidth = 1; end; run; The DEFINE statement tells SAS that you want to make your own style and call it “MYSTYLE.” The PARENT statement tells SAS that you want to copy the Theme style and modify that. The REPLACE statements tell SAS that you only want to replace these three sections of the Theme Style. In the font section we changed all of the font “scales” to 2 because we wanted all of our fonts in the table to be the same size. In the body section we changed all of our margins to 1 inch. The frame change in the output section was trickier. The source code we found in our log had already specified frame = box as we wanted but we were not getting a box frame around our tables. We had to call SAS and we discovered that there was a bug in the Theme style source code (for SAS version 8.2). The bordercolor line (commented out above) was the problem. Running the code above created our new style. The name of this new style was MYSTYLE. To use MYSTYLE to format our tables we changed the style= statement on the ODS RTF line: ods rtf file="surveybook.rtf" startpage=no style=mystyle; With the Proc Tabulate and Template style changes in place our run yielded us tables we were very happy with: AGE Program (%) Control1 (%) Contorl2 (%) Age at the time of interview 0 2.6 0.0 0.0 15-24 2.6 4.8 7.5 25-34 28.2 28.6 32.5 35-44 20.5 19.0 27.5 45-54 17.9 19.0 15.0 55-64 10.3 9.5 7.5 65 5.1 4.8 2.5 66 2.6 9.5 0.0 67 5.1 0.0 0.0 68 5.1 4.8 0.0 70 0.0 0.0 2.5 80 0.0 0.0 5.0 39 21 40 -Sample Size- 8 NESUG 18 Applications QF4 Program (%) Control1 (%) Contorl2 (%) Did a program called Jobs-Plus help you get in any ABE classes, that is, classes for improving your basic reading and math skills, or GED classes to help you prepare for the GED test, or classes to prepare for a regular high school diploma? Yes 41.0 0.0 0.0 No 51.3 0.0 0.0 7.7 0.0 0.0 39 0.0 0.0 Don't Know -Sample SizeQF5 Program (%) Control1 (%) Contorl2 (%) Did the Housing Authority provide or help you get in any ABE classes, classes for improving your basic reading and math skills, or GED classes to help you prepare for the GED test, or classes to prepare for a regular high school diploma? Yes 56.4 76.2 52.5 No 43.6 23.8 47.5 39 21 40 -Sample Size- Conclusion Using Proc Format, the LABEL statement, Proc Tabulate, ODS, the macro language, and Proc Template allowed us to create six nicely formatted survey books. For each of the six cities these books conveyed residents’ responses to nearly 200 survey questions. Because the books were generated by SAS coded in a macro, making modifications (e.g. adding a sample size row, re-categorizing the age breakdown, and requesting cell count or row percentages instead of column percentages) became no big deal. Complete Programming Code libname mydata '.'; proc template; list styles; proc template; source Styles.Theme ; proc format; value $site 'a'='\ul Program (%)' 'c'='\ul Control1 (%)' 'd'='\ul Contorl2 (%)'; VALUE YESNO 1="Yes" 2="No" 98="Don't Know" 99="Refuse to answer"; VALUE AGE 15-24='15-24' 25-34='25-34' 35-44='35-44' 45-54='45-54' 55-64='55-64' other=[3.0]; 9 NESUG 18 Applications DATA NESUG; SET MYDATA.NESUG; LABEL age="Age at the time of interview" QF4 ="Did a program called Jobs-Plus help you get in any ABE classes, that is, classes for improving your basic reading and math skills, or GED classes to help you prepare for the GED test, or classes to prepare for a regular high school diploma?" QF5="Did the Housing Authority provide or help you get in any ABE classes, classes for improving your basic reading and math skills, or GED classes to help you prepare for the GED test, or classes to prepare for a regular high school diploma?" ; format SITE $SITE. QF4 QF5 YESNO. age age.; libname sasstyle "."; ods path(prepend) sasstyle.tmplmst(update); proc template; define style Styles.mystyle; parent = styles.Theme; replace fonts / 'NoFont' = (,2) 'TitleFont2' = ("verdana",2,Bold Italic) 'TitleFont' = ("verdana",2,Bold Italic) 'StrongFont' = ("verdana",2,Bold) 'EmphasisFont' = ("verdana",2,Italic) 'FixedEmphasisFont' = ("courier new",2,Italic) 'FixedStrongFont' = ("courier new",2,Bold) 'FixedHeadingFont' = ("courier new",2) 'FixedFont' = ("courier new",2) 'headingEmphasisFont' = ("verdana",2,Bold Italic) 'headingFont' = ("verdana",2,Bold) 'docFont' = ("verdana",2); replace body / leftmargin=1in rightmargin=1in topmargin=1in bottommargin=1in; replace Output/ background = colors('tablebg') rules = NONE frame = BOX cellpadding = 7 cellspacing = 1 /* bordercolor = colors('tableborder') */ borderwidth = 1; end; run; %macro sbook(ds=&syslast,vl=); ods listing close; ods rtf file="surveybook.rtf" startpage=no style=mystyle; %Let i=1; %do %while(%scan(&vl,&i,' ') ne ); %let rv=%scan(&vl,&i,' '); PROC TABULATE DATA=&ds format=5.1 ; 10 NESUG 18 Applications CLASS &RV; CLASS site /PRELOADFMT; classlev site /s=[protectspecialchars=off cellwidth=130]; classlev &RV /s=[cellwidth=600]; table &RV*pctn<&RV> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX=[label=%upcase("&RV") s=[just=L vjust=T]]; keylabel pctn=' ' all='-Sample Size-'; %let i=%eval(&i +1); %end; run; ods rtf close; run; %mend sbook; %sbook(ds=nesug,vl=age qf4 qf5) run; Reference PROC TABULATE by Example, Lauren Haworth, SAS Press, February 1999. Output Delivery System: The Basics, Lauren Haworth, SAS Press, March 2001. Contact Information Electra Small MDRC 16 East 34th Street, New York, NY 10016 Electra.Small@mdrc.org 11