The Survey Data Book an Easy Read via Proc Tabulate and the

advertisement
NESUG 18
Applications
The Survey Data Book an Easy Read via Proc Tabulate and the ODS RTF Destination
Electra Small, MDRC
Abstract
The intake period for a lengthy survey was over. The next step was to create a nicely formatted “preliminary
results book.” The concept of the survey data book was that it would mirror the initial questionnaire in design. It
was also to report the total number who responded to each question and the percentage who gave each response.
This paper steps through the use of SAS® Proc Tabulate, Proc Format, Proc Template, the ODS RTF destination,
and the macro language accomplish this project.
Introduction to the Project and Purpose
Can an employment program located in public housing developments help residents earn more money and
improve quality of life? The Jobs-Plus program, set up in six cities, sought to achieve these goals. Surveys were
conducted with the residents during the first year and about 2 ½ years later. To communicate the results of the
survey we decided that we would create a nicely formatted “preliminary results book” for each city. The data
book would display the exact wording of each question and the possible responses. It would report the total
number who responded to each question and the percentage who gave each response. To create the books for the
first wave of the survey Proc Freqs were run in SAS®, printed out, and column percentages were hand input into
table shells created in Excel®. This process was labor intensive and error prone.
By the time the survey’s second wave was completed our company had upgraded to SAS 8.2 which included
many features added to Proc Tabulate and Proc Report and ODS was now in production. Also our company had
purchased two new books ODS the Basics, and Proc Tabulate by Example, both by Lauren E. Haworth. After
reading these books and testing out the features of this newer version of SAS it was decided that the production of
the survey data books would be automated as much as possible. We knew we wanted to closely match the
formatting used when creating the first wave books. Below is an example of one table from the first wave books.
89.
When you were growing up, did you ever live in a public housing
development?
Rainier Vista (%)
Yesler Terrace
(%)
28.5
71.5
43.2
56.8
0.0
0.0
0.0
0.0
Yes
No
Refused
Don't know
This paper will step through how SAS Proc Format, Proc Tabulate, Proc Template, and ODS were used to
accomplish this project. Once initial coding was completed, it was placed in a parameterized macro.
The Programming
In reading the documentation about ODS destinations, we realized quickly that the ODS’ RTF destination could
create nicely formatted tables in an RTF document, and that RTF documents become MS Word® documents in
just a few clicks. On the other hand, ODS and Excel don’t work as well together on things like page breaks and
table headings. So, we decided to go with the ODS RTF destination.
1
NESUG 18
Applications
We discovered that some of the Proc Tabulate formatting syntax affects how standard output looks but has no
effect on the look of RTF/MS Word documents. For this reason we decided to start by producing tables for just
three of the survey questions using a small dummy dataset.
There were six main steps involved in the programming of the survey data books. First we used formats to give
each response its proper wording. Next we used the label statement to give each question its exact wording. We
used the ODS statement to direct our output to a RTF document and to select a general style for our tables. A
Proc Tabulate statement was used to produce the tables. This tabulate statement was placed inside a simple
looping macro to produce a table for each of the questions in the survey. Finally we used Proc Template to tweak
the style of the tables. Each of these steps will be described in detail below.
Formats to give each response its proper wording
We used Proc Format and the FORMAT statement to add the exact wording of each response to each question.
proc format;
value $site 'a'='\ul PROGRAM (%)'
'c'='\ul COMPARISON 1 (%)'
'd'='\ul COMPARISON 2 (%)';
VALUE YESNO
1="Yes"
2="No"
98="Don't Know"
99="Refuse to answer";
VALUE AGE
15-24='15-24'
25-34='25-34'
35-44='35-44'
45-54='45-54'
55-64='55-64'
other=[3.0];
Notice the \ul for site on the VALUE statement of the Proc Format. These are special characters that let RTF
know that we want the text that follows underlined. Later we will have to submit a clause as part of our Proc
Tabulate statement to have SAS pass on these characters to the RTF destination. The survey dataset had age
reported in the 4.2 format and we preferred not to show the decimal values on age. Notice that for age on the
VALUE statement of the Proc Format we have other=[3.0]. This is a nested format. This lets SAS know that all
values that meet the condition should be handled by the 3.0 format instead of the groupings of the AGE. format.
Labels to display the exact wording of each question
SAS’ LABEL statement was used to label each variable with the exact wording of the question. The limit for
variable labels is 256 and there were a handful of questions with wording that exceeded 256 characters. In these
cases, we abbreviated the wording. Continuing to build our basic program we added the following:
DATA NESUG;
SET MYDATA.NESUG;
LABEL
AGE="Age at the time of interview"
QF4 ="Did a program called Jobs-Plus help you get in any ABE classes, that is, classes for
improving your basic reading and math skills, or GED classes to help you prepare for the GED
test, or classes to prepare for a regular high school diploma?"
QF5="Did the Housing Authority provide or help you get in any ABE classes, classes for
improving your basic reading and math skills, or GED classes to help you prepare for the GED
test, or classes to prepare for a regular high school diploma?"
;
format SITE $SITE. QF4 QF5 YESNO. AGE age.;
2
NESUG 18
Applications
ODS used to direct output to RFT file and to select a table style
ods listing close;
ods rtf file="surveybook.rtf" startpage=no
style=Theme;
The RFT LISTING CLOSE statement above tells SAS to stop sending output to the standard output file. This is
an optional statement. The ODF RTF statement tells SAS to send ODS output to a file called surveybook.rtf and
startpage=no tells SAS not to put each table on a separate page by itself. The style=Theme tells SAS which style
template we want to use as a template for formatting our RTF tables. SAS ships with several style templates,
others are available for download on their website, and still others are available for purchase from SAS’
publications department. For this project we chose the Theme style to start because it was very simple, with no
shading or coloring on the tables, a basic font, and no lines around the cells of the tables. To find out what styles
are available at your site you can submit the following SAS command:
PROC TEMPLATE; LIST STYLES;
See either of Ms. Haworth’s books for additional details on ODS style templates.
Proc Tabulate used to produced the tables
We submitted three proc tabulate statements, one for each of the three test tables we wanted to produce. Each
procedure only differs in the information provided about the row variable in the table:
PROC TABULATE DATA=nesug format=5.1;
CLASS AGE;
CLASS site /PRELOADFMT;
classlev site /s=[protectspecialchars=off ];
table AGE* pctn<AGE> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX="AGE";
keylabel pctn=' '
all='-Sample Size-';
PROC TABULATE DATA=nesug format=5.1;
CLASS QF4;
CLASS site /PRELOADFMT;
classlev site /s=[protectspecialchars=off];
table QF4* pctn<QF4> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX="QF4";
keylabel pctn=' '
all='-Sample Size-';
PROC TABULATE DATA=nesug format=5.1;
CLASS QF5;
CLASS site /PRELOADFMT;
classlev site /s=[protectspecialchars=off];
table QF5* pctn<QF5> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX="QF5";
keylabel pctn=' '
all='-Sample Size-';
run;
ods rtf close;
run;
The proc tabulate statement
The first statement in each procedure is the actual proc tabulate statement and it simply specifies the dataset to
use. Adding format=5.1 to the PROC Tabulate statement limits percentages to one decimal place.
The class statements
You may have noticed that the Proc Tabulate includes two CLASS statements. Proc Tabulate treats class
variables and analysis variables differently, for example the default statistic for a class variable is a count but for
an analysis variable the default statistic is a sum. In the newer versions of SAS you can format each class variable
differently but to do this you must list each on its own class statement. Therefore, the first class statement simply
declares QF5 as class variable and the second class statement declares sites as a class variable and alerts SAS that
missing data on this variable should be treated differently than missing data on the other class variables.
3
NESUG 18
Applications
The missing data instructions
You have to tell Proc Tabulate how to deal with missing data. Notice that the PRELOADFMT was added to the
class statement and PRINTMISS was added to the table statement. If you have assigned user-defined formats for
your class variables these options together tell SAS to create a column (or row) for every possible value of your
class variable, whether or not this value appears in your dataset. So for example, if your class variable/column
header were gender and some questions were only asked of and answered by the females, the male column would
still show up in every table as long as you had a user-defined format that specifies both values. We only wanted
this behavior for the column variable/header. For the row variables if some possible responses were never used
by residents we didn’t want those responses in the rows of the table. MISSTEXT= “0.0”on the table statement
tells SAS to print 0.0 instead of “.” anytime a cell has no responses at all. For an example of how these options
affected our project look at QF4 in the output on the next page. The whole second column (the one now filled
with 0.0) would have been left out of the table completely were it not for the additions of these options.
The classlev statement
The CLASSLEV statement is used with Proc Tabulate only when using ODS and has only one option /style (this
option can be abbreviated: /s). The CLASSLEV statement is used to give ODS special instructions on how to
format your class variables. There are many style attributes (font face and size, background and foreground
colors, cell width…) that can be changed for just one or many variables in the table. In this case we wanted to
underline our column class variable so we used a style to tell SAS not to filter the special \ul characters we are
sending to our RTF destination (Protectspecialchars=off ).
The tables statement
In the table statement of Proc Tabulate, the first variable you specify is the variable that you want to be in the row
section of the table. In this case for the rows, we want the percent of residents who gave each response to the
particular survey questions. So in Proc Tabulate, first we list the row variable (the question) and “multiplied”
that by the desired statistic (i.e. age*pctn<AGE>). We also wanted to add a sample size row to the bottom of
each table. This a summary line, so we used the ALL keyword to request a row summary line and “multiplied” it
by N because the count is the desired statistic. Here using the table statement for the “all” row we changed the
format to f=5.0, for that line only, because we didn’t need decimals reported on the sample size line.
On the TABLE statement the “,” tells SAS that we are done specifying the table’s row dimension and ,site= ‘ ‘
declares that SITE will be the variable to be used in the column dimension of the table and that we don’t want to
use the site variable label as part of the table. Instead only the value labels (program, control1, control2) will be
used as header for the columns.
Notice the BOX= on the table statement. Here we are requesting that the box above the row header be used to
display the variable name. For this project we decided to use that box in each table to display the variables name
but literally whatever is put in quotes after BOX= will be displayed in that first upper left-hand box in the table.
The RUN statement just before the ODS close statement is required. You must make sure that all of the
procedures whose output you want in the RTF file have been executed before you close the RTF file.
After executing the above steps, the SURVEYBOOK.RFT file looked as follows: (Note that the last column was
cut off of the last 2 tables because it extended beyond the margin allowed for this paper.)
4
NESUG 18
Applications
Program
(%)
AGE
Control 1
(%)
Control 2
(%)
Age at the time of interview
15-24
2.6
4.8
7.5
25-34
28.9
28.6
32.5
35-44
21.1
19.0
27.5
45-54
18.4
19.0
15.0
55-64
10.5
9.5
7.5
65
5.3
4.8
2.5
66
2.6
9.5
0.0
67
5.3
0.0
0.0
68
5.3
4.8
0.0
70
0.0
0.0
2.5
80
0.0
0.0
5.0
38
21
40
-Sample Size_
Program
(%)
QF4
Control 1
(%)
Did a program called Jobs-Plus help you get in any
ABE classes, that is, classes for improving your
basic reading and math skills, or GED classes to
help you prepare for the GED test, or classes to
prepare for a regular high school diploma?
Yes
41.0
0.0
No
51.3
0.0
7.7
0.0
39
0.0
Don't Know
-Sample Size_
Program
(%)
QF5
Control 1
(%)
Did the Housing Authority provide or help you get
in any ABE classes, classes for improving your
basic reading and math skills, or GED classes to
help you prepare for the GED test, or classes to
prepare for a regular high school diploma?
Yes
56.4
76.2
No
43.6
23.8
39
21
-Sample Size_
5
NESUG 18
Applications
The Macro
So you may be wonder why three separate Proc Tabulate statements. With Proc Tabulate if a case is missing on
any one class variable the case is dropped out of every table produced by that Proc Tabulate statement. Question
F4 was only asked of residents living in program developments; therefore, all cases in the comparison (control)
developments were missing for that class variable. To keep the missing responses of any one question from
affecting the tables for the other questions, we ran a separate Proc Tabulate statement for each question. As you
can see the syntax for each of these Proc Tabulate statements varied only in the places the row variable name is
declared. This makes enclosing this Proc Tabulate statement in the do loop of a macro very easy to implement:
run;
%macro sbook(ds=&syslast,vl=);
ods listing close;
ods rtf file="surveybook.rtf" startpage=no
style=Theme;
%Let i=1;
%do %while(%scan(&vl,&i,' ') ne );
%let rv=%scan(&vl,&i,' ');
PROC TABULATE DATA=&ds format=5.1;
CLASS &RV;
CLASS site /PRELOADFMT;
classlev site /s=[protectspecialchars=off ];
table &RV* pctn<&RV> all*N=' '*f=5.0,site= ' ' /PRINTMISS misstext='0.0' BOX="&RV";
keylabel pctn=' ' all='-Sample Size-';
%let i=%eval(&i +1);
%end;
run;
ods rtf close;
run;
%mend sbook;
%sbook(ds=nesug,vl=age qf4 qf5)
run;
The name of the macro is declared as sbook. The sbook macro will use the last dataset created (&syslast) by
default unless another dataset is declared when the macro is called. In this example when the sbook macro is
called, the default &syslast dataset is replaced with the NESUG dataset (using the DS= macro parameter). All the
variables/questions that are to be included in the RTF document are specified next to the VL= macro parameter
when the macro is called. The macro scan function is used in a do loop to one-by-one set the RV (row variable)
macro variable to each of the row variables specified in the VL= list when the sbook macro is called. The RV
macro variable is used in the Proc Tabulate procedure everywhere that the row variable for the table had been
previously specified.
Tweaking the style
As we reviewed our first draft of the RTF document there were a few things we wanted to change. We wanted
the variable name in the box to be in all capital letters and left justified at the top of the box. We wanted to use a
smaller font and standard column widths in every table. We wanted a 1 inch margin at the top, bottom, left, and
right of the document. ODS provides a lot of controls and options for changing most elements of the table style.
Because you can change so many things writing a program to do so can be quite complicated. For example, you
can change the font for the numbers in the tables, for the row header, for the column header, or the summary (all)
line… And not only can you change the font face for each of these or any of these but you can also change the
font weight, and size, and style… Because ODS allows you to customize so many areas of the table this also
means that you often have to specify something that you think of as a global change in many places. For
example, if you want to change the font face for whole table from Times Roman to Arial you must specify that for
6
NESUG 18
Applications
many areas of the table. Therefore, there are some style changes that are easier to make as part of a Proc Tabulate
statement and others (mainly global style changes) that are easier to make using Proc Template.
First we will display the style changes we decided to make on the Proc Tabulate statement.
PROC TABULATE DATA=&ds format=5.1 ;
CLASS &RV;
CLASS site /PRELOADFMT;
classlev site /s=[protectspecialchars=off cellwidth=130];
classlev &RV /s=[cellwidth=600];
table &RV*pctn<&RV> all*N=' '*f=5.0,site= ' '
/PRINTMISS misstext='0.0' BOX=[label=%upcase("&RV") s=[just=L vjust=T]];
keylabel pctn=' ' all='-Sample Size-';
As you can remember, we had already added one style using the CLASSLEV statement to underline our column
headers. To this style we added the cellwidth (measured in pixels) to standardize the width of the last 3 columns
of the table. We added another CLASSLEV statement to specify the cellwidth for the first column of the table.
Next we wanted to upcase the variable name used in the first “box”of the table so we used the upcase macro
function to do that. To move the variable name to the top left of the box we specified these style changes right on
the table statement after requesting the box.
We determined that the rest of the changes we wanted to make would be easier to make using Proc Templates.
The easiest way to create a style using Proc Templates is to find one that is close to what you want and modify the
source code for it. We had decided that the Theme Style was the style closest to what we wanted. Next we
printed the source code for the Theme style. To do this the syntax was easy:
PROC TEMPLATE; SOURCE STYLES.Theme;
Running the code above will send the Proc Tabulate source code to your log. You will find that the source code
for most styles is hundreds of lines long but you only need to copy into your program the sections you want to
modify. We wanted to change the font section, the body section (controls table margins), and the output section
(controls the frame around the tables). Below are sections we copied and modified:
proc template;
define style Styles.mystyle;
parent = styles.Theme;
replace fonts /
'NoFont' = (,2)
'TitleFont2' = ("verdana",2,Bold Italic)
'TitleFont' = ("verdana",2,Bold Italic)
'StrongFont' = ("verdana",2,Bold)
'EmphasisFont' = ("verdana",2,Italic)
'FixedEmphasisFont' = ("courier new",2,Italic)
'FixedStrongFont' = ("courier new",2,Bold)
'FixedHeadingFont' = ("courier new",2)
'FixedFont' = ("courier new",2)
'headingEmphasisFont' = ("verdana",2,Bold Italic)
'headingFont' = ("verdana",2,Bold)
'docFont' = ("verdana",2);
replace body /
leftmargin=1in
rightmargin=1in
topmargin=1in
bottommargin=1in;
replace Output/
background = colors('tablebg')
rules = NONE
frame = BOX
cellpadding = 7
7
NESUG 18
Applications
cellspacing = 1
/* bordercolor = colors('tableborder') */
borderwidth = 1;
end;
run;
The DEFINE statement tells SAS that you want to make your own style and call it “MYSTYLE.” The PARENT
statement tells SAS that you want to copy the Theme style and modify that. The REPLACE statements tell SAS
that you only want to replace these three sections of the Theme Style. In the font section we changed all of the
font “scales” to 2 because we wanted all of our fonts in the table to be the same size. In the body section we
changed all of our margins to 1 inch. The frame change in the output section was trickier. The source code we
found in our log had already specified frame = box as we wanted but we were not getting a box frame around our
tables. We had to call SAS and we discovered that there was a bug in the Theme style source code (for SAS
version 8.2). The bordercolor line (commented out above) was the problem. Running the code above created our
new style. The name of this new style was MYSTYLE. To use MYSTYLE to format our tables we changed the
style= statement on the ODS RTF line:
ods rtf file="surveybook.rtf" startpage=no
style=mystyle;
With the Proc Tabulate and Template style changes in place our run yielded us tables we were very happy with:
AGE
Program
(%)
Control1
(%)
Contorl2
(%)
Age at the time of interview
0
2.6
0.0
0.0
15-24
2.6
4.8
7.5
25-34
28.2
28.6
32.5
35-44
20.5
19.0
27.5
45-54
17.9
19.0
15.0
55-64
10.3
9.5
7.5
65
5.1
4.8
2.5
66
2.6
9.5
0.0
67
5.1
0.0
0.0
68
5.1
4.8
0.0
70
0.0
0.0
2.5
80
0.0
0.0
5.0
39
21
40
-Sample Size-
8
NESUG 18
Applications
QF4
Program
(%)
Control1
(%)
Contorl2
(%)
Did a program called Jobs-Plus help you get in
any ABE classes, that is, classes for improving
your basic reading and math skills, or GED classes
to help you prepare for the GED test, or classes to
prepare for a regular high school diploma?
Yes
41.0
0.0
0.0
No
51.3
0.0
0.0
7.7
0.0
0.0
39
0.0
0.0
Don't Know
-Sample SizeQF5
Program
(%)
Control1
(%)
Contorl2
(%)
Did the Housing Authority provide or help you get
in any ABE classes, classes for improving your
basic reading and math skills, or GED classes to
help you prepare for the GED test, or classes to
prepare for a regular high school diploma?
Yes
56.4
76.2
52.5
No
43.6
23.8
47.5
39
21
40
-Sample Size-
Conclusion
Using Proc Format, the LABEL statement, Proc Tabulate, ODS, the macro language, and Proc Template allowed
us to create six nicely formatted survey books. For each of the six cities these books conveyed residents’
responses to nearly 200 survey questions. Because the books were generated by SAS coded in a macro, making
modifications (e.g. adding a sample size row, re-categorizing the age breakdown, and requesting cell count or row
percentages instead of column percentages) became no big deal.
Complete Programming Code
libname mydata '.';
proc template; list styles;
proc template; source Styles.Theme ;
proc format;
value $site 'a'='\ul Program (%)'
'c'='\ul Control1 (%)'
'd'='\ul Contorl2 (%)';
VALUE YESNO
1="Yes"
2="No"
98="Don't Know"
99="Refuse to answer";
VALUE AGE
15-24='15-24'
25-34='25-34'
35-44='35-44'
45-54='45-54'
55-64='55-64'
other=[3.0];
9
NESUG 18
Applications
DATA NESUG;
SET MYDATA.NESUG;
LABEL
age="Age at the time of interview"
QF4 ="Did a program called Jobs-Plus help you get in any ABE classes, that is, classes for
improving your basic reading and math skills, or GED classes to help you prepare for the GED
test, or classes to prepare for a regular high school diploma?"
QF5="Did the Housing Authority provide or help you get in any ABE classes, classes for
improving your basic reading and math skills, or GED classes to help you prepare for the GED
test, or classes to prepare for a regular high school diploma?"
;
format
SITE $SITE.
QF4
QF5 YESNO.
age age.;
libname sasstyle ".";
ods path(prepend) sasstyle.tmplmst(update);
proc template;
define style Styles.mystyle;
parent = styles.Theme;
replace fonts /
'NoFont' = (,2)
'TitleFont2' = ("verdana",2,Bold Italic)
'TitleFont' = ("verdana",2,Bold Italic)
'StrongFont' = ("verdana",2,Bold)
'EmphasisFont' = ("verdana",2,Italic)
'FixedEmphasisFont' = ("courier new",2,Italic)
'FixedStrongFont' = ("courier new",2,Bold)
'FixedHeadingFont' = ("courier new",2)
'FixedFont' = ("courier new",2)
'headingEmphasisFont' = ("verdana",2,Bold Italic)
'headingFont' = ("verdana",2,Bold)
'docFont' = ("verdana",2);
replace body /
leftmargin=1in
rightmargin=1in
topmargin=1in
bottommargin=1in;
replace Output/
background = colors('tablebg')
rules = NONE
frame = BOX
cellpadding = 7
cellspacing = 1
/* bordercolor = colors('tableborder') */
borderwidth = 1;
end;
run;
%macro sbook(ds=&syslast,vl=);
ods listing close;
ods rtf file="surveybook.rtf" startpage=no
style=mystyle;
%Let i=1;
%do %while(%scan(&vl,&i,' ') ne );
%let rv=%scan(&vl,&i,' ');
PROC TABULATE DATA=&ds format=5.1 ;
10
NESUG 18
Applications
CLASS &RV;
CLASS site /PRELOADFMT;
classlev site /s=[protectspecialchars=off cellwidth=130];
classlev &RV /s=[cellwidth=600];
table &RV*pctn<&RV> all*N=' '*f=5.0,site= ' '
/PRINTMISS misstext='0.0' BOX=[label=%upcase("&RV") s=[just=L vjust=T]];
keylabel pctn=' '
all='-Sample Size-';
%let i=%eval(&i +1);
%end;
run;
ods rtf close;
run;
%mend sbook;
%sbook(ds=nesug,vl=age qf4 qf5)
run;
Reference
PROC TABULATE by Example, Lauren Haworth, SAS Press, February 1999.
Output Delivery System: The Basics, Lauren Haworth, SAS Press, March 2001.
Contact Information
Electra Small
MDRC
16 East 34th Street, New York, NY 10016
Electra.Small@mdrc.org
11
Download