Creating Output Data Sets with the FREQUENCY and MEANS

advertisement
Creating Output Data Sets
with the FREQUENCY and
MEANS Procedures
Transcript
Creating Output Data Sets with the FREQUENCY and MEANS Procedures Transcript was developed by
Marty Hultgren. Additional contributions were made by Ted Meleky, Linda Mitterling, Christine
Riddiough, and Cynthia Zender. Editing and production support was provided by the Curriculum
Development and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product
names are trademarks of their respective companies.
Creating Output Data Sets with the FREQUENCY and MEANS Procedures Transcript
Copyright © 2010 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of
America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
Book code E1645, course code RLSPFPM, prepared date 29Jan2010.
RLSPFPM_001
ISBN 8-1-60764-419-4
For Your Information
Table of Contents
Lecture Description ..................................................................................................................... iv Prerequisites ................................................................................................................................. v Accessibility Tips ......................................................................................................................... v Creating Output Data Sets with the FREQUENCY and MEANS Procedures ............. 1 1.
The FREQUENCY Procedure ............................................................................................ 5 2.
The MEANS Procedure .................................................................................................... 23 3.
The Output Delivery System ............................................................................................ 39 iii
iv
For Your Information
Lecture Description
This SAS e-lecture shows how to use the FREQUENCY and MEANS procedures to create output data
sets, as well as the Output Delivery System method.
To learn more…
For information on other courses in the curriculum, contact the SAS Education
Division at 1-800-333-7660, or send e-mail to training@sas.com. You can also
find this information on the Web at support.sas.com/training/ as well as in the
Training Course Catalog.
For a list of other SAS books that relate to the topics covered in this
Course Notes, USA customers can contact our SAS Publishing Department at
1-800-727-3228 or send e-mail to sasbook@sas.com. Customers outside the
USA, please contact your local SAS office.
Also, see the Publications Catalog on the Web at support.sas.com/pubs for a
complete list of books and a convenient order form.
For Your Information
v
Prerequisites
Before listening to this SAS e-lecture, you should be comfortable with basic syntax of the FREQUENCY
and MEANS procedures. Specifically you should be able to generate one and two dimensional tables
using the FREQUENCY procedure, and you should be able to use the MEANS procedure to generate
descriptive statistics. These topics can be learned by completing the SAS Programming 1: Essentials
course.
Accessibility Tips
If you are using a screen reader, such as Freedom Scientific’s JAWS, you may want to configure your
punctuation settings so that characters used in code samples (comma, ampersand, semicolon, percent) are
announced. Typically, the screen reader default for the character & is to read “and.” For clarity in code
samples, you may want to configure your screen reader to read & as “ampersand.” In addition, depending
on your verbosity options, the character & might be omitted. The same is true for some commas before a
code variable. To confirm code lines, you may choose to read some lines character by character. When
testing this scenario with Adobe Acrobat Reader 9.1 and JAWS 10, ampersands before SAS macro names
were announced only when in character-reading mode.
vi
For Your Information
Creating Output Data Sets with the
FREQUENCY and MEANS
Procedures
1. The FREQUENCY Procedure ............................................................................................ 5 2. The MEANS Procedure ................................................................................................... 23 3. The Output Delivery System ........................................................................................... 39 2
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
1. The FREQUENCY Procedure
3
Creating Output Data Sets with
the FREQUENCY and MEANS
Procedures
Welcome to our e-lecture on Creating Output Data Sets with the FREQUENCY and MEANS Procedures.
My name is Marty and I’m an instructor for SAS.
Before we begin the lecture, let me take just a minute to point out a helpful reference that’s available to
you. We’ve included a transcript so that you can print all of the information provided in this lecture. To
access the transcript, select Reference and then Transcript in the table of contents on the left side of the
viewer. You can print this transcript now for use when viewing the lecture, or print it later to keep as a
reference. We hope you find the transcript to be a useful tool! And now, let’s get started…
4
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Creating Output Data Sets with the FREQUENCY and
MEANS Procedures
1. The FREQUENCY Procedure
2. The MEANS Procedure
3. The Output Delivery System
2
Both the FREQUENCY and MEANS procedures have internal methods for creating output data sets. The
FREQUENCY procedure gives us two ways to create output data sets, and the MEANS procedure gives
us one way. We’ll look at these methods in this lecture, as well as the Output Delivery System, or ODS,
method, because the ODS method uses syntax that is the same for all procedures, and with some
procedures, ODS captures some different statistics than the internal procedure method captures. I think
you’ll find both the procedure-specific methods and the Output Delivery System method to be useful.
1. The FREQUENCY Procedure
1.
The FREQUENCY Procedure
Creating Output Data Sets with the FREQUENCY and
MEANS Procedures
1. The FREQUENCY Procedure
2. The MEANS Procedure
3. The Output Delivery System
3
The first procedure we’ll look at is the FREQUENCY procedure, also called PROC FREQ.
5
6
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Objectives
„
Use the OUT= option in PROC FREQ to create output tables
containing frequency and percentage information.
„
Use the OUTPUT statement in PROC FREQ to create output
tables containing statistics.
4
As I mentioned, PROC FREQ gives us two methods for creating output data sets – the TABLES
statement method and the OUTPUT statement method. Both use the OUT= option. We’ll compare the
two methods briefly, and then look at how to use each method.
1. The FREQUENCY Procedure
The FREQUENCY Procedure
PROC FREQ produces output data sets using the following
two methods:
„
a TABLES statement with an OUT= option
TABLES variables / OUT=SAS-data-set <options>;
„
an OUTPUT statement with an OUT= option
OUTPUT OUT=SAS-data-set <options>;
5
Both methods have similar syntax, but as you’ll see in the coming examples, the output from each differs
quite a bit. When using the TABLES statement, the OUT= option appears after a slash. When using the
OUTPUT statement, no slash is necessary. In either case, the OUT= option is used to name the output
data set that will contain the statistics.
7
8
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The FREQUENCY Procedure
Each method can produce different results.
„
The OUT= option in the TABLES statement names an output
data set containing frequency counts and percentages:
Country
AU
CA
DE
IL
TR
US
COUNT
8
15
10
5
7
28
PERCENT
10.3896
19.4805
12.9870
6.4935
9.0909
36.3636
TABLES variables / OUT=SAS-data-set <options>;
6
The two methods can produce different results. When OUT= is used in the TABLES statement, the data
set will contain frequency counts and percentages.
1. The FREQUENCY Procedure
9
The FREQUENCY Procedure
Each method can produce different results:
„
The OUT= option in the OUTPUT statement names an output
data set containing statistics:
N
_PCHI_
DF_PCHI
77
37.8182
6
P_PCHI
.000001219
OUTPUT OUT=SAS-data-set <options>;
7
In contrast, when OUT= is used on the OUTPUT statement, the data set will contain other statistics, such
as the PCHI statistics seen here. Now let’s walk through the syntax of each method, starting with the
TABLES statement method.
10
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The FREQ Procedure: TABLES Statement
General form of the TABLES statement:
TABLES variables / OUT=SAS-data-set <options>;
When you use the OUT= option with the TABLES statement
in PROC FREQ, the output data set contains the following:
8
„
BY variables
„
table request variables
„
the automatic variables count and percent
„
variables for other statistics requested in the TABLES statement
If multiple table requests appear in the TABLES statement, the
contents of the data set correspond to the last table request.
Here’s the general form of the TABLES statement. Note again that the OUT= option appears after the
slash. Other options can be specified after the slash, as well. When the OUT= option is used in the
TABLES statement, the output data set contains several items, including the following:
• BY variables – if the data set is sorted and a BY statement is used in PROC FREQ.
• table request variables – in other words, the variables listed in the TABLES statement that are used to
build the table. So if I build a table by crossing JOBCODE with STATE, those variables are my table
request variables, and they’ll appear in the output data set.
• the automatic variables COUNT and PERCENT will be written to the output data set – which you’ll
see shortly…
• and variables for other statistics requested in the TABLES statement, such as OUTPCT.
If more than one table request appears in the TABLES statement, the contents of the output data set will
correspond to only the last table request.
1. The FREQUENCY Procedure
11
The FREQ Procedure: TABLES Statement
The default output when you use the OUT= option with the TABLES
statement contains only the frequency count and percentages:
proc freq data=orion.customer;
tables country / out=work.countrycounts;
run;
Partial PROC PRINT Output:
Country
COUNT
PERCENT
AU
CA
DE
IL
8
15
10
5
10.3896
19.4805
12.9870
6.4935
9
The default when using the OUT = option and the TABLES statement, is to generate frequency count and
percentage columns. Here’s an example where only a single variable, Country, is being requested in the
TABLES statement, and no other options besides OUT= are being used. It’s easy to request that more
information be written to the data set. For example…
12
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The FREQ Procedure: TABLES Statement
Adding the OUTCUM option generates two cumulative columns:
proc freq data=orion.customer;
tables country / out=work.countrycounts outcum ;
run;
Partial PROC PRINT Output:
Country
COUNT
PERCENT
CUM_FREQ
CUM_PCT
AU
CA
DE
IL
8
15
10
5
10.3896
19.4805
12.9870
6.4935
8
23
33
38
10.390
29.870
42.857
49.351
10
…By adding the OUTCUM option to the TABLES statement with the OUT= option, two new columns of
data are generated – the cumulative frequency and cumulative percentage. The OUTCUM option is valid
only for one-way tables.
Both one-way and two-way tables can be built. This example shows a one-way table…
1. The FREQUENCY Procedure
13
The FREQ Procedure: TABLES Statement
The OUT= option works equally well with two-way tables.
proc freq data=orion.customer;
tables country*gender/ out=work.countrygender;
run;
Partial PROC PRINT Output:
11
Country
Gender
AU
AU
CA
CA
DE
DE
IL
F
M
F
M
F
M
M
COUNT
3
5
8
7
3
7
5
PERCENT
3.8961
6.4935
10.3896
9.0909
3.8961
9.0909
6.4935
… but a two-way table is being generated in this example. Country and Gender are being used to
build the table. The output shows the same count and percent columns that we saw with the default oneway table. The PERCENT column shows percentages based on the entire data set. So the first value in the
PERCENT column tells us that 3.8961 percent of all the rows of data represent females living in AU, or
Australia.
14
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The FREQ Procedure: TABLES Statement
Using OUTPCT and OUTEXPECT
proc freq data=orion.customer;
tables country*gender / out=work.CountryGender
outexpect outpct ;
run;
Partial PROC PRINT Output:
Country
AU
AU
CA
CA
DE
Gender
F
M
F
M
F
COUNT
3
5
8
7
3
EXPECTED
PERCENT
PCT_ROW
PCT_COL
3.1169
4.8831
5.8442
9.1558
3.8961
3.8961
6.4935
10.3896
9.0909
3.8961
37.500
62.500
53.333
46.667
30.000
10.0000
10.6383
26.6667
14.8936
10.0000
12
When creating two-way tables, there are two options that work nicely with the OUT= option:
OUTEXPECT and OUTPCT. The OUTEXPECT option adds the column of EXPECTED statistics.
1. The FREQUENCY Procedure
The FREQ Procedure: TABLES Statement
Using OUTPCT and OUTEXPECT
proc freq data=orion.customer;
tables country*gender / out=work.CountryGender
outexpect outpct;
run;
Partial PROC PRINT Output:
Country
AU
AU
CA
CA
DE
Gender
F
M
F
M
F
COUNT
3
5
8
7
3
EXPECTED
PERCENT
PCT_ROW
PCT_COL
3.1169
4.8831
5.8442
9.1558
3.8961
3.8961
6.4935
10.3896
9.0909
3.8961
37.500
62.500
53.333
46.667
30.000
10.0000
10.6383
26.6667
14.8936
10.0000
13
The OUTPCT option adds two columns, PCT_ROW and PCT_COL, which contain row and column
percentages, respectively.
15
16
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The FREQ Procedure: OUTPUT Statement
When you use the OUT= option with the OUTPUT statement
in PROC FREQ, the output data set contains the following:
„
BY variables
„
variables identifying the stratum, such as A and B in the table
request A*B*C*D
„
variables containing the specified statistics
If multiple TABLES statements are used, the contents of the
output data set correspond to the last table request in the last
TABLES statement.
14
Now that we’ve looked at using the OUT= option in the TABLES statement, let’s look at using the OUT=
option in the OUTPUT statement.
• A data set created using the OUTPUT statement will contain BY variables, but only if the data set is
sorted and if a BY statement is used in PROC FREQ.
• The output data set will also contain variables identifying the stratum -- for example, A and B represent
a stratum in this table request.
• Finally, the output data set will contain variables corresponding to statistics requested in the TABLES
and OUTPUT statements.
If a TABLES statement has multiple table requests, only the last table request will be used to build the
output data set, and if there are several TABLES statements, only the last table request in the last
TABLES statement is used. If you need several output data sets to be built, you’ll need to run PROC
FREQ once for each data set you want to create.
1. The FREQUENCY Procedure
17
The FREQ Procedure: TABLES Statement
Using an OUTPUT statement without requesting statistics results
in an error message in the log.
proc freq data=orion.customer;
tables country*gender ;
output out=work.CountryStats;
run;
Partial Log:
15
When the OUTPUT statement is used, statistics must be requested -- otherwise, an error message appears
in the log. This example shows a crossing of variables, with no requested statistics on either the TABLES
or the OUTPUT statement. A WARNING message appears in the log telling us that no output data set
was created. This is very easy to fix, and it requires two things.
18
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The FREQ Procedure: TABLES Statement
Using an OUTPUT statement requires that statistics be requested
in the TABLES statement and in the OUTPUT statement.
proc freq data=orion.customer;
tables country*gender / chisq ;
output out=work.CountryStats pchi ;
run;
The CHISQ statistic in the TABLES statement requests chi-square
tests and measures of association based on chi-square.
16
First, add a request for statistics to the TABLES statement, in this case CHISQ. This tells PROC FREQ
which statistics to generate. CHISQ will generate chi-square tests and measures of association based on
chi-square.
1. The FREQUENCY Procedure
19
The FREQ Procedure: TABLES Statement
Using an OUTPUT statement requires that statistics be requested
in the TABLES statement and in the OUTPUT statement.
proc freq data=orion.customer;
tables country*gender / chisq ;
output out=work.CountryStats pchi ;
run;
The PCHI statistic in the OUTPUT statement requests that a subset
of those statistics be written to the output data set.
_PCHI_
DF_PCHI
P_PCHI
12.1484
6
0.058738
17
The second thing we need to do is specify in the OUTPUT statement which of the statistics generated by
the CHISQ option should be written to the output data set. In this case we’re asking that only the PCHI
statistics be selected.
20
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The FREQ Procedure: OUTPUT Statement
proc freq data=orion.customer;
tables country*gender / chisq;
output out=TWOWAY pchi lrchi n nmiss ;
run;
Partial Output:
N
77
NMISS
0
_PCHI_
12.1484
DF_PCHI
6
P_PCHI
0.058738
_LRCHI_
16.2584
DF_LRCHI
6
P_LRCHI
0.012432
Consult the FREQ procedure documentation for statistical options.
18
This example code shows a crossing of the Country and Gender variables in the TABLES statement, and
one requested statistic, the CHISQ, listed after the slash. In the OUTPUT statement, we see the options
PCHI, LRCHI, N, and NMISS. The PCHI option gives us the P statistics, such as _PCHI_ and DF_PCHI.
The LRCHI option gives us the LR statistics, such as _LRCHI_ and DF_LRCHI.
1. The FREQUENCY Procedure
21
PROC FREQ – the TABLES Statement
and the OUTPUT Statement
This demonstration illustrates how to create output data
sets using the OUT= option in both the TABLES and the
OUTPUT statements in PROC FREQ.
19
Now I’ll demonstrate some of the concepts I’ve just discussed.
First I’ll submit a LIBNAME statement, a null TITLE statement, and a simple OPTIONS statement which
turns off the date and page number.
All of the PROC FREQ examples use the NOPRINT option to turn off the automatic printing to the
OUTPUT window. This PROC FREQ step uses the OUT= option in the TABLES statement to create the
output data set and then a PROC PRINT step to view the data. Let me submit the two steps and see what
we get. You can see the results in the OUTPUT window – just the frequency counts and percentages,
nothing more.
The next step adds the OUTCUM option to the TABLES statement. When I submit this step and the
following PROC PRINT, we'll see that OUTCUM adds the cumulative frequency column, CUM_FREQ,
and the cumulative percent column, CUM_PCT.
Finally, before looking at the OUTPUT statement in PROC FREQ, let me show you what we can get
when working with cross tabulations. Here we’re using both Country and Gender, and the default output
I get when I submit this is just a simple count and percent. The OUTCUM option isn’t valid for two-way
tables, but the OUTPCT (outpercent) option is, and when it’s added, as in this example, we get columns
showing us both row and column percentages.
Next you’ll see some examples using the OUTPUT statement.
22
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
This example has an OUTPUT statement but no requested statistics. Now let me bring up the log, so that
you can see what happens when this step is submitted. Notice that I have a WARNING in the log saying
that no output data set is produced.
However, when I add options to the TABLES statement and OUTPUT statement, in this case CHISQ and
LRCHI, a data set is created. The CHISQ option specifies the statistics I want SAS to create, while
LRCHI tells SAS which statistics to write to the output data set.
At this point we’ve seen the two internal methods for creating output data sets using PROC FREQ, the
TABLES statement method and the OUTPUT statement method. We’ve seen that the syntax is similar,
though not the same, and we’ve seen the results each method generates. In the third section of this electure we’ll see how the Output Delivery System can also create output data sets from PROC FREQ.
Next we’ll look at PROC MEANS.
2. The MEANS Procedure
2.
23
The MEANS Procedure
Creating Output Data Sets with the FREQUENCY and
MEANS Procedures
1. The FREQUENCY Procedure
2. The MEANS Procedure
3. The Output Delivery System
20
The second section of this e-lecture looks at PROC MEANS. The MEANS procedure also uses the OUT=
option to create output data sets containing statistics.
24
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Objectives
„
Use the OUTPUT statement in PROC MEANS to create output
tables containing statistics.
„
Rename statistics being output.
21
In this section, you’ll see how to use the OUTPUT statement with PROC MEANS, and we’ll also look at
how to rename statistics for output.
2. The MEANS Procedure
25
The MEANS Procedure
PROC MEANS produces one or more output data sets by specifying
an OUTPUT statement with options.
OUTPUT OUT=SAS-data-set <options>;
22
The OUTPUT statement in the MEANS procedure is similar to the OUTPUT statement in the
FREQUENCY procedure. But unlike PROC FREQ, where only one OUTPUT statement is allowed,
PROC MEANS allows the use of multiple OUTPUT statements, with each OUTPUT statement creating a
unique data set.
26
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The MEANS Procedure
OUTPUT OUT=SAS-data-set <options>;
The output data set contains these variables:
23
„
BY variables
„
ID variables
„
class variables
„
the automatic variables _TYPE_ and _FREQ_
„
variables requested in the OUTPUT statement
„
the automatic variables _STAT_, _LEVEL_, and _WAY_ (dependent
on other syntax within PROC MEANS)
To create multiple data sets, use multiple OUTPUT statements.
The output data set will contain BY variables and ID variables, the class variables, the automatic
variables _TYPE_ and _FREQ_, any variables requested in the OUTPUT statement, and, possibly,
depending on the syntax written, the automatic variables _STAT_, _LEVEL_, and _WAY_.
2. The MEANS Procedure
27
The MEANS Procedure
default statistics
Listing statistics in the PROC MEANS
statement will impact only the MEANS
report, not the data set.
24
proc means data=Orion.Employee_PayInfo;
var salary;
class job_title country;
output out=EmpMeans;
run;
Here’s an example of default PROC MEANS output using an OUTPUT statement. The report shows two
class variables, Job_Title and Country, followed by the _TYPE_ column, which we’ll look at in
more detail shortly. Following _TYPE_ is _FREQ_, and then _STAT_ and the analysis variable,
Salary. The default statistics written to the data set are the same as the default statistics PROC MEANS
writes to the OUTPUT window, and changing statistics on the PROC MEANS statement will NOT
change what’s written to the data set. Next, we’ll look at the OUTPUT statement syntax and see how to
request specific statistics for a data set.
28
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The MEANS Procedure
The OUTPUT statement can also be used to do the following:
„
specify the statistics for the output data set
„
select and name variables
The online documentation contains an extended discussion
of creating output data sets with the MEANS procedure.
25
The OUTPUT statement can be used not only to specify the statistics we want in the output data set, but
also to name them.
2. The MEANS Procedure
The MEANS Procedure
Partial Output:
proc means data=Orion.Employees;
var salary;
class job_title country;
output out=work.EmpMeans2
mean=AvgSalary range=RangeSalary;
run;
26
As shown in this sample code, the OUTPUT statement syntax includes the OUT= option, to name the
output data set, and any requested statistics. If we don’t want to use the default statistic name, we can
rename each statistic, by following it with an equals sign and any valid SAS name. In the example, both
the MEAN and RANGE statistics are being renamed.
29
30
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The MEANS Procedure: _TYPE_
_TYPE_ is a numeric variable by default showing which combination
of class variables produced the summary statistics in that observation.
Partial Output:
27
As mentioned earlier, one of the variables written to an output data set by the MEANS procedure is
_TYPE_, which is a numeric variable (by default) showing which combination of class variables
produced the statistic listed in that observation.
2. The MEANS Procedure
The MEANS Procedure: _TYPE_
_TYPE_ is a numeric variable by default showing which combination
of class variables produced the summary statistics in that observation.
Partial Output:
overall summary
28
The “0” in the top row tells me that the statistics in this row are based on the entire data set.
31
32
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The MEANS Procedure: _TYPE_
_TYPE_ is a numeric variable by default showing which combination
of class variables produced the summary statistics in that observation.
Partial Output:
overall summary
summary by
country only
29
The 1 in the _TYPE_ column refers to statistics calculated for the Country group.
2. The MEANS Procedure
The MEANS Procedure: _TYPE_
_TYPE_ is a numeric variable by default showing which combination
of class variables produced the summary statistics in that observation.
Partial Output:
overall summary
summary by
country only
summary by
job_title only
30
The 2 in the _TYPE_ column refers to statistics calculated for the Job_Title group.
33
34
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
The MEANS Procedure: _TYPE_
_TYPE_ is a numeric variable by default showing which combination
of class variables produced the summary statistics in that observation.
Partial Output:
overall summary
summary by
country only
summary by
job_title only
summary by job_title
and country
31
And finally, the 3 near the bottom of this output refers to statistics calculated for the crossing of
Country within Job_Title.
After PROC MEANS has created this output data set, a WHERE statement might be used with PROC
PRINT to report on a subset of _TYPE_ values.
2. The MEANS Procedure
35
PROC MEANS Statement Options
The following are a few of the options that can be added to
a PROC MEANS statement when you create an output data set:
Option
32
Description
NOPRINT
suppresses the display of the statistical report. Use
NOPRINT when you want to create only an output
data set.
NWAY
specifies that the output data set contain only statistics
for the observations with the highest _TYPE_ value.
DESCENDTYPES
orders the output data set by descending _TYPE_ value.
ASCENDING is the default.
Aliases: DESCENDING, DESCEND
CHARTYPE
specifies that the _TYPE_ variable in the output data set
is a character representation of the binary value
of _TYPE_.
When using PROC MEANS to create output data sets, there are several useful options that can be added
to the PROC MEANS statement.
The NOPRINT option turns off the creation of the usual report in the Output window.
NWAY limits the statistics written to the output data set to just those with the highest _TYPE_ value –
in other words, the crossing of all classification variables. There is an example of NWAY on the next
slide.
In the example we saw on the previous slide, the _TYPE_ values began at 0 and then proceeded to 1, 2,
and 3. The DESCENDTYPES option simply reverses this order.
Finally, _TYPE_ is a numeric variable by default. The CHARTYPE option changes this to character.
Please refer to your documentation for more information about these options.
36
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
NWAY Option
Without the
NWAY option:
proc means data=orion.employees;
...
With the
NWAY option:
33
proc means data=orion.employees nway;
...
Here is an example showing the NWAY option. The first report was generated without the NWAY
option, and it shows statistics calculated for both the entire table and those calculated for each of the
classification variables. So we see _TYPE_ values of 0, 1, 2, and if we were to look farther down in the
report we would see _TYPE_ values of 3, as well.
In the second report, we get only the nway crossing – in other words, we get only the crossing of the two
classification variables, Job_Title and Country, so we see only _TYPE_ values of 3.
2. The MEANS Procedure
37
PROC MEANS and the OUTPUT Statement
This demonstration illustrates how to create output data
sets using the OUT= option in the OUTPUT statement in
PROC MEANS.
34
Now I’ll demonstrate some of the syntax discussed in the previous slides.
I don’t need to re-submit the LIBNAME, TITLE, and OPTIONS statements, so I’ll start with the first
PROC MEANS step. Before I submit the first step, notice that it has an OUTPUT statement and an OUT=
option, but that no statistics are being requested. Let me scroll to the top of the Output window.
Notice in the output that we have the five default statistics, with one row for each statistic for various
groupings of data. The 0 tells me that this row refers to the entire data set, 1 to Customer_Gender
statistics, 2 to Customer_Country statistics, and so on.
In the next example some statistic options have been added, each one building a column, and to avoid
getting default names, each is being renamed. For example, “MIN” is being renamed to
“MinimumSalary.”
When I submit this code and the following PROC PRINT, you’ll see that the data has columns
representing each of the four statistics listed – MIN, MAX, the MEAN and the RANGE, with their
corresponding names. In addition to these, I also see an _FREQ_ column and one called _TYPE_. Let
me go back to the code to show you how we can use _TYPE_.
I can use a WHERE statement in PROC PRINT to specify a particular crossing by referencing values in
the _TYPE_ column. For example, this PROC PRINT has a WHERE statement requesting only rows
38
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
where _TYPE_ = 3, which in this case would give me only the rows for the crossing of the
Customer_Country and Customer_Gender columns. I’ll submit this and show you the output.
You can see that we have only the Customer_Country and Customer_Gender crossing.
Now that we’ve seen how the OUTPUT statement works in PROC MEANS, it’s time to take a look at the
Output Delivery System.
3. The Output Delivery System
3.
39
The Output Delivery System
Creating Output Data Sets with the FREQUENCY and
MEANS Procedures
1. The FREQUENCY Procedure
2. The MEANS Procedure
3. The Output Delivery System
35
Not all procedures can create output data sets, and as you have seen, those that do, might require slightly
different syntax. The Output Delivery System, or ODS, will give us a single method for producing output
data sets, regardless of procedure. The Output Delivery System does some things differently than PROC
FREQ and PROC MEANS, making it a very good choice in certain circumstances.
40
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Objectives
„
Identify the advantages and features of the ODS method for creating
output data sets.
„
Identify properties of output objects.
„
See how the Results window or the ODS TRACE statement can be
used to output object information.
„
Use the ODS OUTPUT statement to create SAS data sets.
36
In the next slides, we’ll see some advantages of using ODS to create output data sets. We’ll also talk
about something called output objects, see how the ODS TRACE statement can help us, and see examples
of the ODS OUTPUT statement in action.
3. The Output Delivery System
41
How to Create Output Data Sets
There are two ways to create summary output data sets containing
PROC FREQ and PROC MEANS statistics:
„
Procedure Syntax Method
– Using the OUTPUT statement in PROC FREQ and
PROC MEANS
– Using the OUT= option on the TABLES statement in
PROC FREQ
„
Output Delivery System (ODS) Method
– For this method to work, the procedure must support
the OUTPUT destination.
– Use the ODS OUTPUT statement to direct procedure
output (output objects) to output data sets.
37
As you may remember, there are two ways to create output data sets from PROC FREQ and PROC
MEANS – the procedure syntax method and the Output Delivery System method. To use the proceduresyntax method we can use the OUTPUT statement, and additionaly, in PROC FREQ, we can use the
TABLES statement.
To use the Output Delivery System method, the procedure must support the ODS OUTPUT destination.
The ODS OUTPUT statement is then used, with the syntax staying the same for all procedures.
42
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
What Is the Output Delivery System?
The Output Delivery System (ODS) is an easy, flexible method
for delivering SAS procedure output to a variety of formats.
SAS Procedure:
MEANS
FREQ
UNIVARIATE, etc.
+
=
ODS Statements:
ODS OUTPUT
ODS HTML
ODS PDF, etc.
38
The Output Delivery System, or ODS, is the way that SAS output from many different procedures is
directed to many different destinations, such as HTML, XML, RTF, PDF, and more, including output
data sets.
3. The Output Delivery System
Why Use the ODS Method?
„
The syntax model for ODS OUTPUT is the same for all procedures,
whereas the procedure syntax for data set creation varies from
procedure to procedure.
„
Some procedures do not have internal support for data set creation
(for example, with an OUTPUT statement or an OUT= option).
„
Your procedure of choice might not output the statistic that you want
using the procedure syntax method.
39
There are three reasons why ODS is such a nice choice. First, the syntax is essentially the same for all
procedures. Second, as mentioned earlier, some procedures do not have internal support for data set
creation (in other words, not all procedures have an OUTPUT statement or an OUT= option). Third,
sometimes the statistic you want is not one that can be captured using the internal method, but can be
captured using ODS.
43
44
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Why Use the ODS Method?
Different results can be obtained from ODS and the procedure code:
ods output chisq=work.ChiSqData;
proc freq data=Orion.Customer;
tables country*gender / chisq;
output out=work.CountryStats lrchi;
run;
Output data set created by PROC FREQ:
Output data set created using ODS:
40
Notice here that we have both an ODS OUTPUT statement in our code before the PROC FREQ step, and
an OUTPUT statement inside the PROC FREQ step. So we’re using two methods to create the output
data set – the procedure method and the ODS method. The CHISQ option is being used in the TABLES
statement, and the LRCHI option appears in the OUTPUT statement. Now notice that the output
generated by the two methods differs, with the internal method giving the first result, with just the three
LRCHI statistics, and ODS giving us the second result, with the additional information captured, like the
Mantel-Haenszel chi-square. So one difference between using the internal procedure method and using
ODS is that we can sometimes generate different results.
3. The Output Delivery System
45
Other Features of the ODS Method
Generally, you can create only one data set for every run of the
procedure with the procedure syntax method. With ODS OUTPUT
and options, you can
„
create multiple data sets from one run of a procedure
„
select only certain output objects or BY groups
for variables’ analyses
„
create a single data set from multiple runs of the same procedure
41
Another difference between the PROC method and ODS is highlighted here: generally, only one data set
can be output for every run of the procedure when using the procedure syntax. However, with ODS you
can create multiple data sets from one run of a procedure; you can also be more specific in selecting only
the output objects, or pieces, of output you’re interested in; and you can create a single data set from
multiple runs of the same procedure.
46
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
ODS OUTPUT Syntax
The ODS OUTPUT statement produces a SAS data set from
an output object.
General form of the ODS OUTPUT statement:
ODS OUTPUT output-object = SAS-data-set ;
Must know the name
of the output object.
By default, the OUTPUT destination is closed at a step boundary.
42
The ODS OUTPUT statement is written before a procedure step, and it produces a data set from an output
object. The general syntax is the keyword ODS followed by the OUTPUT keyword, and then the output
object is referenced, followed by an equals sign and the name of the data set we want to create to hold the
statistics. The OUTPUT destination is automatically closed at a step boundary unless you use an option to
make the destination remain open, or persist. So after a procedure like PROC MEANS runs, no more data
will be written by default to the data set being created.
3. The Output Delivery System
ODS OUTPUT Syntax
The ODS OUTPUT statement produces a SAS data set from
an output object.
ods output onewayfreqs=work.customerfreqs;
proc freq data=Orion.Customer;
tables country;
run;
43
Here is an example of the ODS OUTPUT statement being used to create an output data set from PROC
FREQ. Notice that the ODS OUTPUT statement precedes the PROC FREQ step. The ODS OUTPUT
keywords are followed by a reference to the output object, in this case, ”onewayfreqs”, an equals sign,
and then the name of the data set to which we would like to write the output object. So, in this case, the
onewayfreqs output object, generated by PROC FREQ, will be written to a data set called
work.customerfreqs.
This is an example of how simple it can be to use the ODS OUTPUT statement. In the next slides, we’ll
look in more detail at the output object and how we find out its name.
47
48
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Understanding Output Objects
„
Some procedures have a single output object, and others have
multiple output objects.
„
Output objects have attributes, which includes a name and a label.
„
ODS stores a link to each output object in the Results window.
Multiple output objects. Generated
by PROC FREQ with an OUTPUT
statement.
A single output object. Generated by
PROC FREQ with a TABLES statement.
44
So, what is an output object? The simple definition of output object is that it is data produced by a
procedure, like the statistics produced by PROC MEANS. But how do we determine what an output
object is called, so that we can specify it in our ODS OUTPUT statement?
The first place to look for output object information is the Results window, which shows the label , or
description, of the output object. Some procedures might have only a single output object, whereas others
might have multiple output objects. Here we can see that the first PROC FREQ result shows two output
objects, one called Cross-Tabular Freq Table, and one called Chi-Square Tests, while the second PROC
FREQ result shows only one output object. So, it’s important to know which output object you need, in
order to refer to it properly in your ODS OUTPUT statement. The output object label in the Results
window is not the only way to refer to output objects. This is important because sometimes the label isn’t
specific enough, which can be the case when working with BY groups.
3. The Output Delivery System
Properties of the Output Objects
The Results window shows the label of the
output object.
The Properties window gives the name of the
output object in addition to other information.
45
We can get further information about an output object by right-clicking on the label and choosing
Properties from the pop-up menu. The Properties window shows us more information about an output
object, including other ways to reference the output object, like the Name and the Path.
49
50
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Properties of the Output Objects
Chi-Square Tests
46
In the Properties window shown here, we can see that the value for Name is the abbreviation ”ChiSq,”
while the label shown in the Results window is “Chi-Square Tests,” which contains a dash and a space. If
a label contains characters, which might cause problems in your code, you can refer to the label inside
quotes. Or, you can look in the Properties window. In this case we can use the name value ”ChiSq,”
which is valid as is, and can be typed in the ODS OUTPUT statement without needing quotes.
3. The Output Delivery System
51
ODS TRACE Statement
Use the ODS TRACE statement to determine names and other details
about output objects.
General form of the ODS TRACE statement:
ODS TRACE ON </option(s)> ;
SAS code that generates output objects
ODS TRACE OFF ;
47
Instead of using the Results or Properties windows to find the names of output objects, the ODS TRACE
statement can be used to find not only names of output objects, but other details as well. If we submit the
ODS TRACE ON statement before a procedure is run, the log shows additional information, including the
name and label of the output object. ODS TRACE is best turned off when no longer needed.
52
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
ODS TRACE Statement
ods trace on;
proc freq data=orion.Customer;
tables country;
run;
ods trace off;
48
Because ODS TRACE is being used in this example, there is information in the log, shown on the right,
that is similar to that seen in the Properties window. You can see the words ”Output Added,” and beneath
that you can see the headers ”Name,” ”Label,” ”Template,” and ”Path”. Of these, Name, Label, and Path
can all be used to specify that output object. However, there are times, like when working with BY
groups, for example, that the Name, Label, and Path won’t give us unique names. In that case...
3. The Output Delivery System
53
ODS TRACE Statement
ods trace on;
Output Added:
------------Name:
OneWayFreqs
Label:
One-Way Frequencies
Template:
Base.Freq.OneWayFreqs
Path:
Freq.Table1.OneWayFreqs
-------------
ods trace on / label;
49
Output Added:
------------Name:
OneWayFreqs
Label:
One-Way Frequencies
Template:
Base.Freq.OneWayFreqs
Path:
Freq.Table1.OneWayFreqs
Label Path: ‘The Freq Procedure’.‘Table Country’.‘One-Way Frequencies’
-------------
...we can add the LABEL option to the ODS TRACE statement, after the slash. This provides one extra
piece of information in the log – the Label Path. In this example the extra information is not necessary,
but as mentioned previously, there can be times, like when using BY groups, when the Label Path is very
useful because it gives us a unique identifier for that output object.
54
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Specifying the Output Object
You can specify any combination of name, label, path, or label path
in the ODS OUTPUT statement.
„
name: OneWayFreqs
„
label: 'One-Way Frequencies'
„
path: Freq.Table1.OneWayFreqs
„
partial path: Table1.OneWayFreqs
„
label path: 'The Freq Procedure'.'Table Country'.
'One-Way Frequencies'
„
partial label path: 'Table Country'.'One-Way Frequencies'
50
As we have seen, there are several places to find information we can use when specifying output objects.
With ODS TRACE ON, the log will show us a name and a label. If we add the LABEL option to ODS
TRACE, we also have the label path to use.
Any of these values can be used to specify the output object. From the examples shown on this slide,
name would probably be the simplest method. But the label could be used instead, or the path, or even a
partial path. And if you need something more specific, you can use the label path. Note that an output
object specification containing invalid characters like the space and the hyphen should be quoted.
3. The Output Delivery System
Specifying the Output Object
ods output onewayfreqs=work.country_job_title
freq.table1.onewayfreqs=work.country
freq.table2.onewayfreqs=work.job_title;
proc freq data=orion.employees;
tables country job_title;
run;
The data set WORK.JOB_TITLE has 132 observations and
7 variables.
The data set WORK.COUNTRY has 4 observations and 7 variables.
The data set WORK.COUNTRY_JOB_TITLE has 136 observations
and 9 variables.
51
Here is an example in which the ODS OUTPUT statement is creating three data sets from one run of
PROC FREQ.
Note that the TABLES statement in the PROC FREQ step has two variables listed: Country and
Job_Title. This means that a table of statistics is being created for Country and another for
Job_Title. So how is the ODS OUTPUT statement able to create three data sets from this simple
TABLES statement?
55
56
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Specifying the Output Object
ods output onewayfreqs=work.country_job_title
freq.table1.onewayfreqs=work.country
freq.table2.onewayfreqs=work.job_title;
work.country_job_title
work.country
work.job_title
52
In this example, the first output object referenced is OneWayFreqs. There are two OneWayFreqs output
objects -- one for the Country variable and one for the Job_Title variable. The one way frequency
statistics from both are written to the work.country_job_title data set.
3. The Output Delivery System
57
Specifying the Output Object
ods output onewayfreqs=work.country_job_title
freq.table1.onewayfreqs=work.country
freq.table2.onewayfreqs=work.job_title;
work.country_job_title
work.country
work.job_title
53
The other two output objects specifed are freq.table1.onewayfreqs and freq.table2.onewayfreqs. These
output objects capture the statistics for Country and Job_Title separately and store them in their
respective tables.
58
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Specifying the Output Object
ods listing close;
ods output summary=work.empsums;
proc means data=orion.employees;
class job_title;
var salary;
run;
ods listing;
54
Here is one more example of the ODS OUTPUT statement, this time with PROC MEANS. Notice the
ODS LISTING CLOSE statement. In this case the programmer wanted to create an output data set and
did not want to see anything written to the Output window. We saw the NOPRINT option used in the
PROC MEANS statement in an earlier example. The NOPRINT option is only valid when an OUTPUT
statement is also used in the PROC MEANS step. In this case, we’re not using an OUTPUT statement in
PROC MEANS, so the alternative is to use the ODS LISTING CLOSE statement.
Let’s look at what happens if the NOPRINT option is specified in the PROC MEANS statement and the
OUTPUT statement is not used.
3. The Output Delivery System
59
The NOPRINT Option
If you specify the NOPRINT option, the procedure does not send any
output to ODS.
161
162
163
164
165
ods output summary=work.empsums;
proc means data=orion.employees noprint;
class job_title;
var salary;
run;
ERROR: Neither the PRINT option nor a valid output statement has been
given.
NOTE: The SAS System stopped processing this step because of errors.
To create an output table from procedures such as the MEANS
procedure, do not use the NOPRINT option. Instead, use the following:
ODS LISTING CLOSE;
55
If the NOPRINT option is used without an OUTPUT statement in the PROC MEANS step, the result is
an ERROR message like the one seen here. So, again, if ODS OUTPUT is being used, and you want to
prevent output from also being written to the Output window, you should use an ODS LISTING CLOSE
statement before your step.
60
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
ODS LISTING CLOSE
Specify the
OUTPUT destination
ods output summary=work.empsums;
ods listing close;
Close the
LISTING destination
proc means data=orion.employees;
class job_title;
var salary;
run;
Re-open the LISTING destination
for any following reporting steps
ods listing;
56
Here we have an ODS OUTPUT statement, an ODS LISTING CLOSE statement to close the listing
destination, and finally, an ODS LISTING statement after the PROC step, to make sure that the listing
destination is available for any following reporting steps.
3. The Output Delivery System
61
Using the Output Delivery System
This demonstration illustrates how to create
output data sets using the Output Delivery System.
57
Now I’ll show you some examples using ODS statements.
Remember that the ODS OUTPUT statement produces a SAS data set from an output object, and in order
for this to happen, you need to know how to refer to the output object. Also remember that you can see
the name of an output object using several methods, one of which is to look in the Results window.
Let me run this PROC FREQ step and expand the results in the Results Window on the left. Notice the
two icons, one for Cross-Tabular Freq Table and one for Chi-Square Tests. These are object names and
could be used, within quotes, to specify the output object.
However, we can also right-click on an object name, like Chi-Square Tests and choose Properties, and
there we might find a better choice. You can see that the Name value is just ChiSq, or CHISQ, a simpleto-type, valid SAS name, which does NOT need to be quoted. I might use either one to refer to the output
object.
However, we can get the most information by using the ODS TRACE ON statement as you see here. I’m
also using the ODS TRACE OFF statement after the PROC FREQ step, because the extra information is
only necessary to generate once.
When this code is run, I’ll see ODS output object information in the log. I can use the Name, Label, or
Path to refer to the output object. Now let me move back to the editor window, where we’ll look at some
examples.
62
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Notice first this ODS statement that’s commented out. It’s one valid way to refer to the output object, and
because this value contains a space and a dash, the value must be quoted. Instead of this, I’ve used just the
name in my code, in this case CHISQ. Because no invalid characters exist here, no quotes are necessary.
When I run this, I get just the Chi-square data.
Now, you may not have noticed, but I have not yet used the NOPRINT option in the PROC statement.
Using NOPRINT on the PROC FREQ statement, requires that an OUTPUT statement also be used.
To turn off output to the OUTPUT window when using ODS, use the ODS LISTING CLOSE statement,
as you see here. And be sure to follow your code with an ODS LISTING statement to re-open the listing
destination so that further procedure statements can write to the OUTPUT window.
One last piece of code to show you is an example of how to create multiple output data sets from one
procedure. Here we’re directing three different output objects to three different output datasets.
3. The Output Delivery System
63
Summary
„
The OUT= option used in the TABLES statement in PROC FREQ
generates a table with frequency counts and percentages.
„
The OUT= option in the OUTPUT statement in PROC FREQ
generates a table with statistics, like chi squares.
„
The OUT= option is used only in the OUTPUT statement in
PROC MEANS to create output tables containing statistics.
„
The Output Delivery System (ODS) provides a method that is
syntactically the same for all procedures and requires the use
of output objects.
„
There are several ways to refer to output objects. These can be
obtained via the Results window or by using ODS TRACE, and
can be used to easily create one or more output data sets.
58
To summarize the information we’ve covered in this electure:
• Remember that the OUT= option can be used in two places in PROC FREQ – both in the TABLES and
the OUTPUT statements, and remember that OUT= in the TABLES statement gives us frequency
counts and percentages, while in the OUTPUT statement it gives us statistics.
• OUT= is also used in PROC MEANS, but only in the OUTPUT statement.
• Also, remember that ODS gives us another way to create data sets from procedure output, and its
method is the same regardless of procedure.
• When using ODS you must be able to specfiy which output object you want, and there are several ways
to determine this, including the Results window, and by using ODS TRACE and then reading the log.
Now that you have seen several methods for creating output data sets using PROC FREQ and PROC
MEANS, I hope you’ll be able to put them to good use!
64
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Related Training
The following courses provide related information:
„
SAS Programming 2: Data Manipulation Techniques
„
SAS Report Writing 1: Using Procedures and ODS
For a complete list of available e-lectures and other SAS training
products, visit:
support.sas.com/training
59
Listed here are other lectures that might also interest you. For a complete list of available e-lectures and
other SAS training products, please visit the SAS Web site at support.sas.com/training.
3. The Output Delivery System
65
Credits
Creating Output Data Sets with the FREQUENCY and MEANS
Procedures was developed by Marty Hultgren. Additional contributions
were made by Ted Meleky, Linda Mitterling, Christine Riddiough and
Cynthia Zender.
60
This concludes the e-lecture Creating Output Data Sets with the FREQUENCY and MEANS Procedures.
I hope you found the material in this lecture to be helpful for your work. Thank you to everyone who
contributed to the creation of this e-lecture.
66
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Comments?
We would like to hear what you think.
„
Do you have any comments about this lecture?
„
Did you find the information in this lecture useful?
„
What other e-lectures would you like SAS to develop in the future?
Please e-mail your comments to
EDULectures@sas.com
Or you can fill out the short evaluation form at the end of this lecture.
61
If you have any comments about this lecture or e-lectures in general, we would appreciate receiving your
input. You can use the e-mail address listed here to provide that feedback, or you can complete the short
evaluation form available at the end of this lecture.
3. The Output Delivery System
Copyright
SAS and all other SAS Institute Inc. product or service names are
registered trademarks or trademarks of SAS Institute Inc. in the
USA and other countries.
® indicates USA registration. Other brand and product names
are trademarks of their respective companies.
Copyright © 2009 by SAS Institute Inc., Cary, NC 27513, USA.
All rights reserved.
62
Thank you for your time.
67
68
Creating Output Data Sets with the FREQUENCY and MEANS Procedures
Download