Creating Output Data Sets with the FREQUENCY and MEANS Procedures Transcript Creating Output Data Sets with the FREQUENCY and MEANS Procedures Transcript was developed by Marty Hultgren. Additional contributions were made by Ted Meleky, Linda Mitterling, Christine Riddiough, and Cynthia Zender. Editing and production support was provided by the Curriculum Development and Support Department. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Creating Output Data Sets with the FREQUENCY and MEANS Procedures Transcript Copyright © 2010 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. Book code E1645, course code RLSPFPM, prepared date 29Jan2010. RLSPFPM_001 ISBN 8-1-60764-419-4 For Your Information Table of Contents Lecture Description ..................................................................................................................... iv Prerequisites ................................................................................................................................. v Accessibility Tips ......................................................................................................................... v Creating Output Data Sets with the FREQUENCY and MEANS Procedures ............. 1 1. The FREQUENCY Procedure ............................................................................................ 5 2. The MEANS Procedure .................................................................................................... 23 3. The Output Delivery System ............................................................................................ 39 iii iv For Your Information Lecture Description This SAS e-lecture shows how to use the FREQUENCY and MEANS procedures to create output data sets, as well as the Output Delivery System method. To learn more… For information on other courses in the curriculum, contact the SAS Education Division at 1-800-333-7660, or send e-mail to training@sas.com. You can also find this information on the Web at support.sas.com/training/ as well as in the Training Course Catalog. For a list of other SAS books that relate to the topics covered in this Course Notes, USA customers can contact our SAS Publishing Department at 1-800-727-3228 or send e-mail to sasbook@sas.com. Customers outside the USA, please contact your local SAS office. Also, see the Publications Catalog on the Web at support.sas.com/pubs for a complete list of books and a convenient order form. For Your Information v Prerequisites Before listening to this SAS e-lecture, you should be comfortable with basic syntax of the FREQUENCY and MEANS procedures. Specifically you should be able to generate one and two dimensional tables using the FREQUENCY procedure, and you should be able to use the MEANS procedure to generate descriptive statistics. These topics can be learned by completing the SAS Programming 1: Essentials course. Accessibility Tips If you are using a screen reader, such as Freedom Scientific’s JAWS, you may want to configure your punctuation settings so that characters used in code samples (comma, ampersand, semicolon, percent) are announced. Typically, the screen reader default for the character & is to read “and.” For clarity in code samples, you may want to configure your screen reader to read & as “ampersand.” In addition, depending on your verbosity options, the character & might be omitted. The same is true for some commas before a code variable. To confirm code lines, you may choose to read some lines character by character. When testing this scenario with Adobe Acrobat Reader 9.1 and JAWS 10, ampersands before SAS macro names were announced only when in character-reading mode. vi For Your Information Creating Output Data Sets with the FREQUENCY and MEANS Procedures 1. The FREQUENCY Procedure ............................................................................................ 5 2. The MEANS Procedure ................................................................................................... 23 3. The Output Delivery System ........................................................................................... 39 2 Creating Output Data Sets with the FREQUENCY and MEANS Procedures 1. The FREQUENCY Procedure 3 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Welcome to our e-lecture on Creating Output Data Sets with the FREQUENCY and MEANS Procedures. My name is Marty and I’m an instructor for SAS. Before we begin the lecture, let me take just a minute to point out a helpful reference that’s available to you. We’ve included a transcript so that you can print all of the information provided in this lecture. To access the transcript, select Reference and then Transcript in the table of contents on the left side of the viewer. You can print this transcript now for use when viewing the lecture, or print it later to keep as a reference. We hope you find the transcript to be a useful tool! And now, let’s get started… 4 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Creating Output Data Sets with the FREQUENCY and MEANS Procedures 1. The FREQUENCY Procedure 2. The MEANS Procedure 3. The Output Delivery System 2 Both the FREQUENCY and MEANS procedures have internal methods for creating output data sets. The FREQUENCY procedure gives us two ways to create output data sets, and the MEANS procedure gives us one way. We’ll look at these methods in this lecture, as well as the Output Delivery System, or ODS, method, because the ODS method uses syntax that is the same for all procedures, and with some procedures, ODS captures some different statistics than the internal procedure method captures. I think you’ll find both the procedure-specific methods and the Output Delivery System method to be useful. 1. The FREQUENCY Procedure 1. The FREQUENCY Procedure Creating Output Data Sets with the FREQUENCY and MEANS Procedures 1. The FREQUENCY Procedure 2. The MEANS Procedure 3. The Output Delivery System 3 The first procedure we’ll look at is the FREQUENCY procedure, also called PROC FREQ. 5 6 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Objectives Use the OUT= option in PROC FREQ to create output tables containing frequency and percentage information. Use the OUTPUT statement in PROC FREQ to create output tables containing statistics. 4 As I mentioned, PROC FREQ gives us two methods for creating output data sets – the TABLES statement method and the OUTPUT statement method. Both use the OUT= option. We’ll compare the two methods briefly, and then look at how to use each method. 1. The FREQUENCY Procedure The FREQUENCY Procedure PROC FREQ produces output data sets using the following two methods: a TABLES statement with an OUT= option TABLES variables / OUT=SAS-data-set <options>; an OUTPUT statement with an OUT= option OUTPUT OUT=SAS-data-set <options>; 5 Both methods have similar syntax, but as you’ll see in the coming examples, the output from each differs quite a bit. When using the TABLES statement, the OUT= option appears after a slash. When using the OUTPUT statement, no slash is necessary. In either case, the OUT= option is used to name the output data set that will contain the statistics. 7 8 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The FREQUENCY Procedure Each method can produce different results. The OUT= option in the TABLES statement names an output data set containing frequency counts and percentages: Country AU CA DE IL TR US COUNT 8 15 10 5 7 28 PERCENT 10.3896 19.4805 12.9870 6.4935 9.0909 36.3636 TABLES variables / OUT=SAS-data-set <options>; 6 The two methods can produce different results. When OUT= is used in the TABLES statement, the data set will contain frequency counts and percentages. 1. The FREQUENCY Procedure 9 The FREQUENCY Procedure Each method can produce different results: The OUT= option in the OUTPUT statement names an output data set containing statistics: N _PCHI_ DF_PCHI 77 37.8182 6 P_PCHI .000001219 OUTPUT OUT=SAS-data-set <options>; 7 In contrast, when OUT= is used on the OUTPUT statement, the data set will contain other statistics, such as the PCHI statistics seen here. Now let’s walk through the syntax of each method, starting with the TABLES statement method. 10 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The FREQ Procedure: TABLES Statement General form of the TABLES statement: TABLES variables / OUT=SAS-data-set <options>; When you use the OUT= option with the TABLES statement in PROC FREQ, the output data set contains the following: 8 BY variables table request variables the automatic variables count and percent variables for other statistics requested in the TABLES statement If multiple table requests appear in the TABLES statement, the contents of the data set correspond to the last table request. Here’s the general form of the TABLES statement. Note again that the OUT= option appears after the slash. Other options can be specified after the slash, as well. When the OUT= option is used in the TABLES statement, the output data set contains several items, including the following: • BY variables – if the data set is sorted and a BY statement is used in PROC FREQ. • table request variables – in other words, the variables listed in the TABLES statement that are used to build the table. So if I build a table by crossing JOBCODE with STATE, those variables are my table request variables, and they’ll appear in the output data set. • the automatic variables COUNT and PERCENT will be written to the output data set – which you’ll see shortly… • and variables for other statistics requested in the TABLES statement, such as OUTPCT. If more than one table request appears in the TABLES statement, the contents of the output data set will correspond to only the last table request. 1. The FREQUENCY Procedure 11 The FREQ Procedure: TABLES Statement The default output when you use the OUT= option with the TABLES statement contains only the frequency count and percentages: proc freq data=orion.customer; tables country / out=work.countrycounts; run; Partial PROC PRINT Output: Country COUNT PERCENT AU CA DE IL 8 15 10 5 10.3896 19.4805 12.9870 6.4935 9 The default when using the OUT = option and the TABLES statement, is to generate frequency count and percentage columns. Here’s an example where only a single variable, Country, is being requested in the TABLES statement, and no other options besides OUT= are being used. It’s easy to request that more information be written to the data set. For example… 12 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The FREQ Procedure: TABLES Statement Adding the OUTCUM option generates two cumulative columns: proc freq data=orion.customer; tables country / out=work.countrycounts outcum ; run; Partial PROC PRINT Output: Country COUNT PERCENT CUM_FREQ CUM_PCT AU CA DE IL 8 15 10 5 10.3896 19.4805 12.9870 6.4935 8 23 33 38 10.390 29.870 42.857 49.351 10 …By adding the OUTCUM option to the TABLES statement with the OUT= option, two new columns of data are generated – the cumulative frequency and cumulative percentage. The OUTCUM option is valid only for one-way tables. Both one-way and two-way tables can be built. This example shows a one-way table… 1. The FREQUENCY Procedure 13 The FREQ Procedure: TABLES Statement The OUT= option works equally well with two-way tables. proc freq data=orion.customer; tables country*gender/ out=work.countrygender; run; Partial PROC PRINT Output: 11 Country Gender AU AU CA CA DE DE IL F M F M F M M COUNT 3 5 8 7 3 7 5 PERCENT 3.8961 6.4935 10.3896 9.0909 3.8961 9.0909 6.4935 … but a two-way table is being generated in this example. Country and Gender are being used to build the table. The output shows the same count and percent columns that we saw with the default oneway table. The PERCENT column shows percentages based on the entire data set. So the first value in the PERCENT column tells us that 3.8961 percent of all the rows of data represent females living in AU, or Australia. 14 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The FREQ Procedure: TABLES Statement Using OUTPCT and OUTEXPECT proc freq data=orion.customer; tables country*gender / out=work.CountryGender outexpect outpct ; run; Partial PROC PRINT Output: Country AU AU CA CA DE Gender F M F M F COUNT 3 5 8 7 3 EXPECTED PERCENT PCT_ROW PCT_COL 3.1169 4.8831 5.8442 9.1558 3.8961 3.8961 6.4935 10.3896 9.0909 3.8961 37.500 62.500 53.333 46.667 30.000 10.0000 10.6383 26.6667 14.8936 10.0000 12 When creating two-way tables, there are two options that work nicely with the OUT= option: OUTEXPECT and OUTPCT. The OUTEXPECT option adds the column of EXPECTED statistics. 1. The FREQUENCY Procedure The FREQ Procedure: TABLES Statement Using OUTPCT and OUTEXPECT proc freq data=orion.customer; tables country*gender / out=work.CountryGender outexpect outpct; run; Partial PROC PRINT Output: Country AU AU CA CA DE Gender F M F M F COUNT 3 5 8 7 3 EXPECTED PERCENT PCT_ROW PCT_COL 3.1169 4.8831 5.8442 9.1558 3.8961 3.8961 6.4935 10.3896 9.0909 3.8961 37.500 62.500 53.333 46.667 30.000 10.0000 10.6383 26.6667 14.8936 10.0000 13 The OUTPCT option adds two columns, PCT_ROW and PCT_COL, which contain row and column percentages, respectively. 15 16 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The FREQ Procedure: OUTPUT Statement When you use the OUT= option with the OUTPUT statement in PROC FREQ, the output data set contains the following: BY variables variables identifying the stratum, such as A and B in the table request A*B*C*D variables containing the specified statistics If multiple TABLES statements are used, the contents of the output data set correspond to the last table request in the last TABLES statement. 14 Now that we’ve looked at using the OUT= option in the TABLES statement, let’s look at using the OUT= option in the OUTPUT statement. • A data set created using the OUTPUT statement will contain BY variables, but only if the data set is sorted and if a BY statement is used in PROC FREQ. • The output data set will also contain variables identifying the stratum -- for example, A and B represent a stratum in this table request. • Finally, the output data set will contain variables corresponding to statistics requested in the TABLES and OUTPUT statements. If a TABLES statement has multiple table requests, only the last table request will be used to build the output data set, and if there are several TABLES statements, only the last table request in the last TABLES statement is used. If you need several output data sets to be built, you’ll need to run PROC FREQ once for each data set you want to create. 1. The FREQUENCY Procedure 17 The FREQ Procedure: TABLES Statement Using an OUTPUT statement without requesting statistics results in an error message in the log. proc freq data=orion.customer; tables country*gender ; output out=work.CountryStats; run; Partial Log: 15 When the OUTPUT statement is used, statistics must be requested -- otherwise, an error message appears in the log. This example shows a crossing of variables, with no requested statistics on either the TABLES or the OUTPUT statement. A WARNING message appears in the log telling us that no output data set was created. This is very easy to fix, and it requires two things. 18 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The FREQ Procedure: TABLES Statement Using an OUTPUT statement requires that statistics be requested in the TABLES statement and in the OUTPUT statement. proc freq data=orion.customer; tables country*gender / chisq ; output out=work.CountryStats pchi ; run; The CHISQ statistic in the TABLES statement requests chi-square tests and measures of association based on chi-square. 16 First, add a request for statistics to the TABLES statement, in this case CHISQ. This tells PROC FREQ which statistics to generate. CHISQ will generate chi-square tests and measures of association based on chi-square. 1. The FREQUENCY Procedure 19 The FREQ Procedure: TABLES Statement Using an OUTPUT statement requires that statistics be requested in the TABLES statement and in the OUTPUT statement. proc freq data=orion.customer; tables country*gender / chisq ; output out=work.CountryStats pchi ; run; The PCHI statistic in the OUTPUT statement requests that a subset of those statistics be written to the output data set. _PCHI_ DF_PCHI P_PCHI 12.1484 6 0.058738 17 The second thing we need to do is specify in the OUTPUT statement which of the statistics generated by the CHISQ option should be written to the output data set. In this case we’re asking that only the PCHI statistics be selected. 20 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The FREQ Procedure: OUTPUT Statement proc freq data=orion.customer; tables country*gender / chisq; output out=TWOWAY pchi lrchi n nmiss ; run; Partial Output: N 77 NMISS 0 _PCHI_ 12.1484 DF_PCHI 6 P_PCHI 0.058738 _LRCHI_ 16.2584 DF_LRCHI 6 P_LRCHI 0.012432 Consult the FREQ procedure documentation for statistical options. 18 This example code shows a crossing of the Country and Gender variables in the TABLES statement, and one requested statistic, the CHISQ, listed after the slash. In the OUTPUT statement, we see the options PCHI, LRCHI, N, and NMISS. The PCHI option gives us the P statistics, such as _PCHI_ and DF_PCHI. The LRCHI option gives us the LR statistics, such as _LRCHI_ and DF_LRCHI. 1. The FREQUENCY Procedure 21 PROC FREQ – the TABLES Statement and the OUTPUT Statement This demonstration illustrates how to create output data sets using the OUT= option in both the TABLES and the OUTPUT statements in PROC FREQ. 19 Now I’ll demonstrate some of the concepts I’ve just discussed. First I’ll submit a LIBNAME statement, a null TITLE statement, and a simple OPTIONS statement which turns off the date and page number. All of the PROC FREQ examples use the NOPRINT option to turn off the automatic printing to the OUTPUT window. This PROC FREQ step uses the OUT= option in the TABLES statement to create the output data set and then a PROC PRINT step to view the data. Let me submit the two steps and see what we get. You can see the results in the OUTPUT window – just the frequency counts and percentages, nothing more. The next step adds the OUTCUM option to the TABLES statement. When I submit this step and the following PROC PRINT, we'll see that OUTCUM adds the cumulative frequency column, CUM_FREQ, and the cumulative percent column, CUM_PCT. Finally, before looking at the OUTPUT statement in PROC FREQ, let me show you what we can get when working with cross tabulations. Here we’re using both Country and Gender, and the default output I get when I submit this is just a simple count and percent. The OUTCUM option isn’t valid for two-way tables, but the OUTPCT (outpercent) option is, and when it’s added, as in this example, we get columns showing us both row and column percentages. Next you’ll see some examples using the OUTPUT statement. 22 Creating Output Data Sets with the FREQUENCY and MEANS Procedures This example has an OUTPUT statement but no requested statistics. Now let me bring up the log, so that you can see what happens when this step is submitted. Notice that I have a WARNING in the log saying that no output data set is produced. However, when I add options to the TABLES statement and OUTPUT statement, in this case CHISQ and LRCHI, a data set is created. The CHISQ option specifies the statistics I want SAS to create, while LRCHI tells SAS which statistics to write to the output data set. At this point we’ve seen the two internal methods for creating output data sets using PROC FREQ, the TABLES statement method and the OUTPUT statement method. We’ve seen that the syntax is similar, though not the same, and we’ve seen the results each method generates. In the third section of this electure we’ll see how the Output Delivery System can also create output data sets from PROC FREQ. Next we’ll look at PROC MEANS. 2. The MEANS Procedure 2. 23 The MEANS Procedure Creating Output Data Sets with the FREQUENCY and MEANS Procedures 1. The FREQUENCY Procedure 2. The MEANS Procedure 3. The Output Delivery System 20 The second section of this e-lecture looks at PROC MEANS. The MEANS procedure also uses the OUT= option to create output data sets containing statistics. 24 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Objectives Use the OUTPUT statement in PROC MEANS to create output tables containing statistics. Rename statistics being output. 21 In this section, you’ll see how to use the OUTPUT statement with PROC MEANS, and we’ll also look at how to rename statistics for output. 2. The MEANS Procedure 25 The MEANS Procedure PROC MEANS produces one or more output data sets by specifying an OUTPUT statement with options. OUTPUT OUT=SAS-data-set <options>; 22 The OUTPUT statement in the MEANS procedure is similar to the OUTPUT statement in the FREQUENCY procedure. But unlike PROC FREQ, where only one OUTPUT statement is allowed, PROC MEANS allows the use of multiple OUTPUT statements, with each OUTPUT statement creating a unique data set. 26 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The MEANS Procedure OUTPUT OUT=SAS-data-set <options>; The output data set contains these variables: 23 BY variables ID variables class variables the automatic variables _TYPE_ and _FREQ_ variables requested in the OUTPUT statement the automatic variables _STAT_, _LEVEL_, and _WAY_ (dependent on other syntax within PROC MEANS) To create multiple data sets, use multiple OUTPUT statements. The output data set will contain BY variables and ID variables, the class variables, the automatic variables _TYPE_ and _FREQ_, any variables requested in the OUTPUT statement, and, possibly, depending on the syntax written, the automatic variables _STAT_, _LEVEL_, and _WAY_. 2. The MEANS Procedure 27 The MEANS Procedure default statistics Listing statistics in the PROC MEANS statement will impact only the MEANS report, not the data set. 24 proc means data=Orion.Employee_PayInfo; var salary; class job_title country; output out=EmpMeans; run; Here’s an example of default PROC MEANS output using an OUTPUT statement. The report shows two class variables, Job_Title and Country, followed by the _TYPE_ column, which we’ll look at in more detail shortly. Following _TYPE_ is _FREQ_, and then _STAT_ and the analysis variable, Salary. The default statistics written to the data set are the same as the default statistics PROC MEANS writes to the OUTPUT window, and changing statistics on the PROC MEANS statement will NOT change what’s written to the data set. Next, we’ll look at the OUTPUT statement syntax and see how to request specific statistics for a data set. 28 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The MEANS Procedure The OUTPUT statement can also be used to do the following: specify the statistics for the output data set select and name variables The online documentation contains an extended discussion of creating output data sets with the MEANS procedure. 25 The OUTPUT statement can be used not only to specify the statistics we want in the output data set, but also to name them. 2. The MEANS Procedure The MEANS Procedure Partial Output: proc means data=Orion.Employees; var salary; class job_title country; output out=work.EmpMeans2 mean=AvgSalary range=RangeSalary; run; 26 As shown in this sample code, the OUTPUT statement syntax includes the OUT= option, to name the output data set, and any requested statistics. If we don’t want to use the default statistic name, we can rename each statistic, by following it with an equals sign and any valid SAS name. In the example, both the MEAN and RANGE statistics are being renamed. 29 30 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The MEANS Procedure: _TYPE_ _TYPE_ is a numeric variable by default showing which combination of class variables produced the summary statistics in that observation. Partial Output: 27 As mentioned earlier, one of the variables written to an output data set by the MEANS procedure is _TYPE_, which is a numeric variable (by default) showing which combination of class variables produced the statistic listed in that observation. 2. The MEANS Procedure The MEANS Procedure: _TYPE_ _TYPE_ is a numeric variable by default showing which combination of class variables produced the summary statistics in that observation. Partial Output: overall summary 28 The “0” in the top row tells me that the statistics in this row are based on the entire data set. 31 32 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The MEANS Procedure: _TYPE_ _TYPE_ is a numeric variable by default showing which combination of class variables produced the summary statistics in that observation. Partial Output: overall summary summary by country only 29 The 1 in the _TYPE_ column refers to statistics calculated for the Country group. 2. The MEANS Procedure The MEANS Procedure: _TYPE_ _TYPE_ is a numeric variable by default showing which combination of class variables produced the summary statistics in that observation. Partial Output: overall summary summary by country only summary by job_title only 30 The 2 in the _TYPE_ column refers to statistics calculated for the Job_Title group. 33 34 Creating Output Data Sets with the FREQUENCY and MEANS Procedures The MEANS Procedure: _TYPE_ _TYPE_ is a numeric variable by default showing which combination of class variables produced the summary statistics in that observation. Partial Output: overall summary summary by country only summary by job_title only summary by job_title and country 31 And finally, the 3 near the bottom of this output refers to statistics calculated for the crossing of Country within Job_Title. After PROC MEANS has created this output data set, a WHERE statement might be used with PROC PRINT to report on a subset of _TYPE_ values. 2. The MEANS Procedure 35 PROC MEANS Statement Options The following are a few of the options that can be added to a PROC MEANS statement when you create an output data set: Option 32 Description NOPRINT suppresses the display of the statistical report. Use NOPRINT when you want to create only an output data set. NWAY specifies that the output data set contain only statistics for the observations with the highest _TYPE_ value. DESCENDTYPES orders the output data set by descending _TYPE_ value. ASCENDING is the default. Aliases: DESCENDING, DESCEND CHARTYPE specifies that the _TYPE_ variable in the output data set is a character representation of the binary value of _TYPE_. When using PROC MEANS to create output data sets, there are several useful options that can be added to the PROC MEANS statement. The NOPRINT option turns off the creation of the usual report in the Output window. NWAY limits the statistics written to the output data set to just those with the highest _TYPE_ value – in other words, the crossing of all classification variables. There is an example of NWAY on the next slide. In the example we saw on the previous slide, the _TYPE_ values began at 0 and then proceeded to 1, 2, and 3. The DESCENDTYPES option simply reverses this order. Finally, _TYPE_ is a numeric variable by default. The CHARTYPE option changes this to character. Please refer to your documentation for more information about these options. 36 Creating Output Data Sets with the FREQUENCY and MEANS Procedures NWAY Option Without the NWAY option: proc means data=orion.employees; ... With the NWAY option: 33 proc means data=orion.employees nway; ... Here is an example showing the NWAY option. The first report was generated without the NWAY option, and it shows statistics calculated for both the entire table and those calculated for each of the classification variables. So we see _TYPE_ values of 0, 1, 2, and if we were to look farther down in the report we would see _TYPE_ values of 3, as well. In the second report, we get only the nway crossing – in other words, we get only the crossing of the two classification variables, Job_Title and Country, so we see only _TYPE_ values of 3. 2. The MEANS Procedure 37 PROC MEANS and the OUTPUT Statement This demonstration illustrates how to create output data sets using the OUT= option in the OUTPUT statement in PROC MEANS. 34 Now I’ll demonstrate some of the syntax discussed in the previous slides. I don’t need to re-submit the LIBNAME, TITLE, and OPTIONS statements, so I’ll start with the first PROC MEANS step. Before I submit the first step, notice that it has an OUTPUT statement and an OUT= option, but that no statistics are being requested. Let me scroll to the top of the Output window. Notice in the output that we have the five default statistics, with one row for each statistic for various groupings of data. The 0 tells me that this row refers to the entire data set, 1 to Customer_Gender statistics, 2 to Customer_Country statistics, and so on. In the next example some statistic options have been added, each one building a column, and to avoid getting default names, each is being renamed. For example, “MIN” is being renamed to “MinimumSalary.” When I submit this code and the following PROC PRINT, you’ll see that the data has columns representing each of the four statistics listed – MIN, MAX, the MEAN and the RANGE, with their corresponding names. In addition to these, I also see an _FREQ_ column and one called _TYPE_. Let me go back to the code to show you how we can use _TYPE_. I can use a WHERE statement in PROC PRINT to specify a particular crossing by referencing values in the _TYPE_ column. For example, this PROC PRINT has a WHERE statement requesting only rows 38 Creating Output Data Sets with the FREQUENCY and MEANS Procedures where _TYPE_ = 3, which in this case would give me only the rows for the crossing of the Customer_Country and Customer_Gender columns. I’ll submit this and show you the output. You can see that we have only the Customer_Country and Customer_Gender crossing. Now that we’ve seen how the OUTPUT statement works in PROC MEANS, it’s time to take a look at the Output Delivery System. 3. The Output Delivery System 3. 39 The Output Delivery System Creating Output Data Sets with the FREQUENCY and MEANS Procedures 1. The FREQUENCY Procedure 2. The MEANS Procedure 3. The Output Delivery System 35 Not all procedures can create output data sets, and as you have seen, those that do, might require slightly different syntax. The Output Delivery System, or ODS, will give us a single method for producing output data sets, regardless of procedure. The Output Delivery System does some things differently than PROC FREQ and PROC MEANS, making it a very good choice in certain circumstances. 40 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Objectives Identify the advantages and features of the ODS method for creating output data sets. Identify properties of output objects. See how the Results window or the ODS TRACE statement can be used to output object information. Use the ODS OUTPUT statement to create SAS data sets. 36 In the next slides, we’ll see some advantages of using ODS to create output data sets. We’ll also talk about something called output objects, see how the ODS TRACE statement can help us, and see examples of the ODS OUTPUT statement in action. 3. The Output Delivery System 41 How to Create Output Data Sets There are two ways to create summary output data sets containing PROC FREQ and PROC MEANS statistics: Procedure Syntax Method – Using the OUTPUT statement in PROC FREQ and PROC MEANS – Using the OUT= option on the TABLES statement in PROC FREQ Output Delivery System (ODS) Method – For this method to work, the procedure must support the OUTPUT destination. – Use the ODS OUTPUT statement to direct procedure output (output objects) to output data sets. 37 As you may remember, there are two ways to create output data sets from PROC FREQ and PROC MEANS – the procedure syntax method and the Output Delivery System method. To use the proceduresyntax method we can use the OUTPUT statement, and additionaly, in PROC FREQ, we can use the TABLES statement. To use the Output Delivery System method, the procedure must support the ODS OUTPUT destination. The ODS OUTPUT statement is then used, with the syntax staying the same for all procedures. 42 Creating Output Data Sets with the FREQUENCY and MEANS Procedures What Is the Output Delivery System? The Output Delivery System (ODS) is an easy, flexible method for delivering SAS procedure output to a variety of formats. SAS Procedure: MEANS FREQ UNIVARIATE, etc. + = ODS Statements: ODS OUTPUT ODS HTML ODS PDF, etc. 38 The Output Delivery System, or ODS, is the way that SAS output from many different procedures is directed to many different destinations, such as HTML, XML, RTF, PDF, and more, including output data sets. 3. The Output Delivery System Why Use the ODS Method? The syntax model for ODS OUTPUT is the same for all procedures, whereas the procedure syntax for data set creation varies from procedure to procedure. Some procedures do not have internal support for data set creation (for example, with an OUTPUT statement or an OUT= option). Your procedure of choice might not output the statistic that you want using the procedure syntax method. 39 There are three reasons why ODS is such a nice choice. First, the syntax is essentially the same for all procedures. Second, as mentioned earlier, some procedures do not have internal support for data set creation (in other words, not all procedures have an OUTPUT statement or an OUT= option). Third, sometimes the statistic you want is not one that can be captured using the internal method, but can be captured using ODS. 43 44 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Why Use the ODS Method? Different results can be obtained from ODS and the procedure code: ods output chisq=work.ChiSqData; proc freq data=Orion.Customer; tables country*gender / chisq; output out=work.CountryStats lrchi; run; Output data set created by PROC FREQ: Output data set created using ODS: 40 Notice here that we have both an ODS OUTPUT statement in our code before the PROC FREQ step, and an OUTPUT statement inside the PROC FREQ step. So we’re using two methods to create the output data set – the procedure method and the ODS method. The CHISQ option is being used in the TABLES statement, and the LRCHI option appears in the OUTPUT statement. Now notice that the output generated by the two methods differs, with the internal method giving the first result, with just the three LRCHI statistics, and ODS giving us the second result, with the additional information captured, like the Mantel-Haenszel chi-square. So one difference between using the internal procedure method and using ODS is that we can sometimes generate different results. 3. The Output Delivery System 45 Other Features of the ODS Method Generally, you can create only one data set for every run of the procedure with the procedure syntax method. With ODS OUTPUT and options, you can create multiple data sets from one run of a procedure select only certain output objects or BY groups for variables’ analyses create a single data set from multiple runs of the same procedure 41 Another difference between the PROC method and ODS is highlighted here: generally, only one data set can be output for every run of the procedure when using the procedure syntax. However, with ODS you can create multiple data sets from one run of a procedure; you can also be more specific in selecting only the output objects, or pieces, of output you’re interested in; and you can create a single data set from multiple runs of the same procedure. 46 Creating Output Data Sets with the FREQUENCY and MEANS Procedures ODS OUTPUT Syntax The ODS OUTPUT statement produces a SAS data set from an output object. General form of the ODS OUTPUT statement: ODS OUTPUT output-object = SAS-data-set ; Must know the name of the output object. By default, the OUTPUT destination is closed at a step boundary. 42 The ODS OUTPUT statement is written before a procedure step, and it produces a data set from an output object. The general syntax is the keyword ODS followed by the OUTPUT keyword, and then the output object is referenced, followed by an equals sign and the name of the data set we want to create to hold the statistics. The OUTPUT destination is automatically closed at a step boundary unless you use an option to make the destination remain open, or persist. So after a procedure like PROC MEANS runs, no more data will be written by default to the data set being created. 3. The Output Delivery System ODS OUTPUT Syntax The ODS OUTPUT statement produces a SAS data set from an output object. ods output onewayfreqs=work.customerfreqs; proc freq data=Orion.Customer; tables country; run; 43 Here is an example of the ODS OUTPUT statement being used to create an output data set from PROC FREQ. Notice that the ODS OUTPUT statement precedes the PROC FREQ step. The ODS OUTPUT keywords are followed by a reference to the output object, in this case, ”onewayfreqs”, an equals sign, and then the name of the data set to which we would like to write the output object. So, in this case, the onewayfreqs output object, generated by PROC FREQ, will be written to a data set called work.customerfreqs. This is an example of how simple it can be to use the ODS OUTPUT statement. In the next slides, we’ll look in more detail at the output object and how we find out its name. 47 48 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Understanding Output Objects Some procedures have a single output object, and others have multiple output objects. Output objects have attributes, which includes a name and a label. ODS stores a link to each output object in the Results window. Multiple output objects. Generated by PROC FREQ with an OUTPUT statement. A single output object. Generated by PROC FREQ with a TABLES statement. 44 So, what is an output object? The simple definition of output object is that it is data produced by a procedure, like the statistics produced by PROC MEANS. But how do we determine what an output object is called, so that we can specify it in our ODS OUTPUT statement? The first place to look for output object information is the Results window, which shows the label , or description, of the output object. Some procedures might have only a single output object, whereas others might have multiple output objects. Here we can see that the first PROC FREQ result shows two output objects, one called Cross-Tabular Freq Table, and one called Chi-Square Tests, while the second PROC FREQ result shows only one output object. So, it’s important to know which output object you need, in order to refer to it properly in your ODS OUTPUT statement. The output object label in the Results window is not the only way to refer to output objects. This is important because sometimes the label isn’t specific enough, which can be the case when working with BY groups. 3. The Output Delivery System Properties of the Output Objects The Results window shows the label of the output object. The Properties window gives the name of the output object in addition to other information. 45 We can get further information about an output object by right-clicking on the label and choosing Properties from the pop-up menu. The Properties window shows us more information about an output object, including other ways to reference the output object, like the Name and the Path. 49 50 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Properties of the Output Objects Chi-Square Tests 46 In the Properties window shown here, we can see that the value for Name is the abbreviation ”ChiSq,” while the label shown in the Results window is “Chi-Square Tests,” which contains a dash and a space. If a label contains characters, which might cause problems in your code, you can refer to the label inside quotes. Or, you can look in the Properties window. In this case we can use the name value ”ChiSq,” which is valid as is, and can be typed in the ODS OUTPUT statement without needing quotes. 3. The Output Delivery System 51 ODS TRACE Statement Use the ODS TRACE statement to determine names and other details about output objects. General form of the ODS TRACE statement: ODS TRACE ON </option(s)> ; SAS code that generates output objects ODS TRACE OFF ; 47 Instead of using the Results or Properties windows to find the names of output objects, the ODS TRACE statement can be used to find not only names of output objects, but other details as well. If we submit the ODS TRACE ON statement before a procedure is run, the log shows additional information, including the name and label of the output object. ODS TRACE is best turned off when no longer needed. 52 Creating Output Data Sets with the FREQUENCY and MEANS Procedures ODS TRACE Statement ods trace on; proc freq data=orion.Customer; tables country; run; ods trace off; 48 Because ODS TRACE is being used in this example, there is information in the log, shown on the right, that is similar to that seen in the Properties window. You can see the words ”Output Added,” and beneath that you can see the headers ”Name,” ”Label,” ”Template,” and ”Path”. Of these, Name, Label, and Path can all be used to specify that output object. However, there are times, like when working with BY groups, for example, that the Name, Label, and Path won’t give us unique names. In that case... 3. The Output Delivery System 53 ODS TRACE Statement ods trace on; Output Added: ------------Name: OneWayFreqs Label: One-Way Frequencies Template: Base.Freq.OneWayFreqs Path: Freq.Table1.OneWayFreqs ------------- ods trace on / label; 49 Output Added: ------------Name: OneWayFreqs Label: One-Way Frequencies Template: Base.Freq.OneWayFreqs Path: Freq.Table1.OneWayFreqs Label Path: ‘The Freq Procedure’.‘Table Country’.‘One-Way Frequencies’ ------------- ...we can add the LABEL option to the ODS TRACE statement, after the slash. This provides one extra piece of information in the log – the Label Path. In this example the extra information is not necessary, but as mentioned previously, there can be times, like when using BY groups, when the Label Path is very useful because it gives us a unique identifier for that output object. 54 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Specifying the Output Object You can specify any combination of name, label, path, or label path in the ODS OUTPUT statement. name: OneWayFreqs label: 'One-Way Frequencies' path: Freq.Table1.OneWayFreqs partial path: Table1.OneWayFreqs label path: 'The Freq Procedure'.'Table Country'. 'One-Way Frequencies' partial label path: 'Table Country'.'One-Way Frequencies' 50 As we have seen, there are several places to find information we can use when specifying output objects. With ODS TRACE ON, the log will show us a name and a label. If we add the LABEL option to ODS TRACE, we also have the label path to use. Any of these values can be used to specify the output object. From the examples shown on this slide, name would probably be the simplest method. But the label could be used instead, or the path, or even a partial path. And if you need something more specific, you can use the label path. Note that an output object specification containing invalid characters like the space and the hyphen should be quoted. 3. The Output Delivery System Specifying the Output Object ods output onewayfreqs=work.country_job_title freq.table1.onewayfreqs=work.country freq.table2.onewayfreqs=work.job_title; proc freq data=orion.employees; tables country job_title; run; The data set WORK.JOB_TITLE has 132 observations and 7 variables. The data set WORK.COUNTRY has 4 observations and 7 variables. The data set WORK.COUNTRY_JOB_TITLE has 136 observations and 9 variables. 51 Here is an example in which the ODS OUTPUT statement is creating three data sets from one run of PROC FREQ. Note that the TABLES statement in the PROC FREQ step has two variables listed: Country and Job_Title. This means that a table of statistics is being created for Country and another for Job_Title. So how is the ODS OUTPUT statement able to create three data sets from this simple TABLES statement? 55 56 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Specifying the Output Object ods output onewayfreqs=work.country_job_title freq.table1.onewayfreqs=work.country freq.table2.onewayfreqs=work.job_title; work.country_job_title work.country work.job_title 52 In this example, the first output object referenced is OneWayFreqs. There are two OneWayFreqs output objects -- one for the Country variable and one for the Job_Title variable. The one way frequency statistics from both are written to the work.country_job_title data set. 3. The Output Delivery System 57 Specifying the Output Object ods output onewayfreqs=work.country_job_title freq.table1.onewayfreqs=work.country freq.table2.onewayfreqs=work.job_title; work.country_job_title work.country work.job_title 53 The other two output objects specifed are freq.table1.onewayfreqs and freq.table2.onewayfreqs. These output objects capture the statistics for Country and Job_Title separately and store them in their respective tables. 58 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Specifying the Output Object ods listing close; ods output summary=work.empsums; proc means data=orion.employees; class job_title; var salary; run; ods listing; 54 Here is one more example of the ODS OUTPUT statement, this time with PROC MEANS. Notice the ODS LISTING CLOSE statement. In this case the programmer wanted to create an output data set and did not want to see anything written to the Output window. We saw the NOPRINT option used in the PROC MEANS statement in an earlier example. The NOPRINT option is only valid when an OUTPUT statement is also used in the PROC MEANS step. In this case, we’re not using an OUTPUT statement in PROC MEANS, so the alternative is to use the ODS LISTING CLOSE statement. Let’s look at what happens if the NOPRINT option is specified in the PROC MEANS statement and the OUTPUT statement is not used. 3. The Output Delivery System 59 The NOPRINT Option If you specify the NOPRINT option, the procedure does not send any output to ODS. 161 162 163 164 165 ods output summary=work.empsums; proc means data=orion.employees noprint; class job_title; var salary; run; ERROR: Neither the PRINT option nor a valid output statement has been given. NOTE: The SAS System stopped processing this step because of errors. To create an output table from procedures such as the MEANS procedure, do not use the NOPRINT option. Instead, use the following: ODS LISTING CLOSE; 55 If the NOPRINT option is used without an OUTPUT statement in the PROC MEANS step, the result is an ERROR message like the one seen here. So, again, if ODS OUTPUT is being used, and you want to prevent output from also being written to the Output window, you should use an ODS LISTING CLOSE statement before your step. 60 Creating Output Data Sets with the FREQUENCY and MEANS Procedures ODS LISTING CLOSE Specify the OUTPUT destination ods output summary=work.empsums; ods listing close; Close the LISTING destination proc means data=orion.employees; class job_title; var salary; run; Re-open the LISTING destination for any following reporting steps ods listing; 56 Here we have an ODS OUTPUT statement, an ODS LISTING CLOSE statement to close the listing destination, and finally, an ODS LISTING statement after the PROC step, to make sure that the listing destination is available for any following reporting steps. 3. The Output Delivery System 61 Using the Output Delivery System This demonstration illustrates how to create output data sets using the Output Delivery System. 57 Now I’ll show you some examples using ODS statements. Remember that the ODS OUTPUT statement produces a SAS data set from an output object, and in order for this to happen, you need to know how to refer to the output object. Also remember that you can see the name of an output object using several methods, one of which is to look in the Results window. Let me run this PROC FREQ step and expand the results in the Results Window on the left. Notice the two icons, one for Cross-Tabular Freq Table and one for Chi-Square Tests. These are object names and could be used, within quotes, to specify the output object. However, we can also right-click on an object name, like Chi-Square Tests and choose Properties, and there we might find a better choice. You can see that the Name value is just ChiSq, or CHISQ, a simpleto-type, valid SAS name, which does NOT need to be quoted. I might use either one to refer to the output object. However, we can get the most information by using the ODS TRACE ON statement as you see here. I’m also using the ODS TRACE OFF statement after the PROC FREQ step, because the extra information is only necessary to generate once. When this code is run, I’ll see ODS output object information in the log. I can use the Name, Label, or Path to refer to the output object. Now let me move back to the editor window, where we’ll look at some examples. 62 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Notice first this ODS statement that’s commented out. It’s one valid way to refer to the output object, and because this value contains a space and a dash, the value must be quoted. Instead of this, I’ve used just the name in my code, in this case CHISQ. Because no invalid characters exist here, no quotes are necessary. When I run this, I get just the Chi-square data. Now, you may not have noticed, but I have not yet used the NOPRINT option in the PROC statement. Using NOPRINT on the PROC FREQ statement, requires that an OUTPUT statement also be used. To turn off output to the OUTPUT window when using ODS, use the ODS LISTING CLOSE statement, as you see here. And be sure to follow your code with an ODS LISTING statement to re-open the listing destination so that further procedure statements can write to the OUTPUT window. One last piece of code to show you is an example of how to create multiple output data sets from one procedure. Here we’re directing three different output objects to three different output datasets. 3. The Output Delivery System 63 Summary The OUT= option used in the TABLES statement in PROC FREQ generates a table with frequency counts and percentages. The OUT= option in the OUTPUT statement in PROC FREQ generates a table with statistics, like chi squares. The OUT= option is used only in the OUTPUT statement in PROC MEANS to create output tables containing statistics. The Output Delivery System (ODS) provides a method that is syntactically the same for all procedures and requires the use of output objects. There are several ways to refer to output objects. These can be obtained via the Results window or by using ODS TRACE, and can be used to easily create one or more output data sets. 58 To summarize the information we’ve covered in this electure: • Remember that the OUT= option can be used in two places in PROC FREQ – both in the TABLES and the OUTPUT statements, and remember that OUT= in the TABLES statement gives us frequency counts and percentages, while in the OUTPUT statement it gives us statistics. • OUT= is also used in PROC MEANS, but only in the OUTPUT statement. • Also, remember that ODS gives us another way to create data sets from procedure output, and its method is the same regardless of procedure. • When using ODS you must be able to specfiy which output object you want, and there are several ways to determine this, including the Results window, and by using ODS TRACE and then reading the log. Now that you have seen several methods for creating output data sets using PROC FREQ and PROC MEANS, I hope you’ll be able to put them to good use! 64 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Related Training The following courses provide related information: SAS Programming 2: Data Manipulation Techniques SAS Report Writing 1: Using Procedures and ODS For a complete list of available e-lectures and other SAS training products, visit: support.sas.com/training 59 Listed here are other lectures that might also interest you. For a complete list of available e-lectures and other SAS training products, please visit the SAS Web site at support.sas.com/training. 3. The Output Delivery System 65 Credits Creating Output Data Sets with the FREQUENCY and MEANS Procedures was developed by Marty Hultgren. Additional contributions were made by Ted Meleky, Linda Mitterling, Christine Riddiough and Cynthia Zender. 60 This concludes the e-lecture Creating Output Data Sets with the FREQUENCY and MEANS Procedures. I hope you found the material in this lecture to be helpful for your work. Thank you to everyone who contributed to the creation of this e-lecture. 66 Creating Output Data Sets with the FREQUENCY and MEANS Procedures Comments? We would like to hear what you think. Do you have any comments about this lecture? Did you find the information in this lecture useful? What other e-lectures would you like SAS to develop in the future? Please e-mail your comments to EDULectures@sas.com Or you can fill out the short evaluation form at the end of this lecture. 61 If you have any comments about this lecture or e-lectures in general, we would appreciate receiving your input. You can use the e-mail address listed here to provide that feedback, or you can complete the short evaluation form available at the end of this lecture. 3. The Output Delivery System Copyright SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2009 by SAS Institute Inc., Cary, NC 27513, USA. All rights reserved. 62 Thank you for your time. 67 68 Creating Output Data Sets with the FREQUENCY and MEANS Procedures