Using SAS PC with Windows USING SAS PC WITH WINDOWS Statistics 511 Professor Naomi Altman revised from previous editions by McShane and Altman, and Nshinyabakobeje and Altman TABLE OF CONTENTS A. OVERVIEW OF THE SAS SYSTEM 2 B. TWO STEPS NEEDED IN THE SAS PROGRAMMING LANGUAGE 2 C. SYNTAX 3 D. CHARACTERISTICS OF A SAS DATA SET 4 D. CREATING A SAS DATA SET 5 E. CREATING A SAS PROGRAM FOR DESCRIPTIVE STATISTICS 6 F. PRODUCING A REPORT FROM A SAS OUTPUT 10 G. HELPFUL HINTS 10 1 Using SAS PC with Windows A. OVERVIEW OF THE SAS SYSTEM SAS (Statistical Analysis System) is a software system/package for data analysis. SAS provides tools for: information storage and file handling; data modification and management; statistical analysis; and report writing. The SAS system is a powerful programming language plus a collection of ready-to-use programs called procedures or PROC’s, which can perform a large variety of applications. We will use primarily the Basic and Statistical tools – a small fraction of the capabilities of SAS. On-line documentation is available at www.sas.psu.edu. There is also on-line help when you run SAS PC, but this is difficult to use. B. TWO STEPS NEEDED IN THE SAS PROGRAMMING LANGUAGE The SAS language has its own vocabulary and syntax - words and the rules for putting them together. A SAS statement is a string of SAS keywords, SAS names, and special characters and operators ending in a semicolon that instructs SAS to perform an operation or gives SAS information. A sequence of SAS statements is called a SAS program. A SAS program consists of two kinds of steps: DATA steps and PROC steps. DATA and PROC steps can appear in any order, and any number of DATA and PROC steps can be used in a SAS program. Usually, DATA steps create SAS data sets, and PROC steps do analysis of SAS data sets. A PROC may also create variables, such as residuals and fitted values, which can be placed in a new data set or appended to an existing data set. A DATA step is a group of SAS statements that begins with a DATA statement. Example: DATA ONE; Creates a data set named "ONE". INFILE 'A:YIELD.TXT'; INPUT TREAT REP YIELD; LOGY=LOG (YIELD); Reads the data from the file A:YIELD.TXT The file has 3 variables named TREAT, REP and YIELD. A new variable LOGY is created and added to "ONE". The DATA step begins with a DATA statement and can include any number of program statements. 2 Using SAS PC with Windows You can use the DATA step for these purposes: * retrieval: getting input data from a file * editing: checking for errors in the data and correcting them; computing new variables; * outputting: write data sets to disk; * creating: producing new SAS data sets from existing ones by subsetting, merging, and updating. Every SAS data set has a name. By default, SAS uses the currently active data set, which is the one most recently called as input to a DATA or PROC statement. If your program uses several data sets, it is best to call the required data set using DATA=datasetname when you need to use it. That will avoid problems as you change your program. The DATA step can include statements telling SAS to create one or more new SAS data sets and programming statements that perform the manipulations necessary to build the data sets. Creating a new data set does not change the currently active data set. A PROC is a group of SAS statements that begins with a PROC statement. Example: PROC REG DATA=HOUSING; MODEL PRICE=SQFT NOBEDRM; OUTPUT OUT=NEWDATA R=R P=P; Calls the regression procedure with data set “HOUSING” Tells SAS which are the dependent and independent variables. Stores the created variables, R and P in a new data set called “NEWDATA” The PROC step (or PROCEDURE step) instructs SAS to call a procedure from its library and to execute that procedure, usually with a SAS data set as input. The PROC step begins with a PROC statement. Other statements in the PROC step give the program more information about the results that you want. C. SYNTAX There are 4 main syntax rules: 1. Every SAS statement ends with a semi-colon “;”. Failure to include the semi-colon is the most common error, and unfortunately leads to error messages that are difficult to decipher. 2. Variables or data set names should contain 8 or fewer characters or digits. 3. SAS is case insensitive. DOG, Dog and dog all mean the same thing to SAS. 3 Using SAS PC with Windows 4. SAS ignores “end of line” and multiple spaces. D. CHARACTERISTICS OF A SAS DATA SET The SAS system reads data (letters or numbers) in various forms and organizes them into a SAS data set which is similar to a spreadsheet. Once the data have been organized into a SAS data set, you can access, analyze, revise, and display the data. You can also store datasets – however, for small datasets, it is most convenient to store them as text files. The data consist of the following components: data value, variable, and observation. Data value is a single unit of information (a single cell) Variable is a set of data values describing a characteristic (a single column) Observation is a set of data values for the same item (a single row) The data set following contains 5 variables, 18 observations, and 90 data values, one of which is missing. variables NAME 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Aubrey Ron Carl Antonio Deborah Jacqueline Helen David James Michael Ruth Joel Donna Roger Yao Elizabeth Tim Susan SEX M M M M F F F M M M F M F M M F M F AGE 41 42 32 39 30 33 26 30 53 32 47 34 23 36 . 31 29 28 HEIGHT 74 68 70 72 66 66 64 71 72 69 69 72 62 75 70 67 71 65 WEIGHT 170 166 155 167 124 115 121 158 175 143 139 163 98 160 145 135 176 131 observation missing value data values Now you are ready to create a data set file and a SAS program file. 4 Using SAS PC with Windows E. CREATING A SAS DATA SET The initial step in most SAS programs will involve reading data from a text file. Any text processor or spreadsheet can be used to create the file. We demonstrate using Notepad. Usually I use my favorite text editor and save in txt format. (In the SAS manuals you will occasionally see data imbedded in a DATA step. However, this is awkward and means that a new program file needs to be written every time the data are modified.) The data set to be created consists of two variables measured on a random sample of 9 steers. The first variable is the live weight (in hundreds of pounds) and the second variable is the dressed weight (in hundreds of pounds). This sample data set will be used to obtain simple summary statistics and a Normal Probability Plot. Live weight 4.2 3.8 4.8 3.4 4.5 4.6 4.3 3.7 3.9 Dressed weight 2.8 2.5 3.1 2.1 2.9 2.8 2.6 2.4 2.5 Data from Lyman Ott, An Introduction to Statistical Methods and Data Analysis, p. 143. Start notepad as follows: Start > Programs > Accessories > Notepad. Now, type the following data set. 4.2 2.8 3.8 2.5 4.8 3.1 3.4 2.1 4.5 2.9 4.6 2.8 4.3 2.6 3.7 2.4 3.9 2.5 Save the file on a diskette in drive A: as follows. File > Save > A:\STEERS.TXT > Save. File > Exit. Now you are going to create the SAS Program. E. CREATING A SAS PROGRAM FOR DESCRIPTIVE STATISTICS 5 Using SAS PC with Windows Although SAS has some interactive features, it is basically a batch program. This means that it is convenient to create and save programs as text files. Usually, I create my program in my favorite text editor, and save it as a txt file, with extension “.sas” instead of “.txt”. Then clicking on the file opens the program and places the text in the SAS text editor, from where it can be run. You can also create and save your program in the SAS text editor. Instructions are below. Start SAS as follows: Start > Programs > The SAS System > The SAS System for Windows v8 After opening SAS, one is prompted to the following screen with two windows (see below). The upper window is a Log window showing the SAS statements which have already been processed, along with comments. The bottom window is the Program editor window. You will enter and edit your SAS program in the Program editor window Now, create the following SAS program in the Program Editor window. Use upper or lower case letters as you choose. 6 Using SAS PC with Windows /* THIS PROGRAM IS USED TO CREATE A SMALL SAS PROGRAM WRITTEN BY: LAST NAME, FIRST NAME OF STUDENT. DATE: MONTH/DAY/YEAR */ The text above which is delimited by /* */ is a comment and is ignored by SAS. It is helpful to use comments as a way to document your data. OPTIONS LS=79 NOCENTER; TITLE 'SUMMARY STATISTICS'; DATA MARY; INFILE 'A:STEERS.TXT'; INPUT LIVEWT DRESSWT; TITLE2 'PRINTING LIVEWT'; PROC PRINT DATA=MARY; VAR LIVEWT; RUN; OPTIONS picks options for the output. LS= selects the number of characters per line. TITLE provides a title that appears on each page of the output. We now create the data set named "MARY". We read the data from A:STEERS.TXT There are two variables named LIVEWT and DRESSWT. The variable names are separated by blanks. You need to name all variables in the data set, even if you do not want to use them all. TITLE2 provides a subtitle. It can be used e.g. if several analyses are performed in the same program. We now run our first PROC. It prints some or all of the data in data set MARY. Tells SAS to print LIVEWT only. Otherwise it prints all of the data. The RUN command can be used to terminate a PROC or DATA step. Commands you submit will not run until a RUN command is added. Try to run the SAS program now by clicking the SUBMIT icon (the running figure), or by pressing the key function F3. Look at your SAS output. You should see a list of the data. If you do not, you have made an error. However, you can continue reading this tutorial, as error correction is the next topic. Whether or not the output appears, open the Log window as follows: Window > Log. You should see the SAS commands you entered, with comments about how they executed, including error messages (in red) if any. Warnings are printed in green. We now want to see what happens if you make an error. To do this, we will start over. You can clear any window by clicking on the window to make it active. Then clear the window as follows: Edit > Clear Text. Clear the OUTPUT and LOG windows. (Window>OUTPUT Edit>Clear Text Window>LOG Edit>Clear Text) Recall the current SAS file in the Program Editor window as follows: Window > Program Editor to open the window and Locals > Recall Text to bring back the most recently submitted text. (Repeating Recall Text brings back the second most recently submitted text, etc.) To see how SAS handles errors in a program file, change the statement PROC PRINT DATA=MARY; to PROC PRINT 7 Using SAS PC with Windows DATA=MARIAM; in the program above, then run SAS. You will get the following error message (written in red) in the LOG window. ERROR: File WORK.MARIAM.DATA does not exist. Once a SAS program has been submitted for processing, error messages are written in the Log window. They can be accessed as follows: Window > Log. Now open the Log window and scroll down to read the error message. As mentioned previously, it is assumed that you have cleared the Log window of its previous contents. If not, clear this window and run SAS program again. Reopen your SAS program file as follows: Window > Program Editor > Locals > Recall Text. Go to the statement PROC PRINT DATA=MARIAM; and change MARIAM back to MARY. Add the remaining statements below to your SAS program. PROC UNIVARIATE DATA=MARY; VAR LIVEWT DRESSWT; QQPLOT; RUN; PROC UNIVARIATE prints summary statistics. It is part of SAS BASIC, rather than SAS STATISTICS. We will obtain summary statistics for both variables. We request a Normal Probability Plot for both variables. Save the SAS program file on A:\ drive as follows: File > Save > A:\steers.sas > Save To run the SAS program, click on the SUBMIT icon or simply press the F3 key. Other key functions are defined under: Help > Key. By selecting the Window menu, you can open the Output, Program Editor, and Log windows whenever necessary. The Output window can be selected and opened the same way the other two windows are opened. You can save the Output window’s contents as follows: File > Save > A:\steers.lst > Save I usually cut and paste the entire window into a text editor as described below. 8 Using SAS PC with Windows F. PRODUCING A REPORT FROM A SAS OUTPUT There are many different ways to produce a report using SAS output. We will go through one way, which assumes that you have a text editor such as Word on your computer and that your computer is powerful enough to run the editor and SAS at the same time. Start the Text Editor. In the editor, write your report. For example, type the heading and introductory material describing the problem you analyzed. Discuss your analysis of the data. Suppose you would like to include a SAS analysis output in your report. Copy analysis output to the Clipboard: Open the SAS output: Window > output. Edit> Select all; Edit> Copy; Note that copying only a portion of the output by highlighting does not always work. I copy the entire output to a text editor, and edit there. Open the text editor and paste your SAS output: Edit > Paste. You will realize that the SAS output has the SAS monospace font size 10 by default. You need to modify the font size for a better output. Proceed as follows to modify the font size of your pasted output. e.g. In Microsoft Word, choose Edit > Select All and change the SAS monospace font to size to 8. Now you can edit your word document by adding text and/or removing parts of the output you judge unimportant. You can also copy and paste graph sheets directly into your document. Save your report as follows: File > Save > A:\report.doc> Save Now you can print your report in Microsoft Word as follows: File > Print > Click OK 9 Using SAS PC with Windows G. HELPFUL HINTS Every SAS statement ends with a semicolon ;. You may continue statements on two or more lines. Forgetting the semicolon (;) leads to error messages which are hard to decipher. This is always the first thing to check if your program does not run. The next most common error is misspelling a SAS command, or variable name. SAS variable names can be no more than 8 characters long. The third most common error is trying to use a variable that is not available. For example, this error can occur if you try to draw residual plot, but forgot to store the residuals from the regression or if you are in the wrong data set. SAS ignores extra blanks, including blank lines, so you can space your program so that it is neat and easy to read. You can put more than one statement on a line, but this usually makes your program hard to read. SAS program files can get quite long, so it is useful to keep them readable. Statements added in the SAS editor are not saved in your SAS program file. If you want to have them available for future use, you must explicitly save them using file>save Most PROCs can use only one data set. If variables from multiple data sets are needed, the data sets can be merged in a DATA step. The online SAS help can be difficult to use due to statements with the same name in different PROCs. When seeking help for a PROC, try searching on PROC. This gives a list of all the PROCs. You can then click on the PROC you want, which gives a list of the statements valid for that PROC. You will likely find the online manual more useful. 10