SASTUTORIAL

advertisement
Using SAS PC with Windows
USING SAS PC WITH WINDOWS
Statistics 511
Professor Naomi Altman
revised from previous editions by McShane and Altman,
and Nshinyabakobeje and Altman
TABLE OF CONTENTS
A. OVERVIEW OF THE SAS SYSTEM
2
B. TWO STEPS NEEDED IN THE SAS PROGRAMMING LANGUAGE
2
C. SYNTAX
3
D. CHARACTERISTICS OF A SAS DATA SET
4
D. CREATING A SAS DATA SET
5
E. CREATING A SAS PROGRAM FOR DESCRIPTIVE STATISTICS
6
F. PRODUCING A REPORT FROM A SAS OUTPUT
10
G. HELPFUL HINTS
10
1
Using SAS PC with Windows
A. OVERVIEW OF THE SAS SYSTEM

SAS (Statistical Analysis System) is a software system/package for data analysis. SAS provides
tools for: information storage and file handling; data modification and management; statistical
analysis; and report writing.

The SAS system is a powerful programming language plus a collection of ready-to-use programs
called procedures or PROC’s, which can perform a large variety of applications.

We will use primarily the Basic and Statistical tools – a small fraction of the capabilities of SAS.

On-line documentation is available at www.sas.psu.edu. There is also on-line help when you run
SAS PC, but this is difficult to use.
B. TWO STEPS NEEDED IN THE SAS PROGRAMMING LANGUAGE

The SAS language has its own vocabulary and syntax - words and the rules for putting them
together.

A SAS statement is a string of SAS keywords, SAS names, and special characters and operators
ending in a semicolon that instructs SAS to perform an operation or gives SAS information.

A sequence of SAS statements is called a SAS program.

A SAS program consists of two kinds of steps: DATA steps and PROC steps. DATA and PROC
steps can appear in any order, and any number of DATA and PROC steps can be used in a SAS
program.

Usually, DATA steps create SAS data sets, and PROC steps do analysis of SAS data sets. A
PROC may also create variables, such as residuals and fitted values, which can be placed in a new
data set or appended to an existing data set.
A DATA step is a group of SAS statements that begins with a DATA statement. Example:
DATA ONE;
Creates a data set named "ONE".
INFILE 'A:YIELD.TXT';
INPUT TREAT REP YIELD;
LOGY=LOG (YIELD);
Reads the data from the file A:YIELD.TXT
The file has 3 variables named TREAT, REP and YIELD.
A new variable LOGY is created and added to "ONE".

The DATA step begins with a DATA statement and can include any number of program
statements.
2
Using SAS PC with Windows

You can use the DATA step for these purposes:
*
retrieval: getting input data from a file
*
editing: checking for errors in the data and correcting them; computing new variables;
*
outputting: write data sets to disk;
*
creating: producing new SAS data sets from existing ones by subsetting, merging, and
updating.

Every SAS data set has a name. By default, SAS uses the currently active data set, which is the
one most recently called as input to a DATA or PROC statement. If your program uses several
data sets, it is best to call the required data set using DATA=datasetname when you need to use it.
That will avoid problems as you change your program.

The DATA step can include statements telling SAS to create one or more new SAS data sets and
programming statements that perform the manipulations necessary to build the data sets. Creating
a new data set does not change the currently active data set.
A PROC is a group of SAS statements that begins with a PROC statement. Example:
PROC REG DATA=HOUSING;
MODEL PRICE=SQFT NOBEDRM;
OUTPUT OUT=NEWDATA R=R P=P;
Calls the regression procedure with data set
“HOUSING”
Tells SAS which are the dependent and independent
variables.
Stores the created variables, R and P in a new data set
called “NEWDATA”
 The PROC step (or PROCEDURE step) instructs SAS to call a procedure from its library and to
execute that procedure, usually with a SAS data set as input.
 The PROC step begins with a PROC statement. Other statements in the PROC step give the
program more information about the results that you want.
C. SYNTAX
There are 4 main syntax rules:
1. Every SAS statement ends with a semi-colon “;”. Failure to include the semi-colon is the
most common error, and unfortunately leads to error messages that are difficult to decipher.
2. Variables or data set names should contain 8 or fewer characters or digits.
3. SAS is case insensitive. DOG, Dog and dog all mean the same thing to SAS.
3
Using SAS PC with Windows
4. SAS ignores “end of line” and multiple spaces.
D. CHARACTERISTICS OF A SAS DATA SET
The SAS system reads data (letters or numbers) in various forms and organizes them into a
SAS data set which is similar to a spreadsheet. Once the data have been organized into a SAS data
set, you can access, analyze, revise, and display the data. You can also store datasets – however, for
small datasets, it is most convenient to store them as text files.
The data consist of the following components: data value, variable, and observation.

Data value is a single unit of information (a single cell)

Variable is a set of data values describing a characteristic (a single column)

Observation is a set of data values for the same item (a single row)
The data set following contains 5 variables, 18 observations, and 90 data values, one of which is
missing.
variables
NAME
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Aubrey
Ron
Carl
Antonio
Deborah
Jacqueline
Helen
David
James
Michael
Ruth
Joel
Donna
Roger
Yao
Elizabeth
Tim
Susan
SEX
M
M
M
M
F
F
F
M
M
M
F
M
F
M
M
F
M
F
AGE
41
42
32
39
30
33
26
30
53
32
47
34
23
36
.
31
29
28
HEIGHT
74
68
70
72
66
66
64
71
72
69
69
72
62
75
70
67
71
65
WEIGHT
170
166
155
167
124
115
121
158
175
143
139
163
98
160
145
135
176
131

observation
missing value
data values
Now you are ready to create a data set file and a SAS program file.
4
Using SAS PC with Windows
E. CREATING A SAS DATA SET
The initial step in most SAS programs will involve reading data from a text file. Any text processor
or spreadsheet can be used to create the file. We demonstrate using Notepad. Usually I use my
favorite text editor and save in txt format.
(In the SAS manuals you will occasionally see data imbedded in a DATA step. However, this is
awkward and means that a new program file needs to be written every time the data are modified.)
The data set to be created consists of two variables measured on a random sample of 9 steers. The
first variable is the live weight (in hundreds of pounds) and the second variable is the dressed weight
(in hundreds of pounds). This sample data set will be used to obtain simple summary statistics and a
Normal Probability Plot.
Live weight
4.2
3.8
4.8
3.4
4.5
4.6
4.3
3.7
3.9
Dressed weight
2.8
2.5
3.1
2.1
2.9
2.8
2.6
2.4
2.5
Data from Lyman Ott, An Introduction to Statistical Methods and Data Analysis, p. 143.
Start notepad as follows:
Start > Programs > Accessories > Notepad.
Now, type the following data set.
4.2 2.8
3.8 2.5
4.8 3.1
3.4 2.1
4.5 2.9
4.6 2.8
4.3 2.6
3.7 2.4
3.9 2.5

Save the file on a diskette in drive A: as follows. File > Save > A:\STEERS.TXT > Save.

File > Exit.
Now you are going to create the SAS Program.
E. CREATING A SAS PROGRAM FOR DESCRIPTIVE STATISTICS
5
Using SAS PC with Windows
Although SAS has some interactive features, it is basically a batch program. This means that it is
convenient to create and save programs as text files. Usually, I create my program in my favorite text
editor, and save it as a txt file, with extension “.sas” instead of “.txt”. Then clicking on the file opens
the program and places the text in the SAS text editor, from where it can be run.
You can also create and save your program in the SAS text editor. Instructions are below.
Start SAS as follows:
Start > Programs > The SAS System > The SAS System for Windows v8
After opening SAS, one is prompted to the following screen with two windows (see below). The
upper window is a Log window showing the SAS statements which have already been processed,
along with comments. The bottom window is the Program editor window. You will enter and edit
your SAS program in the Program editor
window
Now, create the following SAS program in the Program Editor window. Use upper or lower case
letters as you choose.
6
Using SAS PC with Windows
/*
THIS PROGRAM IS USED TO CREATE A SMALL SAS PROGRAM
WRITTEN BY: LAST NAME, FIRST NAME OF STUDENT.
DATE: MONTH/DAY/YEAR
*/
The text above which is delimited by /* */ is a comment and is ignored by SAS. It is helpful to use
comments as a way to document your data.
OPTIONS LS=79 NOCENTER;
TITLE 'SUMMARY STATISTICS';
DATA MARY;
INFILE 'A:STEERS.TXT';
INPUT LIVEWT DRESSWT;
TITLE2 'PRINTING LIVEWT';
PROC PRINT DATA=MARY;
VAR LIVEWT;
RUN;
OPTIONS picks options for the output. LS=
selects the number of characters per line.
TITLE provides a title that
appears on each page of the output.
We now create the data set named "MARY".
We read the data from A:STEERS.TXT
There are two variables named LIVEWT and
DRESSWT. The variable names are separated by
blanks. You need to name all variables in
the data set, even if you do not want to use
them all.
TITLE2 provides a subtitle. It can be used
e.g. if several analyses are performed in the
same program.
We now run our first PROC. It prints some or
all of the data in data set MARY.
Tells SAS to print LIVEWT only. Otherwise it
prints all of the data.
The RUN command can be used to terminate a
PROC or DATA step. Commands you submit will
not run until a RUN command is added.
Try to run the SAS program now by clicking the SUBMIT icon (the running figure), or by pressing
the key function F3. Look at your SAS output. You should see a list of the data. If you do not, you
have made an error. However, you can continue reading this tutorial, as error correction is the next
topic. Whether or not the output appears, open the Log window as follows: Window > Log. You
should see the SAS commands you entered, with comments about how they executed, including error
messages (in red) if any. Warnings are printed in green.
We now want to see what happens if you make an error. To do this, we will start over. You can clear
any window by clicking on the window to make it active. Then clear the window as follows: Edit >
Clear Text. Clear the OUTPUT and LOG windows. (Window>OUTPUT Edit>Clear Text
Window>LOG Edit>Clear Text)
Recall the current SAS file in the Program Editor window as follows: Window > Program Editor to
open the window and Locals > Recall Text to bring back the most recently submitted text.
(Repeating Recall Text brings back the second most recently submitted text, etc.) To see how SAS
handles errors in a program file, change the statement PROC PRINT DATA=MARY; to PROC PRINT
7
Using SAS PC with Windows
DATA=MARIAM; in the program above, then run SAS. You will get the following error message
(written in red) in the LOG window. ERROR: File WORK.MARIAM.DATA does not exist. Once a
SAS program has been submitted for processing, error messages are written in the Log window. They
can be accessed as follows: Window > Log. Now open the Log window and scroll down to read the
error message. As mentioned previously, it is assumed that you have cleared the Log window of its
previous contents. If not, clear this window and run SAS program again.
Reopen your SAS program file as follows: Window > Program Editor > Locals > Recall Text. Go
to the statement PROC PRINT DATA=MARIAM; and change MARIAM back to MARY.
Add the remaining statements below to your SAS program.
PROC UNIVARIATE DATA=MARY;
VAR LIVEWT DRESSWT;
QQPLOT;
RUN;
PROC UNIVARIATE prints summary statistics. It is part of SAS
BASIC, rather than SAS STATISTICS.
We will obtain summary statistics for both variables.
We request a Normal Probability Plot for both variables.

Save the SAS program file on A:\ drive as follows: File > Save > A:\steers.sas > Save

To run the SAS program, click on the SUBMIT icon or simply press the F3 key. Other key
functions are defined under: Help > Key.

By selecting the Window menu, you can open the Output, Program Editor, and Log windows
whenever necessary. The Output window can be selected and opened the same way the other two
windows are opened.

You can save the Output window’s contents as follows: File > Save > A:\steers.lst > Save I
usually cut and paste the entire window into a text editor as described below.
8
Using SAS PC with Windows
F. PRODUCING A REPORT FROM A SAS OUTPUT
There are many different ways to produce a report using SAS output. We will go through one way,
which assumes that you have a text editor such as Word on your computer and that your computer is
powerful enough to run the editor and SAS at the same time.
Start the Text Editor.
In the editor, write your report. For example, type the heading and introductory material describing the
problem you analyzed. Discuss your analysis of the data. Suppose you would like to include a SAS
analysis output in your report.
Copy analysis output to the Clipboard:

Open the SAS output: Window > output.

Edit> Select all; Edit> Copy; Note that copying only a portion of the output by highlighting
does not always work. I copy the entire output to a text editor, and edit there.

Open the text editor and paste your SAS output: Edit > Paste.
You will realize that the SAS output has the SAS monospace font size 10 by default. You need to
modify the font size for a better output. Proceed as follows to modify the font size of your pasted
output.

e.g. In Microsoft Word, choose Edit > Select All and change the SAS monospace font to size to 8.
Now you can edit your word document by adding text and/or removing parts of the output you judge
unimportant.

You can also copy and paste graph sheets directly into your document.

Save your report as follows: File > Save > A:\report.doc> Save

Now you can print your report in Microsoft Word as follows: File > Print > Click OK
9
Using SAS PC with Windows
G. HELPFUL HINTS

Every SAS statement ends with a semicolon ;. You may continue statements on two or more lines.
Forgetting the semicolon (;) leads to error messages which are hard to decipher. This is always the
first thing to check if your program does not run.

The next most common error is misspelling a SAS command, or variable name. SAS variable
names can be no more than 8 characters long.

The third most common error is trying to use a variable that is not available. For example, this
error can occur if you try to draw residual plot, but forgot to store the residuals from the
regression or if you are in the wrong data set.

SAS ignores extra blanks, including blank lines, so you can space your program so that it is neat
and easy to read.

You can put more than one statement on a line, but this usually makes your program hard to read.
SAS program files can get quite long, so it is useful to keep them readable.

Statements added in the SAS editor are not saved in your SAS program file. If you want to have
them available for future use, you must explicitly save them using file>save

Most PROCs can use only one data set. If variables from multiple data sets are needed, the data
sets can be merged in a DATA step.

The online SAS help can be difficult to use due to statements with the same name in different
PROCs. When seeking help for a PROC, try searching on PROC. This gives a list of all the
PROCs. You can then click on the PROC you want, which gives a list of the statements valid for
that PROC. You will likely find the online manual more useful.
10
Download