Notes on SAS - Bloomsburg University

advertisement
Introduction to SAS
1.
General
The Statistical Analysis System (SAS) is a comprehensive set of facilities
for data management, reporting, and user interface design. It contains tools
for wide range of applications from the simplest report to the most complex
statistical analysis. The SAS system, which consists of several modules, is
available in its entirety on the Bloomsburg University computer network. In
the WINDOWS operating system, press START and select PROGRAMS.
You will see SAS as one of the choices. Select this option and you will see
the SAS windows: program window, output window and the log window.
The program window is the SAS editor for writing and editing programs. The
output window is to see the output of your program after it is run, and the log
window is to trace the program.
2.
Structure of a SAS Program
A SAS program consists of a series of steps. Each step can be described
by statements using the SAS language. There are generally two types of
steps: DATA steps and PROC steps. A DATA step contains statements for
inputting, modifying and transforming data. You can also output the data into
one or more data files in a data step. A PROC step uses one or more standard
SAS procedures to carry out a specific task on the input data, e.g. hypothesis
testing.
3.
Running a SAS Program
Once you have created a SAS program either by a text editor or
interactively, press the RUN key and your program will be submitted for
processing.
4.
Basic Syntax Rules
The SAS data step consists of a series of statements. Rules for writing these
statements follow.
Words:

Words in statements must be separated by one or more blanks.

A word may not be split between lines.

Words may be in upper, lower, or mixed case.
Variable names:

Variable names may be one through eight characters in length.

All variable names must begin with an alphabetic character (A-Z, a-z)
or an underscore (_). Subsequent characters may include digits.

A variable list such as Vl-V5 means V1, V2, V3, V4, and V5.

SAS matches variable names precisely character-wise, but not casewise. That is V1 is not the same as V01, but V1 is the same as v1.

Variable names may not contain embedded blanks. V1 and V_1 are
acceptable; V 1 is not.

Certain names are reserved for use by SAS, e.g., _N_, TYPE, and
_NAME_. Similarly, logical operators such as GE, LT, AND, OR, and
EQ should not be used as variable names.
Statements

A statement may begin anywhere on a line and may be continued on
additional lines as necessary.

Statements end with a semicolon (;).

Statements which begin with an asterisk (*) are treated as comments
and are not interpreted. A comment is concluded with a semicolon.

A group of statements preceded by /* are ignored until */ is read
(block comment). Semicolons between /*…*/ have no effect.

Multiple statements may appear on a line; they must be separated by
semicolons.
5. The Data Step

The data step begins with the word DATA followed by a name for the
temporary or permanent data set to be output by the data step. See
the sample programs which create and use temporary SAS data sets.

The data step includes instructions about where to find the data and
how to read the values from the data file.

The data step may contain instructions to create new variables or
transform existing variables, label variables, and select cases or
variable. The following statements are examples of valid statements
for the SAS data step:
y = sum (of x1-x15);
label y = ‘total score’;
if y > 10 then group = 1;
else group = 2;
keep group y;

To refer to a missing value for a numeric variable, use a ”.”. for
example, the statement: if a = 99 then a = .; forces SAS to treat a
value of 99 as if it were missing.

All data step commands must be contained within the data step itself;
additional data step commands may be inserted after a PROC only
after beginning a new data step and reading in the default data set.
6. SAS PROCS

SAS PROCs (procedures) are used for many purposes including
carrying out statistical analysis (e.g., PROC REG, PROC MEANS),
displaying information about a SAS data set (e.g., PROC
CONTENTS, PROC PRINT), and creating graphs (PROC PLOT).

Most PROCs produce output of some kind. The output of statistical
PROCs usually appears in the listing file.

The PROC(s) must appear after a data step which creates the SAS data
set used in the procedure.

The word PROC automatically terminates a SAS data step.

Data step commands may not appear after a PROC unless a new data
step is initiated with the word DATA.

A SAS PROC begins with the word PROC followed by the name of
the specific procedure (e.g., PROC REG).

Some PROCs have options or subcommands which allow the user to
output information into a SAS data set (e.g., PROC UNIVARIATE,
PROC REG).

The default data set used by a PROC is the data set created by the last
data or PROC before the current PROC. To change the data set used
by a PROC, use the DATA = option on the PROC line.
7.
Miscellaneous Commands

The OPTIONS statement allows the programmer to set options for the
current sessions. For example: OPTIONS NOCENTER
LINESIZE=80; sets the line size in the listing file as 80 columns in
length and shifts the output to the left side of the page.

INFILE is used to access a specific file. An example of a INFILE
statement appears in Example 3.
8.
Sample Program 1
In this example the SAS program reads data organized in fixed columns,
from an inline source and uses two PROCs.
Program
DATA CLASS:
INPUT NAME $ 1-8 SEX $ 10 AGE 12-13 HEIGHT 15-16 WEIGHT
18-22;
CARDS;
JOHN
JAMES
M 12 59 99.5
M 12 57 83.0
ALFRED M 14 69 112.5
ALICE
F 13 56 84.0
;
PROC MEANS;
VAR AGE HEIGHT WEIGHT;
PROC PLOT;
PLOT WEIGHT*HEIGHT;
RUN;
Explanation
The Data Step

The DATA statement tells the computer that the data is coming from an
inline source, SAS creates a temporary data file called WORK.CLASS.

The INPUT statement formats the variable for the computer
1.
NAME: this is an alphanumeric variable, as indicated by the $.
The variable NAME has been assigned columns 1-8.
2.
SEX: this is also an alphanumeric variable, and has been assigned
column 10.
3.
AGE: this is a numeric variable, and has been assigned columsn
12-13.
4.
HEIGHT: numeric variable, columns 15-16.
5.
WEIGHT: numeric variable, columns 18-22.

The CARDS statement informs the computer that the data are located
in the next lines.
The PROCS
1.
The first procedure, PROC MEANS, calculates the mean for every
variable.
2.
The second, PROC PLOT, plots the values for WEIGHT against
HEIGHT.
9. Sample Program 2
In this example, we set up a 2x2 tables for bronchitis and level of
organic particulates and age groups:
DATA BRONCHITIS;
INPUT AGEGRP LEVEL $ BRONCH $ N;
CARDS;
1 H Y 20
1 H N 382
1LY9
1 L N 214
2 H Y 10
2 H N 172
2LY7
2 L N 120
3 H Y 12
3 H N 327
3LY6
3 L N 183
;
PROC FREQ DATA =BRONCHITIS ORDER=DATA;
TABLES AGEGRP*LEVEL LEVEL*BRONCH
AGEGRP*LEVEL*BRONCH;
WEIGHT N;
RUN;
10.
Sample Program 2
In this example, the program reads data organized in columns
separated by spaces, from an external file and uses three PROCs.
Program
DATA auto;
INFILE ‘a:\car.data’;
*The data set CAR.DAT can be retrieved from the data link on this web
site;
* y = cost, x1 = price, x2 = miles;
input id y x1 x2;
options pagesize=35 linesize=75;
Proc univariate plot normal;
var x1 x2;
Proc means min max;
var x1 x2;
Proc chart;
vbar x1; vbar x2;
hbar y;
run;
Explanation
The Data Step
1.
INFILE reads a data set, specified as car.dat.
2.
The variable (id, x1, x2) are read in as “lit input,” because the data
(all numerical values, in this example) are stored in the file
separated by spaces.
The PROCS
1.
PROC MEANS: computes the default summary statistics of all the
variables x1 and x2.
2.
PROC UNIVARIATE: computes more detailed summary statistics
for the metric variables x1 and x2. The options PLOT and
NORMAL produce specific summaries.
3.
PROC CHART: produces a vertical bar chart for the variables x1
and x2.
Download