introduction to the sas system

advertisement
A. Schweder
SAS Basics Workshop
03/28/03
SAS BASICS WORKSHOP
Amanda E. Schweder, Ph.D.
Department of Psychology
Yale University
StatLab: Main Classroom
140 Prospect Street
March 28th 2003
1:00-3:00pm
1
A. Schweder
SAS Basics Workshop
03/28/03
This workshop is designed to introduce new users to some of the basic concepts necessary for using the
SAS System by covering the following topics:
1.
2.
3.
4.
5.
6.
7.
8.
9.
The SAS window system
How to construct a SAS program, review the SAS log and SAS output
How to input and read raw data files and SAS data files
Difference between a DATA step and a PROC step
Difference between a temporary SAS dataset and a permanent SAS dataset
The LIBNAME and LIBREF statements
Simple ways to manipulate the data and the output
Simple procedures available for analysis
Importance of the semi-colon in SAS
INTRODUCTION to the SAS SYSTEM
The Statistical Analysis System (SAS) was created in 1977 for conducting agricultural research. SAS is
essentially a series of computer programs used for data management, analysis, and presentation. It is also
considered a 4th generation programming language because it requires only the names of operations in
order to perform them. The programs have already been written and the code/syntax used simply invokes
those programs. 3rd generation programs like BASIC, FORTRAN, C, and Pascal all required very
involved code to perform simple operations.
Generally, SAS entails writing a SAS program, submitting it for analysis, and reviewing the log and
output files to help you understand your data and the results of your analyses.
Click on the SAS icon to start the SAS System. Three main windows appear. You can also use the pulldown menu “View” to open the different types of windows in the SAS environment.
1. ENHANCED EDITOR window (in v. 8.0 and above; provides color codes for syntax)
SAS Program – <filename>.sas
2. LOG window
SAS Log – <filename>.log
3. OUTPUT window
SAS Output – <filename>.out
Consider each of these to be a file. Differentiate between them based on the extension (sas, log, out) after
the dot. The SAS environment looks similar to the Windows environment. There are pull down menus
and icon buttons for controlling some of the features of SAS. Much of SAS, however, requires writing
code using the SAS programming language.
2
A. Schweder
SAS Basics Workshop
03/28/03
OVERVIEW for WRITING the SAS PROGRAM
General syntax rules when writing SAS code:
1. SAS statements can be written in upper or lower case (or mixed).
2. SAS statements can begin and end in any column, but a word may not be split between any two
lines.
3. More than one SAS statement can be entered per line.
4. Blank lines can be used and are recommended to aid readability.
5. SAS processes statements in steps so make sure that the data steps and procedure steps are in the
proper order.
6. Every SAS statement ends with a semicolon  ; 
Building a SAS Program:
1. Practice writing a program in the Editor window. Save the file with a name like TEST.SAS.
2. Format the output with some OPTIONS listed at the top of the program.
OPTIONS PS=66 LS=165 NOCENTER NOFMTERR;
PS= tells SAS to fit up to 66 lines of output per page; ranges from 15 - 32,767 lines
LS= tells SAS to fit up to 165 columns of output per line; ranges from 64 – 256 columns
NOCENTER tells SAS to print output flush left instead of the default centered.
NOFMTERR tells SAS to continue processing even if its read the assigned formats before
(which can produce an error message sometimes).
3. Document as much as possible directly in the program with comments – it helps you keep track
of what you are doing in your program and why you are doing it. Comments can go anywhere in
your program.
A SAS comment can start with an asterisk and end with a semicolon:
* PROGRAMMER: AES
DATE: 11/15/02
PROJECT: SAS BASICS WORKSHOP;
You can also use the following, which helps avoid semicolon mishaps:
/* LEARN HOW TO INPUT A RAW DATA FILE */
Within a comment, do not use semicolons and avoid using quotation marks (single & double).
4. Handling data and invoking procedures always occurs in one of 2 steps in SAS.
DATA step: builds a SAS data set (e.g., adds variables, merges datasets)
OR
PROC step: processes a SAS data set (e.g., produce means, frequencies)
3
A. Schweder
SAS Basics Workshop
03/28/03
WORKING with RAW DATA
There are a number of different ways to input and read raw data in SAS (i.e., the instructions given to
SAS about the location and format of the variables).
1. Characteristics of a raw data file:
A. Each row represents an observation, containing data values for one subject.
B. Each column represents a variable across all subjects: e.g., sex, birth date, test scores
C. Values assigned to variables can be:
Numeric – includes only numbers
Character – includes letters, sometimes letters and numbers (alphanumeric)
D. The kind of values that are assigned to variables can influence the way in which SAS reads
the data and performs certain analyses. Its important to gain familiarity with your raw data.
2. To create the raw data file, key in the lines of data using Word, Notepad, or any text editor and
save the file as <filename>.dat or .txt. We will use a raw data file called testdata.txt that is saved
in ‘c:\temp\sasbasics’.
3. Avoid errors in keying the data.
A. In the raw data file, data must be entered starting on Line 1.
B. Leave no blank lines at the top or bottom of the file (unless data is missing for a subject and
should be left blank – key in “pretend data” by using the space bar to represent the number of
columns of data that should be there if the data were not missing).
C. Make sure variables are keyed into the correct column
D. Right-justify numeric data
E. Left-justify character data
F. Use blank columns between variables to aid in readability
Two of the main ways to input a raw dataset with a SAS program include:
1. Using the INPUT and CARDS (or DATALINES) commands to input the actual raw data within
the SAS program. (Note: Use when you have a small set of data.)
2. Using an INFILE command to refer SAS to an external raw data file saved somewhere (e.g.,
floppy disk, hard drive, network). (Note: Better to use when you have a large set of data.)
I. INPUT and CARDS Examples:
Example 1. General Template:
DATA <data-set-name>;
INPUT (variable-name1) (variable-name1) (variable-name3);
CARDS;
Keyed in lines of data go here – each row an observation, each column a variable;
PROC <name-of-desired-statistical-procedure> DATA=<data-set-name>;
VAR <name of variables to be processed>;
RUN;
4
A. Schweder
SAS Basics Workshop
03/28/03
Example 1. Sample Program:
* PROGRAMMER: AES
DATE: 11/15/02
PROJECT: SAS BASICS WORKSHOP;
/* LEARN HOW TO INPUT A RAW DATA FILE */
/* Example 1 using INPUT and CARDS commands */
DATA TEMP;
INPUT SUBJECT SATV SATM;
CARDS;
1 520 490
2 610 590
3 470 450
4 410 390
5 510 460
6 580 350
;
* COMMENT: Below is a PROC step, which allows you to manipulate and
analyze your SAS data set. This produces means for SATV and SATM;
PROC MEANS DATA=TEMP;
VAR SATV SATM;
RUN;
Example 2. General Template:
DATA <data-set-name>;
INPUT #line-number @ column-number (variable-name) (column-width.)
@ column-number (variable-name) (column-width.)
@ column-number (variable-name) (column-width.) ;
CARDS;
Keyed in lines of data go here – each row an observation, each column a variable;
PROC <name-of-desired-statistical-procedure> DATA=<data-set-name>;
VAR <name of variables to be processed>;
RUN;
Example 2. Sample Program:
/* Example 2 using INPUT and CARDS commands */
DATA TEMP;
INPUT #1 @
@
@
@
@
@
@
1
2
3
4
5
6
7
(V1)
(V2)
(V3)
(V4)
(V5)
(V6)
(V7)
(1.)
(1.)
(1.)
(1.)
(1.)
(1.)
(1.)
5
A. Schweder
SAS Basics Workshop
03/28/03
@ 9 (AGE) (2.)
@ 12 (IQ) (3.)
@ 16 (NUMBER) (1.)
CARDS;
2234243
3424325
3242424
3242323
3232143
;
;
22 98 1
20 105 2
32 90 3
19 119 4
18 101 5
* COMMENT: Below is a PROC step, which allows you to manipulate and
analyze your SAS data set. This produces means for V1, V2, AGE, and IQ;
PROC MEANS;
VAR V1 V2 AGE IQ;
Run;
Below describes parts of the example programs above:
1. DATA statement:
General form: DATA <data-set-name>
Data-set-name: TEMP
2. INPUT statement (as in Example 2):
INPUT
#line-number
@ column-number (variable-name) (column-width.)
@ column-number (variable-name) (column-width.)
@ column-number (variable-name) (column-width.) ;
3. Line number directions:
#line-number  Tells SAS what line to start on to read each subject’s data
INPUT #1  In this example, it starts at line 1
4. Column location, variable name, and column width directions:
@ column-number  # of the column at which each variable begins
(variable-name)
 name given to each variable
(column-width.)
 # of columns to be occupied by each variable
Note: Column width must be followed by a period because it helps when decimals are part of the
variable. Also, above IQ was given 3 columns (even though some IQ values were only 2 digits).
Don’t forget the semicolon at the end of the INPUT statement!
INPUT #1 @ 1 (V1) (1.) ; At Line 1, Column 1, variable is called V1, and is 1 column wide
6
A. Schweder
SAS Basics Workshop
03/28/03
5. CARDS (or DATALINES) statement:
Right after the INPUT statement goes the CARDS statement to tell SAS that there is raw data.
There must be a semicolon after the word CARDS and again after the raw data.
6. Data lines:
Data lines are the values for each row/observation/subject. Again, leave no blank lines (otherwise
SAS will think that a subject has missing data) and very carefully check the columns of the
variables to make sure they are aligned correctly. Make sure you have a semicolon on the line
right below your last line of data.
7. PROC and RUN Statements:
PROC tells SAS to perform a given procedure or statistical analysis: e.g., CONTENTS, MEANS,
TTEST, UNIVARIATE, FREQ, GLM, ANOVA, or PRINT.
RUN tells SAS to execute the PROC.
8. General rules for data set names and variable names:
A. Must begin with a letter (not a number)
B. May be no more than 8 characters long
C. May contain no special characters such as “*” or “#”
D. May contain no blank spaces
E. Example data set names: MYDATA, survey2, Dissert, TEMP
II. INFILE Example:
General Template:
DATA <data-set-name>;
INFILE <directory-path-and-name-of-data-file>;
INPUT #line-number @ column-number (variable-name) (column-width.)
@ column-number (variable-name) (column-width.)
@ column-number (variable-name) (column-width.) ;
PROC <name-of-desired-statistical-procedure> DATA=<data-set-name>;
VAR <name of variables to be processed>;
RUN;
Example Program:
/* COMMENT: Example using INFILE command */
/* COMMENT: INFILE indicates the name of the data file in which the raw data
exists and where it can be found (need to specify the directory path if the
file is not located in the same place as the program). The INPUT statement
indicates the structure of the data file (as referred to by the INFILE
command). */
DATA TEMP;
INFILE 'c:\temp\sasbasics\testdata.txt';
7
A. Schweder
SAS Basics Workshop
03/28/03
INPUT #1 @
@
@
@
@
@
@
@
@
@
;
1 (V1) (1.)
2 (V2) (1.)
3 (V3) (1.)
4 (V4) (1.)
5 (V5) (1.)
6 (V6) (1.)
7 (V7) (1.)
9 (AGE) (2.)
12 (IQ) (3.)
16 (NUMBER) (1.)
/* This will produce a mean for each variable in the data set because vars
were not specified */
PROC MEANS DATA=TEMP;
RUN;
The code is identical to using CARDS, but the INFILE statement is added and the CARDS statement and
data lines are deleted. Instead of including the raw data in the program, the INFILE statement indicates
where to find the raw data. The INPUT statement is still needed to tell SAS the structure of the raw data.
Additional tips for handling variables when inputting data:

Input a string of variables with the same prefix and different numeric suffixes. Think about the
variables V1-V7 from above. The prefix (V) is the same, but the suffix is a different number. This is
useful when you have a survey or questionnaire with many items. If you have multiple surveys, the
prefix could be some abbreviated form of what the particular survey is.
INPUT #1 @1 (V1-V7)
@9 (AGE)
@12 (IQ)

 saves lines of code because it’s a string of variables
Inputting character variables requires that you indicate in the INPUT statement that it is a character
variable. The use of a $ before the number of columns required tells SAS that it’s a character variable.
For example, if we added a variable called SEX, it could be inputted with values of M or F instead of
values of 1 or 2.
INPUT #1 @1 (V1-V7)
@9 (AGE)
….
@18 (SEX)

(1.)
(2.)
(3.)
(1.)
(2.)
($1.)  $ is included to indicate character values for SEX
Sometimes multiple lines of data are needed for each subject.
INPUT #1
@ 1 (V1-V7) (1.)
@ 9 (AGE) (2.)
@ 12 (IQ) (3.)
@ 16 (NUMBER) (1.)
@ 18 (SEX) ($1.)
#2 @ 1 (SATV) (3.)
@ 5 (SATM) (3.) ;
8
A. Schweder
SAS Basics Workshop
03/28/03
Raw data for this input statement would look like this for 3 subjects:
2234243 22 98 1 M
520 490
 Subject 1 has data for SATV and SATM
3424325 20 105 2 M
 Subject 2 is missing data for SATV and SATM
3242424 32 90 3 F
390 420
If data is missing for an observation, leave the space there as if it were present so SAS doesn’t
misalign the rows.

Create decimal places on input for numeric variables so you don’t have to key in the decimal point:
If you had a variable called GPA, key it in without the decimals
3.56  356
2.20  220
INPUT #1 @ 1 (GPA) (3.2) ;  Tells SAS to use 3 cols. & put a decimal in the 2nd
CARDS;
356
220;

Inputting “check all that apply questions” as multiple variables:
Treat single questions with multiple parts to them as a set of questions. For each question there
can be a value of either 0 (not checked) or 1 (checked) – making each question a dichotomous
variable using dummy coding.
WORKING with TEMPORARY and PERMANENT DATASETS
The DATA statement tells SAS to build a SAS data set.
1. Building a Temporary SAS Data Set
The syntax for building a temporary SAS data set is:
DATA <data-set-name> ;
INFILE ‘drive:\path\filename.dat’ ;
INPUT variable information ;
Here, the DATA statement refers to the data-set-name as the name of a temporary SAS data set. TEMP
was used in the previous programs as a data set name.
Example Program:
DATA TEMP;
INFILE 'c:\temp\sasbasics\testdata.txt';
INPUT #1 @ 1 (V1) (1.)
@ 2 (V2) (1.)
@ 3 (V3) (1.)
9
A. Schweder
SAS Basics Workshop
03/28/03
@
@
@
@
@
@
@
;
4 (V4) (1.)
5 (V5) (1.)
6 (V6) (1.)
7 (V7) (1.)
9 (AGE) (2.)
12 (IQ) (3.)
16 (NUMBER) (1.)
This code will not create a physical SAS dataset called testdata. Instead, the code invokes the physical
raw dataset called testdata.txt and creates a temporary dataset called TEMP only for as along as you are
working in that DATA step and in that program. After SAS runs the program that creates TEMP, it
deletes it. A permanent data set, however, is kept even after SAS runs the program that creates it.
2. Building a Permanent SAS Data Set
The syntax that creates a permanent SAS data set is:
LIBNAME libref ‘drive:\path’;
DATA <libref.filename>;
 Two-level name
The LIBNAME statement defines a libref, or a nickname, for the drive and the directory path in which to
save or to find the permanent SAS data set.
A libref is 1-8 characters long, no spaces are allowed, and can start with an “_” or a letter, but not a
number (i.e., any valid SAS name). It works by giving a nickname to the ‘drive:\path’ (single quotes are
required) for the duration of the current SAS program.
Define all librefs at the beginning of a SAS program to document where permanent SAS data sets are
saved (or used) by the SAS program.
The DATA step tells SAS to create a permanent SAS data set by using a two-level name, i.e.,
<libref.filename>.
The 1st level of the name is the libref, or the previously defined nickname in the LIBNAME
statement to represent the ‘drive:\path’ where the permanent SAS data set is stored. The libref
name is followed by a period.
The 2nd level of the name is the filename of the permanent SAS data set stored in the libref. SAS
automatically appends the extension .SD2 to permanent SAS data sets.
Example Program:
/* COMMENT: Example of saving a permanent SAS dataset from the raw dataset */
LIBNAME sasdata ‘c:\temp\sasbasics’ ;
* Step below creates a permanent SAS dataset called testdata.sd2 ;
DATA sasdata.testdata ;
* Step below uses the raw dataset to create testdata.sd2 ;
10
A. Schweder
SAS Basics Workshop
03/28/03
INFILE ‘c:\temp\sasbasics\testdata.txt’;
INPUT #1 @ 1 (V1) (1.)
@ 2 (V2) (1.)
@ 3 (V3) (1.)
@ 4 (V4) (1.)
@ 5 (V5) (1.)
@ 6 (V6) (1.)
@ 7 (V7) (1.)
@ 9 (AGE) (2.)
@ 12 (IQ) (3.)
@ 16 (NUMBER) (1.)
;
PROC CONTENTS DATA=sasdata.testdata ;
RUN;
The LIBNAME statement above uses sasdata as the libref to refer to ‘c:\temp\sasbasics’.
The DATA step (using the two-level name) tells SAS to create a permanent SAS data set called testdata
and to save it in sasdata (a.k.a. c:\temp\sasbasics).
The INFILE statement tells SAS where the raw data set file exists in order to create testdata.sd2, the SAS
dataset.
After this program is run, check that the permanent SAS data set called testdata.sd2 exists in
c:\temp\sasbasics. Also, check the output to see the contents of testdata.sd2.
3. Processing a Temporary SAS Data Set
Now that we have created a SAS dataset, we can use it to process the dataset temporarily. This is helpful
when you are testing out some code and don’t necessarily want to save the changes you are making. Note
that we no longer need to use the INFILE statement to indicate where to find the file; instead, we use a
SET statement.
LIBNAME <libref > ‘drive:\path’;
DATA <data-set-name> ;
 A temporary SAS dataset used as the working file for code to
follow
SET <libref.filename> ;
 A permanent dataset (in some cases it can be a temp SAS
dataset) must be named here using the SET statement so a
temporary data set can be created from it.
Example Program:
/* COMMENT: Setting a permanent SAS dataset to process temporarily */
LIBNAME sasdata ‘c:\temp\sasbasics’ ;
* Step below creates a temporary SAS dataset called TEMP ;
DATA TEMP ;
11
A. Schweder
SAS Basics Workshop
03/28/03
SET sasdata.testdata ;
PROC CONTENTS DATA=TEMP ;
RUN;
Note that no physical SAS dataset file is saved in c:\temp\sasbasics called temp.sd2. In the output, the
contents will indicate that this data set is called TEMP.
4. Processing a Permanent SAS Data Set
One way to process a permanent SAS data set to perform a procedure is illustrated in this syntax:
LIBNAME libref ‘drive:\path’;
PROC <name-of- statistical-procedure> DATA = <libref.filename> ;  Two-level name
Note that the INPUT and INFILE statements are not needed now.
The LIBNAME statement defines the libref so that it refers to the ‘drive:\path’ where the permanent
SAS data set is stored.
The PROC statement tells SAS to perform a procedure on the SAS data set. Follow PROC with the name
of the procedure you want SAS to perform (e.g., MEANS, PRINT).
After the PROC, but before the semicolon, comes a DATA statement that uses the libref to tell SAS
where the permanent SAS data set is stored (the directory), followed by a period, and what the filename
is of the permanent SAS data set.
Example Program:
/* COMMENT: Setting a permanent SAS dataset on a PROC step */
LIBNAME sasdata ‘c:\temp\sasbasics’ ;
* Step below prints the data out for the permanent dataset called testdata;
PROC PRINT DATA=sasdata.testdata ;
RUN;
The PROC statement tells SAS to perform the PRINT procedure on the permanent SAS data set
testdata.sd2 stored in c:\temp\sasbasics (as referred to by the libref we created using the LIBNAME
statement, “sasdata”).
12
A. Schweder
SAS Basics Workshop
03/28/03
Another way to work with permanent data sets is to SET an existing permanent SAS data in order to
make a new permanent data set with a different name as well as changes to the data set.
Example Program:
/* COMMENT: Creating a new permanent SAS dataset by setting a permanent SAS
dataset */
LIBNAME sasdata ‘c:\temp\sasbasics’ ;
* Step below saves a new data set called newdata.sd2 that is identical to the
data set called testdata.sd2 but with a new variable called sex;
DATA sasdata.testdat2 ;
SET sasdata.testdata ;
* Create a variable called sex based on ID number ;
if number in (1,2,3) then sex = 1;
if number in (4,5) then sex = 0;
PROC PRINT DATA=sasdata.testdat2;
RUN;
The DATA statement tells SAS to save a new permanent SAS data set called testdat2.sd2 stored in
‘c:\temp\sasbasics’ by setting using the SET statement the data set called testdata.sd2. Check in
‘c:\temp\sasbasics’ to make sure that it was created. Also, check the output to see that the new variable is
included in the dataset.
Note that an INFILE statement tells SAS what raw data set to use, whereas a SET statement tells SAS
what existing or permanent SAS data set to use.
WAYS to MANIPULATE the DATA
Data-manipulation will transform the data set in some way, e.g., add new variables or change existing
variables. Data manipulation code can go on a DATA step usually in one of two places:
1) Immediately after the INPUT statement (whether you use CARDS or INFILE)
Example Program:
DATA TEMP;
INFILE ‘c:\temp\sasbasics\testdata.txt’;
INPUT #1 @ 1 (V1-V7) (1.)
@ 9 (AGE) (2.)
@ 12 (IQ) (3.)
@ 16 (NUMBER) (1.) ;
if number in (1,2,3) then sex = 1;  data-manipulation & data-subsetting statements go here
if number in (4,5) then sex = 0;
PROC PRINT DATA = TEMP;
RUN;
13
A. Schweder
SAS Basics Workshop
03/28/03
2) Immediately after the creation of a new data set:
Example Program:
DATA TEMP;
INFILE ‘c:\temp\sasbasics\testdata.txt’;
INPUT #1 @ 1 (V1-V7) (1.)
@ 9 (AGE) (2.)
@ 12 (IQ) (3.)
@ 16 (NUMBER) (1.) ;
DATA TEMP2;
SET TEMP;
 name of new data set to create
 name of existing data set
if number in (1,2,3) then sex = 1;  data-manipulation & data-subsetting statements go here
if number in (4,5) then sex = 0;
PROC PRINT DATA = TEMP;  the variable SEX will not be in this dataset
RUN;
PROC PRINT DATA = TEMP2;  the variable SEX will be in this dataset
RUN;
Ways to manipulate the data can include creating variables in a DATA step with an assignment statement
(see syntax below). Variables can be created or recoded in a DATA step, but not in a PROC step.
1. Create duplicate variables with new variable names:
General syntax:
<new-variable-name> = <existing-variable-name> ;
Examples:
V1 = BDI1;
GENDER = SEX;
2. Duplicating variables vs. renaming variables:
In the previous examples, the variables were not re-named; instead, duplicate variables were
created with new names. Both original and duplicate variables remain in the data set. There is
also a RENAME function to permanently rename variables without duplicating them.
3. Create new variables from existing variables:
Use these symbols in SAS to perform operations on variables: ( +, - , * , / , = )
Use parentheses and follow rules for order of operations.
Use SAS functions such as SUM, MEAN, or ROUND in an assignment statement.
Always check created variables to verify that they were created correctly.
General syntax:
<new-variable-name> = <formula-including-existing-variable-name> ;
Examples:
VTOTAL = V1 + V2 + V3 + V4 ;  SAS will not compute for obs with missing values
14
A. Schweder
SAS Basics Workshop
03/28/03
VTOTAL = SUM(V1,V2,V3,V4) ;  SAS ignores missing values & computes based
on the values present
Summing variables V1 through V4 creates a new variable called VTOTAL.
4. Recode variables to have a different value:
SAS can overwrite existing variables or create a new variable to store recoded values.
Variable values can be recoded upon INPUT or recoded after they are saved in a SAS data set.
SAS can recode variable values or ranges into user-specified values with IF-THEN statements.
Example:
IF SEX = 1 THEN SEX = ‘M’ ;
5. Recode reversed variables:
Sometimes questionnaires have reversed items – a question is stated so that the meaning is the
opposite of the meaning of the other items on the questionnaire.
In general, perform the reversal before other data manipulations are performed on those items. It
is good practice to store recoded variable values as a new variable and leave the existing variable
intact.
<new-variable> = <constant – existing-variable> ;
The constant is always equal to the number of response items on your survey plus 1.
V1R = 6 – V1; (in the case of 5 response items)
SUBSETTING DATA
Data-subsetting will eliminate unwanted observations from a sample so only a specified subgroup is in
the data set. For example, you only want to look at males and not females, or a particular age range.
Use what is called a sub-setting IF statement to perform analyses on only a subset of observations
included in the data set.
General syntax:
DATA <new-data-set-name> ;
SET <existing-data-set-name> ;
IF statement;
Example:
To obtain the mean for each variable only for ages greater than 20 in the data set:
DATA TEMP2;
SET TEMP;
IF AGE > 20 ;
15
A. Schweder
SAS Basics Workshop
03/28/03
PROC MEANS DATA = TEMP2;  This will display means only for those subjects older than 20.
RUN;
LABELS for VARIABLES
Use the LABEL statement to associate a label with any or all of the variables. Many SAS procedures
print a variable name followed by its label to help document what is in the output.
General syntax:
LABEL var1 = ‘label for var1’  The label can be up to 40 characters (including blanks)
var2 = ‘label for var2’
…
var[n] = ‘label for var[n]’ ;
The LABEL statement tells SAS to associate the label “label for var1” with the variable var1, the label
“label for var2” with variable var2, and so on.

Use the LABEL statement within a DATA step to associate the label(s) permanently with the
variable(s). These labels will be used in subsequent PROCs.

Use the LABEL statement within a PROC step to associate the label(s) temporarily with the
variable(s). Labels associated with variables in a PROC step will be used in that PROC only.
FORMATS for VARIABLES
A format is a set of instructions that tells SAS how to print variable values in the output. A format can be
associated with one or more variables temporarily in a PROC step or permanently in a DATA step.
You need to provide a place for SAS to keep the format library that you create. You use the LIBNAME
statement to do this. The libref LIBRARY is always used to refer to the format library. SAS will create a
separate file (.SC2) of the format library. This file must always be with the SAS file or else you will
encounter errors.
1. To associate a format temporarily, use the FORMAT statement on a PROC step.
Example:
PROC FORMAT LIBRARY=LIBRARY;
VALUE $sex ‘M’ = ‘Male’
‘F’ = ‘Female’ ;
VALUE affinity 1 = ‘not at all’
2 = ‘a little’
3 = ‘in the middle’
4 = ‘a lot’
5 = ‘I LOVE IT’ ;
LIBNAME sasdata ‘c:\temp\sasbasics’ ;
LIBNAME library ‘c:\temp\sasbasics’ ;
16
A. Schweder
SAS Basics Workshop
03/28/03
PROC MEANS DATA=sasdata.testdata;
VAR v1 v2 v3 v4 v5 v6 v7 ;
FORMAT affinity. ;
RUN;
2. To associate a format permanently, use the FORMAT statement on a DATA step.
Example:
PROC FORMAT LIBRARY=LIBRARY;
VALUE $sex ‘M’ = ‘Male’
‘F’ = ‘Female’ ;
VALUE affinity 1 = ‘not at all’
2 = ‘a little’
3 = ‘in the middle’
4 = ‘a lot’
5 = ‘I LOVE IT’ ;
LIBNAME sasdata ‘c:\temp\sasbasics’ ;
LIBNAME library ‘c:\temp\sasbasics’ ;
DATA TEMP;
SET sasdata.testdata;
FORMAT v1-v7 affinity. ;
PROC MEANS DATA=TEMP;
RUN;
PROCEDURES
1. Examining the variables in a SAS data set
To print descriptor information about a SAS data set, use PROC CONTENTS.
General syntax:
PROC CONTENTS DATA = <libref.filename> or <filename>;
This tells SAS to run the CONTENTS procedure on the temporary SAS data set called TEMP.
PROC CONTENTS will list the name, type (numeric or character), length in bytes, and ordinal
position in the SAS data set, for each variable in alphabetical order.
General syntax with options:
PROC CONTENTS DATA = <libref.filename> or <filename> POSITION;
You can use statement options to change the defaults for PROC CONTENTS:
POSITION – will list variables in the order of their position in the SAS data set
SHORT – will print only a list of the variable names in the SAS data set
17
A. Schweder
SAS Basics Workshop
03/28/03
2. Examining the values in a SAS data set
To print the actual data (the actual observations) in a SAS data set, use PROC PRINT.
General syntax:
PROC PRINT DATA = DATA = <libref.filename> or <filename>;
This tells SAS to run the PRINT procedure on temporary SAS data set TEMP. PRINT numbers
each observation and lists variable values in columns under the variable name.
General syntax:
PROC PRINT DATA = <libref.filename> or <filename> DOUBLE NOOBS ;
You can use statement options to change the defaults for PROC PRINT:
DOUBLE – double-spaces output
NOOBS – suppresses printing of the observation number
UNIFORM – formats all pages uniformly (by default, SAS fits as much per page as
possible)
3. Producing frequency tables and crosstabulations
To produce frequency tables and/or crosstabulations and any relevant statistics use PROC FREQ.
General syntax:
PROC FREQ DATA = <libref.filename> or <filename>;
TABLES var
var * var
var * var * var / options ;




var = simple (one-way) frequency table
var * var = crosstabulation (two-way table) where values of the variable before the asterisk (*)
will occupy the rows of the table and the values of the variable after the asterisk will occupy the
columns of the table (row * column).
var * var * var = crosstabulations of the second variable by the third variable for each level of
the first (control) variable (control * row * column).
The slash (/) tells SAS to compute optional statistic(s) options for the tables (e.g., / CHISQ ; )
4. Producing univariate descriptive statistics
To calculate univariate descriptive statistics (e.g., mean, standard deviation, maximum, minimum,
median, percentiles) for one or more numeric variables use PROC UNIVARIATE.
General syntax:
PROC UNIVARIATE DATA = <libref.filename> or <filename>;
VAR var1 var2 … var[n] ;
PROC UNIVARIATE can provide additional detail on the distribution of a variable including
plots, frequency tables, and a test to determine whether the data are normally distributed. Add the
18
A. Schweder
SAS Basics Workshop
03/28/03
PLOT, FREQ, and /or NORMAL option to the PROC UNIVARIATE statement to include this
information to the output.
General syntax with options:
PROC UNIVARIATE DATA = <libref.filename> or <filename> PLOT FREQ NORMAL ;
VAR var1 var2 … varn ;
PROC UNIVARIATE will print a separate page of output for each variable. It is useful for
examining percentiles and outliers. Use PROC MEANS to print univariate descriptive statistics
for more than one variable on the same page.
Note that there are many, many more procedures that SAS uses to perform analyses.
TITLES
Document your output with the use of titles. Titles can be used anywhere in the program.
General syntax:
TITLE ‘<Insert your title here: This is a title to be printed on line 1 of each page of output>’ ;
Note that SAS processes a program in steps. A step begins with either a DATA or a PROC statement. A
step ends with another DATA or PROC statement (or the end of the program). All TITLEs encountered
from the beginning of the step until the beginning of the next step are used for the current step. Use
optional RUN; statements to end a step at a specific point. Suppress a TITLE by writing the TITLE;
statement with no text following it.
PUTTING A PROGRAM TOGETHER
/* PUT A PROGRAM TOGETHER */
OPTIONS
PS = 66 LS = 165 NOCENTER NOFMTERR;
/* Assign formats to the variables. Numbers generally don't require formats
unless you categorize them. This step just lays out the formats, but does not
permanently assign them. */
PROC FORMAT LIBRARY=LIBRARY;
VALUE $sex 'M' = 'Male'
'F' = 'Female' ;
VALUE affinity 1
2
3
4
5
=
=
=
=
=
'not at all'
'a little'
'in the middle'
'a lot'
'I LOVE IT' ;
LIBNAME sasdata 'c:\temp\sasbasics' ;
LIBNAME library 'c:\temp\sasbasics' ;
19
A. Schweder
SAS Basics Workshop
03/28/03
DATA sasdata.testdata ;
INFILE 'c:\temp\sasbasics\testdata.txt';
INPUT #1 @ 1 (V1) (1.)
@ 2 (V2) (1.)
@ 3 (V3) (1.)
@ 4 (V4) (1.)
@ 5 (V5) (1.)
@ 6 (V6) (1.)
@ 7 (V7) (1.)
@ 9 (AGE) (2.)
@ 12 (IQ) (3.)
@ 16 (NUMBER) (1.)
;
/* Create a permanent data set called TESTDAT2. We need a new data set
because we are about to change the data by adding labels and formats to the
variables and creating new variables. We need to SET the data set we want to
work from (called TESTDATA) in order to create the new version (called
TESTDAT2). */
DATA sasdata.testdat2 ;
SET sasdata.testdata ;
/* Create some new variables */
if number in (1,2,3) then sex = 'M';
if number in (4,5) then sex = 'F';
GENDER = SEX;
VTOTAL = V1 + V2 + V3 + V4;
/* Assign labels to the variables. */
LABEL
V1 = 'Variable 1'
V2 = 'Variable 2'
V3 = 'Variable 3'
V4 = 'Variable 4'
V5 = 'Variable 5'
V6 = 'Variable 6'
V7 = 'Variable 7'
age = 'Age of Subject'
IQ = 'IQ of Subject'
number = 'ID Number'
gender = 'Gender of Subject'
vtotal = 'Total sum of V1-V4'
;
/* Permanently assign the formats to the variables. V1-V7 use the same
format. */
FORMAT gender $sex.
V1-V7 affinity. ;
/* When you want SAS to use the data set that you last invoked for a
procedure, you do not need to identify it in the PROC statement. SAS defaults
to the last dataset used - in this case, it is TEMPDAT2.
20
A. Schweder
SAS Basics Workshop
03/28/03
/* Print the variables in the data set for each person. */
PROC PRINT DOUBLE;
TITLE 'Print of data in TESTDAT2.SD2';
RUN;
/* Produce means of the variables in the data set */
PROC MEANS;
TITLE 'Means of numeric variables in TESTDAT2.SD2';
RUN;
/* Correlate age and IQ */
PROC CORR;
VAR AGE IQ;
TITLE 'Correlation b/t age and IQ';
RUN;
AFTER RUNNING YOUR SAS PROGRAM
Always check the log file that is produced when you run a SAS program. Check the number of
observations read. The log will indicate if there are any errors in the program that must be fixed. The log
also provides comments about what SAS did with your program. When an error is found, return to the
program and, starting from the beginning of the program, edit one thing at a time and re-run the program
(this helps isolate where the problem is located because the log doesn’t always specify exactly where the
problem occurred).
MISCELLANEOUS NOTES
 An excellent collection of searchable SAS resources: http://www.ats.ucla.edu/stat/sas/
 SAS is can be fairly abstract, but it is also very powerful.
 SAS is great for large data sets with hundreds or thousands of observations and variables.
 SAS relies heavily on programming code as opposed to using icons and pull-down menus to execute
commands.
 SAS is a very logical language and is useful for planning out the steps necessary to do complicated
data work. Also, note that certain statements must go before other statements.
 One of the hardest concepts to grasp is the distinction between a temporary data set and a permanent
data set.
 Know your data well. Know what kind of file you will be working from. Think about whether you
need to build a data file from scratch or utilize an existing data file.
 There are MANY ways to accomplish the same goal in SAS. Go with what feels most comfortable.
 You can always look up how to do things in SAS if you can’t remember!
21
Download