Article I. Introduction to SAS

advertisement
Article I. Introductio
n to SAS
Statistics Outreach Center
Short Course
Beginning instructions:
Open Word document
Save text file in H:\
Save excel file in H:\
SAS Short Course
COURSE TOPICS
1. How to open SAS
2. Overview: the 5 “main windows” of SAS
3. Components of a SAS program
a. Data step
b. Procedures “step”
c. Other features of all SAS programs
4. SAS datasets
a. Nature of the dataset
b. Data embedded in Editor window
i. List input
ii. Colum input
iii. Informats
c. Data from an external file
i. Text file
1. Using filename statement
2. Not using filename statement
ii. Excel file
d. Types of datasets
i. Temporary datasets
ii. Permanent datasets
5. Data analysis & common procedures
a. Contents
b. Print
c. Frequency
d. Means
e. Univariate
f. Sort
6. Miscellaneous
2
SAS Short Course
Getting Started
Students at the University of Iowa can use SAS on their “Virtual Desktop.” This site can
be found at: https://virtualdesktop.uiowa.edu/Citrix/VirtualDesktop/auth/login.aspx
To open SAS:
1. Go to the Virtual Desktop website shown above
2. Log in using your HawkID username and password
3. You will see a main menu with several folders.
4. Go to Statistical analysis and you will see SAS 9_3 32 bit and SAS 9_3 64 bit.
Click on “SAS 9_3 64 bit”
5. You will see a pop-up window titled “Getting Started with SAS.” Click
“Close” for the time being.
6. You are ready to begin using SAS
Note:
These instructions can be used on computers that do not have SAS. If you are
using a computer that has SAS installed, you can use SAS directly from the installed
program. Click the “Start” menu in the lower left hand corner of the screen, click on “All
Programs,” and then click on “SAS (English).”
3
SAS Short Course
SAS Basics
There are 5 main “windows” you can view when using SAS: Explorer, Results,
Editor, Log, and Output. Explorer and Results are at the bottom of the left-hand panel,
and Editor, Log, and Output are at the bottom of the main panel. A brief description of
what each of these performs appears below:
Explorer:
This contains the folders “Libraries,” “File Shortcuts,” “Favorite Folders,”
and “My Computer.” The two most commonly used folders you might use in this
environment are “My Computer” and “Libraries.” “My Computer” gives you access to
all files on your computer. “Libraries” gives you access to SAS datasets that you create.
Results:
Results from SAS procedures that you have previously conducted during
your work session are stored in here.
Editor:
This window is where you type in (and edit) your SAS code. Your SAS
program runs from this window.
Log:
After you “run” a program, the Log contains notes concerning your code.
This window in SAS keeps track of how procedures were performed, and gives
indications of any errors in your SAS code.
Output:
Output from the requested procedures will be displayed in the output
window.
4
SAS Short Course
Results Output Options:
Tools > Options > Preferences… > Results
5
SAS Short Course
Sample SAS Program
List Input
DM 'LOG;CLEAR;OUT;CLEAR;';
/* CLEARING LOG & OUTPUT WINDOWS */
/*****************************************************************/
/*PROJECT: SAS Short Course
*/
/*
FOR: COE Students
*/
/*
BY: Sheila Barron
*/
/*
DATE: February 05, 2015
*/
/* NOTES: Entering data and checking it
*/
/*****************************************************************/
DATA CLASSDAT;
INPUT ID $ NAME $ SEX $ EXAM1 GRADE $;
DATALINES;
S01 Max
M 84 A
S02 John M 89 A
S03 Sarah F 86 B
S04 Lee
M 85 B
S05 Rosa F 94 A
S06 Ming F 84 C
;
PROC CONTENTS DATA=CLASSDAT VARNUM;
RUN;
PROC PRINT DATA=CLASSDAT;
TITLE 'SAS SHORT COURSE';
RUN;
/*****************************************************************/
/*DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;'; */
/*****************************************************************/
6
SAS Short Course
Components of a SAS program
In the SAS editor you can type in the commands you want SAS to execute.
A simple SAS program can be thought of as having two important parts (although it is
not necessary that every program have both parts).

SAS data step: The word DATA tells SAS that you want to work with your
dataset – either inputting the data or manipulating the data.

SAS procedures step: The word PROC tells SAS you want to do something
with the data (e.g., print it out, calculate statistics).
o If no data is specified, the last previously used dataset will be invoked.
A few things to know about SAS:

Each SAS statement must end with a semicolon “;”

At the end of your program you must have a run statement, “RUN;”. Otherwise
the last SAS data step or SAS procedure will not get executed.

SAS comments: Anything written between “/*” and “*/” is considered as
documentation that the person writing the program did not intend SAS to try to
execute. In other words, SAS will pass over anything that is written between “/*”
and “*/”. “*” and “;” also works to denote a comment. It is a good idea to use
comments to document what you are doing in your program. If you come back to
the program later, the comments will hopefully help you understand the purpose
of the program.
Running your program:

When you want SAS to execute the statements you have written, click the
“running man” icon on the toolbar. Or click on the Run pull-down menu and
select “submit.”

To run the entire program, make sure nothing is highlighted and click Run.

If you only want to run part of the program, highlight the part you want to run and
then click Run. SAS will only process the part of the program that you have
highlighted.
7
SAS Short Course
SAS Datasets
Before SAS can perform the variety of functions that it is used for, SAS first must
know what dataset it is going to use. SAS datasets contain columns corresponding to
specific variables (e.g., height, weight, etc.) and rows corresponding to specific
observations (e.g., persons, clinic sites, etc.). SAS can read data in two different
methods:
1. SAS datasets can be directly embedded in the Editor window
2. SAS datasets can be imported from a file (i.e., text file, excel file, etc.)
SAS Variables

SAS variables can be in one of two possible formats:
1. Character: typically letters or strings of letters and numbers, and mathematical
operations can not be performed on them. (ID, Name, Gender, Grade)
2. Numeric: typically numbers, and mathematical operations can be performed on
them. (Exam1)

Some rules about variable names:
1. Start with a letter or _ (underscore)
2. Can CoNtAin UpPer and LoWer Case
3. Contain only letters, numerals or underscores (_)
4. No Spaces
5. Are not case sensitive
6. 32 characters or fewer
8
SAS Short Course
Data Embedded in Editor Window
Suppose we want to use the following dataset in SAS. Note that each row corresponds to
a specific observation (person), and each column corresponds to a specific variable (ID,
Name, Gender, Exam1, and Grade).
S01
S02
S03
S04
S05
S06
Max
John
Sarah
Lee
Rosa
Ming
M
M
F
M
F
F
84
89
86
85
94
84
A
A
B
B
A
C
Option 1: List Input
Notice that the dataset does not have any missing data and there is always at least 1 blank
space between variables. When your data are set up like this it is OK to list the variables
in the INPUT statement without telling SAS where to find each variable. This is called
“list input” – SAS will read the input statement and expect the variables to be in the order
they are listed and separated by at least one space. If you have missing data that are
represented by blanks, variables that include blanks, or if you have variables that have no
spaces between them, ‘list input” won’t work (you will need to put a “.” for missing
data).
INPUT ID $ NAME $ SEX $ EXAM1 GRADE $;
Option 2: Column Input
Another option is “Column input.” In order to use “column input,” values for each
variable must line up – that is they must always be in the same columns. Then in the
input statement you add column numbers to tell SAS what column or columns to find
each variable.
INPUT ID $ 1-3 NAME $ 5-9 SEX $ 11 EXAM1 13-14 GRADE $ 16;
9
SAS Short Course
Option 3: Informats
A third way of reading in data is to use SAS informats. SAS informats tell the computer
the format of the data that is to be read in. The most commonly used informats are date
informats. Dates are a little tricky to deal with in computer programs if you want to use
them in calculations. A numeric informat consists the following pieces:
1.
Name
2. Width
3. A period
4.
Number of places after the decimal
For example, an informat for a date that is written month, day, year, separated by slashes
(e.g., 11/10/2007) is “MMDDYY10.” The name of this informat is MMDDYY, the
width is 10, next is the period. This is not a number with a decimal so the number of
places after the decimal is omitted.
Another note: Character informats start with a dollar sign ‘$’.
We will not be discussing informats in great detail. However, to look up other SAS
informats, go to the HELP menu, select SAS Help and Documentation.
Then go to:
SAS products
 Base SAS
 SAS 9.3 Formats and Informats: Reference
 SAS Informats
 Dictionary of Informats
 Informats by Category
10
SAS Short Course
Data from an external file
We will discuss how to import data from two common sources: an EXCEL file and a
TEXT file. For the most part, the input statement will follow all the same rules as if the
data were in the program but you need to tell SAS where to find the data. When
specifying the pathname, you must pay attention to the capitalization used in the
filename.
Data from a text file using FILENAME statement:
DM 'LOG;CLEAR;OUT;CLEAR;';
/* CLEARING LOG & OUTPUT WINDOWS */
/*****************************************************************/
/*PROJECT: SAS Short Course
*/
/*
FOR: COE Students
*/
/*
BY: Sheila Barron
*/
/*
DATE: February 05, 2015
*/
/* NOTES: Entering data and checking it
*/
/*****************************************************************/
FILENAME IN1 'H:\SOC_SAS_Short_Course_INTRO_TXT.TXT';
DATA CLASSDAT;
INFILE IN1;
INPUT ID $ 1-3 NAME $ 5-9 SEX $ 11 EXAM1 13-14 GRADE $ 16;
PROC CONTENTS DATA=CLASSDAT VARNUM;
RUN;
PROC PRINT DATA=CLASSDAT;
TITLE 'SAS SHORT COURSE';
RUN;
/*****************************************************************/
/*DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;'; */
/*****************************************************************/
11
SAS Short Course
Data from a text file without using FILENAME statement (simpler):
DM 'LOG;CLEAR;OUT;CLEAR;';
/* CLEARING LOG & OUTPUT WINDOWS */
/*****************************************************************/
/*PROJECT: SAS Short Course
*/
/*
FOR: COE Students
*/
/*
BY: Sheila Barron
*/
/*
DATE: February 05, 2015
*/
/* NOTES: Entering data and checking it
*/
/*****************************************************************/
DATA CLASSDAT;
INFILE 'H:\SOC_SAS_Short_Course_INTRO_TXT.TXT';
INPUT ID $ 1-3 NAME $ 5-9 SEX $ 11 EXAM1 13-14 GRADE $ 16;
PROC CONTENTS DATA=CLASSDAT VARNUM;
RUN;
PROC PRINT DATA=CLASSDAT;
TITLE 'SAS SHORT COURSE';
RUN;
/*****************************************************************/
/*DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;'; */
/*****************************************************************/
12
SAS Short Course
Importing Data from an Excel file:
Pulldown: File
 Import Data
 Next
 Browse for workbook (select appropriate EXCEL file) [OK]
 Sheet name [Next]
 Data name (Under Member: CLASSDAT) [Next]
 SAS File Name (H:\Intro_Data) [Finish]
[Open new SAS program]
Notice that when you read the data in from EXCEL, SAS tries to assign informats that
seem the most logical. This can be a big help – for example, SAS will often correctly
read in dates. But it can also be a pain when the informat SAS picks in not the correct
one. Thus, be careful when you import data to look carefully and make sure the data got
read in correctly.
Also, the wizard creates SAS code which can serve as a “template.” The wizard does not
have to be used—you just write your own SAS code to import an excel file. The wizard
would create the following code. Usually it is easiest to save it to your desktop at the end
of the wizard, then copy and paste it into the editor window you are using.
PROC IMPORT OUT=CLASSDAT
DATAFILE= "H:\RA_SAS_Short_Course_INTRO_XLS.xls"
DBMS=EXCEL REPLACE;
RANGE="Sheet1$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
13
SAS Short Course
Data in SAS
There are two types of files (data) that can be used in SAS: temporary datasets, and
permanent datasets.
Temporary datasets:
“Work” datasets are temporary datasets. SAS remembers them during the particular
session that you are working in, but will forget them for subsequent sessions. Up until
this point in time, we’ve only been working with work datasets—hence the
“WORK.______” format for all specified datasets.
Permanent datasets:
Permanent datasets can be created (and stored) using the “<library name>.______”
format for specified datasets. The library name can be anything. To do this you need to
start your program with a library reference (LIBREF). Then use that reference as the first
part of the dataset name you assign. For example, I like to call my library SAVE so I use
the following libref.
LIBNAME SAVE
LIBNAME
'H:\';
lets SAS know that the permanent directory is going to be specified. SAVE
(can be anything) is the name used to refer to the external data library specified by 'H:\'
which is the full pathname.
When specifying the data, use the “<library name>._______” format. For example, using
the LIBNAME statement above, SAVE.CLASSDAT would create a permanent file
compared to the WORK.CLASSDAT which we have been using.
You can then view the dataset by opening the appropriate library in the “Libraries” folder
under the explorer tab.
14
SAS Short Course
Sample SAS Program
Creating a Permanent Dataset With List Input
DM 'LOG;CLEAR;OUT;CLEAR;';
/* CLEARING LOG & OUTPUT WINDOWS */
/*****************************************************************/
/*PROJECT: SAS Short Course
*/
/*
FOR: COE Students
*/
/*
BY: Sheila Barron
*/
/*
DATE: November 13, 2007
*/
/* NOTES: Entering data and checking it
*/
/*****************************************************************/
LIBNAME SAVE
'H:\DATA';
DATA SAVE.CLASSDAT;
INPUT ID $ NAME $ SEX $ EXAM1 GRADE $;
DATALINES;
S01 Max
M 84 A
S02 John M 89 A
S03 Sarah F 86 B
S04 Lee
M 85 B
S05 Rosa F 94 A
S06 Ming F 84 C
;
PROC CONTENTS DATA=SAVE.CLASSDAT VARNUM;
RUN;
PROC PRINT DATA=SAVE.CLASSDAT;
TITLE 'SAS SHORT COURSE';
RUN;
/*****************************************************************/
/*DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;'; */
/*****************************************************************/
15
SAS Short Course
Ready to Begin Data Analysis
Now that your data is in SAS, you are ready to conduct statistical procedures. SAS has
literally hundreds of procedures that will do just about any quantitative analysis you
want. To get an overview of the procedures go to the HELP menu, select SAS Help and
Documentation and Contents. Then go to:
SAS products
 SAS/STAT 9.3 User’s Guide
In the user guide you will find overviews for different types of analyses as well as details
on specific procedures.
16
SAS Short Course
Sample SAS Program
Code for Common Procedures
DM 'LOG;CLEAR;OUT;CLEAR;';
/* CLEARING LOG & OUTPUT WINDOWS */
/*****************************************************************/
/*PROJECT: SAS Short Course
*/
/*
FOR: COE Students
*/
/*
BY: Sheila Barron
*/
/*
DATE: Feb 05, 2015
*/
/* NOTES: Entering data and checking it
*/
/*****************************************************************/
DATA WORK.CLASSDAT;
INPUT ID $ NAME $ SEX $ EXAM1 GRADE $;
DATALINES;
S01 Max
M 84 A
S02 John M 89 A
S03 Sarah F 86 B
S04 Lee
M 85 B
S05 Rosa F 94 A
S06 Ming F 84 C
;
PROC CONTENTS DATA=WORK.CLASSDAT VARNUM; RUN;
PROC PRINT DATA=WORK.CLASSDAT;
RUN;
PROC PRINT DATA=WORK.CLASSDAT (OBS=3);
VAR NAME GRADE; RUN;
TITLE 'SAS SHORT COURSE';
PROC FREQ;
TABLES EXAM1; RUN;
PROC FREQ;
TABLES EXAM1*GRADE; RUN;
PROC FREQ;
TABLES EXAM1*GRADE /LIST; RUN;
PROC MEANS;
VAR EXAM1; RUN;
PROC UNIVARIATE;
VAR EXAM1; RUN;
PROC SORT;
BY SEX; RUN;
/*****************************************************************/
/*DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;'; */
/*****************************************************************/
17
SAS Short Course
PROC CONTENTS DATA=WORK.CLASSDAT VARNUM;
To get a listing of the variables in a dataset along with other information about the
dataset. “Varnum” limits output to include only variable names, type, length, and labels.
PROC PRINT;
To print out a dataset (often good to check the data using PROC PRINT before running
any analyses).
PROC PRINT DATA=WORK.CLASSDAT (OBS=3);
VAR NAME GRADE;
If the dataset is small you can print out the whole thing. If it is large you may want to
select particular variables to print using a VAR statement or select particular observations
to print using an OBS= option.
PROC FREQ;
TABLES EXAM1;
To produce a frequency distribution for a variable (specify the variable using the
“TABLES” statement.
PROC FREQ;
TABLES EXAM1*GRADE;
PROC FREQ will also produce two-way (or higher) cross-tabulations of the data.
PROC FREQ;
TABLES EXAM1*GRADE /LIST;
If there are lots of unique values for the variables, you may want to try a LIST option to
produce more concise output.
18
SAS Short Course
PROC MEANS;
VAR EXAM1;
PROC UNIVARIATE;
VAR EXAM1;
To produce means and other descriptive statistics use PROC MEANS or PROC
UNIVARIATE. PROC UNIVARIATE will produce more extensive output. (Note that
the specific variable is specified by the VAR statement. If no VAR statement is included,
by default SAS will produce output for all variables.) Sometimes you may want to want
to save the output to a dataset. This can be accomplished with:
PROC MEANS DATA=SAVE.CLASSDAT;
VAR EXAM1;
OUTPUT OUT=STATS;
RUN;
PROC PRINT DATA=WORK.STATS;
RUN;
This is useful if you want a permanent record of the results or if you will use the results
in other analyses. Note the OUTPUT OUT= statement can be used in other PROC
procedures as well. Often times the outputted dataset will have variables you don’t want.
To get rid of these use the DROP statement. For example:
DATA STATS; SET STATS;
DROP _TYPE_ _FREQ_;
RUN;
PROC PRINT DATA=WORK.STATS;
RUN;
The first statement drops the automatically created variables _TYPE_ and _FREQ_ from
the dataset. The print procedure confirms this. For more advanced procedures or selecting
specific tables created from a procedure, look up the ODS TRACE function.
PROC SORT;
BY SEX; RUN;
Sometimes you will want to get descriptive statistics for subgroups based on a categorical
variable. This often requires the data be sorted prior to running the analysis (see below).
Sorting your data is also helpful if you want to print the data to examine it.
PROC MEANS DATA=SAVE.CLASSDAT;
VAR EXAM1;
BY SEX; RUN;
19
SAS Short Course
NOTE:
Note that in some PROC statements, the keyword “DATA=” is specified. In other PROC
statements, it is omitted. It is necessary to tell SAS which dataset to use if you are just
starting your SAS session or if you are switching the dataset you want SAS to use. If you
are continuing to use the same dataset that you used in the last procedure or data step,
then it is not necessary to tell SAS which dataset to use, it will automatically use the
dataset it used last.
Miscellaneous
Note that there are some lines in this program that we have not talked about.

The top line (DM 'LOG; CLEAR; OUT; CLEAR; ';) tells SAS to clear out the log
and output windows. Without this line, each time you run the program, SAS will
add the log and output to the end of the old log and output. This can sometimes
be useful, but it can be confusing after several runs of a program.

The two lines that start with “FILENAME” tell SAS where the log and output are
to be saved (not included in this program).
FILENAME
FILENAME

LOG "H:\LOG_SAS.TXT";
OUT "H:\OUT_SAS.txt";
The last two lines tell SAS to save the log and output and if those files already
exist, to replace the old versions with these versions (not used in this program:
these lines were forced to be a comment in this program).
When you have written a program it is a good idea to save it. Go to the FILE menu and
click SAVE AS. It will prompt you for a name. After that, you can save your revisions
by selecting SAVE or clicking the save icon. When you come back later, you can open
the program and continue working.
20
SAS Short Course
It is always good habit to give a title to something that you are doing. For example,
TITLE "Short Course Example";.
After you have completed your procedures, end
with TITLE;, otherwise your title will be carried through the remainder of your session.
21
Download