The SAS® System

advertisement
Quantify the Example Data




First, code and quantify the data (assign
column locations & variable names)
Use the sample data to create a data set
from the first 10 counties
Include: ID, County, Number of reporting
Units (v1), Number of employees (v2), Payroll
(v3)
Save to your flash drive as ‘countydata’
The
®
SAS
System
Statistical Analysis Programming
Introduction to SAS®

Arguably the most popular computer software
for conducting statistical data analysis

Does both data management & statistical
analysis


Useful for managing even the most complex data
sets
Operates on its own language
Introduction to SAS®

Open the SAS® Window
Introduction to SAS®

You essentially have 4 windows within SAS:





The Explorer Sidebar Window
The Log Window
The Editor Window
The Output Window
You can resize and reconfigure these
windows, and minimize & maximize as you
would in any windows-based program
Introduction to SAS®

The Editor Window is for constructing &
running programs

“Programming” in SAS involves writing out
step-by-step instructions in the correct order
in a format the SAS System can understand


The program you write must be perfect
SAS will give you error messages
SAS® Programming

Three major components to most
SAS programs:
 Input
 Manipulation
 Output
SAS® Programming

Input

Most of the time data are placed into a
data file and inputted into the program

The program tells the system which
variables are located in which columns
SAS® Programming: Input
Input data &
column locations
SAS® Programming

Manipulation

Data are then manipulated to
accomplish the tasks for which the
program was written: transforming or
combining variables or conducting
statistical or other analyses
SAS® Programming:
Manipulation
Manipulate
the
Data
SAS® Programming

Output
Program Output
 The results of the program are then
outputted into the Output Window
 You must save these results
Log
SAS® Programming: Output
SAS® Programming: Log
SAS® Programming

Basic Input Statement = “DATA Step”


Begins with an “options” statement that formats
what the output page will look like
Names the temporary data set location


Tells SAS where to find your actual data set


“data1,” “data 2,” etc. or text name (8 characters max)
File location
Gives the “Input” – or, column locations for your
variables
SAS® Programming: Input
Temporary Data Set
Input
Column
Locations
Options
Data
Location
SAS® Programming

Basic Input Statement



After your input statement, you add statements to
transform or manipulate the data
Add statements to perform analysis procedures
Ends with a RUN statement
SAS® Programming: Input
Analysis
Procedure
Data
Manipulations &
Transformations
SAS® Programming: Syntax

SAS Statements



Commands or instructions that can be interpreted
by the SAS system
These commands appear as blue text in the
Enhanced Editor window
DATA, PROC, PUT, INPUT, RUN, etc.
SAS® Programming: Syntax

Every SAS statement must end in a
semicolon;



This is how the system knows the statement is
complete
One of the most common errors is omitting
semicolons
Comments begin with an asterisk *
SAS® Programming: Syntax

In the Enhanced Editor:





Plain text is black
Numerical values are teal
SAS Statements are blue
Errors are red
Basic arithmetic functions can be used (+, -,
*, /)
SAS® Programming:
Logical Operators
Symbol
Abbreviation
Operation
=
eq
equal to
^= or ~=
ne
not equal to
>
gt
greater than
<
lt
less than
>= or =>
ge
greater than or equal to
<= or =<
le
less than or equal to
&
and
and
|
or
or
Building a SAS® Program
1.
2.
3.
Open the SAS Program and Click inside the
Editor Window
Add your “options” statements:
options nocenter nonumber nodate
linesize=88 pagesize=72;
Add the “data” statement, then the name of
your first temporary data file (data1)
Building a SAS® Program
Building a SAS® Program
Add the “infile” statement, then the file
location where your data is stored
4.
Add the “input” statement, then each
variable name followed by its numeric
location
5.


A dollar sign $ after a variable name signifies
that the variable is character (text) data
Recommend that you input data in 80 column
lines, #2 would signify the start of a new line
Building a SAS® Program
Building a SAS® Program
Add statements for data management or
statistical analysis.
6.



SAS Statements vary based on the task to be
accomplished
Data management: create new variables,
change values, etc.
Statistical procedures: frequencies, correlations,
crosstabulations, regression, etc.
Building a SAS® Program
Building a SAS® Program
Hands-On Exercise 1: Build a Basic SAS
Program
 Using SAS, write a basic program for the
county data set you created
 For your analysis, run a “print” command:

Proc Print; var county v1 v2 v3;
Exercise 1
SAS® Procedures
PROC Commands

SAS procedures that perform different
operations use “PROC” commands



A lot of different PROC commands, we’ll touch on
a few of the most used
Some for data management
Some for statistical analysis
SAS® Procedures

PROC PRINT



Prints the data you have in your temporary SAS
data set
Will print the variables you designate (either those
from your initial INPUT statement, or variables
you create)
Helps you better understand your data set; helps
you spot errors
SAS® Procedures
Proc Print; var v1 v2 v3;


This statement tells SAS to print the data /
information for v1, v2, and v3
If you run “PROC PRINT” without any
variables designated, it will print ALL of your
variables
SAS® Procedures

PROC PRINT


You should run a proc print when you transform
variables or create new variables to insure that
the transformations were done correctly
Example




Create a new variable by adding two others:
newvar = v1+v2;
Proc print; var v1 v2 newvar;
Check the output to insure that the operation is
correct
Variable Manipulations

SAS will permit you to perform many different
types of variable manipulations

Add Variables


newvar1 = v1+v2+v3;
Subtract Variables

newvar2 = v3 – v2;
Variable Manipulations

Multiply Variables


Divide Variables


newvar3 = v2 * v3;
newvar4 = v2/v1;
More complex transformations can be done
following basic rules for arithmetic operations

newvar5 = (v1+v2/v3)*4;
Variable Manipulations

You can also use your new variables in other
transformations


newvar6 = newvar4*newvar5
Create categorical variables


You can reformat your data into new variables
If you have a survey question with responses
showing ‘year of birth’ you can convert it to ‘age’
Variable Manipulations
Variable Manipulations

For example, if you have a series of data for
a variable:

Variable name: “vexample”


Values: 1 2 3 4 5 6 7 8 9 10
We want to create a categorical variable with the
categories and corresponding values of:



Low = 1
Medium = 2
High = 3
Variable Manipulations


Give your new variable a name like “newvexample”
or “vexamplecat”
Your new categorical variable would be created with
this if/then syntax:
Variable Manipulations

If your data is not as simple as 1 2 3 4 5 6 and so
on, you can use the “PROC SORT” command to
help you sort your data set
Variable Manipulations

Run a PROC SORT for v2, and then run a PROC
PRINT to show the variable rearranged in ascending
order
Variable Manipulations
Variable Manipulations

Now, create a new variable “newv2” with the
following categories:




Low = 1 (values less than 100)
Medium = 2 (values 100 to 500)
High = 3 (values more than 500)
Run a PROC PRINT and PROC FREQ to
check your transformations
Variable Manipulations
Variable Manipulations
IF/THEN Statements

In the previous exercise, you saw how if /
then statements can be used to create new
variables

If / then statements are very powerful and
can be used in a number of ways to help you
manage your data
IF/THEN Statements
Segmenting Data Sets – the IF statement

Simple IF Statements


The SAS “IF” command can be used to segment
or partition your data set
For example, suppose you only want to examine
certain cases in your data set – only females,
only people over age 55, only Florida counties
with populations greater than 500,000, etc.
IF/THEN Statements

You can segment in this way, using the IF
statement:

If we only want to examine the number of
reporting units in our sample for counties with a
“low” number of employees:


If newv2 is low looks like this in SAS language:
IF newv2=1;
IF/THEN Statements
IF/THEN Statements
Combining IF statements to segment data sets
with the DATA command
 It is very useful to combine the IF command
to segment data with the DATA command we
learned earlier
 Recall that your initial data step started with
the command:

data data1;
This created the initial temporary SAS data set
IF/THEN Statements


The temporary data set “data 1” contained all
of the cases that you entered into your data
set
If you now want to examine only a subset of
those cases, you can do that in a second
data set:

data data 2; set data1;
This creates a second temporary data set called
“data 2” (remember SAS allows a large number of
data sets)
IF/THEN Statements

We can now use an IF statement to segment
the data in our set “data 2”

Let’s create a second data set that includes
only counties with a “medium” number of
employees

Run a PROC PRINT to check the output
IF/THEN Statements
IF/THEN Statements

The PROC PRINT
shows us that the
temporary data set
we’re now dealing with
has only the 5 counties
with a “medium”
number of employees
IF/THEN Statements
Hands-On Exercise
 Use the commands we’ve just learned to:
1. Create a new variable for high, medium,
and low payroll amounts (newv3)
2. Use the DATA and IF statements to create a
new data set that contains only those
counties with the highest payroll for gasoline
services stations – run a PROC PRINT to
check your results
IF/THEN Statements
IF/THEN Statements
IF/THEN Statements
IF/THEN Statements

The IF and THEN commands are most often
used together with the operators we talked
about before
SAS® Programming:
Logical Operators
Symbol
Abbreviation
Operation
=
eq
equal to
^= or ~=
ne
not equal to
>
gt
greater than
<
lt
less than
>= or =>
ge
greater than or equal to
<= or =<
le
less than or equal to
&
and
and
|
or
or
IF/THEN Statements
More Complex IF statements

Multiple IF statements can be connected
using “and” or “or” statements to make more
complex statements:
if v1 eq 2 or v2 gt 5 and v3 ne 2 then newvar =1
IF/THEN Statements
Using IF and THEN statements:
 The general form of this command (for
creating new variables, separating data sets,
etc.) is:



IF variable condition exists (character indicator
abbreviation: eq, ne, lt, le, ge) THEN new variable
condition (numeric symbol)
IF v2 eq 5 then newv2 = 1;
Again, you can combine conditions for more
complex statements
IF/THEN Statements
Add Variables & Cases

Two other important data management
functions that SAS can perform are adding
additional cases or observations and adding
new variables
Add Variables & Cases

Adding Cases



The term for adding cases or observations is
“concatenation”
This allows you to add new cases to the bottom of
your existing data set
You simply create a second data set and add it to
your initial data set
Add Variables & Cases
Initial
Data
Set
Additional
Cases
Merged Set
Add Variables & Cases
Hands-On Exercise
 You have already created one data set of 10
counties
1. Create a new data set containing
information for the next 4 counties (Collier,
Columbia, De Soto, and Dixie)
2. Add these cases to your existing data set
3. Do a PROC PRINT for data3 to verify
Exercise
Add Variables & Cases

Adding Variables



Adding variables to your existing data is simple as
well
Again, you will need to create a second data set
that will essentially add a column or columns to
your initial data set
The second data set will contain the new variable
you are adding and one variable that matches
exactly a variable in your initial data set – usually
the sequential ID number (similar to Access)
Add Variables & Cases

To make sure that the data sets are properly
combined, you must SORT the initial and
second data set by the matching variable

The syntax looks like this:
Add Variables & Cases
Initial
Data
Set
Added
Variables
Merge
SAS® Statistical Procedures
Descriptive Procedure for Continuous Data
 PROC UNIVARIATE;
 Proc Univariate will provide basic descriptive
information for continuous variables
 The syntax looks like this:
SAS® Statistical Procedures
SAS® Statistical Procedures
Descriptive Procedure for Categorical Data
 PROC FREQ;
 Proc Freq will provide basic descriptive information
for categorical or ordinal variables
 The syntax looks like this:
SAS® Statistical Procedures
SAS® Statistical Procedures
Analytical Procedures for Continuous Data
 PROC CORR;
 Proc Corr provides an analysis of the association
between two continuous variables
 Computes a correlation coefficient that demonstrates
the level of association, as well as a p-value showing
the significance of that association
 The syntax looks like this:
SAS® Statistical Procedures
Correlation coefficient
p-value
SAS® Statistical Procedures
Analytical Procedures for Categorical Data
 PROC FREQ;
 Proc Freq can also be used to calculate the level of
association between two categorical or nominal
variables
 X2 can be added to assess the significance level of
that association
 The syntax looks like this:
DV
IV
SAS® Statistical Procedures
Crosstab
Table
Chi-square
analysis
SAS® Statistical Procedures


PROC FREQ can also be used in conjunction
with DEVIATION to analyze the standard
deviation
Many SAS procedures like this have
additional analyses that can be added in this
way
SAS® Statistical Procedures

Multivariate Analysis:


PROC REG; computes the association between a
continuous dependent variable and numerous
independent variables
PROC LOGIT; computes the association between
a categorical dependent variable and numerous
independent variables
SAS® Statistical Procedures
Regression analysis:
 PROC REG;
 Uses the “model” command
 Construct your model with your dependent variable
first, then your independent variables
 The syntax looks like this:
SAS® Statistical Procedures
SAS® Statistical Procedures


These are only a few examples of the
analyses you can do with SAS
SAS can also do:





Time series analysis
Factor analysis
ANNOVA
T-tests
…and more!
Download