SAS Chapter 3 - University of South Carolina

advertisement
Chapter 3: Working With Your
Data
**SAS Alert**
In this chapter, we will see some examples
of SAS at its worst. The “implicit” looping
of the DATA step makes several
operations that would be easy in R
difficult or clumsy in SAS
© Fall
2011 John Grego and the University of South Carolina
1
Working With Your Data
Creating variables
based on other
variables is easily
done within the data
step
 Assignment is carried
out with the = sign

input var1 var2
var3;
mysum=var1+var2+va
r3;
mycube=var3**3;
always6=6;
newname=var2;
2
Working With Your Data
Order of operations is
followed, but use
parentheses when
necessary and for
clarity
 Previously defined
variables can be overwritten

var1=var1-15;
3
Working with your data


In addition to simple math expressions,
you can use built-in SAS functions to
create variables
Section 3.3 (pages 78-81) lists many builtin functions
4
Working with your data


Some of the most useful: LOG, MEAN,
ROUND, SUM, TRIM, UPCASE,
SUBSTR, CAT, COMPRESS, DAY,
MONTH
Note: MEAN takes the mean of several
variables, not the mean of all values of
one variable. Similarly with SUM, etc.
They are row operators, not column
operators.
5
Using IF-THEN Statements

Conditional statements in SAS rely on
several important keywords like IF, THEN
and ELSE and logical keywords like
EQ,NE,GT,LT,GE,LE,IN,AND,OR

All of these have symbolic equivalents
(see page 82)
IN: Checks whether a variable value
occurs in a specified list

6
Using IF-THEN Statements

An IF-THEN
statement is a simple
conditional statement,
usually resulting in
only one action,
unless the keywords
DO and END are
specified (like braces
in R)
IF X>0 AND X<2
THEN Y=X;
ELSE Y=2-X;
7
Using IF-THEN Statements
Several conditions
may be checked using
ELSE IF or ELSE
statements
 The last action is
carried out if none of
the previous
conditions are true

IF .. THEN ..;
ELSE IF .. THEN
..;
ELSE ..;
8
Using IF-THEN Statements


Using several ELSE statements is more
efficient than using several IF-THEN
statements (though errors in logic are
more likely)
Note: Parentheses may be useful with
AND/OR statements.
9
Using IF-THEN Statements

Be careful with
missing values when
making comparisons!
SAS considers
missing values to be
“less than” practically
any value, so if data
contains missing
values, handle them
separately
IF weight=. THEN
size=‘unknown’;
ELSE IF weight<25
THEN
size=‘small’;
ELSE IF ..
10
Using IF to select a subset of
data
We can retain cases
from the data using
logical operators with
a subsetting IF.
 The syntax is
unusual; it seems as
though part of the
statement is missing

DATA B; SET A; IF
type=‘Pine’;
11
Using IF to select a subset of
data


Data set B will then include only the cases
that match the condition
Most people get used to this odd syntax,
but you can include the KEEP statement if
this makes you really uncomfortable
12
SAS Dates



SAS stores dates internally as the number
of days since January 1, 1960
Special informats for reading dates (pp.
44-45)
When a year is specified by two digits
(‘03, ‘45, etc.), you can use YEARCUTOFF
to specify the century
13
SAS Dates
The default is 1920;
SAS assumes dates
range from 1920 to
2019.
 It’s better to simply
avoid this ambiguity

options
yearcutoff=1930;
options
yearcutoff=1800;
14
SAS Dates
Handy function:
TODAY() is set to the
current date
 Special date form for
logical operators

if
birthdate>’01JAN
1988’d then
age=‘Under 21;
15
SAS Dates
Printing dates in a conventional format:
Use FORMAT command in PROC PRINT;
 We can also output data using date
formats (pp. 90-91)
 Other useful functions:
MONTH(),DAY(),YEAR(),MDY(),QTR()

16
RETAIN statement




The RETAIN statement tells SAS to retain
the value of a variable as SAS moves from
observation to observation
A clumsy solution to a common SAS
problem
Can be useful for cumulative analyses
A sum statement creates a cumulative sum:
cumul_sum+value_added;
17
Using arrays



We have seen how to alter variables that
have been read into a SAS data set
Sometimes we want to operate on more
than one variable in the same way
This can be accomplished quickly by
creating an array (another example of an
awkward solution to a common problem in
SAS)
18
Using arrays


An array is a group of variables (either all
numeric or all character)
These could be already-existing variables
or new ones
19
Defining an array
Once an array is
defined, you can refer
to its variables using
“subscripts”. Ex:
array_name(2)
 Using a DO statement
is clunky—I like to
use DO OVER
instead

ARRAY array_name
(n) $ .. .. ..;
20
Shortcuts for lists of variables

Suppose variable names begin with a
common character string, and end with a
number sequence:
var1, var2, var3, var4

You can refer to them in shortcut fashion:
var1-var4
21
Shortcuts for lists of variables

When specifying
abbreviated lists in
functions, you must
use the keyword OF
sum(of var1-var4);
mean(of var1var7);
22
Shortcuts for lists of variables

You can abbreviate lists of named
variables using a double hyphen:
firstvar—lastvar

These must follow the creation order of
the variables as defined in the SAS data
sheet. Check this either in the worksheet
or by entering:
proc contents data=dsname
position;
23
Special variables




_ALL_ is short for “all variables in the
data set”
_NUMERIC_ is short for “all numeric
variables in the data set”
_CHARACTER_ is short for “all character
variables in the data set”
_N_ is short for “current observation index
in the data set
24
Download