Introduction to SAS

advertisement
Introducing
SAS® software
Acknowlegements to
David Williams
Caroline Brophy
Statistics
in
Science

Need to know
– SAS environment
– SAS files (datasets, catalogs etc) & libraries
– SAS programs
How to:
 Get data in
 Manipulate data
 Get results out
Statistics
in
Science

SAS software environment
Statistics
in
Science

SAS Windows (SAS 9)
Statistics
in
Science

Some (!) SAS windows
– Editor
Where code is written or imported, and submitted
– Log
What happened, including what went wrong
– Output
Results of program procedures that produce output
– Explorer
Shows libraries (SAS & Windows), their files, and where you can see
data, graphs
– Results
Shows how the output is made up of tables, graphs, datasets etc
– Notepad
A useful place to keep bits of code
Statistics
in
Science

SAS software programs
Statistics
in
Science

SAS Programs
data one;
input x y;
datalines;
-3.2 0.0024
-3.1 0.0033
. . .
;
run;
proc print data = one (obs = 5);
run;
proc means data = one;
run;
Statistics
in
Science

DATA step
creates SAS data set
PROC steps
process data in data set
Step Boundaries
SAS steps begin with a

DATA statement

PROC statement.
SAS detects the end of a step when it encounters
Statistics
in
Science


a RUN statement (for most steps)

a QUIT statement (for some procedures)

the beginning of another step (DATA statement or
PROC statement).

Recommendation: use RUN; at end of each step
Step Boundaries
data seedwt;
input oz $ rad wt;
datalines;
Low 118.4 0.7
High 109.1 1.3
Low 215.2 2.9
run;
proc print data = two;
proc means data = seedwt;
class oz;
var rad wt;
run;
Statistics
in
Science

Submitting a SAS Program
When you execute a SAS program, the output generated
by SAS is divided into two major parts:
SAS log
contains information about the processing of
the SAS program, including any warning and
error messages.
SAS output contains reports generated by SAS
procedures and DATA steps.
Statistics
in
Science

Recommended steps!
1) Submit all (or selected) code by
 F4
 Click on the runner in the toolbar
2) Read log
3) Look in output window
if you expect code to produce output
4) Problems
 Bad syntax
 Missing ; at end of line
 Missing quote ’ at end of title (nasty!)
Statistics
in
Science

Improved output - HTML
Tools  Options  Preferences Results
Do this & resubmit code
Check HTML output in Results Window
Statistics
in
Science

SAS data sets
Statistics
in
Science

SAS data sets
• SAS procedures (PROC … ) process data from SAS
data sets
• Need to know (briefly!)
– What a SAS data set looks like
– How to get out data into a SAS data set
Statistics
in
Science

SAS data sets
• live in libraries
• have a descriptor part (with useful info)
• have a data part which is a rectangular table
of character and/or numeric data values
(rows called observations)
• have names with syntax
<libname.>datasetname
libname defaults to work if omitted
Statistics
in
Science

work library
SAS data sets with a single part name like
oz, wp or mybestdata99
1)
are stored in the work library
2)
can be referenced e.g. as
mybestdata99 or work.mybestdata99
3)
Statistics
in
Science

are deleted at end of SAS session!
Don’t loose your data!
Keep the SAS program that read the data from its
original source
. . . More later!
Statistics
in
Science

Viewing descriptor & data
/* view descriptor part */
proc contents data = wp;
run;
/* view data part */
proc print data = work.wp;
run;
Alternatively:
Use SAS Explorer: Open (for data) Properties (for descriptor)
Properties is not as clear as CONTENTS
Statistics
in
Science

SAS variables
There are two types of variables:
• character
contain any value: letters, numbers, special
characters, and blanks.
Character values are stored with a length of 1 to 32,767
bytes (default is 8).
One byte equals one character.
• numeric
stored as floating point numbers in 8 bytes
of storage by default.
Eight bytes of floating point storage provide space for 16 or
17 significant digits.
You are not restricted to 8 digits.
Don’t change the 8 byte length!
Statistics
in
Science

SAS variables
OUTPUT
The CONTENTS Procedure
Alphabetic List of Variables and Attributes
#
1
2
3
Statistics
in
Science

Variable
oz
rad
wt
Type
Char
Num
Num
Len
8
8
8
SAS names
– for data sets & variables
• can be 32 characters long.
• can be uppercase, lowercase, or mixed-case
but are not case sensitive!
• must start with a letter or underscore. Subsequent characters can
be letters, underscores, or numeric digits
- no %$!*&#@ or spaces.
Statistics
in
Science

Missing Data Values
A value must exist for every variable for each observation.
Missing values are valid values.
LastName
FirstName
JobTitle
Salary
TORRES
LANGKAMM
SMITH
WAGSCHAL
TOERMOEN
JAN
SARAH
MICHAEL
NADJA
JOCHEN
Pilot
Mechanic
Mechanic
Pilot
50000
80000
.
77500
65000
A character missing
value is displayed as
a blank.
Statistics
in
Science

A numeric
missing value
is displayed as
a period.
SAS syntax
• Not case sensitive
• Each ‘line’ usually begins with keyword
and ends with ;
• Common Errors:
– Forget ;
– Miss-spelt or wrong keyword
– Missing final quote in title
title ‘Woodpecker Habitat; /* quote mark missing */
title ‘Woodpecker Habitat’;
Statistics
in
Science

Comments
1.
Type /* to begin a comment.
2.
Type your comment text.
3.
Type */ to end the comment.
•
To comment selected typed text remember: Ctrl+/
•
Alternative:
* comment ;
Statistics
in
Science

SAS
Creating a SAS data set
Statistics
in
Science

Getting data in!
Consider 2 methods
Statistics
in
Science

1)
Data in program (briefly!)
2)
Data in Excel workbook
Getting data in!
Data in program file:
data oz;
input
datalines;
Low 118.4
High 109.1
Low 215.2
. . .
;
run;
oz $ rad wt;
0.7
1.3
2.9
Note:
1. oz is text variable so requires $
2. No missing values
3. Values of oz
Statistics
in
Science

•
don’t contain spaces
•
are at most 8 character long
Getting data in!
from Excel
• Use IMPORT wizard
saving program to reduce future clicking!
Statistics
in
Science

Creating new variables
Adding a new variable to an existing SAS data
set (say work.old)
1. Use set
2. Give definition of new variable
data new;
/* read data from work.old */
set old;
y2 = y**2;
ly = log(y);
ly_base10 = log10(y);
t1 = (treat = 1);
run;
Statistics
in
Science

Data set: work.new
Obs
Statistics
in
Science

treat
y
ysquared
logy
logy_base10
t1
1
A
10.0
100.00
2.30259
1
0
2
A
100.0
10000.00
4.60517
2
0
3
B
-10.0
100.00
.
.
1
4
B
0.0
0.00
.
.
1
5
B
0.1
0.01 -2.30259
-1
1
Data Screening
Statistics
in
Science

Data Screening
checking input data for gross errors
• Use PRINT procedure to scan for obvious anomalies
• Use MEANS procedure & examine summary table
– MAXIMUM, MINIMUM – reasonable?
– MEAN - near middle of range?
– MISSING VALUES - input or calculation error e.g.
log(0)?
– CV (= 100*std.dev/mean) - < 10% for plant growth,
between 12 & 30% for animal production variables, >
50% implies skewness for any positive variable
Statistics
in
Science

SAS syntax
MEANS syntax
What else should go here?
Statistics
in
Science

Dealing with data errors
• Check original records
• Change mistakes in recording where the correct
value is beyond question
• Regenerate observations where possible – e.g.
reweigh sample, redo chemical analysis
• With a large body of data in an unbalanced
design err on the side of omitting questionable
data
Do not proceed until data has been
properly cleaned – if necessary
perform a number of screening runs
Statistics
in
Science

Download