Presentation

advertisement
Introduction to SAS
Available at
http://brac.umd.edu/~raulcruz
1
What is SAS?


SAS = “Statistical Analysis System” –
developed for both data manipulation
and data analyses in 1976
Visit the SAS website:
http://www.sas.com
2
Basics of SAS

5 Windows



EDITOR – file where you write code and comments for
execution by SAS (save as .sas)
LOG – file where notes about the execution of the program
are written, as well as errors (save as .log)
OUTPUT – file where results from the program are written
(save as .lst)

Explorer Window

Results Window
3
The SAS interface consists of multiple windows designed for specific functions.
The following windows are open by default:
Enhanced Editor
Window
Type SAS programs here. The "enhanced" editor has
more advanced features than the traditional "program
editor" used in SAS 6.12.
Output Window
View the results of SAS procedures including
tables and line charts. Graphs will be displayed
in a separate Graph window.
Log
Window
View SAS programs as they
execute including error
messages and warnings.
Explorer
Window
Browse your SAS tables
(datasets) and libraries. Create
new files and file shortcuts.
Results
Window
Displays a hierarchical
outline of SAS results to
simplify output navigation.
4
SAS Menus







File: file input/output
Edit: Editing contents in every window
 Contents in LOG and OUTPUT windows are not
editable, but deletable
View: view programs, log files, outputs, and data sets
Tools: editors for graphics, report, table, etc
Solutions: analysis without writing codes
Window: navigating among windows
Help: help information of SAS
5
SAS toolbar


The toolbar gives you quick access to
commands that are already accessible
through the pull down menus
Not all operating environments have a
toolbar
6
SAS command bar


Command bar is a place where you can
type in SAS command.
Most commands you can type in the
command bar are accessible through
the SAS menus or the toolbar
7
Controlling your windows



The window pull-down menu
Type the name of the window in the
command bar
Click on the window
8
Basic Rules of SAS Codes







Every SAS statement ends with a semicolon ;
Lines of data are NOT separated by semicolons
SAS statements can extend over multiple lines
provided you do not split a word of the statement
across lines
More than one statement can appear on a single line
You can start statement anywhere within a line (not
recommended)
SAS is case insensitive
Words in SAS statement are separated by blanks
9
SAS Steps

Two main types of SAS steps:



Data Step: read in data, manipulate datasets etc.
PROC Step: perform statistical analyses etc.
DATA and PROC steps execute when



a RUN, QUIT, or CARDS statement is enters
Another DATA or PROC statement is entered
The ENDSAS statement is entered
10
SAS Comments

Two ways to comment:

/* …..comments…..*/

good for long documentation
 good for commenting out sections of code
*……comments……;




good for commenting out one line of code
only commented until first ‘;’
SAS Comments are green in (SAS steps are blue)
11
Example 1
/*Data instructor contains information of
several teachers*/
data instructor;
input name $ gender $ age;
cards;
Jane F 30
Mary F 29
Mike M 28
;
run;
Proc means;var age; run;
SAS Dataset



Basic structure: a rectangular matrix
Name Sex
Age
Observation 1
Jane F
30
Observation 2
Mary F
29
Observation 3
Mike M
28
Columns are variables
Rows are observations
SAS data type
(1) Numeric data: numbers
• Can be added and subtracted
• Can have decimal places
• Can be positive or negative
(2) Character data: contains letters,
numerals or special characters
14
SAS Dataset and variable names

Dataset name






Start with A-Z or underscore character _
Can contain only letters, numbers, or underscores
Can contain upper- and lowercase letters
choose names which are easy to be memorized
Can be greater than 8 characters in SAS 8.0+
Variable name: same rule as dataset name
Examples: valid SAS names






Parts
LastName
First_Name
_Null1_
X12
X1Y1
16
Examples: invalid SAS names





3Parts
Last Name
First-Name
_Null1$
Num%
17
Submitting a program in SAS
First, get your program into the editor
 Type your program in the editor
 Open an existing SAS program: use
open from the File full down menu or
use the open icon or just click your SAS
program directly
18
Submitting a program in SAS
Make your editor window active, and
submit your code by
 Submit Icon
 Enter submit in the command bar
 Select “submit” from the Run pull-down
menu
19
Submitting a program in SAS
Reading the SAS log window
 It starts with notes about the version of
SAS and your SAS site number
 Original SAS code with line numbers
added on the left
 Notes contains information about SAS
data set and computer resources used
20
Assessing errors in .log file




Non-error SAS messages begin with NOTE:
SAS error message begin with ERROR: or possibly
WARNING:
In data set creation NOTE’s are important to read
because they indicate if the data set was created
correctly. Many times there are no errors yet the
data set is not correct.
ERROR message sometimes give you hints about
options or keywords in DATA/PROC steps
21
The output window
Viewing results from the output window
 You can save and print contents in the
output window
 When you have a lot of output, one
easy way to find the specific output is
to use the list in the “results” window
22
Creating HTML output






Tools --- Options --Preferences
Click on the “Results” tab
Click the box next to “Create HTML”
Once turned on, results will be show in
the “Results Viewer” window
Results viewer window just show one
piece of output at a time
To turn off, just uncheck it
23
SAS Data Libraries




A SAS library is simply a location where SAS
data sets are stored
Explorer window, click on “libraries”, there
are at least three libraries: Sashelp, Sasuser
and work.
Sashelp and Sasuser contains information
that controls your SAS session.
Work is the default library, it is a temporary
storage location for SAS data sets.
24
Creating a new library


Make the “Active libraries” window
active (click Explorer, then click libraries)
Choose “New” from the File menu or
right click in the active libraries window
and choose “New” from the pop-up
menu
25
Creating a new library



Type the name of the library in the box after
name.
This name must be eight characters or fewer,
and contains only letters, numbers and
underscore.
In the path field, type in the complete path to
the folder or directory where you want to
save your data (or use Browse)
26
Creating a new library

Another way to create a new library is
to use the LIBNAME statement to
associate the library with a directory accessible
from your computer.

LIBNAME mylib ‘E:/’;
associates the directory h:/EPIB698A/week1 with
the name mylib. Mylib is known as a libref (a
library reference)
27
Temporary/permanent SAS datasets



Every SAS dataset is stored in a SAS data library.
By default all data sets created during a SAS session are
temporary data sets and are deleted when you close SAS.
 All data sets associated with the library WORK are
deleted at the end of the SAS session (they are
temporary).
A permanent data set is a data set that will not be deleted
when SAS is exited.
 To create a permanent data set, simply use a different
library name to create a data set.
To create Permanent SAS datasets

Code to create permanent SAS datasets
libname yourlib ‘E:/';
data yourlib.instructor;
input name $ sex $ age;
cards;
Mike
M 30
Wendy F 29
Jane
F 28
;
run;
To access Permanent SAS
datasets


When you start a new SAS session, the
permanent datasets can be accessed directly
using libref.
The name of the libref can be different from
the name you used when creating the
permanent data set.
libname mylib ‘E:\';
proc print data=mylib.instructor;
run;
30
Viewing SAS data with SAS
Explorer




Click the libraries icon in the Explorer
window
Click the library you want to see
Click the data name to open a SAS data
To go back to the previous window
within Explorer, choose “up one level”
from the view menu, or click the up one
level button on the toolbar
31
Listing the properties of a SAS
data set



Right click the SAS data icon
Select “Properties” from the pop up
menu
If choose columns, SAS displays
information about the columns (or
variables) in the data set.
32
PROC contents



PROC contents prints the descriptive information about
the data set and the variables in the data set
 Data set information: name, number of observations,
number of variables, and date created
 Variable information: name, internal order, type,
length, format/informat, and label
Very useful for snapshot a data set
Syntax:
proc contents data=data_set_name;
run;
TITLES






Titles are descriptive headers SAS places at the top of
each page of the OUT window.
A title is set with the TITLE statement followed by a
string of character.
The string must be enclosed in single or double quotes.
The maximum length for a string is 200 characters.
If you want multiple line titles you can use the TITLE
statement where the word title is followed by a number:
title1 ‘EPIB 698A';
title2 'week1';
To clear the title setting simply execute
title;
PROC print



The PRINT procedure prints the observations in a SAS data set
to the output window.
Features:

Autoformatting

columns labeled with variable names or labels

automatic accumulation and printing of subtotals and totals
Syntax:
proc print data=data_set_name options;
var var1 var2 var3 var4;
run;
Order
35
PROC print (cont.)

The var statement




The var statement is used to specify the variables
to process in a proc step. Not unique to proc
print.
Variables are usually processed in the order listed
in the var statement.
Only applies to a local proc step (not global)
If no var statement is used, generally the
procedure will process all the variables (or all the
numeric variables if a calculation is performed).
36
PROC print (cont.)


Useful options with PROC print:

double: double spaces the output

noobs: suppresses observation numbers

label: uses variable labels as column headings
added statements for use in PROC print:

sums variables at bottom of output:
sum variable_list;
37
Import/Export Data


To Export SAS datasets
 Go to the File menu and select “Export Data”
 Choose the data file ( from the library Work)
 Locate and select file type using the browse button
 Save the data set and finish
 Check the log to make sure the data set was created
 This method does not require a data step, but any
modification may require a data step
 Convenient for Excel file
Import a SAS data set follows similar step
38
Home gardener's data
DATA homegarden;
INFILE ‘E:\Garden.txt';
INPUT Name $ 1-7 Tomato Zucchini Peas grapes;
group = 14;
Type = 'home';
Zucchini_1= Zucchini * 10;
Total=tomato + zucchini_1 + peas + grapes;
PerTom = (Tomato / Total) * 100;
Run;
39
Modifying a data set with the SET
statement
The SET statement
 The SET statement in the data step allows you to
read a SAS data set so that you can add new
variables, create a subset, or modify the data set
 The SET statement brings a SAS data set, one
observation at a time, in to a data step for processing
 Syntax:
Data new-data-set ;
Set data set;
40
Modifying a data set with the SET
statement
Data new;
input x y ;
cards;
1 2
3 4
;
run;
Data new1;
set new;
z=x+y;
run;
41
GPLOT


The GPLOT procedure plots the values of two or more variables
on a set of coordinate axes (X and Y).
The procedure produces a variety of two-dimensional graphs
including
 simple scatter plots
 overlay plots in which multiple sets of data points display on
one set of axes
Procedure Syntax: PROC GPLOT
PROC GPLOT;
PLOT y*x </option(s)>;
run;

Example: plot of systolic blood pressure (SBP) by
diastolic blood pressure (DBP)
title "Scatter Plot of SBP by DBP";
proc gplot data=d.clinic;
plot SBP * DBP;
run;
Plots Exponential Survival Function
data exponential_survival_function;
lambda1 = .2;
lambda2 = .002;
lambda3 = .1;
x=0;
do x = 0.01 to 15 by .01;
S_x_1 = exp(-lambda1*x);
S_x_2 = exp(-lambda2*x);
S_x_3 = exp(-lambda3*x);
output;
end;
run;
data linetext(drop=S_x_1 S_x_2 S_x_3 lambda1 lambda2 lambda3);
/*Creating variables "function"=label,xsys ='2', ysys ='2', hsys= '3', position='6' and size='2'(size of text) */
retain function 'label' xsys ysys '2' hsys '3' position '6' size 1.9;
set exponential_survival_function end=last;
style = "'Albany AMT/bold'";
/*Setting the variable y equal to the last values for each S_x_lambda and the text corresponding to each lambda */
if last then do;
y=S_x_1-.01; text='lambda=.2 '; output;
y=S_x_2-.01; text='lambda=.002'; output;
y=S_x_3-.01; text='lambda=.1'; output;
end;
x=x-.5;
run;
/* Add a title to the graph */
title1 'Exponential Survival Function ';
/* Create axis definitions */
/*offset = extra space for the labels of the curves*/
axis1 offset=(1,10) label=('X (time)');
axis2 label=('Suvival Function');
/* Produce the plot */
proc gplot data=exponential_survival_function;
plot (S_x_1 S_x_2 S_x_3)*x / overlay annotate=linetext haxis=axis1 vaxis=axis2;
run;
quit;
44
45
Plots Weibull Survival Function
data weibull_survival_function;
lambda1 = .2;
alpha1= .5;
lambda2 = .1;
alpha2=1.0;
lambda3 = .002;
alpha3=3.0;
x=0;
do x = 0.01 to 15 by .01;
S_x_1 = exp(-lambda1*x**alpha1);
S_x_2 = exp(-lambda2*x**alpha2);
S_x_3 = exp(-lambda3*x**alpha3);
output;
end;
run;
data linetext(drop=S_x_1 S_x_2 S_x_3 lambda1 lambda2 lambda3);
/*Creating variables "function"=label,xsys ='2', ysys ='2', hsys= '3', position='6' and size='2'(size of text) */
retain function 'label' xsys ysys '2' hsys '3' position '6' size 2;
set weibull_survival_function end=last;
style = "'Albany AMT/bold'";
/*Setting the variable y equal to the last values for each S_x_lambda and the text corresponding to each lambda */
if last then do;
y=S_x_1; text='lambda=.2, alpha=.50 '; output;
y=S_x_2; text='lambda=.1, alpha=1.0 '; output;
y=S_x_3; text='lambda=.002, alpha=3.0 '; output;
end;
run;
/* Add a title to the graph */
title1 'Weibull Survival Function ';
/* Create axis definitions */
/*offset = extra space for the labels of the curves*/
axis1 offset=(1,20) label=('X (time)');
axis2 label=('Suvival Function');
/* Produce the plot */
proc gplot data=weibull_survival_function;
plot (S_x_1 S_x_2 S_x_3)*x / overlay annotate=linetext haxis=axis1 vaxis=axis2;
run;
quit;
46
47
Plots Exponential Hazard Function
The hazard function is constant when the survival time
is exponentially distributed
48
Plots Weibull Hazard Function
data weibull_hazard_function;
lambda1 = .2;
alpha1= .5;
lambda2 = .1;
alpha2=1.0;
lambda3 = .002;
alpha3=3.0;
x=0;
do x = 0.01 to 15 by .01;
h_x_1 = lambda1*alpha1*(x**(alpha1-1));
h_x_2 = lambda2*alpha2*(x**(alpha2-1));
h_x_3 = lambda3*alpha3*(x**(alpha3-1));
output;
end;
run;
data linetext(drop=h_x_1 h_x_2 h_x_3 lambda1 lambda2 lambda3);
/*Creating variables "function"=label,xsys ='2', ysys ='2', hsys= '3', position='6' and size='2'(size of text) */
retain function 'label' xsys ysys '2' hsys '3' position '6' size 2;
set weibull_hazard_function end=last;
style = "'Albany AMT/bold'";
/*Setting the variable y equal to the last values for each S_x_lambda and the text corresponding to each lambda */
if last then do;
y=h_x_1; text='lambda=.2, alpha=.50 '; output;
y=h_x_2; text='lambda=.1, alpha=1.0 '; output;
y=h_x_3; text='lambda=.002, alpha=3.0 '; output;
end;
run;
/* Add a title to the graph */
title1 'Weibull hazard Function ';
/* Create axis definitions */
/*offset = extra space for the labels of the curves*/
axis1 offset=(1,20) label=('X (time)');
axis2 label=('Hazard Function');
/* Produce the plot */
proc gplot data=weibull_hazard_function;
plot (h_x_1 h_x_2 h_x_3)*x / overlay annotate=linetext haxis=axis1 vaxis=axis2;
run;
quit;
49
50
Checking Exponential Distribution
data exponential_survival_function;
set exponential_survival_function;
log_S_x=log(S_x_1);
run;
Add this code to the
program in slide 44
/* Add a title to the graph */
title1 'Check for Exponential Distribution Function ';
/* Create axis definitions */
/*offset = extra space for the labels of the curves*/
axis1 label=('X (time)');
axis2 label=('ln[S(x)]');
proc gplot data=exponential_survival_function;
plot log_S_x *x /haxis=axis1 vaxis=axis2;
run;
quit;
51
52
Checking Exponential Distribution
data weibull_survival_function;
set weibull_survival_function;
log_S_x=log(-1*(log(S_x_1)));
log_x=log(x);
run;
Add this code to the
program in slide 46
/* Add a title to the graph */
title1 'Check for Weibull Distribution Function ';
/* Create axis definitions */
/*offset = extra space for the labels of the curves*/
axis1 label=('log(X) (time)');
axis2 label=('ln[-ln[S(x)]]');
proc gplot data=weibull_survival_function;
plot log_S_x *log_x /haxis=axis1 vaxis=axis2;;
run;
quit;
53
54
Lab with SAS


Regents Drive Garage (Building #202) in Room
0504. The lab is open 24 hours, 7 days per
week: http://www.oit.umd.edu/as/cl/
Securing SAS outside the classroom



Labs (http://www.oit.umd.edu/as/cl/)
Desktop version from departments
Room 1304 SPH Building
55
Download