SAS Workshop Iowa State University Introduction to SAS Programming

advertisement
SAS Workshop
Introduction to SAS
Programming
Day 1 SessioN II
Iowa State University
May 9, 2016
Missing Values
A data value not present for a variable in a particular
observation, is considered missing.
 A missing character value is stored/displayed as a blank.
 A missing numeric value is stored/displayed as a period.
.
Example A3
data oranges;
input Variety $ Flavor Texture Looks;
Rating=(Flavor+Texture+Looks)/3;
Average=mean(Flavor,Texture,Looks);
datalines;
navel 9 8 6
temple 7 . 7
valencia 8 9 9
mandarin 5 7 8
;
proc sort data=oranges;
by descending Total;
run;
proc print data=oranges;
title 'Taste Test Results for Oranges';
run;
SAS Data Step
 Begins with the statement
 DATA name ;
 Followed by one of these statements:
 INPUT
 SET
;
;
 A SAS data step is (usually) used to create a new SAS data
set from
 external data (using an INPUT statement)
 another SAS data set (using a SET statement)
 SAS program statements are used in a SAS data step to
modify input data, if necessary
SAS Data Step: Flow of Operations
Start
SAS
returns
for a
new line
of data
SAS reads a line
of data
SAS carries out program
statements for values in
this data line and creates
a new observation
SAS adds this
observation to the
SAS data set
If no more lines of data to
input SAS closes the data set
and goes on to next DATA or
PROC statement
SAS Data Step: Flow of Operations
data oranges;
input Variety $ Flavor Texture Looks;
Total=Flavor+Texture+Looks;
datalines;
navel 9 8 6
Variety
Flavor Texture
temple 7 7 7
navel
9
8
valencia 8 9 9
temple
7
7
mandarin 5 7 8
;
Looks
6
7
Total
23
21
Some Additional Details
 The data step provides a wide range of capabilities, in addition to






accessing data from external sources.
In a data step, you may transform or create new variables, create
subsets of observations or combine data from several other SAS
data sets.
As you saw the data step actually functions as a loop, whose
statements will be executed for each line of data.
In each iteration of the loop, the data step starts with a vector of
missing values for all the variables that will be in the new
observation.
It then replaces the missing value for each variable by either an
input data value or a value created by a data step statement.
Finally, it writes the new observation to the SAS data set (as a new
record in a data file on disk).
The SAS data set is usually written as a temporary file (if not
specifically asked to be permanently saved in one of your folders).
SAS Program Statements
Y1 = X1+X2**2;
Y2 = ABS(X3) ;
Y3 = SQRT(X4 + 4.0*X5**2) −X6;
X7=3.14156*log(X7);
IF INCOME = . THEN DELETE ;
IF STATE = ‘CA’ | STATE =‘OR’ THEN
REGION = ‘PACIFIC COAST’ ;
IF SCORE < 0 THEN SCORE = 0;
IF SCORE < 80 THEN WEIGHT=.67;
ELSE WEIGHT=.75;
WEIGHT = (SCORE < 80 ) * .67 + (SCORE >=80) * .75;
SAS Program Statements
IF SCORE < 80 THEN DO;
WEIGHT =0.67;
RATE=5.70;
END;
ELSE DO;
WEIGHT =0.75;
RATE=6.50;
END;
DATA ;
INPUT X 1 − X 5 ;
X6 = (X 4+X 5) / 2 ;
DROP X 4 X 5 ;
DATALINES ;

Order of Evaluating Expressions
 Rule 1: Expressions within parenthesis are evaluated first
 Rule 2: Higher priority operators are performed first
Group I **, + (prefix), − (prefix), ^(NOT), ><, <>
Group II *, /
Group III + (infix), −(infix)
Group IV | |
Group V <, <=, =, ^=, >=, >, ^>, ^<
Group VI & (AND)
Group VII | (OR)
 Rule 3: For operators with the same priority, the operations
take place from left to right of the expression (except for
Group I operators, which are executed right to left.)
Example A4
data two;
input X1-X3;
X3= 3*X3-X1**2;
X4=sqrt(X2)+1;
drop X1 X2;
datalines;
345
-2 9 3
. 16 8
-3 1 4
;
proc print data=two;
title “SAS Data Step Programming";
run;
Example A5
data group1;
input Age @@;
datalines;
1 3 7 9 12 17 21 26 30 32 36 42 45 51
;
data group2;
set group1;
if 0<=Age<10 then Agegroup=0;
else if 10<=Age<20 then Agegroup=10;
else if 20<=Age<30 then Agegroup=20;
else if 30<=Age<40 then Agegroup=30;
else if 40<=Age<50 then Agegroup=40;
else if Age >=50 then Agegroup=50;
run;
proc print;run;
data group3;
set group1;
Agegroup=int(Age/10)*10;
run;
proc print; run;
Some Additional Details
 If do not use a name on the data statement, SAS will





create default data set names of the form data1 and so on.
The input statement is used to access data from lines
contained in your SAS program or from an external source.
The datalines; statement is used to precede the data
inserted in your program (called in-stream data).
The infile statement names an external file (or fileref that
refers to an external file) from which to access the data.
Most commonly, an external source is just a text-file that
contains the data lines as if they appear in-stream.
The simplest form of an infile statement is:
infile “C:\kevinw\stat401\mydata.txt“;
SAS Functions
 A SAS function is internal code that returns a value
that is determined from specified arguments.
 Usage: function-name(argument1,argument2, . . .)
 Examples:
date=mdy(month,day,year)
ave=mean(flavor,texture,looks)
id=substr(item,1,2)
 SAS functions can do the following:
• perform arithmetic operations
• compute sample statistics (for example: sum, mean,
and standard deviation)
• manipulate SAS dates
• process character values
• perform many other tasks
Simple INPUT Statements
 List Input
INPUT
1342
ID SEX $ AGE WEIGHT ;
F
27 121.2
INPUT
63.1
SCORE1-SCORE4 ;
94 87.5 72
 Formatted Input
INPUT ID 4. STATE $2. FERT 5.2 PERCENT 3.2 ;
0001IA_ _504089
INPUT @10 ITEM $4. +5 PRICE 6.2;
xxxxxxxxxR2D2xxxxx_91350
INPUT (ID SEX AGE WT HT) (3. $1. 2. 2*5.1);
123M21_1650_ _721
The general form of the Informats we used above:
w.
$w.
w.d
4.
$2.
5.2
Examples:
 Column Input
INPUT ID 1-4 STATE $ 5-6 FERT 7-12 PERCENT 13-15 .2;
0001IAbb5.04b89
Example A6
data biology;
input Id Sex $ Age Year Height Weight;
datalines;
7389
M 24 4 69.2 132.5
3945
F 19 2 58.5 112.0
4721
F 20 2 65.3
98.6
1835
F 24 4 62.8 102.5
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
8472
M 21 2 76.5 205.1
6327
M 20 1 70.2 135.4
8472
F 20 4 66.8 142.6
4875
M 20 1 74.2 160.4
;
proc print data=biology;
var Height Weight;
id Id;
title "Biology class: Analysis of Height and Weights";
run;
Example A7
data first ;
input @4 Income 5.2 Tax 5.2 Age 2. St $2.;
State=stnamel(St);
datalines;
123546750346535IA
234765480895645IN
348578650595431NH
. . . . . . . . .
. . . . . . . . .
345678560912728LA
346685960675138IA
546825750562527WV
;
proc print data=first;
format Income Tax dollar8.2;
var Income Tax Age State;
title "SAS Listing of Tax data";
run;
Example A8
data first ;
input (Income Tax Age St)(@4 2*5.2 2. $2.);
State=stnamel(St);
datalines;
123546750346535IA
234765480895645IN
348578650595431NH
. . . . . . . . .
. . . . . . . . .
345678560912728LA
346685960675138IA
546825750562527WV
;
proc print data=first;
format Income Tax dollar8.2;
var Income Tax Age State;
title "SAS Listing of Tax data";
run;
Some Additional Details
 List input style: data fields are separated by at least one





blank. List the names of the variables, follow the name
with a dollar sign ($) for character data.
Column input style: follow the variable name (and $ for
character) with start_column – end_column.
Formatted input: data field must be in specific columns.
Follow the variable name with a SAS informat.
Examples of informats: $10. (to read a 10 column
character string), 6.2 (6 column numeric with 2 decimals)
For formatted input, the next data value is read from the
column immediately after the previous value.
For list input, the next data value is read from the next
non-blank column after the previous value.
Modifiers to Input statement
 @column: moves to read data from the named column.
 +number: move this number columns forward.
 trailing @@: hold the current data line so more data can




be read from it in the following iterations of the loop.
/number: jump to the next line of data to access more
data.
#number: jump to this line number in the data to access
more data.
trailing @: hold the current line to allow other input
statements to access data from the same line.
The @, + and # specifications can all be followed by a
variable name instead of a number.
Download