SAS Workshop Introduction to SAS Programming Day 1 SessioN II Iowa State University May 9, 2016 Missing Values A data value not present for a variable in a particular observation, is considered missing. A missing character value is stored/displayed as a blank. A missing numeric value is stored/displayed as a period. . Example A3 data oranges; input Variety $ Flavor Texture Looks; Rating=(Flavor+Texture+Looks)/3; Average=mean(Flavor,Texture,Looks); datalines; navel 9 8 6 temple 7 . 7 valencia 8 9 9 mandarin 5 7 8 ; proc sort data=oranges; by descending Total; run; proc print data=oranges; title 'Taste Test Results for Oranges'; run; SAS Data Step Begins with the statement DATA name ; Followed by one of these statements: INPUT SET ; ; A SAS data step is (usually) used to create a new SAS data set from external data (using an INPUT statement) another SAS data set (using a SET statement) SAS program statements are used in a SAS data step to modify input data, if necessary SAS Data Step: Flow of Operations Start SAS returns for a new line of data SAS reads a line of data SAS carries out program statements for values in this data line and creates a new observation SAS adds this observation to the SAS data set If no more lines of data to input SAS closes the data set and goes on to next DATA or PROC statement SAS Data Step: Flow of Operations data oranges; input Variety $ Flavor Texture Looks; Total=Flavor+Texture+Looks; datalines; navel 9 8 6 Variety Flavor Texture temple 7 7 7 navel 9 8 valencia 8 9 9 temple 7 7 mandarin 5 7 8 ; Looks 6 7 Total 23 21 Some Additional Details The data step provides a wide range of capabilities, in addition to accessing data from external sources. In a data step, you may transform or create new variables, create subsets of observations or combine data from several other SAS data sets. As you saw the data step actually functions as a loop, whose statements will be executed for each line of data. In each iteration of the loop, the data step starts with a vector of missing values for all the variables that will be in the new observation. It then replaces the missing value for each variable by either an input data value or a value created by a data step statement. Finally, it writes the new observation to the SAS data set (as a new record in a data file on disk). The SAS data set is usually written as a temporary file (if not specifically asked to be permanently saved in one of your folders). SAS Program Statements Y1 = X1+X2**2; Y2 = ABS(X3) ; Y3 = SQRT(X4 + 4.0*X5**2) −X6; X7=3.14156*log(X7); IF INCOME = . THEN DELETE ; IF STATE = ‘CA’ | STATE =‘OR’ THEN REGION = ‘PACIFIC COAST’ ; IF SCORE < 0 THEN SCORE = 0; IF SCORE < 80 THEN WEIGHT=.67; ELSE WEIGHT=.75; WEIGHT = (SCORE < 80 ) * .67 + (SCORE >=80) * .75; SAS Program Statements IF SCORE < 80 THEN DO; WEIGHT =0.67; RATE=5.70; END; ELSE DO; WEIGHT =0.75; RATE=6.50; END; DATA ; INPUT X 1 − X 5 ; X6 = (X 4+X 5) / 2 ; DROP X 4 X 5 ; DATALINES ; Order of Evaluating Expressions Rule 1: Expressions within parenthesis are evaluated first Rule 2: Higher priority operators are performed first Group I **, + (prefix), − (prefix), ^(NOT), ><, <> Group II *, / Group III + (infix), −(infix) Group IV | | Group V <, <=, =, ^=, >=, >, ^>, ^< Group VI & (AND) Group VII | (OR) Rule 3: For operators with the same priority, the operations take place from left to right of the expression (except for Group I operators, which are executed right to left.) Example A4 data two; input X1-X3; X3= 3*X3-X1**2; X4=sqrt(X2)+1; drop X1 X2; datalines; 345 -2 9 3 . 16 8 -3 1 4 ; proc print data=two; title “SAS Data Step Programming"; run; Example A5 data group1; input Age @@; datalines; 1 3 7 9 12 17 21 26 30 32 36 42 45 51 ; data group2; set group1; if 0<=Age<10 then Agegroup=0; else if 10<=Age<20 then Agegroup=10; else if 20<=Age<30 then Agegroup=20; else if 30<=Age<40 then Agegroup=30; else if 40<=Age<50 then Agegroup=40; else if Age >=50 then Agegroup=50; run; proc print;run; data group3; set group1; Agegroup=int(Age/10)*10; run; proc print; run; Some Additional Details If do not use a name on the data statement, SAS will create default data set names of the form data1 and so on. The input statement is used to access data from lines contained in your SAS program or from an external source. The datalines; statement is used to precede the data inserted in your program (called in-stream data). The infile statement names an external file (or fileref that refers to an external file) from which to access the data. Most commonly, an external source is just a text-file that contains the data lines as if they appear in-stream. The simplest form of an infile statement is: infile “C:\kevinw\stat401\mydata.txt“; SAS Functions A SAS function is internal code that returns a value that is determined from specified arguments. Usage: function-name(argument1,argument2, . . .) Examples: date=mdy(month,day,year) ave=mean(flavor,texture,looks) id=substr(item,1,2) SAS functions can do the following: • perform arithmetic operations • compute sample statistics (for example: sum, mean, and standard deviation) • manipulate SAS dates • process character values • perform many other tasks Simple INPUT Statements List Input INPUT 1342 ID SEX $ AGE WEIGHT ; F 27 121.2 INPUT 63.1 SCORE1-SCORE4 ; 94 87.5 72 Formatted Input INPUT ID 4. STATE $2. FERT 5.2 PERCENT 3.2 ; 0001IA_ _504089 INPUT @10 ITEM $4. +5 PRICE 6.2; xxxxxxxxxR2D2xxxxx_91350 INPUT (ID SEX AGE WT HT) (3. $1. 2. 2*5.1); 123M21_1650_ _721 The general form of the Informats we used above: w. $w. w.d 4. $2. 5.2 Examples: Column Input INPUT ID 1-4 STATE $ 5-6 FERT 7-12 PERCENT 13-15 .2; 0001IAbb5.04b89 Example A6 data biology; input Id Sex $ Age Year Height Weight; datalines; 7389 M 24 4 69.2 132.5 3945 F 19 2 58.5 112.0 4721 F 20 2 65.3 98.6 1835 F 24 4 62.8 102.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8472 M 21 2 76.5 205.1 6327 M 20 1 70.2 135.4 8472 F 20 4 66.8 142.6 4875 M 20 1 74.2 160.4 ; proc print data=biology; var Height Weight; id Id; title "Biology class: Analysis of Height and Weights"; run; Example A7 data first ; input @4 Income 5.2 Tax 5.2 Age 2. St $2.; State=stnamel(St); datalines; 123546750346535IA 234765480895645IN 348578650595431NH . . . . . . . . . . . . . . . . . . 345678560912728LA 346685960675138IA 546825750562527WV ; proc print data=first; format Income Tax dollar8.2; var Income Tax Age State; title "SAS Listing of Tax data"; run; Example A8 data first ; input (Income Tax Age St)(@4 2*5.2 2. $2.); State=stnamel(St); datalines; 123546750346535IA 234765480895645IN 348578650595431NH . . . . . . . . . . . . . . . . . . 345678560912728LA 346685960675138IA 546825750562527WV ; proc print data=first; format Income Tax dollar8.2; var Income Tax Age State; title "SAS Listing of Tax data"; run; Some Additional Details List input style: data fields are separated by at least one blank. List the names of the variables, follow the name with a dollar sign ($) for character data. Column input style: follow the variable name (and $ for character) with start_column – end_column. Formatted input: data field must be in specific columns. Follow the variable name with a SAS informat. Examples of informats: $10. (to read a 10 column character string), 6.2 (6 column numeric with 2 decimals) For formatted input, the next data value is read from the column immediately after the previous value. For list input, the next data value is read from the next non-blank column after the previous value. Modifiers to Input statement @column: moves to read data from the named column. +number: move this number columns forward. trailing @@: hold the current data line so more data can be read from it in the following iterations of the loop. /number: jump to the next line of data to access more data. #number: jump to this line number in the data to access more data. trailing @: hold the current line to allow other input statements to access data from the same line. The @, + and # specifications can all be followed by a variable name instead of a number.