Chapter 3: Working With Your Data **SAS Alert** In this chapter, we will see some examples of SAS at its worst. The “implicit” looping of the DATA step makes several operations that would be easy in R difficult or clumsy in SAS © Fall 2011 John Grego and the University of South Carolina 1 Working With Your Data Creating variables based on other variables is easily done within the data step Assignment is carried out with the = sign input var1 var2 var3; mysum=var1+var2+va r3; mycube=var3**3; always6=6; newname=var2; 2 Working With Your Data Order of operations is followed, but use parentheses when necessary and for clarity Previously defined variables can be overwritten var1=var1-15; 3 Working with your data In addition to simple math expressions, you can use built-in SAS functions to create variables Section 3.3 (pages 78-81) lists many builtin functions 4 Working with your data Some of the most useful: LOG, MEAN, ROUND, SUM, TRIM, UPCASE, SUBSTR, CAT, COMPRESS, DAY, MONTH Note: MEAN takes the mean of several variables, not the mean of all values of one variable. Similarly with SUM, etc. They are row operators, not column operators. 5 Using IF-THEN Statements Conditional statements in SAS rely on several important keywords like IF, THEN and ELSE and logical keywords like EQ,NE,GT,LT,GE,LE,IN,AND,OR All of these have symbolic equivalents (see page 82) IN: Checks whether a variable value occurs in a specified list 6 Using IF-THEN Statements An IF-THEN statement is a simple conditional statement, usually resulting in only one action, unless the keywords DO and END are specified (like braces in R) IF X>0 AND X<2 THEN Y=X; ELSE Y=2-X; 7 Using IF-THEN Statements Several conditions may be checked using ELSE IF or ELSE statements The last action is carried out if none of the previous conditions are true IF .. THEN ..; ELSE IF .. THEN ..; ELSE ..; 8 Using IF-THEN Statements Using several ELSE statements is more efficient than using several IF-THEN statements (though errors in logic are more likely) Note: Parentheses may be useful with AND/OR statements. 9 Using IF-THEN Statements Be careful with missing values when making comparisons! SAS considers missing values to be “less than” practically any value, so if data contains missing values, handle them separately IF weight=. THEN size=‘unknown’; ELSE IF weight<25 THEN size=‘small’; ELSE IF .. 10 Using IF to select a subset of data We can retain cases from the data using logical operators with a subsetting IF. The syntax is unusual; it seems as though part of the statement is missing DATA B; SET A; IF type=‘Pine’; 11 Using IF to select a subset of data Data set B will then include only the cases that match the condition Most people get used to this odd syntax, but you can include the KEEP statement if this makes you really uncomfortable 12 SAS Dates SAS stores dates internally as the number of days since January 1, 1960 Special informats for reading dates (pp. 44-45) When a year is specified by two digits (‘03, ‘45, etc.), you can use YEARCUTOFF to specify the century 13 SAS Dates The default is 1920; SAS assumes dates range from 1920 to 2019. It’s better to simply avoid this ambiguity options yearcutoff=1930; options yearcutoff=1800; 14 SAS Dates Handy function: TODAY() is set to the current date Special date form for logical operators if birthdate>’01JAN 1988’d then age=‘Under 21; 15 SAS Dates Printing dates in a conventional format: Use FORMAT command in PROC PRINT; We can also output data using date formats (pp. 90-91) Other useful functions: MONTH(),DAY(),YEAR(),MDY(),QTR() 16 RETAIN statement The RETAIN statement tells SAS to retain the value of a variable as SAS moves from observation to observation A clumsy solution to a common SAS problem Can be useful for cumulative analyses A sum statement creates a cumulative sum: cumul_sum+value_added; 17 Using arrays We have seen how to alter variables that have been read into a SAS data set Sometimes we want to operate on more than one variable in the same way This can be accomplished quickly by creating an array (another example of an awkward solution to a common problem in SAS) 18 Using arrays An array is a group of variables (either all numeric or all character) These could be already-existing variables or new ones 19 Defining an array Once an array is defined, you can refer to its variables using “subscripts”. Ex: array_name(2) Using a DO statement is clunky—I like to use DO OVER instead ARRAY array_name (n) $ .. .. ..; 20 Shortcuts for lists of variables Suppose variable names begin with a common character string, and end with a number sequence: var1, var2, var3, var4 You can refer to them in shortcut fashion: var1-var4 21 Shortcuts for lists of variables When specifying abbreviated lists in functions, you must use the keyword OF sum(of var1-var4); mean(of var1var7); 22 Shortcuts for lists of variables You can abbreviate lists of named variables using a double hyphen: firstvar—lastvar These must follow the creation order of the variables as defined in the SAS data sheet. Check this either in the worksheet or by entering: proc contents data=dsname position; 23 Special variables _ALL_ is short for “all variables in the data set” _NUMERIC_ is short for “all numeric variables in the data set” _CHARACTER_ is short for “all character variables in the data set” _N_ is short for “current observation index in the data set 24