SAS Workshop Introduction to SAS Programming Day 1 Session Iii Iowa State University May 9, 2016 Repetitive Computation Repetitive computation is achieved through the use of do loops. In the SAS data step language, several forms of do statements are available. An iterative do loop, in general, is used to perform the same computation on a sequence of variables. This requires the sequence of variables to be defined as elements of an array. The array statement allows the user to reference the variables using the matching array elements using subscripts. The use of iterative do loops and the array statement in the data step are illustrated in Examples A9 - A12 Processing a Sequence of Variables Example A9 data compete; input Red Blue Grey Green White; array grade8(5) Red Blue Grey Green White; Total=0; drop Team; do Team=1 to 5; if grade8(Team)=. then grade8(Team)=0; grade8(Team)= grade8(Team)*10; Total + grade8(Team); end; datalines; 4 6 0 1 . 3 2 8 9 12 5 . 4 7 6 7 5 10 4 5 ; proc print; run; Writing observations into the data set In the execution of a SAS data step, the statements in a data step are executed and an observation is written to the output SAS data set, for every line of input data. However, the user can insert an output statement in the data step at the any point where he/she wishes to write a new observation to the SAS data set. When an output statement is encountered, SAS writes a new observation to the SAS data set containing the current values of the variables. In Example A10, we create data values in the data step internally (i.e. no external data are input), by doing a calculation and use the output statement to write the data as new observations. We use an iterative do loop to do the calculation and the write the results as an observation into a data set repeatedly. Writing Observations into a SAS Dataset Example A10 data convert; do Celsius= -10 to 40 by 5; Fahrenheit=9*Celsius/5+32; output; end; run; proc print data=convert; title "Celsius to Fahrenheit Converter"; run; More uses of do loops and arrays In Example A4 we used do loops and arrays to change or transform data values in a data set. By inserting an output statement inside a do loop we can form multiple observations from data values in a single data line. This gives us a useful method perform an operation called transposing. Transposing is using data lines in input data to form columns (or variables) in a SAS data set. In Example A11, the data values in each data line (Quiz scores) form values of the variable called Score in the output data set. The value of the variable Name remain the same for each of the values in the same data line. More Examples of do loop and array Example A11 data quizzes; input Name $ Quiz1-Quiz5; array scores (5) Quiz1-Quiz5; drop Quiz1-Quiz5; do Test= 1 to 5; if scores(Test)=. then scores(Test)= 0 ; Score = scores(Test); output; end; datalines; Smith 8 7 9 . 3 Jones 4 5 10 8 4 ; proc print data=quizzes; run; More uses of do loops and arrays In Example A12 we use the method we discussed in the previous example to read in a data set At the same time, we convert it to a form suitable to be input to proc anova or proc glm etc. Most of the time, this is done in practice by reading in a single value per data line. The method we use is more intiutive because the data appears in the data lines in the the same form they would appear in a data table. Notice that in this example we have two do loops, an inner do loop nested within an outer do loop. For the values of the subscripts we use the actual values of the corresponding variables Amount and Concentration. Example A12 data reaction; length Concentration $4; do Amount =.9 to .6 by -.1; do Concentration = '1%' , '1.5%' , '2%' , '2.5%' , '3%' ; input Time @@; output; end; end; datalines; 10.9 11.5 9.8 12.7 10.6 9.2 10.3 9.0 10.6 9.4 8.7 9.7 8.2 9.4 8.5 7.2 8.6 7.5 9.7 7.7 ; proc print; title 'Reaction times for biological substrate'; run; Additional Notes on Arrays Arrays are used for repetitive processing of variables The array statement can be used to perform the same task on a group of variables. array array-name (subscript) <variable-list> <(initial- values)>; You can then use the array name with parentheses and a subscript as in the examples. Notes: 1. 2. 3. 4. All the variables in an array must be of the same type. An array cannot have the same name as a variable name. Subscript may be a number giving the dimension size or a range of subscripts such as 1:5 If an asterisk (*) is used as the subscript, SAS will determine the dimension size by counting the variables in the list.