Stat 579: Grouping, Loops and Conditional Execution Ranjan Maitra 2220 Snedecor Hall Department of Statistics Iowa State University. Phone: 515-294-7757 maitra@iastate.edu , 1/16 Grouped expressions R is an expression language in the sense that its only command type is a function or expression which returns a result. Even an assignment is an expression whose result is the value assigned, and it may be used wherever any expression may be used; in particular multiple assignments are possible. Commands may be grouped together in braces, {expr 1; ...; expr m}, in which case the value of the group is the result of the last expression in the group evaluated. Since such a group is also an expression it may, for example, be itself included in parentheses and used a part of an even larger expression, and so on. We will now discuss control statements and how to handle them in R. , 2/16 Conditional execution: if statements The language has available a conditional construction of the form > if (expr 1) expr 2 else expr 3 where expr 1 must evaluate to a single logical value and the result of the entire expression is then evident. The short-circuit operators && and || are often used as part of the condition in an if statement. Whereas & and | apply element-wise to vectors, && and || apply to vectors of length one, and only evaluate their second argument if necessary. There is a vectorized version of the if/else construct, the ifelse function. This has the form ifelse(condition, a, b) and returns a vector of the length of its longest argument, with elements a[i] if condition[i] is true, otherwise b[i]. , 3/16 Repetitive execution: for loops There is also a for loop construction which has the form > for (name in expr 1) expr 2 where name is the loop variable. expr 1 is a vector expression, (often a sequence like 1:20), and expr 2 is often a grouped expression with its sub-expressions written in terms of the dummy name. expr 2 is repeatedly evaluated as name ranges through the values in the vector result of expr 1. As an example, suppose ind is a vector of class indicators and we wish to produce separate plots of y versus x within classes. One possibility here is to use coplot() which will produce an array of plots corresponding to each level of the factor. Another way to do this, now putting all plots on the one display, is as follows: , > xc <- split(x, ind) > yc <- split(y, ind) > for (i in 1:length(yc)) plot(xc[[i]], yc[[i]]); abline(lsfit(xc[[i]], yc[[i]])) 4/16 for loops (continued) Note the function split() which produces a list of vectors obtained by splitting a larger vector according to the classes specified by a factor. This is a useful function, mostly used in connection with boxplots. See the help facility for further details. Warning: Note that for() loops should be used in R code much less often than in compiled languages. Code that takes a whole object view is likely to be both clearer and faster in R. Other looping facilities include the > repeat expr > while (condition) expr statement. The break statement can be used to terminate any loop, possibly abnormally. This is the only way to terminate repeat loops. The next statement can be used to discontinue one particular cycle and skip to the “next”. , 5/16 Conditional computation in R The basic control structure available in R for conditional computation is of the form if (cond) expr 1 else expr 2 where cond is an expression that evaluates to a logical value, expr 1 is an R expression that will be executed if the value of cond is TRUE; expr 2, is an R expression that will be executed if the value of cond is FALSE. Both expr 1 and expr 2 may be either simple R expressions or compound R expressions. A compound R expression consists of a group of simple R expressions enclosed in braces. Note that several R expressions may be entered on the same line as long as they are separated by semi-colons. Some examples are: > a <- 25;b <- 50 > if (a > b) c <- a else c <- b > c [1] 50 > gpa <- 3.5; sat <- 560 > if (gpa<3||sat<600)category <- "B"; score <- gpa + .007 * sat else category <- "A" score <- gpa + .006 * sat > category , 6/16 Conditional computation in R (contd.) Note that if there is a newline just before the else clause, the if statement is taken to be complete, because the if statement without an else part, is a valid R statement. The operators && and || are used to combine logical comparisons to make compound logical expressions. As noted before, the operators & and | must be used to combine logical vectors. The functions all() and any() are also useful for making compound logical expressions by combining the components of their arguments using the && and the || operators, respectively. > h 15.1 11.3 7.0 9.0 > h>10 TRUE TRUE FALSE FALSE > h[1]>10h[2]>10h[3]>10h[4]>10 FALSE > all(h>10) FALSE > h[1]>10||h[2]>10||h[3]>10||h[4]>10 TRUE > any(h>10) , 7/16 Conditional computation (continued) These two functions can be used on the results of more complex logical expressions: > a <- rnorm(100,5,2) > any( a > 9.5 ) TRUE > all( a >.5 ) FALSE > any( a <.5 | a > 9.5) TRUE > a[ a < .5 | a > 9.5] -0.5626276 9.5432217 0.2307503 0.2633115 > (1:100)[a < .5 | a > 9.5] 5 36 50 99 If the logical expression cond results in a vector of logical values, the ifelse() function is better suited for conditional evaluation. It is of the form ifelse(cond, expr 1, expr 2) and returns a value of the same shape as expr 1, containing elements from expr 1 or expr 2 depending on whether the corresponding elements of cond are TRUE or FALSE, respectively. > x <- 6 : -4 > x 6 5 4 3 2 1 0 -1 -2 -3 -4 > ifelse( x > 0, x, -x) , 8/16 Looping in R There are three kinds of looping constructs in R: the for loop, the while loop, the repeat loop. Three purposes that a loop is used in computation are: for repeating the same transformation (or computation) on every element of a data structure, say, an array or a matrix, for forming sums, such as we saw in the case of computational formulas for the sample variance, or summation of series expansions, and for implementing iterative methods. , It is possible in R to avoid using loops in the first two instances because of the vectorizing capability of R. Thus we can take square roots of every element in a vector object or find the sample variance of a data in a vector object, without using a loop. In order to understand iterative methods, some background knowledge in numerical computing is useful. This topic will be covered in a later class where the use of the above looping constructs to program iterative methods will be discussed. 9/16 for loops This type of loop construct is suitable when the sequence of values of a variable through which a loop is repeated, is known in advance. The general form of the statement is for(name in seq) expr where seq is a sequence values usually in a vector or a list, name is a name of an object usually a scalar, and expr is a simple or compound R expression. Each value of seq is assigned to the name in turn and the expression is evaluated. Thus the expression is evaluated repeatedly for different values of name, whether or not its value is used in the evaluation of the expression. The loop is terminated and control passed on to the next statement when the sequence of values is exhausted. , 10/16 for loops: An example > x <- 10 : 20 > for(i in 1:11) x[i] <- x[i]ˆ2 In the above example, the variable i is used for extracting successive elements of a vector using subscripting. As demonstrated below, the use of a loop is unnecessary in this example as the exponentiation operation is vectorized. Of course, this is the same as x <- > xˆ2 which is always preferred. Similarly for the following example: > a <- rnorm(100); b <rnorm(100); > diff1 <- rep(0,100) > for(i in 1:100) diff1[i]=a[i]-b[i] Note that objects being created new, must be initialized before subscripts can be used to reference their elements in a for loop. , 11/16 Example: two nested for loops Here nested for loops are used for computing the treatment totals of the weights for each of the levels of the factor feed. Note that the sequence in the second for loop here is a vector of character values. > chickwts > attach(chickwts) > levels(feed) "casein" "horsebean" "linseed" "meatmeal" "soybean" "sunflower" > tots <- rep(0,6) > names(tots) <- levels(feed) > tots > for(i in 1:71){ + for(diet in levels(feed)) + if(feed[i]==diet) tots[diet] <- tots[diet]+weight[i] + } , 12/16 while loops This kind of looping structure is suitable when the number of times the computations contained within the loop is repeated is not known in advance, and the termination of the loop is dependent on some other criteria. The general form of the while loop is: while(cond) expr. The simple or compound R expression, expr, is repeatedly executed until the logical expression cond evaluates to a FALSE value. Then the loop is exited and control transfers to the next statement. The value of the cond logical expression must necessarily depend on computations carried out in expr. Below, an example previously used to illustrate the for loop, is redone using a while loop. Note that the loop counter j is incremented inside the loop. , 13/16 Example: > > > > > + + + while loops a <- rnorm(100) b <- rnorm(100) diff3 <- NULL j <- 0 while ( j < 100) { j <- j+1 diff3[j] <- a[j]-b[j] } In practice, a while loop is preferred when the cond expression is used for other purposes than just counting. It can be used, for example, to determine if a terminating condition such as that the error in a computed answer has decreased to be smaller than a prespecified tolerance level and therefore is acceptable. , 14/16 Example: > > > > > + + + + , while loops i <- 0 term <- 1 sum <- 1 x <- 5 while( term > 0.0001) { i <- i + 1 term <- xˆi / factorial(i) sum <- sum + term } The loop is terminated when the value of the next term to be added to the series is less than or equal to 0.0001. Note that this loop will not work for negative vales of x.m Of course, if we knew that we need 21 terms in the series to achieve this accuracy of the result, we could have used the expression: 1+sum(5ˆ(1:20)/factorial(1:20)) In practice, a while loop is preferred when the cond expression is used for other purposes than just counting. It can be used, for example, to determine if a terminating condition such as that the error in a computed answer has decreased to be smaller than a prespecified tolerance level and therefore is acceptable. 15/16 repeat loops The repeat loop is similar to the while loop except that the condition for termination is tested inside the loop. This allows for more than a single condition to be checked and for these conditions to occur at different places in the loop. The general form of the repeat loop is: repeat expr where the expr is usually an R compound expression. The expression is evaluated repeatedly so that at least one break statement must be in the loop. The loop will be exited only when a given condition is satisfied. A break statement is of the form if(cond) break and any number of these statements may appear at different places in the loop with different cond logical expressions. , 16/16 Example: repeat loops The while loop in the previous example may be rewritten as follows: > i <- 0 > > > + + + + sum <- 1 x <- 5 repeat { i <- i + 1 term <- xˆi / factorial(i) if (term <= .0001) break sum <- sum + term + } Thus there is not too much of a difference between a repeat and a while loop, unless there are more than a single test and exit points. Otherwise, the choice depends on how the user wants to organize the computations within the loop, and when the condition for exiting the loop is checked. , 17/16