Stat 579: Grouping, Loops and Conditional Execution Ranjan Maitra

advertisement
Stat 579: Grouping, Loops and Conditional
Execution
Ranjan Maitra
2220 Snedecor Hall
Department of Statistics
Iowa State University.
Phone: 515-294-7757
maitra@iastate.edu
,
1/16
Grouped expressions
R is an expression language in the sense that its only
command type is a function or expression which returns a
result. Even an assignment is an expression whose result
is the value assigned, and it may be used wherever any
expression may be used; in particular multiple
assignments are possible.
Commands may be grouped together in braces, {expr 1;
...; expr m}, in which case the value of the group is the
result of the last expression in the group evaluated. Since
such a group is also an expression it may, for example, be
itself included in parentheses and used a part of an even
larger expression, and so on.
We will now discuss control statements and how to handle
them in R.
,
2/16
Conditional execution: if statements
The language has available a conditional construction of
the form
> if (expr 1) expr 2 else expr 3
where expr 1 must evaluate to a single logical value and
the result of the entire expression is then evident.
The short-circuit operators && and || are often used as
part of the condition in an if statement. Whereas & and |
apply element-wise to vectors, && and || apply to vectors
of length one, and only evaluate their second argument if
necessary.
There is a vectorized version of the if/else construct, the
ifelse function. This has the form
ifelse(condition, a, b) and returns a vector of the
length of its longest argument, with elements a[i] if
condition[i] is true, otherwise b[i].
,
3/16
Repetitive execution: for loops
There is also a for loop construction which has the form
> for (name in expr 1) expr 2
where name is the loop variable. expr 1 is a vector
expression, (often a sequence like 1:20), and expr 2 is
often a grouped expression with its sub-expressions
written in terms of the dummy name. expr 2 is repeatedly
evaluated as name ranges through the values in the vector
result of expr 1.
As an example, suppose ind is a vector of class indicators
and we wish to produce separate plots of y versus x within
classes. One possibility here is to use coplot() which
will produce an array of plots corresponding to each level
of the factor. Another way to do this, now putting all plots
on the one display, is as follows:
,
> xc <- split(x, ind)
> yc <- split(y, ind)
> for (i in 1:length(yc))
plot(xc[[i]], yc[[i]]);
abline(lsfit(xc[[i]], yc[[i]]))
4/16
for loops (continued)
Note the function split() which produces a list of
vectors obtained by splitting a larger vector according to
the classes specified by a factor. This is a useful function,
mostly used in connection with boxplots. See the help
facility for further details.
Warning: Note that for() loops should be used in R
code much less often than in compiled languages. Code
that takes a whole object view is likely to be both clearer
and faster in R.
Other looping facilities include the
> repeat expr
> while (condition) expr
statement.
The break statement can be used to terminate any loop,
possibly abnormally. This is the only way to terminate
repeat loops. The next statement can be used to
discontinue one particular cycle and skip to the “next”.
,
5/16
Conditional computation in R
The basic control structure available in R for conditional
computation is of the form if (cond) expr 1 else expr 2
where cond is an expression that evaluates to a logical
value, expr 1 is an R expression that will be executed if
the value of cond is TRUE; expr 2, is an R expression
that will be executed if the value of cond is FALSE.
Both expr 1 and expr 2 may be either simple R
expressions or compound R expressions.
A compound R expression consists of a group of simple R
expressions enclosed in braces. Note that several R
expressions may be entered on the same line as long as
they are separated by semi-colons. Some examples are: >
a <- 25;b <- 50 > if (a > b) c <- a else c <- b > c
[1] 50
> gpa <- 3.5; sat <- 560
> if (gpa<3||sat<600)category <- "B";
score <- gpa + .007 * sat else
category <- "A"
score <- gpa + .006 * sat
> category
,
6/16
Conditional computation in R (contd.)
Note that if there is a newline just before the else
clause, the if statement is taken to be complete, because
the if statement without an else part, is a valid R
statement.
The operators && and || are used to combine logical
comparisons to make compound logical expressions. As
noted before, the operators & and | must be used to
combine logical vectors. The functions all() and any()
are also useful for making compound logical expressions
by combining the components of their arguments using the
&& and the || operators, respectively.
> h
15.1 11.3 7.0 9.0 > h>10
TRUE TRUE FALSE FALSE > h[1]>10h[2]>10h[3]>10h[4]>10
FALSE
> all(h>10)
FALSE
> h[1]>10||h[2]>10||h[3]>10||h[4]>10
TRUE
> any(h>10)
,
7/16
Conditional computation (continued)
These two functions can be used on the results of more
complex logical expressions: > a <- rnorm(100,5,2)
> any( a > 9.5 )
TRUE
> all( a >.5 )
FALSE
> any( a <.5 | a > 9.5)
TRUE
> a[ a < .5 | a > 9.5]
-0.5626276 9.5432217 0.2307503 0.2633115
> (1:100)[a < .5 | a > 9.5]
5 36 50 99
If the logical expression cond results in a vector of logical
values, the ifelse() function is better suited for
conditional evaluation. It is of the form ifelse(cond,
expr 1, expr 2) and returns a value of the same shape as
expr 1, containing elements from expr 1 or expr 2
depending on whether the corresponding elements of
cond are TRUE or FALSE, respectively. > x <- 6 : -4
> x
6 5 4 3 2 1 0 -1 -2 -3 -4
> ifelse( x > 0, x, -x)
,
8/16
Looping in R
There are three kinds of looping constructs in R: the for
loop, the while loop, the repeat loop. Three purposes
that a loop is used in computation are:
for repeating the same transformation (or computation) on
every element of a data structure, say, an array or a matrix,
for forming sums, such as we saw in the case of
computational formulas for the sample variance, or
summation of series expansions, and
for implementing iterative methods.
,
It is possible in R to avoid using loops in the first two
instances because of the vectorizing capability of R. Thus
we can take square roots of every element in a vector
object or find the sample variance of a data in a vector
object, without using a loop.
In order to understand iterative methods, some
background knowledge in numerical computing is useful.
This topic will be covered in a later class where the use of
the above looping constructs to program iterative methods
will be discussed.
9/16
for loops
This type of loop construct is suitable when the sequence
of values of a variable through which a loop is repeated, is
known in advance. The general form of the statement is
for(name in seq) expr where seq is a sequence values
usually in a vector or a list, name is a name of an object
usually a scalar, and expr is a simple or compound R
expression. Each value of seq is assigned to the name in
turn and the expression is evaluated. Thus the expression
is evaluated repeatedly for different values of name,
whether or not its value is used in the evaluation of the
expression. The loop is terminated and control passed on
to the next statement when the sequence of values is
exhausted.
,
10/16
for loops: An example
> x <- 10 : 20
> for(i in 1:11) x[i] <- x[i]ˆ2
In the above example, the variable i is used for extracting
successive elements of a vector using subscripting. As
demonstrated below, the use of a loop is unnecessary in
this example as the exponentiation operation is vectorized.
Of course, this is the same as x <- > xˆ2 which is always
preferred.
Similarly for the following example: > a <- rnorm(100); b <rnorm(100);
> diff1 <- rep(0,100)
> for(i in 1:100) diff1[i]=a[i]-b[i]
Note that objects being created new, must be initialized
before subscripts can be used to reference their elements
in a for loop.
,
11/16
Example: two nested for loops
Here nested for loops are used for computing the
treatment totals of the weights for each of the levels of
the factor feed. Note that the sequence in the second for
loop here is a vector of character values.
> chickwts
> attach(chickwts)
> levels(feed)
"casein" "horsebean" "linseed" "meatmeal" "soybean"
"sunflower" > tots <- rep(0,6)
> names(tots) <- levels(feed)
> tots
> for(i in 1:71){
+ for(diet in levels(feed))
+ if(feed[i]==diet) tots[diet] <- tots[diet]+weight[i]
+ }
,
12/16
while loops
This kind of looping structure is suitable when the number
of times the computations contained within the loop is
repeated is not known in advance, and the termination of
the loop is dependent on some other criteria.
The general form of the while loop is: while(cond) expr.
The simple or compound R expression, expr, is
repeatedly executed until the logical expression cond
evaluates to a FALSE value. Then the loop is exited and
control transfers to the next statement. The value of the
cond logical expression must necessarily depend on
computations carried out in expr. Below, an example
previously used to illustrate the for loop, is redone using a
while loop. Note that the loop counter j is incremented
inside the loop.
,
13/16
Example:
>
>
>
>
>
+
+
+
while loops
a <- rnorm(100)
b <- rnorm(100)
diff3 <- NULL
j <- 0
while ( j < 100) {
j <- j+1
diff3[j] <- a[j]-b[j]
}
In practice, a while loop is preferred when the cond
expression is used for other purposes than just counting. It
can be used, for example, to determine if a terminating
condition such as that the error in a computed answer has
decreased to be smaller than a prespecified tolerance level
and therefore is acceptable.
,
14/16
Example:
>
>
>
>
>
+
+
+
+
,
while loops
i <- 0
term <- 1
sum <- 1
x <- 5
while( term > 0.0001) {
i <- i + 1
term <- xˆi / factorial(i)
sum <- sum + term
}
The loop is terminated when the value of the next term to
be added to the series is less than or equal to 0.0001.
Note that this loop will not work for negative vales of x.m
Of course, if we knew that we need 21 terms in the series
to achieve this accuracy of the result, we could have used
the expression: 1+sum(5ˆ(1:20)/factorial(1:20))
In practice, a while loop is preferred when the cond
expression is used for other purposes than just counting. It
can be used, for example, to determine if a terminating
condition such as that the error in a computed answer has
decreased to be smaller than a prespecified tolerance level
and therefore is acceptable.
15/16
repeat loops
The repeat loop is similar to the while loop except that the
condition for termination is tested inside the loop. This
allows for more than a single condition to be checked and
for these conditions to occur at different places in the loop.
The general form of the repeat loop is: repeat expr where
the expr is usually an R compound expression.
The expression is evaluated repeatedly so that at least one
break statement must be in the loop.
The loop will be exited only when a given condition is
satisfied.
A break statement is of the form if(cond) break and any
number of these statements may appear at different places
in the loop with different cond logical expressions.
,
16/16
Example: repeat loops
The while loop in the previous example may be rewritten
as follows: > i <- 0
>
>
>
+
+
+
+
sum <- 1
x <- 5
repeat {
i <- i + 1
term <- xˆi / factorial(i)
if (term <= .0001) break
sum <- sum + term
+ }
Thus there is not too much of a difference between a
repeat and a while loop, unless there are more than a
single test and exit points. Otherwise, the choice depends
on how the user wants to organize the computations within
the loop, and when the condition for exiting the loop is
checked.
,
17/16
Download