Lecture 4

advertisement
Introduction to R Lecture 4: Looping
Andrew Jaffe
9/27/2010
Overview
Practice Review
 The ‘for’ loop

 Rationale
 Syntax
 Application
 Getting
creative…
Practice overview

Compute the average dog weight, dog
length, and dog food consumption for each
dog type at baseline
Practice Overview
mean(dog_dat$dog_wt_mo1[dog_dat$dog_type
mean(dog_dat$dog_wt_mo1[dog_dat$dog_type
mean(dog_dat$dog_wt_mo1[dog_dat$dog_type
mean(dog_dat$dog_wt_mo1[dog_dat$dog_type
== "lab"])
== "husky"])
== "poodle"])
== "retriever"])
mean(dog_dat$dog_len_mo1[dog_dat$dog_type
mean(dog_dat$dog_len_mo1[dog_dat$dog_type
mean(dog_dat$dog_len_mo1[dog_dat$dog_type
mean(dog_dat$dog_len_mo1[dog_dat$dog_type
== "lab"])
== "husky"])
== "poodle"])
== "retriever"])
mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "lab"])
mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "husky"])
mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "poodle"])
mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "retriever"])
Overview
Practice Review
 The ‘for’ loop

 Rationale
 Syntax
 Application
 Getting
creative…
Loop Rationale
Download “lec4_data.rda” from the
website under Lecture 4 data
 Load it into R

– load(filename)
 Check your workspace with ls()
 Remember
Loop Rationale

What are the dimensions of the dataset?
Loop Rationale

What are the dimensions of the dataset?
> dim(dog_dat)
[1] 482 39
Loop Rationale

What are the variable names?
Loop Rationale

What are the variable names?
> names(dog_dat)
[1] "dog_id"
[4] "dog_wt_mo1"
[7] "dog_wt_mo4"
[10] "dog_wt_mo7"
[13] "dog_wt_mo10"
[16] "dog_len_mo1"
[19] "dog_len_mo4"
[22] "dog_len_mo7"
[25] "dog_len_mo10"
[28] "dog_food_mo1"
[31] "dog_food_mo4"
[34] "dog_food_mo7"
[37] "dog_food_mo10"
"owner_id"
"dog_wt_mo2"
"dog_wt_mo5"
"dog_wt_mo8"
"dog_wt_mo11"
"dog_len_mo2"
"dog_len_mo5"
"dog_len_mo8"
"dog_len_mo11"
"dog_food_mo2"
"dog_food_mo5"
"dog_food_mo8"
"dog_food_mo11"
"dog_type"
"dog_wt_mo3"
"dog_wt_mo6"
"dog_wt_mo9"
"dog_wt_mo12"
"dog_len_mo3"
"dog_len_mo6"
"dog_len_mo9"
"dog_len_mo12"
"dog_food_mo3"
"dog_food_mo6"
"dog_food_mo9"
"dog_food_mo12"
Loop Rationale
dog_wt_mo1-12: the dog’s weight at each
of the 12 months
 dog_len_mo1-12: the dog’s length at each
of the 12 months
 dog_food_mo1-12: the dog’s food
consumption at each of the 12 months

Loop Rationale
Now, compute the average dog weight,
dog length, and dog food consumption for
each dog type at EVERY visit
 That would be 36*4 = 144 lines of code
that’s almost identical
 Now let’s talk about the ‘for’ loop….

Overview
Practice Review
 The ‘for’ loop

 Rationale
 Syntax
 Application
 Getting
creative…
Syntax
variable
sequence
for(i in 1:10) {
print(i)
}
Curly brackets
designate a loop (or
function).
The body of the loop is
between them.
Syntax
> for(i in 1:10) {
+ print(i)
+ }
Sets i=1, and run the loop body until the end
[1] 1
Sets i=2, and run the loop body until the end
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
Sets i=10, and run the loop body until the end
[1] 10
Syntax

Another way to think about it: set i=1, and
then just run the loop body
> i=1
> print(i)
[1] 1
> i=2
> print(i)
[1] 2
Syntax

Some notes/comments:
 ‘i’
is a common variable for loops, but it can be
anything: ‘x’, ‘names’, etc
 That variable will get set to the loop
sequence, and get overwritten if it exists
 Run the ‘for’ loop above (with print), and type i
– it should equal 10
Syntax
> b = 0
> for(i in 1:10) {
+ b = b + i
+ }
> b
[1] 55
> b = 0
> for(i in 1:10) {
+ b = b + i
+ print(b)
+ }
i=1: 0[b] + 1 = 1 = b
[1] 1
i=2: 1[b] + 2 = 3 = b
[1] 3
i=3: 3[b] + 3 = 6 = b
[1] 6
[1] 10
i=4: 6[b] + 4 = 10 = b
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
i=10: 45[b] + 10 = 55 = b
[1] 55
Syntax
We don’t just want to print stuff usually –
we want to manipulate data and save it
 Procedure: create a blank vector, then fill
in that vector with a ‘for’ loop

Syntax

Guess what this is doing:
b = 0
b_vec = rep(0, 10)
for(i in 1:10) {
b = b + i
b_vec[i] = b
}
>
> b_vec
[1] 1 3 6 10 15 21 28 36 45 55
We’re using the looping variable to index!
Syntax

That last loop, step by step:
 Set
b=0 and create a blank vector of length 10
 For 1 through 10, add each iteration to its
running sum

Ie sum(1:10) – 1 + 2 + … + 10 = 55
 Store
that sum in vector b_vec
Overview
Practice Review
 The ‘for’ loop

 Rationale
 Syntax
 Application
 Getting
creative…
Application
Let’s take a step forward, and calculate
the average dog weight, dog length, and
dog food consumption for all dogs at every
visit
 Instead of looping over a vector, we will
loop over a matrix/data.frame

Application
Let’s try just dog weight first
 We can loop over non-sequential variables
(indices) in a dataset
 Here, we want columns 4-15 of dog_wt,
which corresponds to the dog’s weights at
each month

Application
Looping over non-sequential elements is
easy to do
 However, you have to be careful when
saving outputs of non-sequential elements

Application
> Index = c(1,3,5,7,9)
> out = rep(NA,5)
> mat = matrix(rnorm(100), ncol = 10)
> for(i in Index) {
+ out[i] = mean(mat[,i])
+ }
> out
[1] 0.2230609
NA -0.2862340
[5] 0.3940720
NA -0.1284383
[9] 0.1291539
NA
NA
Wrong – there’s missing data! Note that we want
out[1:5] to correspond to mat[,c(1,3,5,7,9)]
Application
Index = 4:15
mean_wt <- rep(0, length(Index))
for(i in 1:length(Index)) {
ind = Index[i] # column index
mean_wt[i] = mean(dog_dat[,ind])
}
Application



Here, we are defining our column indices first,
and creating a blank vector of that length
We then loop over each value of that column
index, take the mean of the resulting vector, and
store it in the blank vector
This allows us to store the mean of the fourth
column in the first position of our output vector,
the fifth column in the 2nd position, 6th column in
the 3rd position, etc
Application
So, the first time through the loop, we take
the item from the i’th position of the Index
 The first time through the loop, i=1, and
ind = Index[1] = 4

Application

Why not just loop using for(i in 4:15)? Aka
for(i in Index)
> Index = 4:15
> mean_wt <- rep(0, length(Index))
> for(i in Index) {
+ mean_wt[i] = mean(dog_dat[,i])
+ }
> mean_wt
[1] 0.00000 0.00000 0.00000 49.69606 48.56680 48.91141
[7] 50.13568 50.05124 49.54793 48.29378 46.41971 44.55975
[13] 45.02490 44.18506 45.75394
It’s too long – length(Index) = 12
Application

This is the same thing – if i = 4 the first
time through, and you want something to
be saved in position 1 of another vector:
Index = 4:15
mean_wt <- rep(0, length(Index))
for(i in Index) {
mean_wt[(i-3)] = mean(dog_dat[,i])
}
Application
I think it’s easier to define an index first,
and then within the loop use each entry of
that index (first way of doing it)
 However, feel free to do it any way you
want (however it makes the most sense to
you)

Application

Note: R has several built-in commands
that do what we just did:
 rowSums()
, colSums()
 rowMeans(), colMeans()

We basically just did this using a loop:
colMeans(dog_dat[,4:15])
Overview
Practice Review
 The ‘for’ loop

 Rationale
 Syntax
 Application
 Getting
creative…
Creative

We still have two problems to solve:
 Average
of food, weight, and length at each
visit
 And then those averages for each dog type at
each visit
Creative
Index = 16:27
mean_len <- rep(0, length(Index))
for(i in 1:length(Index)) {
ind = Index[i]
mean_len[i] = mean(dog_dat[,ind])
}
Index = 28:39
mean_food <- rep(0, length(Index))
for(i in 1:length(Index)) {
ind = Index[i]
mean_food[i] = mean(dog_dat[,ind])
}
Creative
> dog_means = rbind(mean_wt, mean_len, mean_food)
> colnames(dog_means) = paste("month",1:12,sep="_")
> dog_means
month_1 month_2 month_3 month_4 month_5
mean_wt
49.69606 48.56680 48.91141 50.13568 50.05124
mean_len 20.32427 20.57220 20.68838 20.89668 20.98050
mean_food 30.01660 29.74834 28.75415 28.18942 29.50207
month_6 month_7 month_8 month_9 month_10
mean_wt
49.54793 48.29378 46.41971 44.55975 45.02490
mean_len 21.26950 21.37178 21.50705 21.61141 21.80975
mean_food 30.22573 30.88050 29.18942 30.01079 29.87033
month_11 month_12
mean_wt
44.18506 45.75394
mean_len 21.97842 22.27822
mean_food 29.51784 30.87614
Creative

paste: concatenates vectors after
converting to character – its great for
creating names within for loops, or of new
matrices
> paste("letter",c("a","b","c"), sep=":")
[1] "letter:a" "letter:b" "letter:c"
> x = c("a", "b", "c")
> paste("letter",x, sep=":")
[1] "letter:a" "letter:b" "letter:c"
Creative
Index = 4:15
mean_wt <- rep(0, length(Index))
lab = rep(0, length(Index))
for(i in 1:length(Index)) {
ind = Index[i]
mean_wt[i] = mean(dog_dat[,ind])
lab[i] = paste("the ", i, "th entry is ",
round(mean_wt[i],2),sep="")
}
> head(lab)
[1] "the 1th entry is 49.7" "the 2th entry is 48.57"
[3] "the 3th entry is 48.91" "the 4th entry is 50.14"
[5] "the 5th entry is 50.05" "the 6th entry is 49.55"
Creative
Now we get to solve #2 (using ‘for’ loops
and colMeans) and store it in 3 matrices
 First, make blank matrices
 Then create for loops over our variables of
interest

Creative
dogs = unique(dog_dat$dog_type)
wt = matrix(nrow = length(dogs), ncol = 12)
for(i in 1:length(dogs)) { # 1:4
# for each dog type...
Index = which(dog_dat$dog_type == dogs[i])
# specific weights for each dog type
tmp = dog_dat[Index,4:15]
# each row is for one dog
wt[i,] = colMeans(tmp)
}
Creative
> rownames(wt) = dogs
> colnames(wt) = paste("month",1:12,sep="_")
> wt
month_1 month_2 month_3 month_4
lab
49.81840 48.69200 49.03360 50.26560
poodle
49.40090 48.27297 48.61892 49.84414
husky
49.26372 48.13097 48.48142 49.70088
retriever 50.19474 49.06466 49.40602 50.62632
month_7 month_8 month_9 month_10
lab
48.41600 46.54640 44.68640 45.15040
poodle
47.99820 46.12613 44.26577 44.73243
husky
47.86195 45.98761 44.12832 44.59469
retriever 48.79248 46.91278 45.05263 45.51654
month_5
50.17600
49.76126
49.61858
50.54361
month_11
44.30640
43.89009
43.75221
44.68496
month_6
49.67280
49.25856
49.11327
50.04135
month_12
45.88240
45.46306
45.31858
46.24586
Creative
# same thing for length...
len = matrix(nrow = length(dogs), ncol = 12)
rownames(len) = dogs
colnames(len) = paste("month",1:12,sep="_")
for(i in 1:length(dogs)) {
tmp = dog_dat[dog_dat$dog_type == dogs[i],16:27]
len[i,] = colMeans(tmp)
}
# and for food.
food = matrix(nrow = length(dogs), ncol = 12)
rownames(food) = dogs
colnames(food) = paste("month",1:12,sep="_")
for(i in 1:length(dogs)) {
tmp = dog_dat[dog_dat$dog_type == dogs[i],28:39]
food[i,] = colMeans(tmp)
}
Creative
Note that the code for each category
(weight, length, and food) is still quite
similar
 Next week, double ‘for’ loops and lists

Download