advertisement

Introduction to R Lecture 4: Looping Andrew Jaffe 9/27/2010 Overview Practice Review The ‘for’ loop Rationale Syntax Application Getting creative… Practice overview Compute the average dog weight, dog length, and dog food consumption for each dog type at baseline Practice Overview mean(dog_dat$dog_wt_mo1[dog_dat$dog_type mean(dog_dat$dog_wt_mo1[dog_dat$dog_type mean(dog_dat$dog_wt_mo1[dog_dat$dog_type mean(dog_dat$dog_wt_mo1[dog_dat$dog_type == "lab"]) == "husky"]) == "poodle"]) == "retriever"]) mean(dog_dat$dog_len_mo1[dog_dat$dog_type mean(dog_dat$dog_len_mo1[dog_dat$dog_type mean(dog_dat$dog_len_mo1[dog_dat$dog_type mean(dog_dat$dog_len_mo1[dog_dat$dog_type == "lab"]) == "husky"]) == "poodle"]) == "retriever"]) mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "lab"]) mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "husky"]) mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "poodle"]) mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "retriever"]) Overview Practice Review The ‘for’ loop Rationale Syntax Application Getting creative… Loop Rationale Download “lec4_data.rda” from the website under Lecture 4 data Load it into R – load(filename) Check your workspace with ls() Remember Loop Rationale What are the dimensions of the dataset? Loop Rationale What are the dimensions of the dataset? > dim(dog_dat) [1] 482 39 Loop Rationale What are the variable names? Loop Rationale What are the variable names? > names(dog_dat) [1] "dog_id" [4] "dog_wt_mo1" [7] "dog_wt_mo4" [10] "dog_wt_mo7" [13] "dog_wt_mo10" [16] "dog_len_mo1" [19] "dog_len_mo4" [22] "dog_len_mo7" [25] "dog_len_mo10" [28] "dog_food_mo1" [31] "dog_food_mo4" [34] "dog_food_mo7" [37] "dog_food_mo10" "owner_id" "dog_wt_mo2" "dog_wt_mo5" "dog_wt_mo8" "dog_wt_mo11" "dog_len_mo2" "dog_len_mo5" "dog_len_mo8" "dog_len_mo11" "dog_food_mo2" "dog_food_mo5" "dog_food_mo8" "dog_food_mo11" "dog_type" "dog_wt_mo3" "dog_wt_mo6" "dog_wt_mo9" "dog_wt_mo12" "dog_len_mo3" "dog_len_mo6" "dog_len_mo9" "dog_len_mo12" "dog_food_mo3" "dog_food_mo6" "dog_food_mo9" "dog_food_mo12" Loop Rationale dog_wt_mo1-12: the dog’s weight at each of the 12 months dog_len_mo1-12: the dog’s length at each of the 12 months dog_food_mo1-12: the dog’s food consumption at each of the 12 months Loop Rationale Now, compute the average dog weight, dog length, and dog food consumption for each dog type at EVERY visit That would be 36*4 = 144 lines of code that’s almost identical Now let’s talk about the ‘for’ loop…. Overview Practice Review The ‘for’ loop Rationale Syntax Application Getting creative… Syntax variable sequence for(i in 1:10) { print(i) } Curly brackets designate a loop (or function). The body of the loop is between them. Syntax > for(i in 1:10) { + print(i) + } Sets i=1, and run the loop body until the end [1] 1 Sets i=2, and run the loop body until the end [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 Sets i=10, and run the loop body until the end [1] 10 Syntax Another way to think about it: set i=1, and then just run the loop body > i=1 > print(i) [1] 1 > i=2 > print(i) [1] 2 Syntax Some notes/comments: ‘i’ is a common variable for loops, but it can be anything: ‘x’, ‘names’, etc That variable will get set to the loop sequence, and get overwritten if it exists Run the ‘for’ loop above (with print), and type i – it should equal 10 Syntax > b = 0 > for(i in 1:10) { + b = b + i + } > b [1] 55 > b = 0 > for(i in 1:10) { + b = b + i + print(b) + } i=1: 0[b] + 1 = 1 = b [1] 1 i=2: 1[b] + 2 = 3 = b [1] 3 i=3: 3[b] + 3 = 6 = b [1] 6 [1] 10 i=4: 6[b] + 4 = 10 = b [1] 15 [1] 21 [1] 28 [1] 36 [1] 45 i=10: 45[b] + 10 = 55 = b [1] 55 Syntax We don’t just want to print stuff usually – we want to manipulate data and save it Procedure: create a blank vector, then fill in that vector with a ‘for’ loop Syntax Guess what this is doing: b = 0 b_vec = rep(0, 10) for(i in 1:10) { b = b + i b_vec[i] = b } > > b_vec [1] 1 3 6 10 15 21 28 36 45 55 We’re using the looping variable to index! Syntax That last loop, step by step: Set b=0 and create a blank vector of length 10 For 1 through 10, add each iteration to its running sum Ie sum(1:10) – 1 + 2 + … + 10 = 55 Store that sum in vector b_vec Overview Practice Review The ‘for’ loop Rationale Syntax Application Getting creative… Application Let’s take a step forward, and calculate the average dog weight, dog length, and dog food consumption for all dogs at every visit Instead of looping over a vector, we will loop over a matrix/data.frame Application Let’s try just dog weight first We can loop over non-sequential variables (indices) in a dataset Here, we want columns 4-15 of dog_wt, which corresponds to the dog’s weights at each month Application Looping over non-sequential elements is easy to do However, you have to be careful when saving outputs of non-sequential elements Application > Index = c(1,3,5,7,9) > out = rep(NA,5) > mat = matrix(rnorm(100), ncol = 10) > for(i in Index) { + out[i] = mean(mat[,i]) + } > out [1] 0.2230609 NA -0.2862340 [5] 0.3940720 NA -0.1284383 [9] 0.1291539 NA NA Wrong – there’s missing data! Note that we want out[1:5] to correspond to mat[,c(1,3,5,7,9)] Application Index = 4:15 mean_wt <- rep(0, length(Index)) for(i in 1:length(Index)) { ind = Index[i] # column index mean_wt[i] = mean(dog_dat[,ind]) } Application Here, we are defining our column indices first, and creating a blank vector of that length We then loop over each value of that column index, take the mean of the resulting vector, and store it in the blank vector This allows us to store the mean of the fourth column in the first position of our output vector, the fifth column in the 2nd position, 6th column in the 3rd position, etc Application So, the first time through the loop, we take the item from the i’th position of the Index The first time through the loop, i=1, and ind = Index[1] = 4 Application Why not just loop using for(i in 4:15)? Aka for(i in Index) > Index = 4:15 > mean_wt <- rep(0, length(Index)) > for(i in Index) { + mean_wt[i] = mean(dog_dat[,i]) + } > mean_wt [1] 0.00000 0.00000 0.00000 49.69606 48.56680 48.91141 [7] 50.13568 50.05124 49.54793 48.29378 46.41971 44.55975 [13] 45.02490 44.18506 45.75394 It’s too long – length(Index) = 12 Application This is the same thing – if i = 4 the first time through, and you want something to be saved in position 1 of another vector: Index = 4:15 mean_wt <- rep(0, length(Index)) for(i in Index) { mean_wt[(i-3)] = mean(dog_dat[,i]) } Application I think it’s easier to define an index first, and then within the loop use each entry of that index (first way of doing it) However, feel free to do it any way you want (however it makes the most sense to you) Application Note: R has several built-in commands that do what we just did: rowSums() , colSums() rowMeans(), colMeans() We basically just did this using a loop: colMeans(dog_dat[,4:15]) Overview Practice Review The ‘for’ loop Rationale Syntax Application Getting creative… Creative We still have two problems to solve: Average of food, weight, and length at each visit And then those averages for each dog type at each visit Creative Index = 16:27 mean_len <- rep(0, length(Index)) for(i in 1:length(Index)) { ind = Index[i] mean_len[i] = mean(dog_dat[,ind]) } Index = 28:39 mean_food <- rep(0, length(Index)) for(i in 1:length(Index)) { ind = Index[i] mean_food[i] = mean(dog_dat[,ind]) } Creative > dog_means = rbind(mean_wt, mean_len, mean_food) > colnames(dog_means) = paste("month",1:12,sep="_") > dog_means month_1 month_2 month_3 month_4 month_5 mean_wt 49.69606 48.56680 48.91141 50.13568 50.05124 mean_len 20.32427 20.57220 20.68838 20.89668 20.98050 mean_food 30.01660 29.74834 28.75415 28.18942 29.50207 month_6 month_7 month_8 month_9 month_10 mean_wt 49.54793 48.29378 46.41971 44.55975 45.02490 mean_len 21.26950 21.37178 21.50705 21.61141 21.80975 mean_food 30.22573 30.88050 29.18942 30.01079 29.87033 month_11 month_12 mean_wt 44.18506 45.75394 mean_len 21.97842 22.27822 mean_food 29.51784 30.87614 Creative paste: concatenates vectors after converting to character – its great for creating names within for loops, or of new matrices > paste("letter",c("a","b","c"), sep=":") [1] "letter:a" "letter:b" "letter:c" > x = c("a", "b", "c") > paste("letter",x, sep=":") [1] "letter:a" "letter:b" "letter:c" Creative Index = 4:15 mean_wt <- rep(0, length(Index)) lab = rep(0, length(Index)) for(i in 1:length(Index)) { ind = Index[i] mean_wt[i] = mean(dog_dat[,ind]) lab[i] = paste("the ", i, "th entry is ", round(mean_wt[i],2),sep="") } > head(lab) [1] "the 1th entry is 49.7" "the 2th entry is 48.57" [3] "the 3th entry is 48.91" "the 4th entry is 50.14" [5] "the 5th entry is 50.05" "the 6th entry is 49.55" Creative Now we get to solve #2 (using ‘for’ loops and colMeans) and store it in 3 matrices First, make blank matrices Then create for loops over our variables of interest Creative dogs = unique(dog_dat$dog_type) wt = matrix(nrow = length(dogs), ncol = 12) for(i in 1:length(dogs)) { # 1:4 # for each dog type... Index = which(dog_dat$dog_type == dogs[i]) # specific weights for each dog type tmp = dog_dat[Index,4:15] # each row is for one dog wt[i,] = colMeans(tmp) } Creative > rownames(wt) = dogs > colnames(wt) = paste("month",1:12,sep="_") > wt month_1 month_2 month_3 month_4 lab 49.81840 48.69200 49.03360 50.26560 poodle 49.40090 48.27297 48.61892 49.84414 husky 49.26372 48.13097 48.48142 49.70088 retriever 50.19474 49.06466 49.40602 50.62632 month_7 month_8 month_9 month_10 lab 48.41600 46.54640 44.68640 45.15040 poodle 47.99820 46.12613 44.26577 44.73243 husky 47.86195 45.98761 44.12832 44.59469 retriever 48.79248 46.91278 45.05263 45.51654 month_5 50.17600 49.76126 49.61858 50.54361 month_11 44.30640 43.89009 43.75221 44.68496 month_6 49.67280 49.25856 49.11327 50.04135 month_12 45.88240 45.46306 45.31858 46.24586 Creative # same thing for length... len = matrix(nrow = length(dogs), ncol = 12) rownames(len) = dogs colnames(len) = paste("month",1:12,sep="_") for(i in 1:length(dogs)) { tmp = dog_dat[dog_dat$dog_type == dogs[i],16:27] len[i,] = colMeans(tmp) } # and for food. food = matrix(nrow = length(dogs), ncol = 12) rownames(food) = dogs colnames(food) = paste("month",1:12,sep="_") for(i in 1:length(dogs)) { tmp = dog_dat[dog_dat$dog_type == dogs[i],28:39] food[i,] = colMeans(tmp) } Creative Note that the code for each category (weight, length, and food) is still quite similar Next week, double ‘for’ loops and lists