CSSS 508: Intro R 2/10/06 Homework 5 Solutions 1) Download the data hw5prob1.dat from the website and read it into R. > prob5<-read.table("hw5prob1.dat") > dim(prob5) [1] 200 500 Write a function that samples n rows from a matrix and returns the mean of the values between the 10th and 90th percentile for each sampled row. We’re passing in the matrix and how many rows we want to sample (n). We can use the trim option in the mean function to trim off the upper and lower 10 % of the values in each sampled row. mean.of.sampled.rows<-function(n,matrix){ ##Dimensions of matrix m.rows<-nrow(matrix) m.cols<-ncol(matrix) ##Space for the means we're returning mean.vec<-rep(NA,n) ##Which n rows we're sampling sampled.rows<-sample(seq(1,m.rows),n) for(i in 1:n){ mean.vec[i]<-mean(matrix[sampled.rows[i],],trim=0.10) } return(mean.vec) } > mean.of.sampled.rows(3,prob5) [1] 27.87556 27.84723 26.78632 > mean.of.sampled.rows(6,prob5) [1] 26.90512 26.57977 27.28167 27.62742 26.29790 26.65509 We could have done this with an apply function as well. It will be faster because we’ll avoid the for loop. mean.of.sampled.rows<-function(n,matrix){ ##Dimensions of matrix m.rows<-nrow(matrix) m.cols<-ncol(matrix) ##Don’t need to declare any space for the mean.vec ##Which n rows we're sampling; then make a smaller submatrix sampled.rows<-sample(seq(1,m.rows),n) sampled.matrix<-matrix[sampled.rows,] ##Need to pass in the argument trim=0.10 for the mean function mean.vec<-apply(sampled.matrix,1,mean,trim=0.10) return(mean.vec) } This way gives us the sampled row numbers back as well. > mean.of.sampled.rows(3,prob5) 81 160 8 26.99522 27.00366 26.97218 > mean.of.sampled.rows(6,prob5) 167 70 64 12 27 59 26.98979 27.00424 26.95259 26.96838 26.99951 27.03397 Now write a function that takes in a matrix and a vector of sample sizes. For each sample size n, find the trimmed means (10 percent on each side) of n sampled rows of the matrix. Store the average of the n means. At the end, you should have a vector the same length as the vector of sample sizes, one “mean of the trimmed means” for each sample size. Return this vector. mean.of.the.means<-function(matrix, sample.size.vec){ #How many different sample sizes we have n<-length(sample.size.vec) ##Space for the averages we’re returning mean.of.means.vec<-rep(NA,n) ##We loop over the different sample sizes; ##Use the previous function mean.of.sampled.rows ##to sample rows and return their means in a vector. ##Then we take the average of those means. for(i in 1:n){ mean.vec<-mean.of.sampled.rows(sample.size.vec[i],matrix) mean.of.means.vec[i]<-mean(mean.vec) } return(mean.of.means.vec) } You could also put in a check that the sample size does not exceed the number of rows of the matrix. (Can’t sample more rows than you have in the matrix.) > mean.of.the.means(prob5,c(3,5,10,20,50,100)) [1] 27.00236 27.01630 27.01185 27.00672 26.99620 27.00258 2) Create a matrix with 8 rows and 6 columns. Each row is a random sample from a uniform distribution of size 6 from the interval [2,4]. Print out your matrix. > > + + > prob2.matrix<-matrix(0,8,6) for(i in 1:8){ prob2.matrix[i,]<-runif(6,2,4) } prob2.matrix [,1] [,2] [,3] [,4] [1,] 2.375181 3.497939 2.769875 2.882981 [2,] 2.088742 3.562703 2.050232 3.249195 [3,] 2.896904 2.600610 3.933817 3.195647 [4,] 3.696312 3.417983 3.704388 3.134369 [5,] 2.689624 2.272793 2.338112 2.006397 [6,] 2.254685 2.644029 2.376087 3.155786 [7,] 3.600982 2.663882 2.903032 2.572278 [8,] 3.609709 2.710008 2.126889 3.889838 [,5] 2.368221 3.061528 3.819058 2.830233 2.845876 3.801183 2.195084 3.818240 [,6] 2.181326 3.620945 2.336203 2.126479 2.657805 2.709532 2.627709 2.848927 Do the following with the apply command (no for loops). a) Return the mean of each row. > apply(prob2.matrix,1,mean) [1] 2.679254 2.938891 3.130373 3.151627 2.468435 2.823550 2.760495 3.167268 b) Return the max of each column. > apply(prob2.matrix,2,max) [1] 3.696312 3.562703 3.933817 3.889838 3.819058 3.620945 c) Return the first, second, and third quartile of each column. We’re returning multiple arguments here; we’ll need to write our own function. Remember that this function will be applied to each column, one at a time. So it will take in a vector as its argument. > first.three.quartiles<-function(vector){ + sum.vector<-summary(vector) + quar.1<-sum.vector[2] + quar.2<-sum.vector[3] + quar.3<-sum.vector[5] + + return(quar.1,quar.2,quar.3) + } > apply(prob2.matrix,2,first.three.quartiles) [[1]] [[1]]$quar.1 1st Qu. 2.345 [[1]]$quar.2 Median 2.793 [[1]]$quar.3 3rd Qu. 3.603 [[2]] [[2]]$quar.1 1st Qu. 2.633 [[2]]$quar.2 Median 2.687 [[2]]$quar.3 3rd Qu. 3.438 [[3]] [[3]]$quar.1 1st Qu. 2.285 [[3]]$quar.2 Median 2.573 [[3]]$quar.3 3rd Qu. 3.103 [[4]] [[4]]$quar.1 1st Qu. 2.805 [[4]]$quar.2 Median 3.145 [[4]]$quar.3 3rd Qu. 3.209 [[5]] [[5]]$quar.1 1st Qu. 2.715 [[5]]$quar.2 Median 2.954 [[5]]$quar.3 3rd Qu. 3.805 [[6]] [[6]]$quar.1 1st Qu. 2.297 [[6]]$quar.2 Median 2.643 [[6]]$quar.3 3rd Qu. 2.744 3) Sample a random integer between 15 and 30. Call it n. Build a table with n rows and 4 columns. The first column should be your first name. The second column is the row number (1,2,3,4, etc). The third column is the row number squared (1,4,9,16, etc). The fourth column is the square root of the row number rounded to two decimal places. Write the table out to a .dat file in your R working directory. Open this file up with your choice of text editor (Word, Notepad, etc). Turn a copy of it in. > n<-sample(seq(15,30),1) > n [1] 28 #Building the columns > col.1<-rep("Rebecca",n) > row.nos<-seq(1,28) > col.2<-row.nos > col.3<-row.nos^2 > col.4<-round(sqrt(row.nos),2) > prob4.table<-cbind(col.1,col.2,col.3,col.4) > write.table(prob4.table,"prob4.table.dat") > read.table("prob4.table.dat") 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 (getting it back) col.1 col.2 col.3 col.4 Rebecca 1 1 1.00 Rebecca 2 4 1.41 Rebecca 3 9 1.73 Rebecca 4 16 2.00 Rebecca 5 25 2.24 Rebecca 6 36 2.45 Rebecca 7 49 2.65 Rebecca 8 64 2.83 Rebecca 9 81 3.00 Rebecca 10 100 3.16 Rebecca 11 121 3.32 Rebecca 12 144 3.46 Rebecca 13 169 3.61 Rebecca 14 196 3.74 Rebecca 15 225 3.87 Rebecca 16 256 4.00 Rebecca 17 289 4.12 Rebecca 18 324 4.24 Rebecca 19 361 4.36 Rebecca 20 400 4.47 Rebecca 21 441 4.58 Rebecca 22 484 4.69 Rebecca 23 529 4.80 Rebecca 24 576 4.90 Rebecca 25 625 5.00 Rebecca 26 676 5.10 Rebecca 27 729 5.20 Rebecca 28 784 5.29