Solutions

advertisement
CSSS 508: Intro R
2/10/06
Homework 5 Solutions
1) Download the data hw5prob1.dat from the website and read it into R.
> prob5<-read.table("hw5prob1.dat")
> dim(prob5)
[1] 200 500
Write a function that samples n rows from a matrix and returns the mean of the values
between the 10th and 90th percentile for each sampled row.
We’re passing in the matrix and how many rows we want to sample (n).
We can use the trim option in the mean function to trim off the upper and lower 10 % of
the values in each sampled row.
mean.of.sampled.rows<-function(n,matrix){
##Dimensions of matrix
m.rows<-nrow(matrix)
m.cols<-ncol(matrix)
##Space for the means we're returning
mean.vec<-rep(NA,n)
##Which n rows we're sampling
sampled.rows<-sample(seq(1,m.rows),n)
for(i in 1:n){
mean.vec[i]<-mean(matrix[sampled.rows[i],],trim=0.10)
}
return(mean.vec)
}
> mean.of.sampled.rows(3,prob5)
[1] 27.87556 27.84723 26.78632
> mean.of.sampled.rows(6,prob5)
[1] 26.90512 26.57977 27.28167 27.62742 26.29790 26.65509
We could have done this with an apply function as well.
It will be faster because we’ll avoid the for loop.
mean.of.sampled.rows<-function(n,matrix){
##Dimensions of matrix
m.rows<-nrow(matrix)
m.cols<-ncol(matrix)
##Don’t need to declare any space for the mean.vec
##Which n rows we're sampling; then make a smaller submatrix
sampled.rows<-sample(seq(1,m.rows),n)
sampled.matrix<-matrix[sampled.rows,]
##Need to pass in the argument trim=0.10 for the mean function
mean.vec<-apply(sampled.matrix,1,mean,trim=0.10)
return(mean.vec)
}
This way gives us the sampled row numbers back as well.
> mean.of.sampled.rows(3,prob5)
81
160
8
26.99522 27.00366 26.97218
> mean.of.sampled.rows(6,prob5)
167
70
64
12
27
59
26.98979 27.00424 26.95259 26.96838 26.99951 27.03397
Now write a function that takes in a matrix and a vector of sample sizes. For each sample
size n, find the trimmed means (10 percent on each side) of n sampled rows of the matrix.
Store the average of the n means. At the end, you should have a vector the same length
as the vector of sample sizes, one “mean of the trimmed means” for each sample size.
Return this vector.
mean.of.the.means<-function(matrix, sample.size.vec){
#How many different sample sizes we have
n<-length(sample.size.vec)
##Space for the averages we’re returning
mean.of.means.vec<-rep(NA,n)
##We loop over the different sample sizes;
##Use the previous function mean.of.sampled.rows
##to sample rows and return their means in a vector.
##Then we take the average of those means.
for(i in 1:n){
mean.vec<-mean.of.sampled.rows(sample.size.vec[i],matrix)
mean.of.means.vec[i]<-mean(mean.vec)
}
return(mean.of.means.vec)
}
You could also put in a check that the sample size does not exceed the number of rows of
the matrix. (Can’t sample more rows than you have in the matrix.)
> mean.of.the.means(prob5,c(3,5,10,20,50,100))
[1] 27.00236 27.01630 27.01185 27.00672 26.99620 27.00258
2) Create a matrix with 8 rows and 6 columns. Each row is a random sample from a
uniform distribution of size 6 from the interval [2,4]. Print out your matrix.
>
>
+
+
>
prob2.matrix<-matrix(0,8,6)
for(i in 1:8){
prob2.matrix[i,]<-runif(6,2,4)
}
prob2.matrix
[,1]
[,2]
[,3]
[,4]
[1,] 2.375181 3.497939 2.769875 2.882981
[2,] 2.088742 3.562703 2.050232 3.249195
[3,] 2.896904 2.600610 3.933817 3.195647
[4,] 3.696312 3.417983 3.704388 3.134369
[5,] 2.689624 2.272793 2.338112 2.006397
[6,] 2.254685 2.644029 2.376087 3.155786
[7,] 3.600982 2.663882 2.903032 2.572278
[8,] 3.609709 2.710008 2.126889 3.889838
[,5]
2.368221
3.061528
3.819058
2.830233
2.845876
3.801183
2.195084
3.818240
[,6]
2.181326
3.620945
2.336203
2.126479
2.657805
2.709532
2.627709
2.848927
Do the following with the apply command (no for loops).
a) Return the mean of each row.
> apply(prob2.matrix,1,mean)
[1] 2.679254 2.938891 3.130373 3.151627 2.468435 2.823550 2.760495
3.167268
b) Return the max of each column.
> apply(prob2.matrix,2,max)
[1] 3.696312 3.562703 3.933817 3.889838 3.819058 3.620945
c) Return the first, second, and third quartile of each column.
We’re returning multiple arguments here; we’ll need to write our own function.
Remember that this function will be applied to each column, one at a time.
So it will take in a vector as its argument.
> first.three.quartiles<-function(vector){
+
sum.vector<-summary(vector)
+
quar.1<-sum.vector[2]
+
quar.2<-sum.vector[3]
+
quar.3<-sum.vector[5]
+
+
return(quar.1,quar.2,quar.3)
+ }
> apply(prob2.matrix,2,first.three.quartiles)
[[1]]
[[1]]$quar.1
1st Qu.
2.345
[[1]]$quar.2
Median
2.793
[[1]]$quar.3
3rd Qu.
3.603
[[2]]
[[2]]$quar.1
1st Qu.
2.633
[[2]]$quar.2
Median
2.687
[[2]]$quar.3
3rd Qu.
3.438
[[3]]
[[3]]$quar.1
1st Qu.
2.285
[[3]]$quar.2
Median
2.573
[[3]]$quar.3
3rd Qu.
3.103
[[4]]
[[4]]$quar.1
1st Qu.
2.805
[[4]]$quar.2
Median
3.145
[[4]]$quar.3
3rd Qu.
3.209
[[5]]
[[5]]$quar.1
1st Qu.
2.715
[[5]]$quar.2
Median
2.954
[[5]]$quar.3
3rd Qu.
3.805
[[6]]
[[6]]$quar.1
1st Qu.
2.297
[[6]]$quar.2
Median
2.643
[[6]]$quar.3
3rd Qu.
2.744
3) Sample a random integer between 15 and 30. Call it n. Build a table with n rows and
4 columns. The first column should be your first name. The second column is the
row number (1,2,3,4, etc). The third column is the row number squared
(1,4,9,16, etc). The fourth column is the square root of the row number rounded to
two decimal places.
Write the table out to a .dat file in your R working directory. Open this file up with your
choice of text editor (Word, Notepad, etc). Turn a copy of it in.
> n<-sample(seq(15,30),1)
> n
[1] 28
#Building the columns
> col.1<-rep("Rebecca",n)
> row.nos<-seq(1,28)
> col.2<-row.nos
> col.3<-row.nos^2
> col.4<-round(sqrt(row.nos),2)
> prob4.table<-cbind(col.1,col.2,col.3,col.4)
> write.table(prob4.table,"prob4.table.dat")
> read.table("prob4.table.dat")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
(getting it back)
col.1 col.2 col.3 col.4
Rebecca
1
1 1.00
Rebecca
2
4 1.41
Rebecca
3
9 1.73
Rebecca
4
16 2.00
Rebecca
5
25 2.24
Rebecca
6
36 2.45
Rebecca
7
49 2.65
Rebecca
8
64 2.83
Rebecca
9
81 3.00
Rebecca
10
100 3.16
Rebecca
11
121 3.32
Rebecca
12
144 3.46
Rebecca
13
169 3.61
Rebecca
14
196 3.74
Rebecca
15
225 3.87
Rebecca
16
256 4.00
Rebecca
17
289 4.12
Rebecca
18
324 4.24
Rebecca
19
361 4.36
Rebecca
20
400 4.47
Rebecca
21
441 4.58
Rebecca
22
484 4.69
Rebecca
23
529 4.80
Rebecca
24
576 4.90
Rebecca
25
625 5.00
Rebecca
26
676 5.10
Rebecca
27
729 5.20
Rebecca
28
784 5.29
Download