687313972 Example R Session 2 CSSS 508 Spring 2005 The following is color coded: blue for comments/description red for commands typed into R black for response of R Creating vectors : and c() function > x <- 1:5 > x [1] 1 2 3 4 5 > y <- 11:15 > y [1] 11 12 13 14 15 > x+y [1] 12 14 16 18 20 > z <- c(10,6,3,5,1) > z [1] 10 6 3 5 1 seq() function > help(seq) > seq(from=3,to=7,by=2) [1] 3 5 7 > seq(3,17) [1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 > seq(14,2) [1] 14 13 12 11 10 9 8 7 6 5 4 3 2 > seq(1,2,.1) [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 rep() function > help(rep) > rep(1,5) [1] 1 1 1 1 1 > rep(1:5,3) [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 > temp <- c(2,1,5,9) > temp.3 <- rep(temp,3) > temp.3 [1] 2 1 5 9 2 1 5 9 2 1 5 9 > rep(c(seq(2,10,2),seq(1,9,2)),2) [1] 2 4 6 8 10 1 3 5 7 9 2 4 6 8 10 1 3 5 7 9 Indexing vectors > length(temp.3) [1] 12 > temp.3[3] [1] 5 > temp.3[5:7] [1] 2 1 5 1 2/6/2016 687313972 > temp.3[c(2,1,3)] [1] 1 2 5 Comparisons, TRUE and FALSE > temp [1] 2 1 5 9 > temp == 1 [1] FALSE TRUE FALSE FALSE > temp >= 2 [1] TRUE FALSE TRUE TRUE > temp < 5 [1] TRUE TRUE FALSE FALSE > temp != 9 [1] TRUE TRUE TRUE FALSE > temp.3 [1] 2 1 5 9 2 1 5 9 2 1 5 9 > temp.3 == 9 [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE > find9 <- temp.3==9 > find9 [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE > temp.3[find9] [1] 9 9 9 TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE Using sum() function with boolean values > sum(temp.3) [1] 51 > boolean.vector <- c(T,F,T,T,F) > boolean.vector [1] TRUE FALSE TRUE TRUE FALSE > sum(boolean.vector) [1] 3 > sum(temp.3 == 9) [1] 3 sample(), sort() and order() function > help(sample) > test <- sample(1:15,15) > test [1] 2 1 9 10 7 11 14 15 > sort.test <- sort(test) > sort.test [1] 1 2 3 4 5 6 7 8 > order.test <- order(test) > order.test [1] 2 1 13 11 15 14 5 9 > test[order.test] [1] 1 2 3 4 5 6 7 8 > 8 13 4 12 3 6 5 9 10 11 12 13 14 15 3 4 6 12 10 7 8 9 10 11 12 13 14 15 Creating arrays (matrices) by combining vectors > x [1] 1 2 3 4 5 > y [1] 11 12 13 14 15 2 2/6/2016 687313972 > z [1] 10 6 3 5 1 > rbind(x,y,z) [,1] [,2] [,3] [,4] [,5] x 1 2 3 4 5 y 11 12 13 14 15 z 10 6 3 5 1 > m <- rbind(x,y,z) > m [,1] [,2] [,3] [,4] [,5] x 1 2 3 4 5 y 11 12 13 14 15 z 10 6 3 5 1 > dim(m) [1] 3 5 > n <- cbind(x,y,z) > n x y z [1,] 1 11 10 [2,] 2 12 6 [3,] 3 13 3 [4,] 4 14 5 [5,] 5 15 1 Indexing to get particular value in an array > m[2,3] [1] 13 > n[2,3] [1] 6 Different kinds of objects: vectors and arrays Different kinds of attributes: vectors have length but not dimension; arrays have both dimension and length > is.vector(y) [1] TRUE > length(y) [1] 5 > dim(y) NULL > is.array(y) [1] FALSE > is.vector(n) [1] FALSE > is.array(n) [1] TRUE > dim(n) [1] 5 3 > length(n) [1] 15 Using the matrix function > help(matrix) > z <- matrix(1:12,nrow=3) 3 2/6/2016 687313972 > z [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > z <- matrix(1:12,nrow=4) > z [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 > z <- matrix(1:12,nrow=3,byrow=T) > z [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > temp <- c(5,10,12,24,42,60,63,72) > z <- matrix(temp,ncol=2) > z [,1] [,2] [1,] 5 42 [2,] 10 60 [3,] 12 63 [4,] 24 72 Matrix multiplication, transpose and inverse > temp <- c(1, 3, 7, 4, 0, 6) > x <- matrix(temp,ncol=2) > x [,1] [,2] [1,] 1 4 [2,] 3 0 [3,] 7 6 > y <- matrix(c(3,2,4,5,6),nrow=2,ncol=5) > y [,1] [,2] [,3] [,4] [,5] [1,] 3 4 6 2 5 [2,] 2 5 3 4 6 > x %*% y [,1] [,2] [,3] [,4] [,5] [1,] 11 24 18 18 29 [2,] 9 12 18 6 15 [3,] 33 58 60 38 71 > t(x) [,1] [,2] [,3] [1,] 1 3 7 [2,] 4 0 6 > z <- t(x) %*% x > z [,1] [,2] [1,] 59 46 [2,] 46 52 > z.inverse <- solve(z) 4 2/6/2016 687313972 > z.inverse [,1] [,2] [1,] 0.05462185 -0.04831933 [2,] -0.04831933 0.06197479 > z %*% z.inverse [,1] [,2] [1,] 1.000000e+00 0 [2,] -1.387779e-16 1 Reading in data from a text file using the read.table function. Note I’m reading this as a loca file in the same directory as my R session (.RData file). A full path name will also work, but you need to use front slashes (as in Unix) not back slashes (as in Windows). R can also read web address path names. > help(read.table) > example.data <- read.table(“example.dat",header=TRUE) Note that the variable names, “x1”, “x2”, and “y”, were in the first line of the data file (hence “header=TRUE” above). The following shows the first 10 cases in the data file. “[1:10,]” means rows 1 through 10 and all of the columns. “[4,3]” would refer to the value in the 4th row, 3rd column. “[,1:2]” all of the rows and the first two columns (x1 and x2). > example.data[1:10,] x1 x2 1 20.42405 0.3015210 2 32.74049 1.3469176 3 24.94999 1.1793272 4 27.72610 0.9360980 5 13.54863 2.0794730 6 20.98872 0.9834769 7 15.17568 0.9979689 8 32.34730 1.0945375 9 24.18020 0.2266746 10 17.83226 0.6170786 y 36.172660 28.415784 28.376674 -4.515564 12.169991 27.996572 23.448322 23.722077 17.164225 18.524674 Dataframes > dim(example.data) [1] 100 3 > is.array(example.data) [1] FALSE > is.data.frame(example.data) [1] TRUE Indexing individual values > example.data[1,1] [1] 20.42405 > example.data[8,3] [1] 23.72208 > example.data$x1[1] [1] 20.42405 Vector components of data frames > is.vector(example.data$x1) [1] TRUE 5 2/6/2016 687313972 > length(example.data$x1) [1] 100 > mean(example.data$x1) [1] 19.86305 > mean(example.data[,1]) [1] 19.86305 > sum(example.data[2,]) [1] 62.5032 > example.data[2,1]+example.data[2,2]+example.data[2,3] [1] 62.5032 > example.data$x1[2]+example.data$x2[2]+example.data$y[2] [1] 62.5032 > Using glm to fit a linear regression model to the data. Open the description of glm using help. Note that glm expects the first parameter to be a formula. In this example, the formula is y~x1+x2. This indicates that y is the dependent variable and x1 and x2 are the independent variables (covariates). By default an intercept term is used. To explicitly ask for an intercept term use y~1+x1+x2. To eliminate the intercept term use y~-1+x1+x2. > glm.results <- glm(y~x1+x2,data=example.data) Note that “glm.results” is the name of the object that stores all the information from the glm function. > glm.results Call: glm(formula = y ~ x1 + x2, data = example.data) Coefficients: (Intercept) 5.9103 x1 0.6385 x2 0.8105 Degrees of Freedom: 99 Total (i.e. Null); 97 Residual Null Deviance: 12420 Residual Deviance: 11010 AIC: 761.9 The “summary” function gives more detail. > summary(glm.results) Call: glm(formula = y ~ x1 + x2, data = example.data) Deviance Residuals: Min 1Q -28.8878 -7.0357 Median -0.7815 3Q 7.2315 Coefficients: Estimate Std. Error t value (Intercept) 5.9103 4.1817 1.413 x1 0.6385 0.1874 3.407 x2 0.8105 2.1384 0.379 --Signif. codes: 0 `***' 0.001 `**' 0.01 Max 27.4582 Pr(>|t|) 0.160753 0.000958 *** 0.705508 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for gaussian family taken to be 113.4992) 6 2/6/2016 687313972 Null deviance: 12425 Residual deviance: 11009 AIC: 761.92 on 99 on 97 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 2 7 2/6/2016