Department of Statistics Stockholm University R - Study Group - Session II Vectors, Matrices, Listing and Indexing Nicklas Pettersson and Cletus Kum November 4 2008 1. Objects: Everything in R is an object. The basic object in R is the vector. 2. Vectors: A vector in R is a variable with one or more values of the same type: logical, integer, real, complex, string (character) or raw. A scalar in R is a vector of length 1.Vectors can also have length 0, which is useful in writing functions. To create vectors, we use the concatenation function c. Examples: i) X <- c(20,12, 23, 15,25)#numeric vector Or assign("X",c(20,12, 23, 15,25))#numeric vector Y <- c("T","T","T","F","F","F") # vector of characters Or Y<-rep(c(T,F),each=3) # vector of characters iii) Y <- c(T,T,F,F,F,T) # logical vector ii) 1 Logical vectors are created by stating conditions, using logical operators. a) Logical Operators Equals Less than Greater than Less or equal Greater or equal Not equal Not And Or == < > <= >= != ! & | Examples: i) X <- c(20, 12, 23, 15, 25); M <-X > 23 Another example of a character function is the paste function. This function concatenates vectors after converting to character. paste("X",rep(1:2,each=2), 1:2, sep="") b) Sequence Vector 2 One can also obtain a vector by using the sequence function seq to generate a sequence. Here we use the : (colon) operator. Examples: i) Y <- 1:10 ii) Z <- seq(1,10) Z <- seq(1,10,2) iii) Z <- seq(1,10,by = 2) or simply Z<-seq(0:5) is not the same thing as Z<-seq(0,5) c) Sequence Vectors with repetition (replication) Examples: i) rep(x, times=5) # Five copies of x end-to-end. ii) rep(x, each=5) # Repeats each element of x five times before moving on to the next d) Manipulating a Data Vector: Suppose X is vector of length n = length(x) i) X[1]# picks out first element of X; ii) X[length(X)]# picks out only the last element of X; iii) X[c(2,3)]# 2nd and 3rd entries; iv) X[1]<-6 #assign a value of 6 to the first entry; v) X[c(2,5)]=c(3,4)#assigns values of 3 and 4 respectively to the 2nd and 5th entries 3 3. FACTORS A factor is a classifying variable. By making a variable a factor, R understands that the variable is nominal. The factor stores the nominal values as a vector of integers in the range [1,2…k] (where k is the number of unique values in the nominal variable), and the internal vector of character strings (the original values) mapped to these integers. Example: gender<-c(rep("male ",10),rep("female ",5)) gender<-factor(gender)#stores gender as 10 1s and 5 2s and #1=female,2=male internally in alphabetical manner. summary(gender) 4. Lists: A list is an ordered collection of objects or components. This permits the collection of a variety of related or unrelated objects under one name. Example: List1<-list(X=sample(1:4,10,rep=T),Y=rep(letters[1:5],2), Z=rpois(10,1.5)) In general listing is: mylist <- list(first=x,second=y,third=z) # Put objects x, y, z in “mylist”, give them names first, second and third. 4 5.Data Frames: A data frame is a list where all objects have the same length. Components must be vectors, factors, numeric matrices, lists or other data frames. Characters are coerced to be factors. A data frame is indexed in the same fashion as a matrix by row and column number. Examples: i) First create a simple data frame using vectors X and Y and function data.frame: X <c(20, 12, 23, 15, 25) ; Y <letters[1 :5] ; data.frame(X,Y) ii) Using the names function : m<-c(2,6,3,15) c<-c("black","white","green",NA) r<-c(TRUE,FALSE,TRUE,FALSE) # cannot take "YES","or "NO" mydata<-data.frame(m,c,r) names(mydata)<-c("ID","Colour","PASSED") #Giving names to m, c, and r. mydata[c("ID","PASSED")] or Mydata[2:3] #Will display these two columns only. iii) One could also use the read.table function: mystring <- ("id,workshop,gender,q1,q2,q3,q4 1,1,f,1,1,5,1 5 2,,f,2,1,4,1 3,1,f,2,2,4,3 4,,f,3,1, ,3 5,1,m,4,5,2,4 6,2,m,5,4,5,5 7,,m,5,3,4,4 8,2,m,4,5,5,5") mydata<-read.table(textConnection(mystring), header=TRUE,sep=",",row.names="id"); rm(mystring) attach(mydata) # Attach list or data.frame so that "q1" instead of "mydata$q1" can be used to reach variables. detach(mydata) # Detach list or data.frame search() # Show which data frames, lists and packages that are attached iv)The fix and edit functions fix(mydata) # Data in a spreadsheet (if data frame or matrix) or textsheet (vectors, integers) names(mydata) <- edit( names(mydata) ) # Generate window where names can be changed mydata$q5 <- NULL # Drop variable q5 from mydata v) Using the “==” operator 6 females <- mydata[gender=="f"] #Put females in a data frame. 6. Matrices A matrix is a two dimensional array of numbers. R begins with a list of elements and translates this into a matrix by filling up columns. All columns in a matrix must have the same mode and same length. 6.1 Creating matrices in R One may create a matrix in R by; a) Using the function dim. The dim assign function sets or changes to the dimension attribute of vector X, causing R to treat the vector of 6 numbers as a 2 x 3 matrix. Examples: i) X<-1:6 dim(X)<-c(2,3) ii) vec<-c(2,3,4,5,5,4,3,2) dim(vec)<-c(4,2) and to verify if vec is actually a vector we just type the function is.matrix(vec), response is TRUE. b) Using the matrix function: Examples: 7 matrix(1:6,nrow=2,byrow=T)# byrow=T argument causes the matrix to be filled by row. The default is column wise. ii) cells<-c(2,27,25,69) rnames<-c("R1","R2") cnames<-c("c1","c2") i) mymatrix<matrix(cells,nrow=2,ncol=2,byrow=T,dimnames=list(rnames,cnames)) The dimnames function provides optional labels for the rows and columns. iii) N<-matrix(2,4,6) is a matrix of 2s, with nrow=4 and ncol=6 iv) N<-matrix(c(2,4,6)) is a column vector. c) Using the cbind and rbind functions: Matrices can be produced by "gluing" together other vectors column wise or row wise, using the cbind and rbind functions. Examples: i) V1<-c(2,6,3,9,10) V2<-c(5,11,5,20,15) Rbindv1v2<-rbind(V1,V2) 8 ii) V1<-c(2,6,3,9,10) V2<-c(5,11,5,20,15) cbindv1v2<-cbind(V1,V2) 6.2 Some manipulations with matrices Multiplication: a) X<-matrix(c(7,8,9,10,11,12),nrow=2) Y<-matrix(c(7,8,9,10,11,12),nrow=3,byrow=TRUE) Matrix.prod<-X%*%Y b) d<-matrix(c(2,4,6),nrow=3) dt<-matrix(c(2,4,6),nrow=1) i) dt%*%d ii) d%*%dt Transpose of a Matrix: transP <-t(X) 9 Determinant, Diagonal and Inverse of a matrix : To obtain the inverse of a matrix requires loading the MASS package (Moore-Penrose generalised inverse) and using the ginv function. We can equally obtain an inverse by loading the car package and using the function inv. a) P<-matrix(c(1,2,4,2,1,1,3,1,2),3,3) #Using MASS package library(MASS) PI<-ginv(P) Determinant of P is det(P); Diagonal of P is diag(P) b) P<-matrix(c(1,2,4,2,1,1,3,1,2),3,3) #Using car package library(car) PI<-inv(P) c) For a square matrix A, inverse is given by solve(A) i) M<-matrix(c(3,2,9,16,25,49,9,25,36),3,3);solve(M) ii) M<matrix(c(3,2,9,16,25,49,9,25,36),nrow=3,ncol=3,byrow=T);solve(M) d) Other applications include: 10 i) Solving a system of equations. 5x + 3y = -7 4x + 5y = -3 Solution: A=matrix(c(5,4,3,5),nrow=2) B<-matrix(c(-7,-3),nrow=2) solve(A,B) or solve(A)%*%B ii) To solve the equation 1 Bˆ X X X Y We proceed as follows: library(MASS) ginv(t(X)%*%X )%*%t(X)%*%Y 7. Indexing: a) Indexing Matrices X[i,j] # element at row i, column j X[i,] # row i X[,j] # column j X[,c(1,3)] # columns 1 and 3 X["name",] # row named “name” 11 b) Indexing data frames (matrix indexing plus the following) X[["name"]] # column named “name” X$name # id 12