Department of Statistics

advertisement
Department of Statistics
Stockholm University
R - Study Group - Session II
Vectors, Matrices, Listing and Indexing
Nicklas Pettersson and Cletus Kum
November 4 2008
1. Objects:
Everything in R is an object. The basic object in R is the vector.
2. Vectors:
A vector in R is a variable with one or more values of the same type: logical, integer,
real, complex, string (character) or raw.
A scalar in R is a vector of length 1.Vectors can also have length 0, which is useful in
writing functions. To create vectors, we use the concatenation function c.
Examples:
i)
X <- c(20,12, 23, 15,25)#numeric vector
Or assign("X",c(20,12, 23, 15,25))#numeric vector
Y <- c("T","T","T","F","F","F") # vector of characters
Or Y<-rep(c(T,F),each=3) # vector of characters
iii) Y <- c(T,T,F,F,F,T) # logical vector
ii)
1
Logical vectors are created by stating conditions, using logical operators.
a) Logical Operators
Equals
Less than
Greater than
Less or equal
Greater or equal
Not equal
Not
And
Or
==
<
>
<=
>=
!=
!
&
|
Examples:
i) X <- c(20, 12, 23, 15, 25); M <-X > 23
Another example of a character function is the paste function.
This function concatenates vectors after converting to character.
paste("X",rep(1:2,each=2), 1:2, sep="")
b) Sequence Vector
2
One can also obtain a vector by using the sequence function seq to generate a
sequence. Here we use the : (colon) operator.
Examples:
i) Y <- 1:10 ii) Z <- seq(1,10)
Z <- seq(1,10,2)
iii) Z <- seq(1,10,by = 2) or simply
Z<-seq(0:5) is not the same thing as Z<-seq(0,5)
c) Sequence Vectors with repetition (replication)
Examples:
i) rep(x, times=5) # Five copies of x end-to-end.
ii) rep(x, each=5) # Repeats each element of x five times
before moving on to the next
d) Manipulating a Data Vector:
Suppose X is vector of length n = length(x)
i) X[1]# picks out first element of X; ii) X[length(X)]# picks out
only the last element of X; iii) X[c(2,3)]# 2nd and 3rd entries;
iv) X[1]<-6 #assign a value of 6 to the first entry;
v) X[c(2,5)]=c(3,4)#assigns values of 3 and 4 respectively to the
2nd and 5th entries
3
3. FACTORS
A factor is a classifying variable. By making a variable a factor, R understands that the
variable is nominal. The factor stores the nominal values as a vector of integers in the
range [1,2…k] (where k is the number of unique values in the nominal variable), and the
internal vector of character strings (the original values) mapped to these integers.
Example:
gender<-c(rep("male ",10),rep("female ",5))
gender<-factor(gender)#stores gender as 10 1s and 5 2s and
#1=female,2=male internally in alphabetical manner.
summary(gender)
4. Lists:
A list is an ordered collection of objects or components. This permits the collection of a
variety of related or unrelated objects under one name.
Example:
List1<-list(X=sample(1:4,10,rep=T),Y=rep(letters[1:5],2),
Z=rpois(10,1.5))
In general listing is:
mylist <- list(first=x,second=y,third=z) # Put objects x, y,
z in “mylist”, give them names first, second and third.
4
5.Data Frames:
A data frame is a list where all objects have the same length. Components must be
vectors, factors, numeric matrices, lists or other data frames. Characters are coerced to be
factors. A data frame is indexed in the same fashion as a matrix by row and column
number.
Examples:
i) First create a simple data frame using vectors X and Y and function data.frame:
X
<c(20,
12,
23,
15,
25) ;
Y
<letters[1 :5] ;
data.frame(X,Y)
ii) Using the names function :
m<-c(2,6,3,15)
c<-c("black","white","green",NA)
r<-c(TRUE,FALSE,TRUE,FALSE) # cannot take "YES","or "NO"
mydata<-data.frame(m,c,r)
names(mydata)<-c("ID","Colour","PASSED") #Giving names to m, c,
and r.
mydata[c("ID","PASSED")] or Mydata[2:3] #Will display these two
columns only.
iii) One could also use the read.table function:
mystring <- ("id,workshop,gender,q1,q2,q3,q4
1,1,f,1,1,5,1
5
2,,f,2,1,4,1
3,1,f,2,2,4,3
4,,f,3,1, ,3
5,1,m,4,5,2,4
6,2,m,5,4,5,5
7,,m,5,3,4,4
8,2,m,4,5,5,5")
mydata<-read.table(textConnection(mystring),
header=TRUE,sep=",",row.names="id"); rm(mystring)
attach(mydata) # Attach list or data.frame so that "q1" instead
of "mydata$q1" can be used to reach variables.
detach(mydata) # Detach list or data.frame
search() # Show which data frames, lists and packages that are
attached
iv)The fix and edit functions
fix(mydata) # Data in a spreadsheet (if data frame or matrix)
or textsheet (vectors, integers)
names(mydata) <- edit( names(mydata) ) # Generate window where
names can be changed
mydata$q5 <- NULL # Drop variable q5 from mydata
v) Using the “==” operator
6
females <- mydata[gender=="f"] #Put females in a data frame.
6. Matrices
A matrix is a two dimensional array of numbers. R begins with a list of elements and
translates this into a matrix by filling up columns.
All columns in a matrix must have the same mode and same length.
6.1 Creating matrices in R
One may create a matrix in R by;
a) Using the function dim.
The dim assign function sets or changes to the dimension attribute of vector X, causing R
to treat the vector of 6 numbers as a 2 x 3 matrix.
Examples:
i) X<-1:6
dim(X)<-c(2,3)
ii) vec<-c(2,3,4,5,5,4,3,2)
dim(vec)<-c(4,2) and to verify if vec is actually a vector we just type the
function is.matrix(vec), response is TRUE.
b) Using the matrix function:
Examples:
7
matrix(1:6,nrow=2,byrow=T)# byrow=T argument causes the matrix to be
filled by row. The default is column wise.
ii) cells<-c(2,27,25,69)
rnames<-c("R1","R2")
cnames<-c("c1","c2")
i)
mymatrix<matrix(cells,nrow=2,ncol=2,byrow=T,dimnames=list(rnames,cnames))
The dimnames function provides optional labels for the rows and columns.
iii)
N<-matrix(2,4,6) is a matrix of 2s, with nrow=4 and ncol=6
iv)
N<-matrix(c(2,4,6)) is a column vector.
c) Using the cbind and rbind functions:
Matrices can be produced by "gluing" together other vectors column wise or row wise,
using the cbind and rbind functions.
Examples:
i) V1<-c(2,6,3,9,10)
V2<-c(5,11,5,20,15)
Rbindv1v2<-rbind(V1,V2)
8
ii) V1<-c(2,6,3,9,10)
V2<-c(5,11,5,20,15)
cbindv1v2<-cbind(V1,V2)
6.2 Some manipulations with matrices
 Multiplication:
a) X<-matrix(c(7,8,9,10,11,12),nrow=2)
Y<-matrix(c(7,8,9,10,11,12),nrow=3,byrow=TRUE)
Matrix.prod<-X%*%Y
b) d<-matrix(c(2,4,6),nrow=3)
dt<-matrix(c(2,4,6),nrow=1)
i) dt%*%d
ii) d%*%dt
 Transpose of a Matrix:
transP <-t(X)
9
 Determinant, Diagonal and Inverse of a matrix :
To obtain the inverse of a matrix requires loading the MASS package (Moore-Penrose
generalised inverse) and using the ginv function.
We can equally obtain an inverse by loading the car package and using the function
inv.
a) P<-matrix(c(1,2,4,2,1,1,3,1,2),3,3)
#Using MASS package
library(MASS)
PI<-ginv(P)
Determinant of P is det(P); Diagonal of P is diag(P)
b) P<-matrix(c(1,2,4,2,1,1,3,1,2),3,3)
#Using car package
library(car)
PI<-inv(P)
c) For a square matrix A, inverse is given by solve(A)
i) M<-matrix(c(3,2,9,16,25,49,9,25,36),3,3);solve(M)
ii)
M<matrix(c(3,2,9,16,25,49,9,25,36),nrow=3,ncol=3,byrow=T);solve(M)
d) Other applications include:
10
i) Solving a system of equations.
5x + 3y = -7
4x + 5y = -3
Solution:
A=matrix(c(5,4,3,5),nrow=2)
B<-matrix(c(-7,-3),nrow=2)
solve(A,B) or solve(A)%*%B
ii) To solve the equation
1
Bˆ  X X  X Y
We proceed as follows:
library(MASS)
ginv(t(X)%*%X )%*%t(X)%*%Y
7. Indexing:
a) Indexing Matrices
X[i,j] # element at row i, column j
X[i,] # row i
X[,j] # column j
X[,c(1,3)] # columns 1 and 3
X["name",] # row named “name”
11
b) Indexing data frames (matrix indexing plus the following)
X[["name"]] # column named “name”
X$name # id
12
Download