R Session 2

advertisement
687313972
Example R Session 2
CSSS 508 Spring 2005
The following is color coded:
blue for comments/description
red for commands typed into R
black for response of R
Creating vectors
: and c() function
> x <- 1:5
> x
[1] 1 2 3 4 5
> y <- 11:15
> y
[1] 11 12 13 14 15
> x+y
[1] 12 14 16 18 20
> z <- c(10,6,3,5,1)
> z
[1] 10 6 3 5 1
seq() function
> help(seq)
> seq(from=3,to=7,by=2)
[1] 3 5 7
> seq(3,17)
[1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
> seq(14,2)
[1] 14 13 12 11 10 9 8 7 6 5 4 3 2
> seq(1,2,.1)
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
rep() function
> help(rep)
> rep(1,5)
[1] 1 1 1 1 1
> rep(1:5,3)
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
> temp <- c(2,1,5,9)
> temp.3 <- rep(temp,3)
> temp.3
[1] 2 1 5 9 2 1 5 9 2 1 5 9
> rep(c(seq(2,10,2),seq(1,9,2)),2)
[1] 2 4 6 8 10 1 3 5 7 9
2
4
6
8 10
1
3
5
7
9
Indexing vectors
> length(temp.3)
[1] 12
> temp.3[3]
[1] 5
> temp.3[5:7]
[1] 2 1 5
1
2/6/2016
687313972
> temp.3[c(2,1,3)]
[1] 1 2 5
Comparisons, TRUE and FALSE
> temp
[1] 2 1 5 9
> temp == 1
[1] FALSE TRUE FALSE FALSE
> temp >= 2
[1] TRUE FALSE TRUE TRUE
> temp < 5
[1] TRUE TRUE FALSE FALSE
> temp != 9
[1] TRUE TRUE TRUE FALSE
> temp.3
[1] 2 1 5 9 2 1 5 9 2 1 5 9
> temp.3 == 9
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE
> find9 <- temp.3==9
> find9
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE
> temp.3[find9]
[1] 9 9 9
TRUE FALSE FALSE FALSE
TRUE
TRUE FALSE FALSE FALSE
TRUE
Using sum() function with boolean values
> sum(temp.3)
[1] 51
> boolean.vector <- c(T,F,T,T,F)
> boolean.vector
[1] TRUE FALSE TRUE TRUE FALSE
> sum(boolean.vector)
[1] 3
> sum(temp.3 == 9)
[1] 3
sample(), sort() and order() function
> help(sample)
> test <- sample(1:15,15)
> test
[1] 2 1 9 10 7 11 14 15
> sort.test <- sort(test)
> sort.test
[1] 1 2 3 4 5 6 7 8
> order.test <- order(test)
> order.test
[1] 2 1 13 11 15 14 5 9
> test[order.test]
[1] 1 2 3 4 5 6 7 8
>
8 13
4 12
3
6
5
9 10 11 12 13 14 15
3
4
6 12 10
7
8
9 10 11 12 13 14 15
Creating arrays (matrices) by combining vectors
> x
[1] 1 2 3 4 5
> y
[1] 11 12 13 14 15
2
2/6/2016
687313972
> z
[1] 10
6
3
5
1
> rbind(x,y,z)
[,1] [,2] [,3] [,4] [,5]
x
1
2
3
4
5
y
11
12
13
14
15
z
10
6
3
5
1
> m <- rbind(x,y,z)
> m
[,1] [,2] [,3] [,4] [,5]
x
1
2
3
4
5
y
11
12
13
14
15
z
10
6
3
5
1
> dim(m)
[1] 3 5
> n <- cbind(x,y,z)
> n
x y z
[1,] 1 11 10
[2,] 2 12 6
[3,] 3 13 3
[4,] 4 14 5
[5,] 5 15 1
Indexing to get particular value in an array
> m[2,3]
[1] 13
> n[2,3]
[1] 6
Different kinds of objects: vectors and arrays
Different kinds of attributes: vectors have length but not dimension; arrays have both
dimension and length
> is.vector(y)
[1] TRUE
> length(y)
[1] 5
> dim(y)
NULL
> is.array(y)
[1] FALSE
> is.vector(n)
[1] FALSE
> is.array(n)
[1] TRUE
> dim(n)
[1] 5 3
> length(n)
[1] 15
Using the matrix function
> help(matrix)
> z <- matrix(1:12,nrow=3)
3
2/6/2016
687313972
> z
[,1] [,2] [,3] [,4]
[1,]
1
4
7
10
[2,]
2
5
8
11
[3,]
3
6
9
12
> z <- matrix(1:12,nrow=4)
> z
[,1] [,2] [,3]
[1,]
1
5
9
[2,]
2
6
10
[3,]
3
7
11
[4,]
4
8
12
> z <- matrix(1:12,nrow=3,byrow=T)
> z
[,1] [,2] [,3] [,4]
[1,]
1
2
3
4
[2,]
5
6
7
8
[3,]
9
10
11
12
> temp <- c(5,10,12,24,42,60,63,72)
> z <- matrix(temp,ncol=2)
> z
[,1] [,2]
[1,]
5
42
[2,]
10
60
[3,]
12
63
[4,]
24
72
Matrix multiplication, transpose and inverse
> temp <- c(1, 3, 7, 4, 0, 6)
> x <- matrix(temp,ncol=2)
> x
[,1] [,2]
[1,]
1
4
[2,]
3
0
[3,]
7
6
> y <- matrix(c(3,2,4,5,6),nrow=2,ncol=5)
> y
[,1] [,2] [,3] [,4] [,5]
[1,]
3
4
6
2
5
[2,]
2
5
3
4
6
> x %*% y
[,1] [,2] [,3] [,4] [,5]
[1,]
11
24
18
18
29
[2,]
9
12
18
6
15
[3,]
33
58
60
38
71
> t(x)
[,1] [,2] [,3]
[1,]
1
3
7
[2,]
4
0
6
> z <- t(x) %*% x
> z
[,1] [,2]
[1,]
59
46
[2,]
46
52
> z.inverse <- solve(z)
4
2/6/2016
687313972
> z.inverse
[,1]
[,2]
[1,] 0.05462185 -0.04831933
[2,] -0.04831933 0.06197479
> z %*% z.inverse
[,1] [,2]
[1,] 1.000000e+00
0
[2,] -1.387779e-16
1
Reading in data from a text file using the read.table function. Note I’m reading this as a loca
file in the same directory as my R session (.RData file). A full path name will also work, but you
need to use front slashes (as in Unix) not back slashes (as in Windows). R can also read web
address path names.
> help(read.table)
> example.data <- read.table(“example.dat",header=TRUE)
Note that the variable names, “x1”, “x2”, and “y”, were in the first line of the data file (hence
“header=TRUE” above).
The following shows the first 10 cases in the data file. “[1:10,]” means rows 1 through 10 and
all of the columns. “[4,3]” would refer to the value in the 4th row, 3rd column. “[,1:2]” all of the
rows and the first two columns (x1 and x2).
> example.data[1:10,]
x1
x2
1 20.42405 0.3015210
2 32.74049 1.3469176
3 24.94999 1.1793272
4 27.72610 0.9360980
5 13.54863 2.0794730
6 20.98872 0.9834769
7 15.17568 0.9979689
8 32.34730 1.0945375
9 24.18020 0.2266746
10 17.83226 0.6170786
y
36.172660
28.415784
28.376674
-4.515564
12.169991
27.996572
23.448322
23.722077
17.164225
18.524674
Dataframes
> dim(example.data)
[1] 100
3
> is.array(example.data)
[1] FALSE
> is.data.frame(example.data)
[1] TRUE
Indexing individual values
> example.data[1,1]
[1] 20.42405
> example.data[8,3]
[1] 23.72208
> example.data$x1[1]
[1] 20.42405
Vector components of data frames
> is.vector(example.data$x1)
[1] TRUE
5
2/6/2016
687313972
> length(example.data$x1)
[1] 100
> mean(example.data$x1)
[1] 19.86305
> mean(example.data[,1])
[1] 19.86305
> sum(example.data[2,])
[1] 62.5032
> example.data[2,1]+example.data[2,2]+example.data[2,3]
[1] 62.5032
> example.data$x1[2]+example.data$x2[2]+example.data$y[2]
[1] 62.5032
>
Using glm to fit a linear regression model to the data. Open the description of glm using help.
Note that glm expects the first parameter to be a formula. In this example, the formula is
y~x1+x2. This indicates that y is the dependent variable and x1 and x2 are the independent
variables (covariates). By default an intercept term is used. To explicitly ask for an intercept
term use y~1+x1+x2. To eliminate the intercept term use y~-1+x1+x2.
> glm.results <- glm(y~x1+x2,data=example.data)
Note that “glm.results” is the name of the object that stores all the information from the glm
function.
> glm.results
Call:
glm(formula = y ~ x1 + x2, data = example.data)
Coefficients:
(Intercept)
5.9103
x1
0.6385
x2
0.8105
Degrees of Freedom: 99 Total (i.e. Null); 97 Residual
Null Deviance:
12420
Residual Deviance: 11010
AIC: 761.9
The “summary” function gives more detail.
> summary(glm.results)
Call:
glm(formula = y ~ x1 + x2, data = example.data)
Deviance Residuals:
Min
1Q
-28.8878
-7.0357
Median
-0.7815
3Q
7.2315
Coefficients:
Estimate Std. Error t value
(Intercept)
5.9103
4.1817
1.413
x1
0.6385
0.1874
3.407
x2
0.8105
2.1384
0.379
--Signif. codes: 0 `***' 0.001 `**' 0.01
Max
27.4582
Pr(>|t|)
0.160753
0.000958 ***
0.705508
`*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for gaussian family taken to be 113.4992)
6
2/6/2016
687313972
Null deviance: 12425
Residual deviance: 11009
AIC: 761.92
on 99
on 97
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 2
7
2/6/2016
Download