Uploaded by qy L

Introduction-to-R

advertisement
Regression Modelling
Introduction to R
Abhinav Mehta
R Introduction
R and Rstudio
• R is a programming language and free software environment
for statistical computing and graphics.
• RStudio is an integrated development environment (IDE) for
R and and has more friendly user interface than R.
• RStudio can not be used without R. So to use Rstudio, you
must install R first.
• See some R learning resourses on Wattle.
Rstudio Layout
Basic R Programming
5 + 2ˆ3 # Direct way
[1] 13
x <- 1 # Define some objects in R
y <- 2ˆ3
z <- x+y
z
[1] 9
sum(x, y) # Use some functions
[1] 9
mean(x, y)
[1] 1
Suggestions
• Creat an R script file to save your R commands.
• Set your working directory.
•
In RStudio easiest way to do this is create a new project.
• Give meaningful names to the objects you create.
• Add apporiate comments when coding.
• Run the code to check errors line by line (or block by block)
when coding.
# Remove the existing objects in the current workspace
rm(list = ls())
# Know your working directory
getwd()
[1] "/Users/abhinav/Library/Mobile Documents/com~apple~CloudDocs
Name <- c("Abhinav", "Lucy", "Eric")
Gender <- c("M", "F", "M")
N_Course <- c(2, 2, 1) # Number of courses they teach
InPerson <- c(FALSE, FALSE, TRUE) # Teaching in person or not
Choosing the working directory
# Setting the working directory through the console
setwd()
• This requires the file path like ‘~/Documents/Regression/. . . ’
• Alternatively, use the ‘Tools’ command in R Studio
Creating a new project
Get Help with R
• Use ‘help()’ or ? followed by some function name.
help(log)
?log
• Google
• See more information on https://www.r-project.org/help.html
for other options.
This Course ΜΈ= R
• The goal of this course is to implement appropriate statistical
modelling for real data.
• R is just a tool to realise the statistical computing when
statistical models have been built.
• The more important thing in this course is to understand
statistical concepts. That is the only way to perform
statistical analysis properly.
Data Types and Data Structures in R
Basic Data Type
• Numeric
• Character
• Logical (TRUE / FALSE)
Basic Data Type
# Use class() to get the data type
class(x)
class(Name)
class(InPerson)
# Boolean expression
Female <- Gender == "F"
Female
[1] FALSE
TRUE FALSE
class(Female)
[1] "logical"
Basic Data Structure
• Vector is a single entity consisting of a collection of numbers.
• Factor is a special vector and is used to store categorial data.
• Matrices or more generally arrays are multi-dimensional
generalisations of vectors.
• List is a general form of vector in which the various elements
need not be of the same type.
• Data frame is matrix-like structure, in which the columns can
be of different types/variables. Think of data frame as “data
matrice” with one row per observation but with (possibly)
both numerical and categorical variables.
Vector
Topic <- c("SLR", "MLR", "GLM") # By c()
# Generate regular sequences
a1 <- 1:4 # By colon
a2 <- seq(from=1,to=5, by=2) # By seq()
a1 + 4
[1] 5 6 7 8
# Select values
a1[3]
[1] 3
a1[c(1,4)]
[1] 1 4
# Some functions
length(a1) # Get the length of vectors
[1] 4
log(a1)
[1] 0.0000000 0.6931472 1.0986123 1.3862944
Factor
# Unorderd factor
f1 <- factor(Female)
f1
[1] FALSE TRUE FALSE
Levels: FALSE TRUE
class(f1)
[1] "factor"
# Orderd factor
f2 <- factor(c("Disagree","Agree","Medium","Agree"),
ordered = TRUE, levels = c("Disagree","Medium","Agree"))
f2
[1] Disagree Agree
Medium
Agree
Levels: Disagree < Medium < Agree
class(f2)
[1] "ordered" "factor"
Matrix
M1 <- matrix(1:6, ncol=3) # default ordering by column
M1
[1,]
[2,]
[,1] [,2] [,3]
1
3
5
2
4
6
M1 <- matrix(1:6, ncol=3, byrow=TRUE)
M1
[1,]
[2,]
[,1] [,2] [,3]
1
2
3
4
5
6
M1[2,3] # select the element in row 2 and column 3
[1] 6
Combine Matrices
M2 <- rbind(1:3,2:4) # row combine
M2
[1,]
[2,]
[,1] [,2] [,3]
1
2
3
2
3
4
# Make sure row numbers are same
M <- cbind(M1,M2) # column combine
M
[1,]
[2,]
[,1] [,2] [,3] [,4] [,5] [,6]
1
2
3
1
2
3
4
5
6
2
3
4
M[1,c(1,3,5)]
[1] 1 3 2
List
list1 <- list() # null list
list2 <- list(1, "abc", c(2,4,5)) # list with 3 elements
list2
[[1]]
[1] 1
[[2]]
[1] "abc"
[[3]]
[1] 2 4 5
class(list2)
[1] "list"
list2[[2]] # extract the element by ID number
[1] "abc"
List
# list with elements having a name
list3 <- list(num=1, char="abc", vec=c(2,4,5))
list3
$num
[1] 1
$char
[1] "abc"
$vec
[1] 2 4 5
list3$char # extract the element by name
[1] "abc"
list3[[2]]
[1] "abc"
Data Frame
head(iris) # show the first 6 rows of a built-in R dataset "iris"
1
2
3
4
5
6
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1
3.5
1.4
0.2 setosa
4.9
3.0
1.4
0.2 setosa
4.7
3.2
1.3
0.2 setosa
4.6
3.1
1.5
0.2 setosa
5.0
3.6
1.4
0.2 setosa
5.4
3.9
1.7
0.4 setosa
class(iris)
[1] "data.frame"
class(iris$Species)
[1] "factor"
Data Frame
iris[3,] # the third observation
3
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4.7
3.2
1.3
0.2 setosa
iris[3,1] # the first variable of the third observation
[1] 4.7
iris$Species[120]
[1] virginica
Levels: setosa versicolor virginica
Data Frame
mydata <- data.frame(Name, Gender, N_Course, InPerson)
mydata
Name Gender N_Course InPerson
1 Abhinav
M
2
FALSE
2
Lucy
F
2
FALSE
3
Eric
M
1
TRUE
Import and Export Data
.csv File
> # Import
> data <- read.csv("assessment.csv", header = TRUE)
> head(data,3)
1
2
3
UniqueNo gender residence Assignment1 Assignment2 Assignment3
1
Male
India
16
24
40
2
Male Indonesia
18
27
45
3
Male
Korea
18
27
45
> data$FinalScore <- data$Assignment1 + data$Assignment2 + data$Assignment3
> head(data,3)
1
2
3
UniqueNo gender residence Assignment1 Assignment2 Assignment3 FinalScore
1
Male
India
16
24
40
80
2
Male Indonesia
18
27
45
90
3
Male
Korea
18
27
45
90
# Export
write.csv(data, "new_assessment.csv")
.txt File or .Rdata File
# Import .txt file
cherry <- read.table("Cherry.txt", header=TRUE)
head(cherry)
1
2
3
4
5
6
diameter height volume
8.3
70
10.3
8.6
65
10.3
8.8
63
10.2
10.5
72
16.4
10.7
81
18.8
10.8
83
19.7
# Export one dataset to one .Rdata file
save(mydata,file = "mydata.RData")
# Export multiple objects to one .Rdata file
save(mydata, M1, file = "two_objects.RData")
# Export entire working space to one .Rdata file
save.image("all_objects.RData")
# Import .Rdata file
load("save_all_objects.RData")
Packages
Install and Load Packages
# Only need to install once
install.packages("ALSM")
# Load before you use the package
library(ALSM)
The best way to learn R is to use it.
Take a look at the exercises in Week 2’s tutorial.
Try to do it by yourself!
Download