Regression Modelling Introduction to R Abhinav Mehta R Introduction R and Rstudio • R is a programming language and free software environment for statistical computing and graphics. • RStudio is an integrated development environment (IDE) for R and and has more friendly user interface than R. • RStudio can not be used without R. So to use Rstudio, you must install R first. • See some R learning resourses on Wattle. Rstudio Layout Basic R Programming 5 + 2ˆ3 # Direct way [1] 13 x <- 1 # Define some objects in R y <- 2ˆ3 z <- x+y z [1] 9 sum(x, y) # Use some functions [1] 9 mean(x, y) [1] 1 Suggestions • Creat an R script file to save your R commands. • Set your working directory. • In RStudio easiest way to do this is create a new project. • Give meaningful names to the objects you create. • Add apporiate comments when coding. • Run the code to check errors line by line (or block by block) when coding. # Remove the existing objects in the current workspace rm(list = ls()) # Know your working directory getwd() [1] "/Users/abhinav/Library/Mobile Documents/com~apple~CloudDocs Name <- c("Abhinav", "Lucy", "Eric") Gender <- c("M", "F", "M") N_Course <- c(2, 2, 1) # Number of courses they teach InPerson <- c(FALSE, FALSE, TRUE) # Teaching in person or not Choosing the working directory # Setting the working directory through the console setwd() • This requires the file path like ‘~/Documents/Regression/. . . ’ • Alternatively, use the ‘Tools’ command in R Studio Creating a new project Get Help with R • Use ‘help()’ or ? followed by some function name. help(log) ?log • Google • See more information on https://www.r-project.org/help.html for other options. This Course ΜΈ= R • The goal of this course is to implement appropriate statistical modelling for real data. • R is just a tool to realise the statistical computing when statistical models have been built. • The more important thing in this course is to understand statistical concepts. That is the only way to perform statistical analysis properly. Data Types and Data Structures in R Basic Data Type • Numeric • Character • Logical (TRUE / FALSE) Basic Data Type # Use class() to get the data type class(x) class(Name) class(InPerson) # Boolean expression Female <- Gender == "F" Female [1] FALSE TRUE FALSE class(Female) [1] "logical" Basic Data Structure • Vector is a single entity consisting of a collection of numbers. • Factor is a special vector and is used to store categorial data. • Matrices or more generally arrays are multi-dimensional generalisations of vectors. • List is a general form of vector in which the various elements need not be of the same type. • Data frame is matrix-like structure, in which the columns can be of different types/variables. Think of data frame as “data matrice” with one row per observation but with (possibly) both numerical and categorical variables. Vector Topic <- c("SLR", "MLR", "GLM") # By c() # Generate regular sequences a1 <- 1:4 # By colon a2 <- seq(from=1,to=5, by=2) # By seq() a1 + 4 [1] 5 6 7 8 # Select values a1[3] [1] 3 a1[c(1,4)] [1] 1 4 # Some functions length(a1) # Get the length of vectors [1] 4 log(a1) [1] 0.0000000 0.6931472 1.0986123 1.3862944 Factor # Unorderd factor f1 <- factor(Female) f1 [1] FALSE TRUE FALSE Levels: FALSE TRUE class(f1) [1] "factor" # Orderd factor f2 <- factor(c("Disagree","Agree","Medium","Agree"), ordered = TRUE, levels = c("Disagree","Medium","Agree")) f2 [1] Disagree Agree Medium Agree Levels: Disagree < Medium < Agree class(f2) [1] "ordered" "factor" Matrix M1 <- matrix(1:6, ncol=3) # default ordering by column M1 [1,] [2,] [,1] [,2] [,3] 1 3 5 2 4 6 M1 <- matrix(1:6, ncol=3, byrow=TRUE) M1 [1,] [2,] [,1] [,2] [,3] 1 2 3 4 5 6 M1[2,3] # select the element in row 2 and column 3 [1] 6 Combine Matrices M2 <- rbind(1:3,2:4) # row combine M2 [1,] [2,] [,1] [,2] [,3] 1 2 3 2 3 4 # Make sure row numbers are same M <- cbind(M1,M2) # column combine M [1,] [2,] [,1] [,2] [,3] [,4] [,5] [,6] 1 2 3 1 2 3 4 5 6 2 3 4 M[1,c(1,3,5)] [1] 1 3 2 List list1 <- list() # null list list2 <- list(1, "abc", c(2,4,5)) # list with 3 elements list2 [[1]] [1] 1 [[2]] [1] "abc" [[3]] [1] 2 4 5 class(list2) [1] "list" list2[[2]] # extract the element by ID number [1] "abc" List # list with elements having a name list3 <- list(num=1, char="abc", vec=c(2,4,5)) list3 $num [1] 1 $char [1] "abc" $vec [1] 2 4 5 list3$char # extract the element by name [1] "abc" list3[[2]] [1] "abc" Data Frame head(iris) # show the first 6 rows of a built-in R dataset "iris" 1 2 3 4 5 6 Sepal.Length Sepal.Width Petal.Length Petal.Width Species 5.1 3.5 1.4 0.2 setosa 4.9 3.0 1.4 0.2 setosa 4.7 3.2 1.3 0.2 setosa 4.6 3.1 1.5 0.2 setosa 5.0 3.6 1.4 0.2 setosa 5.4 3.9 1.7 0.4 setosa class(iris) [1] "data.frame" class(iris$Species) [1] "factor" Data Frame iris[3,] # the third observation 3 Sepal.Length Sepal.Width Petal.Length Petal.Width Species 4.7 3.2 1.3 0.2 setosa iris[3,1] # the first variable of the third observation [1] 4.7 iris$Species[120] [1] virginica Levels: setosa versicolor virginica Data Frame mydata <- data.frame(Name, Gender, N_Course, InPerson) mydata Name Gender N_Course InPerson 1 Abhinav M 2 FALSE 2 Lucy F 2 FALSE 3 Eric M 1 TRUE Import and Export Data .csv File > # Import > data <- read.csv("assessment.csv", header = TRUE) > head(data,3) 1 2 3 UniqueNo gender residence Assignment1 Assignment2 Assignment3 1 Male India 16 24 40 2 Male Indonesia 18 27 45 3 Male Korea 18 27 45 > data$FinalScore <- data$Assignment1 + data$Assignment2 + data$Assignment3 > head(data,3) 1 2 3 UniqueNo gender residence Assignment1 Assignment2 Assignment3 FinalScore 1 Male India 16 24 40 80 2 Male Indonesia 18 27 45 90 3 Male Korea 18 27 45 90 # Export write.csv(data, "new_assessment.csv") .txt File or .Rdata File # Import .txt file cherry <- read.table("Cherry.txt", header=TRUE) head(cherry) 1 2 3 4 5 6 diameter height volume 8.3 70 10.3 8.6 65 10.3 8.8 63 10.2 10.5 72 16.4 10.7 81 18.8 10.8 83 19.7 # Export one dataset to one .Rdata file save(mydata,file = "mydata.RData") # Export multiple objects to one .Rdata file save(mydata, M1, file = "two_objects.RData") # Export entire working space to one .Rdata file save.image("all_objects.RData") # Import .Rdata file load("save_all_objects.RData") Packages Install and Load Packages # Only need to install once install.packages("ALSM") # Load before you use the package library(ALSM) The best way to learn R is to use it. Take a look at the exercises in Week 2’s tutorial. Try to do it by yourself!