Introduction to R and RStudio Welcome to R ¨ ¨ R programming language began in 1992 to create a special-purpose language for use in statistical applications. R gained traction as a popular language as it is available to everyone as a free, open source language developed by a community of committed developers. 7-2 7-2 Welcome to R ¨ ¨ Package: The open source code distributed by Comprehensive R Archive Network (CRAN), a worldwide repository of popular R code. R is an interpreted language where code written is stored as a script. ¤ The script is executed by the system processing the code. ¤ As an interpreted language, R allows execution of R commands directly and gives an immediate result 7-3 7-3 The R Language ¨ The R language is available as a free download from the R Project website at: https://www.r-project.org 7-4 7-4 RStudio ¨ RStudio: An integrated development environment (IDE) that offers a graphical interface to assist in creating R code. ¤ ¨ Allows users to manage code, monitor progress, and troubleshoot issues RStudio IDE comes in different versions ¤ For the purpose of this course, the open source version will be more than sufficient. 7-5 7-5 RStudio Desktop Download the most recent version at: https://posit.co/download/rstudio-desktop/ 7-6 7-6 RStudio Environment ¨ ¨ ¨ ¨ Console Pane: appears in the lower-left corner, allows you to interact directly with the R interpreter and type commands where R will immediately execute them. Script Pane: where you write R commands in a script file that you can save. An R script is simply a text file containing R commands. R will color-code different elements of your code to make it easier to read. Environment Pane: where you can see the values of variables, datasets, and other objects that are currently stored in memory. Plots Pane: appears in the lower-right corner and will contain any graphics that you generate in your R code. 7-7 7-7 R Packages ¨ ¨ Packages are the secret sauce of R, consisting of collections of code created by the community and shared for public use. Installing Packages: ¤ ¤ Use the install.packages() command Ex: Installing RWeka package n ¨ install.packages("RWeka") Loading Packages ¤ ¤ Use the library() command to load a package into session Ex: Loading RWeka package n library(RWeka) 7-8 7-8 Writing & Running R Script ¨ ¨ Write a script in the script pane To execute, click the “run” button as seen below: 7-9 7-9 Data Types in R ¨ Logical: a simple binary variable that may have only two values: TRUE or FALSE. ¨ Numeric: data type that stores decimal numbers ¨ Integer: data type that stores integers ¨ ¨ ¨ Character: data type that is used to store text strings of up to 65,535 characters each Factor: data type that is used to store categorical values. Each possible value of a factor is known as a level. Ordered Factor: a special factor data type where the order of the levels is significant. Ex: Low, Medium, and High 7-10 7-10 Vectors ¨ Vectors: a way to collect elements of the same data type in R together in a sequence ¤ ¨ Use the c()function to create a new vector ¤ ¨ Each data element in a vector is called a component of that vector. names <- c('Mike', 'Renee', 'Richard’, 'Christopher’) 'Matthew', Once data is stored as vector, you can access individual components ¤ names[1] would output 'Mike' 7-11 7-11 Vectors ¨ ¨ ¨ Functions such as mean(), median(),min(),and max()work on entire vectors at once All components of a vector must be of the same data type for functions to work Vectors are combined into data structures that resemble spreadsheets, known in R as data frames 7-12 7-12 Testing Data Types ¨ Use the class()function to return data type of an object. Example: > x <- TRUE > class(x) [1] “logical” ¨ Use the length()function to return number of components in vector. Example: > x <- TRUE > length(x) [1] 1 7-13 7-13 Converting Data Types ¨ Use the following functions to convert to the corresponding data type as.logical() ¤ as.numerical() ¤ as.integer() ¤ as.character() ¤ as.factor() ¤ 7-14 7-14 Missing Values ¨ R uses the special constant value NA to represent missing values in a dataset. ¤ ¨ These values are different from blank or zero values. Use the is.na() function to test if an object contains missing values. 7-15 7-15