UNIT - III Vectors: Creating and Naming Vectors, Vector Arithmetic, Vector Subsetting, Matrices: Creating and Naming Matrices, Matrix Sub setting, Arrays, Class. Factors and Data Frames: Introduction to Factors: Factor Levels, Summarizing a Factor, Ordered Factors, Comparing Ordered Factors, Introduction to Data Frame, subsetting of Data Frames, Extending Data Frames, Sorting Data Frames. Lists: Introduction, creating a List: Creating a Named List, Accessing List Elements, Manipulating List Elements, Merging Lists, Converting Lists to Vectors Taking Input from User in R Programming For taking user-entered input into an R Program, we use the two methods in R. a. Using readline() method b. Using scan() method a. Using readline(): It takes input as string S Syntax: var = readline(); var = as.integer(var); Convertion of String to other types: By using the as object reference call the respective function to convert • as.integer(n); —> convert to integer • as.numeric(n); —> convert to numeric type (float, double etc) • as.complex(n); —> convert to complex number (i.e 3+2i) • as.Date(n) —> convert to date …, etc Example: var = readline(prompt=“Enter a number”); Or readline(“Enter a number”); var = as.integer(var); # print the value print(var) Example: name = readline(prompt="Input your name: ") age = readline(prompt="Input your age: ") print(paste("My name is",name, "and I am",age ,"years old.")) Output: Input your name: Rama Input your age: 21 [1] My name is Rama and I am 21 years old." b. Using scan() method This method reads data in the form of a vector or list. This method also uses to reads input from a file. Syntax: x = scan() scan() method is taking input continuously, to terminate the input process, need to press Enter key 2 times on the console. Example: x=scan(“filename.txt”) print(x) Taking double, string, character type values using scan() method To take double, string, character types inputs, specify the type of the inputted value in the scan() method. To do this there is an argument called what, by which one can specify the data type of the inputted value. Syntax: x = scan(what = double()) —-for double x = scan(what = ” “) —-for string x = scan(what = character()) —-for character Example: # R program to illustrate # taking input from the user # string file input using scan() 1 s = scan("fileString.txt", what = " ") # double file input using scan() d = scan("fileDouble.txt", what = double()) # character file input using scan() c = scan("fileChar.txt", what = character()) # print the inputted values print(s) # string print(d) # double print(c) # character Output: Read 7 items Read 5 items Read 13 items [1] "geek" "for" "geeks" "gfg" "c++" "java" "python" [1] 123.321 523.458 632.147 741.250 855.360 [1] "g" "e" "e" "k" "s" "f" "o" "r" "g" "e" "e" "k" "s" 1. Vectors and Creating Vector Vectors are the simplest data structures in R. They are sequences of elements of the same basic type. These types can be numeric, integer, complex, character, and logical. The more complicated data structures in R are made with vectors as building-blocks. There are numerous ways to create an R vector: 1. Using c() Function To create a vector, we use the c() function: Code: > vec <- c(1,2,3,4,5) #creates a vector named vec > vec #prints the vector vec 2. Using assign() function Another way to create a vector is the assign() function. Code: > assign("vec2", c(6,7,8,9,10)) #creates a vector named vec2 > vec2 #prints the vector vec2 3. Using : operator An easy way to make integer vectors is to use the: operator. Code: > vec3 <- 1:20 > vec3 Types of vectors A vector can be of different types depending on the elements it contains. These may be: 1. Numeric Vectors: Vectors containing numeric values. Code: > num_vec <- c(1,2,3,4,5) > num_vec 2. Integer Vectors: Vectors containing integer values. Code: > int_vec <- c(6L,7L,8L,9L,10L) > int_vec 3. Logical Vectors: Vectors containing logical values of TRUE or FALSE. Code: > log_vec <- c(TRUE, FALSE,TRUE,FALSE,FALSE) > log_vec 2 4. Character Vectors: Vectors containing text. Code: > char_vec <- c("aa","bb","cc","dd","ee") > char_vec 5. Complex Vectors: Vectors containing complex values. Code: > cv <- c(12+1i, 3i, 5+4i, 4+9i, 6i) > cv How to find the type of R vector? We can use the typeof() function to find the type of a vector. For example: Code: Note: The typeof() function returns “double” for numeric values. This is because of the way numeric-class stores a value. The numeric class stores values as double-precision floating-point numbers. Their type is double while their class is numeric. How to combine R vectors? The c() function can also combine two or more vectors and add elements to vectors. Example 1 Code: Naming Vector You can give the names to the elements of a vector. The following example illustrates how to give names to vector elements Example: We can also assign names to vector members as For example, the following variable v is a character string vector with two members. > v = c("Mary", "Sue") >v [1] "Mary" "Sue" We now name the first member as First, and the second as Last. > names(v) = c("First", "Last") >v First Last "Mary" "Sue" 3 Then we can retrieve the first member by its name. > v["First"] [1] "Mary" Furthermore, we can reverse the order with a character string index vector. > v[c("Last", "First")] Last First "Sue" "Mary" Assign Names to Vector Using names() Function First, we have to create an example vector: my_vec <- 1:5 # Create example vector my_vec # Print example vector #12345 As you can see based on the output of the RStudio console, our example vector (or array) is consisting of five values ranging from 1 to 5. At this point, the vector elements do not have names. If we want to assign a name to each of the elements of our vector, we can use the following R code: names(my_vec) <- letters[1:5] # Assign names to vector my_vec # Print updated vector #abcde #12345 The previous R syntax assigned the alphabetic letters a, b, c, d, and e as names to our vector elements. Now, we could also check the names of our vector using the names() function once more: names(my_vec) # Return names of updated vector # "a" "b" "c" "d" "e" Again, we need to create some example data first: my_list <- list(1:5,3, "bbb") # Create example list my_list # Print example list # [[1]] # [1] 1 2 3 4 5 # # [[2]] # [1] 3 # # [[3]] # [1] "bbb" Our list consists of three list elements. None of these list elements has a name yet. We can use the names function to specify the names of our list elements as shown below: names(my_list) <- LETTERS[1:3] # Assign names to list my_list # Print updated list # $A # [1] 1 2 3 4 5 # # $B # [1] 3 # # $C # [1] "bbb" Vector Arithmetic Arithmetic operations of vectors are performed member-by-member, i.e., member-wise. For example, suppose we have two vectors a and b. > a = c(1, 3, 5, 7) > b = c(1, 2, 4, 8) Then, if we multiply a by 5, we would get a vector with each of its members multiplied by 5. >5*a [1] 5 15 25 35 4 And if we add a and b together, the sum would be a vector whose members are the sum of the corresponding members from a and b. >a+b [1] 2 5 9 15 Similarly, for subtraction, multiplication, and division, we get new vectors via member-wise operations. >a-b [1] 0 1 1 -1 >a*b [1] 1 6 20 56 >a/b [1] 1.000 1.500 1.250 0.875 Recycling Rule If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector. For example, the following vectors u and v have different lengths, and their sum is computed by recycling values of the shorter vector u. > u = c(10, 20, 30) > v = c(1, 2, 3, 4, 5, 6, 7, 8, 9) >u+v [1] 11 22 33 14 25 36 17 28 39 Vector Functions: R has many functions that can manipulate vectors or get more information about them. Here are some of the commonly used functions: 1. seq()– seq() function generates regular numeric sequences. The function has the following arguments: from: starting value to: ending value by: increment (default is 1) length.out: length of the sequence along.with: the length of this argument can define the length of the sequence. Code: > vec_seq <- seq(from=1,to=10,length=5) > vec_seq [1] 1.00 3.25 5.50 7.75 10.00 2. rep() – The rep() function repeats a given numeric vector. The function has the following arguments: X: x is the numeric vector that is repeated. times: number of repetitions. each: number of repetitions for each element of the vector. length.out: the length of the resultant vector. The function repeats until it reaches the length. Code: > vec_rep <- rep(c(2,3,4), times=3) > vec_rep [1] 2 3 4 2 3 4 2 3 4 3. sum() – The sum() function returns an integer value which is the sum of all the elements in a vector. Code: > sum(vec_rep) 5 4. Type checking and conversion functions – the functions as.numeric() / as.character() / as.logical() / as.integer() can convert a vector into their corresponding type. The functions is.numeric() /is.character() /is.logical() etc. tell whether the vector is of the corresponding type or not. Code: > is.numeric(vec_rep) > as.character(vec_rep) 5. match() function in R Language is used to return the positions of the first match of the elements of the first vector in the second vector. If the element is not found, it returns NA. Syntax: match(x1, x2, nomatch) Parameters: x1: Vector 1 x2: Vector 2 nomatch: value to be returned in case of no match Example 1: # Creating vectors x1 <- c("a", "b", "c", "d", "e") x2 <- c("d", "f", "g", "a", "e", "k") # Calling match function match(x1, x2) Output: [1] 4 NA NA 1 5 Vector Sub-setting: x <- c(5.4, 6.2, 7.1, 4.8, 7.5) We can subset this in many ways i. Using positive integers x[1] [1] 5.4 x[c(3,1)] [1] 7.1 5.4 # We can duplicate indices x[c(1, 1)] [1] 5.4 5.4 # Real numbers are silently truncated to integers x[c(2.1, 2.9)] [1] 6.2 6.2 ii. Using negative integers # skip the first element x[-1] [1] 6.2 7.1 4.8 7.5 # skip the first and the fifth x[-c(1,5)] [1] 6.2 7.1 4.8 iii. Using logical operators x[c(TRUE, TRUE, FALSE, FALSE)] [1] 5.4 6.2 7.5 # or based on a condition x[x > 3] [1] 5.4 6.2 7.1 4.8 7.5 x[which(x >3)] [1] 5.4 6.2 7.1 4.8 7.5 Also see `which.max()` and `which.min()` 6 iv. with nothing x[] [1] 5.4 6.2 7.1 4.8 7.5 # useful when dealing with 2 or higher-dimensional objects v. with zero x[0] numeric(0) # helpful for generating test data or creating empty objects vi. Referencing objects by their names (y <- setNames(x, letters[1:4])) a b c d <NA> 5.4 6.2 7.1 4.8 7.5 y[c("d", "c", "a")] d c a 4.8 7.1 5.4 # We can also repeat indices y[c("a", "a", "a")] a a a 5.4 5.4 5.4 # Names are always matched exactly, not partially z <- c(abc = 1, def = 2) z[c("a", "d")] <NA> <NA> NA NA # Quick Examples of Substing a Vector # Create a Vector v <- c('A','B','C','D','E','F') # Subset by Index v[1] v[-2] # Subset by Range v[2:4] # Subset by list v[c(1,3)] v[c(2.3,4.5)] # Subset Vector by Negative Values v[-c(5,2)] v[c(-1,2)] # Subset by Logical vector v[c(TRUE,FALSE,TRUE,FALSE,TRUE,TRUE)] v[c(TRUE,FALSE,TRUE,FALSE)] # By Nothing v[] # By Zero v[0] # Named Vector v <- c(c1='A', 7 c2='B', c3='C', c4='D') # Subset by Name v[c('c2','c3')] # subset by condition v[v == 'B'] v[v %in% c('B','C')] # Using subset subset(v,subset=c(TRUE,FALSE)) 2. Matrices Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. They contain elements of the same atomic types. Though we can create a matrix containing only characters or only logical values, they are not of much use. We use matrices containing numeric elements to be used in mathematical calculations. A Matrix is created using the matrix() function. Syntax The basic syntax for creating a matrix in R is − matrix(data, nrow, ncol, byrow, dimnames) Following is the description of the parameters used − data is the input vector which becomes the data elements of the matrix. nrow is the number of rows to be created. ncol is the number of columns to be created. byrow is a logical clue. If TRUE then the input vector elements are arranged by row. dimname is the names assigned to the rows and columns. Example Create a matrix taking a vector of numbers as input. # Elements are arranged sequentially by row. M <- matrix(c(3:14), nrow = 4, byrow = TRUE) print(M) # Elements are arranged sequentially by column. N <- matrix(c(3:14), nrow = 4, byrow = FALSE) print(N) #Renaming Matrix # Define the column and row names. rownames = c("row1", "row2", "row3", "row4") colnames = c("col1", "col2", "col3") P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames)) print(P) When we execute the above code, it produces the following result − [,1] [,2] [,3] [1,] 3 4 5 [2,] 6 7 8 8 [3,] 9 10 11 [4,] 12 13 14 [,1] [,2] [,3] [1,] 3 7 11 [2,] 4 8 12 [3,] 5 9 13 [4,] 6 10 14 col1 col2 col3 row1 3 4 5 row2 6 7 8 row3 9 10 11 row4 12 13 14 Accessing Elements of a Matrix Elements of a matrix can be accessed by using the column and row index of the element. We consider the matrix P above to find the specific elements below. # Define the column and row names. rownames = c("row1", "row2", "row3", "row4") colnames = c("col1", "col2", "col3") # Create the matrix. P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames)) # Access the element at 3rd column and 1st row. print(P[1,3]) # Access the element at 2nd column and 4th row. print(P[4,2]) # Access only the 2nd row. print(P[2,]) # Access only the 3rd column. print(P[,3]) When we execute the above code, it produces the following result − [1] 5 [1] 13 col1 col2 col3 6 7 8 row1 row2 row3 row4 5 8 11 14 Matrix Computations Various mathematical operations are performed on the matrices using the R operators. The result of the operation is also a matrix. The dimensions (number of rows and columns) should be same for the matrices involved in the operation. Matrix Addition & Subtraction # Create two 2x3 matrices. M1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2) print(matrix1) M2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2) print(matrix2) # Add the matrices. 9 result <- M1 + M2 cat("Result of addition","\n") print(result) # Subtract the matrices result <- M1 - M2 cat("Result of subtraction","\n") print(result) When we execute the above code, it produces the following result − [,1] [,2] [,3] [1,] 3 -1 2 [2,] 9 4 6 [,1] [,2] [,3] [1,] 5 0 3 [2,] 2 9 4 Result of addition [,1] [,2] [,3] [1,] 8 -1 5 [2,] 11 13 10 Result of subtraction [,1] [,2] [,3] [1,] -2 -1 -1 [2,] 7 -5 2 Matrix Multiplication & Division # Create two 2x3 matrices. M1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2) print(matrix1) M2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2) print(matrix2) # Multiply the matrices. result <- M1 * M2 cat("Result of multiplication","\n") print(result) # Divide the matrices result <- M1 / M2 cat("Result of division","\n") print(result) When we execute the above code, it produces the following result − [,1] [,2] [,3] [1,] 3 -1 2 [2,] 9 4 6 [,1] [,2] [,3] [1,] 5 0 3 [2,] 2 9 4 Result of multiplication [,1] [,2] [,3] [1,] 15 0 6 [2,] 18 36 24 Result of division [,1] [,2] [,3] [1,] 0.6 -Inf 0.6666667 [2,] 4.5 0.4444444 1.5000000 10 NOTE: What if we want Matrix Multiplication? As an exercise try M1%*%M2 in the R console for matrix multiplication and check the result. Once you run the operation you must have got an error message “Error in M1 %*% M2: non-conformable arguments" Why is it so? As a basic rule for matrix multiplication number of rows of Matrix1 should be equal to the number of columns in Matrix2 when multiplying Matrix1 with Matrix2. What if want to add Matrices of different dimensions? We can check dimension of Matrices using dim() function dim(M1) [1] 2 3 dim(M2) [1] 2 3 Adding Row and Col to a Matrix i. Using the cbind() function to add a col to a Matrix >m<-matrix(c(1:4), nrow=2,ncol=2) >nm1<-cbind(m, c(5,6)) ii. Using the cbind() function to add a col to a Matrix >m<-matrix(c(1:4), nrow=2,ncol=2) >nm2<-rbind(m, c(5,6)) >dim(nm2) >length(nm2) Combining Matrices Looping through the Matrix Output: 11 Subsetting matrices a <- matrix(1:9, nrow = 3) colnames(a) <- c("A", "B", "C") a[1:2, ] # ABC # [1,] 1 4 7 # [2,] 2 5 8 a[c(T, F, T), c("B", "A")] # BA # [1,] 4 1 # [2,] 6 3 a[0, -2] # AC When you extract a single column or row, you get a vector out. This is not always what is wanted, so the drop argument comes in useful a[1,,drop=FALSE] a[1,] 3. Arrays Compared to matrices, arrays can have more than two dimensions. We can use the array() function to create an array, and the dim parameter to specify the dimensions: Example # An array with one dimension with values ranging from 1 to 24 thisarray <- c(1:24) thisarray # An array with more than one dimension multiarray <- array(thisarray, dim = c(4, 3, 2)) multiarray Example Explained In the example above we create an array with the values 1 to 24. 12 How does dim=c(4,3,2) work? The first and second number in the bracket specifies the amount of rows and columns. The last number in the bracket specifies how many dimensions we want. Note: Arrays can only have one data type. Access Array Items You can access the array elements by referring to the index position. You can use the [] brackets to access the desired elements from an array: Example a <- c(1:24) # a is an array of 24 elements multiarray <- array(a, dim = c(4, 3, 2)) # creates two arrays of 4X3 matrice multiarray[2, 3, 2] # prints 2nd row 3rd col from 2nd matric The syntax is as follow: array[row position, column position, matrix level] You can also access the whole row or column from a matrix in an array, by using the c() function: Example thisarray <- c(1:24) # Access all the items from the first row from matrix one multiarray <- array(thisarray, dim = c(4, 3, 2)) multiarray[c(1),,1] # Access all the items from the first column from matrix one multiarray <- array(thisarray, dim = c(4, 3, 2)) multiarray[,c(1),1] A comma (,) before c() means that we want to access the column. A comma (,) after c() means that we want to access the row. Check if an Item Exists To find out if a specified item is present in an array, use the %in% operator: Example Check if the value "2" is present in the array: thisarray <- c(1:24) multiarray <- array(thisarray, dim = c(4, 3, 2)) 2 %in% multiarray Amount of Rows and Columns Use the dim() function to find the amount of rows and columns in an array: Example thisarray <- c(1:24) multiarray <- array(thisarray, dim = c(4, 3, 2)) 13 dim(multiarray) Array Length Use the length() function to find the dimension of an array: Example thisarray <- c(1:24) multiarray <- array(thisarray, dim = c(4, 3, 2)) length(multiarray) Loop Through an Array You can loop through the array items by using a for loop: Example thisarray <- c(1:24) multiarray <- array(thisarray, dim = c(4, 3, 2)) for(x in multiarray){ print(x) } 4. R Objects and Classes R is a functional language that uses concepts of objects and classes. An object is simply a collection of data (variables) and methods (functions). Similarly, a class is a blueprint for that object. Let's take a real life example, We can think of the class as a sketch (prototype) of a house. It contains all the details about the floors, doors, windows, etc. Based on these descriptions we build the house. House is the object. Class System in R While most programming languages have a single class system, R has three class systems: a. S3 Class b. S4 Class c. Reference Class S3 Class in R S3 class is the most popular class in the R programming language. Most of the classes that come predefined in R are of this type. First we create a list with various components then we create a class using the class() function. For example, # create a list with required components student1 <- list(name = "John", age = 21, GPA = 3.5) # name the class appropriately class(student1) <- "Student_Info" # create and call an object student1 Output $name [1] "John" $age [1] 21 $GPA 14 [1] 3.5 attr(,"class") [1] "student" In the above example, we have created a list named student1 with three components. Notice the creation of class, class(student1) <- "Student_Info" Here, Student_Info is the name of the class. And to create an object of this class, we have passed the student1 list inside class(). Finally, we have created an object of the Student_Info class and called the object student1. To learn more in detail about S3 classes, please visit R S3 class. S4 Class in R S4 class is an improvement over the S3 class. They have a formally defined structure which helps in making objects of the same class look more or less similar. In R, we use the setClass() function to define a class. For example, setClass("Student_Info", slots=list(name="character", age="numeric", GPA="numeric")) Here, we have created a class named Student_Info with three slots (member variables): name, age, and GPA. Now to create an object, we use the new() function. For example, student1 <- new("Student_Info", name = "John", age = 21, GPA = 3.5) Here, inside new(), we have provided the name of the class "Student_Info" and value for all three slots. We have successfully created the object named student1. Example: S4 Class in R # create a class "Student_Info" with three member variables setClass("Student_Info", slots=list(name="character", age="numeric", GPA="numeric")) # create an object of class student1 <- new("Student_Info", name = "John", age = 21, GPA = 3.5) # call student1 object student1 Output An object of class "Student_Info" Slot "name": [1] "John" Slot "age": [1] 21 Slot "GPA": [1] 3.5 Here, we have created an S4 class named Student_Info using the setClass() function and an object named student1 using the new() function. To learn more in detail about S4 classes, please visit R S4 class. Reference Class in R Reference classes were introduced later, compared to the other two. It is more similar to the object oriented programming we are used to seeing in other major programming languages. Defining a reference class is similar to defining a S4 class. Instead of setClass() we use the setRefClass() function. For example, # create a class "Student_Info" with three member variables Student_Info <- setRefClass("Student_Info", fields = list(name = "character", age = "numeric", GPA = "numeric")) # Student_Info() is our generator function which can be used to create new objects 15 student1 <- Student_Info(name = "John", age = 21, GPA = 3.5) # call student1 object student1 Output Reference class object of class "Student_Info" Field "name": [1] "John" Field "age": [1] 21 Field "GPA": [1] 3.5 In the above example, we have created a reference class named Student_Info using the setRefClass() function. And we have used our generator function Student_Info() to create a new object student1. Comparison Between S3 vs S4 vs Reference Class S3 Class S4 Class Reference Class Lacks formal definition Class defined using setClass() Class defined using setRefClass() Objects are created by setting the class attribute Objects are created using new() Objects are created using generator functions Attributes are accessed using $ Attributes are accessed using @ Attributes are accessed using $ Methods belong to generic function Methods belong to generic function Methods belong to the class Follows copy-on-modify semantics Follows copy-on-modify semantics Does not follow copy-onmodify semantics 5. Factors Factors in R Programming Language are data structures that are implemented to categorize the data or represent categorical data and store it on multiple levels. They can be stored as integers with a corresponding label to every unique integer. Though factors may look similar to character vectors, they are integers and care must be taken while using them as strings. The factor accepts only a restricted number of distinct values. For example, a data field such as gender may contain values only from female, male, or transgender. In the above example, all the possible cases are known beforehand and are predefined. These distinct values are known as levels. After a factor is created it only consists of levels that are by default sorted alphabetically. Attributes of Factors in R Language x: It is the vector that needs to be converted into a factor. Levels: It is a set of distinct values which are given to the input vector x. Labels: It is a character vector corresponding to the number of labels. Exclude: This will mention all the values you want to exclude. Ordered: This logical attribute decides whether the levels are ordered. nmax: It will decide the upper limit for the maximum number of levels. Creating a Factor in R Programming Language 16 The command used to create or modify a factor in R language is – factor() with a vector as input. The two steps to creating a factor are: Creating a vector Converting the vector created into a factor using function factor() Examples: Let us create a factor gender with levels female, male and transgender. # Creating a vector x < -c("female", "male", "male", "female") print(x) # Converting the vector x into a factor # named gender gender < -factor(x) print(gender) Output: [1] "female" "male" "male" "female" [1] female male male female Levels: female male Levels can also be predefined by the programmer. # Creating a factor with levels defined by programmer gender <- factor(c("female", "male", "male", "female"), levels = c("female", "transgender", "male")); gender Output: [1] female male male female Levels: female transgender male Further one can check the levels of a factor by using function levels(). Checking for a Factor in R The function is.factor() is used to check whether the variable is a factor and returns “TRUE” if it is a factor. gender <- factor(c("female", "male", "male", "female")); print(is.factor(gender)) Output: [1] TRUE Function class() is also used to check whether the variable is a factor and if true returns “factor”. gender <- factor(c("female", "male", "male", "female")); class(gender) Output: [1] "factor" Accessing elements of a Factor in R Like we access elements of a vector, the same way we access the elements of a factor. If gender is a factor then gender[i] would mean accessing ith element in the factor. Example: gender <- factor(c("female", "male", "male", "female")); gender[3] Output: 17 [1] male Levels: female male More than one element can be accessed at a time. Example: gender <- factor(c("female", "male", "male", "female")); gender[c(2, 4)] Output: [1] male female Levels: female male Example: gender <- factor(c("female", "male", "male", "female" )); gender[-3] Output: [1] female male female Levels: female male Modification of a Factor in R After a factor is formed, its components can be modified but the new values which need to be assigned must be at the predefined level. Example: gender <- factor(c("female", "male", "male", "female" )); gender[2]<-"female" gender Output: [1] female female male female Levels: female male For selecting all the elements of the factor gender except ith element, gender[-i] should be used. So if you want to modify a factor and add value out of predefines levels, then first modify levels. Example: gender <- factor(c("female", "male", "male", "female" )); # add new level levels(gender) <- c(levels(gender), "other") gender[3] <- "other" gender Output: [1] female male other female Levels: female male other Factors in Data Frame The Data frame is similar to a 2D array with the columns containing all the values of one variable and the rows having one set of values from every column. There are four things to remember about data frames: column names are compulsory and cannot be empty. Unique names should be assigned to each row. The data frame’s data can be only of three types- factor, numeric, and character type. The same number of data items must be present in each column. In R language when we create a data frame, its column is categorical data and hence a factor is automatically created on it. We can create a data frame and check if its column is a factor. 18 Example: age <- c(40, 49, 48, 40, 67, 52, 53) salary <- c(103200, 106200, 150200, 10606, 10390, 14070, 10220) gender <- c("male", "male", "transgender", "female", "male", "female", "transgender") employee<- data.frame(age, salary, gender) print(employee) print(is.factor(employee$gender)) Output: age salary gender 1 40 103200 male 2 49 106200 male 3 48 150200 transgender 4 40 10606 female 5 67 10390 male 6 52 14070 female 7 53 10220 transgender [1] TRUE Levels of a factor: To print the levels of a factor we use the level() function Example: music_genre <factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz")) levels(music_genre) Result: [1] "Classic" "Jazz" "Pop" "Rock" Summarizing a Factor: The summary() function will give you a quick overview of a variable Ordered Factors: The levels of factors are stored in alphabetical order, or in the order, they were specified to factor if they were specified explicitly. Sometimes the levels will have a natural ordering that we want to record and want our statistical analysis to make use of. Example1: Low, Medium, High Example2: Pass Class, Second class, First Class, Distinction, Extraordinary Example3: 40%, 50%, 60%, 70%,80% Comparing Ordered Factors You can compare different elements of an ordered factor by using well-known operators. For example, to compare if the element of the first-factor vector is greater than the first element of the second-factor vector, you would use the greater than operator ( > ). factor_vector1[1] > factor_vector2[1] Example: 19 # Create factor_speed_vector speed_vector <- c("medium", "slow", "slow", "medium", "fast") factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "medium", "fast")) # Factor value for second data analyst da2 <- factor_speed_vector[2] # Factor value for fifth data analyst da5 <- factor_speed_vector[5] # Is data analyst 2 faster than data analyst 5? da2 > da5 [1] FALSE 6. Introduction to Data Frame: Data Frames are data displayed in a format as a table. Data frames can also be interpreted as matrices where each column of a matrix can be of different data types. Data Frame is made up of three principal components, the data, rows, and columns. Characteristics of a data frame: Column names should not be empty The row names should be unique The data stored in the data frame may be numeric, factor or character Each column contains same number of data items Data Frames can have different types of data inside it. While the first column can be character, the second and third can be numeric or logical. However, each column should have the same type of data. Use the data.frame() function to create a data frame: Three Rules for Datasets: Example: Data_Frame <- data.frame ( Training = c("Strength", "Stamina", "Other"), Pulse = c(100, 150, 120), Duration = c(60, 30, 45) ) # Print the data frame Data_Frame Output: Training Pulse Duration 1 Strength 100 60 2 Stamina 150 30 3 Other 120 45 Summarize the Data Use the summary () function to summarize the data from a Data Frame: Example: summary (Data_Frame) Output: 20 Training Pulse Duration Other :1 Min. :100.0 Min. :30.0 Stamina :1 1st Qu.:110.0 1st Qu.:37.5 Strength:1 Median:120.0 Median:45.0 Mean:123.3 Mean:45.0 3rd Qu.:135.0 3rd Qu.:52.5 Max. :150.0 Max. :60.0 Subsetting of Data Frames Find out how to access your data frame's data with subsetting. Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your data frame. Selecting Rows debt[3:6, ] name payment 3 Dan 150 4 Rob 50 5 Rob 75 6 Rob 100 Here we selected rows 3 through 6 of debt. One thing to look at is the simplification that happens when you select a single column. Selecting Rows From a Specific Column Selecting the first three rows of just the payment column simplifies the result into a vector. debt[1:3, 2] 100 200 150 Dataframe Formatting To keep it as a dataframe, just add drop=False as shown below: debt[1:3, 2, drop = FALSE] 1 2 3 payment 100 200 150 Selecting a Specific Column [Shortcut] To select a specific column, you can also type in the name of the dataframe, followed by a $, and then the name of the column you are looking to select. In this example, we will be selecting the payment column of the dataframe. When running this script, R will simplify the result as a vector. debt$payment 100 200 150 50 75 100 Using subset() for More Power When looking to create more complex subsets or a subset based on a condition, the next step up is to use the subset() function. For example, what if you wanted to look at debt from someone named Dan. You could just use the brackets to select their debt and total it up, but it isn't a very robust way of doing things, especially with potential changes to the data set. # This works, but is not informative nor robust debt[1:3, ] subset() Function A better way to do this is to use the subset() function to select the rows where the name column is equal to Dan. Notice that their needs to be a double equals sign, known as a relational operator. # This works, but is not informative nor robust debt[1:3, ] 21 # Much more informative! subset(debt, name == "Dan") name payment 1 Dan 100 2 Dan 200 3 Dan 150 Using a subset() Function on a Numeric Column We can also subset on numeric columns. If we wanted to see rows where payments equal $100, you would do the following: subset(debt, payment == 100) name payment 1 Dan 100 6 Rob 100 Extending Data Frames (Adding Rows and Cols to Data Frame) We can append one or more rows to a data frame in R by using one of the following methods: Method 1: Use rbind() to append data frames. rbind(df1, df2) Method 2: Use nrow() to append a row. df[nrow(df) + 1,] = c(value1, value2, ...) The examples of how to use each of these methods in practice. Method 1: Use rbind() to Append Data Frames This first method assumes that you have two data frames with the same column names. By using the rbind() function, we can easily append the rows of the second data frame to the end of the first data frame. For example: #define data frame df1 <- data.frame(var1=c(4, 13, 7, 8), var2=c(15, 9, 9, 13), var3=c(12, 12, 7, 5)) df1 var1 var2 var3 1 4 15 12 2 13 9 12 3 7 9 7 4 8 13 5 #define second data frame df2 <- data.frame(var1=c(4, 13), var2=c(9, 12), var3=c(6, 6)) df2 var1 var2 var3 1 4 9 6 2 13 12 6 #append the rows of the second data frame to end of first data frame df3 <- rbind(df1, df2) df3 22 var1 var2 var3 1 4 15 12 2 13 9 12 3 7 9 7 4 8 13 5 5 4 9 6 6 13 12 6 Method 2: Use nrow() to Append a Row This method uses the nrow() function to append a row to the end of a given data frame. For example: #define first data frame df1 <- data.frame(var1=c(4, 13, 7, 8), var2=c(15, 9, 9, 13), var3=c(12, 12, 7, 5)) df1 var1 var2 var3 1 4 15 12 2 13 9 12 3 7 9 7 4 8 13 5 #append row to end of data frame df1[nrow(df1) + 1,] = c(5, 5, 3) df1 var1 var2 var3 1 4 15 12 2 13 9 12 3 7 9 7 4 8 13 5 5 5 5 3 In order for this method to work, the vector of values that you’re appending needs to be the same length as the number of columns in the data frame. Sorting Data Frames: The following discusses how to sort the data frame in R Programming Language. Methods to sort a dataframe: 1. order() function (increasing and decreasing order) 2. arrange() function from dplyr package 3. setorder() function from data.table package Method 1: Using order() function This function is used to sort the dataframe based on the particular column in the dataframe Syntax: order(dataframe$column_name,decreasing = TRUE)) where dataframe is the input dataframe Column name is the column in the dataframe such that dataframe is sorted based on this column Decreasing parameter specifies the type of sorting order If it is TRUE dataframe is sorted in descending order. Otherwise, in increasing order return type: Index positions of the elements 23 Example 1: R program to create dataframe with 2 columns and order based on particular columns in decreasing order. Displayed the Sorted dataframe based on subjects in decreasing order, displayed the Sorted dataframe based on rollno in decreasing order # create dataframe with roll no and # subjects columns data = data.frame( rollno = c(1, 5, 4, 2, 3), subjects = c("java", "python", "php", "sql", "c")) print(data) print("sort the data in decreasing order based on subjects ") print(data[order(data$subjects, decreasing = TRUE), ] ) print("sort the data in decreasing order based on rollno ") print(data[order(data$rollno, decreasing = TRUE), ] ) Output: rollno subjects 1 1 java 2 5 python 3 4 php 4 2 sql 5 3 c [1] "sort the data in decreasing order based on subjects " rollno subjects 4 2 sql 2 5 python 3 4 php 1 1 java 5 3 c [1] "sort the data in decreasing order based on rollno " rollno subjects 2 5 python 3 4 php 5 3 c 4 2 sql 1 1 java Example 2: R program to create dataframe with 3 columns named rollno, names, and subjects with a vector, displayed the Sorted dataframe based on subjects in increasing order, displayed the Sorted dataframe based on rollno in increasing order, displayed the Sorted dataframe based on names in increasing order # create dataframe with roll no, names # and subjects columns 24 data=data.frame(rollno = c(1, 5, 4, 2, 3), names = c("sravan", "bobby", "pinkey", "rohith", "gnanesh"), subjects = c("java", "python", "php", "sql", "c")) print(data) print("sort the data in increasing order based on subjects") print(data[order(data$subjects, decreasing = FALSE), ] ) print("sort the data in increasing order based on rollno") print(data[order(data$rollno, decreasing = FALSE), ] ) print("sort the data in increasing order based on names") print(data[order(data$names,decreasing = FALSE), ] ) Output: rollno names subjects 1 1 sravan java 2 5 bobby python 3 4 pinkey 4 2 rohith 5 3 gnanesh php sql c [1] "sort the data in increasing order based on subjects" rollno names subjects 5 3 gnanesh c 1 1 sravan java 3 4 pinkey php 2 5 bobby python 4 2 rohith sql [1] "sort the data in increasing order based on rollno" rollno names subjects 1 1 sravan java 4 2 rohith sql 5 3 gnanesh c 3 4 pinkey php 2 5 bobby python [1] "sort the data in increasing order based on names" rollno names subjects 2 5 bobby python 5 3 gnanesh c 3 4 pinkey php 4 2 rohith sql 1 1 sravan java 25 Method 2: Using arrange() Function from dplyr. Arrange() is used to sort the dataframe in increasing order, it will also sort the dataframe based on the column in the dataframe Syntax: arrange(dataframe,column) where dataframe is the dataframe input column is the column name , based on this column dataframe is sorted We need to install dplyr package as it is available in that package Syntax: install.packages(“dplyr”) Example: R program to sort dataframe based on columns In this program, we created three columns using the vector and sorted the dataframe based on the subjects column Code: # load the package library("dplyr") # create dataframe with roll no, names # and subjects columns data = data.frame(rollno = c(1, 5, 4, 2, 3), names = c("sravan", "bobby", "pinkey", "rohith", "gnanesh"), subjects = c("java", "python", "php", "sql", "c")) # sort the data based on subjects print(arrange(data, subjects)) Output: rollno names subjects 1 3 gnanesh c 2 1 sravan java 3 4 pinkey php 4 5 bobby python 5 2 rohith sql Method 3: Using setorder() from data.table package setorder is used to sort a dataframe in the set order format. Syntax: setorder(dataframe, column) Where dataframe is the input dataframe The column is the column name Example: R program to sort dataframe based on columns In this program, we created the dataframe with three columns using vector and sorted the dataframe using setorder function based on subjects column Code: # load the library library("data.table") # create dataframe with roll no, names # and subjects columns data=data.frame(rollno = c(1, 5, 4, 2, 3), names = c("sravan", "bobby", "pinkey", "rohith", "gnanesh"), subjects = c("java", "python", "php", "sql", "c")) 26 # sort the data based on subjects print(setorder(data,subjects)) Output: rollno names subjects 5 3 gnanesh c 1 1 sravan java 3 4 pinkey php 2 5 bobby python 4 2 rohith sql 7. Lists: Introduction Lists are the R objects which contain elements of different types like − numbers, strings, vectors and another list inside it. A list can also contain a matrix or a function as its elements. List is created using list() function. Creating a List Following is an example to create a list containing strings, numbers, vectors and a logical values. # Create a list containing strings, numbers, vectors and a logical values. list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1) print(list_data) When we execute the above code, it produces the following result − [[1]] [1] "Red" [[2]] [1] "Green" [[3]] [1] 21 32 11 [[4]] [1] TRUE [[5]] [1] 51.23 [[6]] [1] 119.1 Naming List Elements The list elements can be given names and they can be accessed using these names. # Create a list containing a vector, a matrix and a list. list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3)) # Give names to the elements in the list. names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list") # Show the list. print(list_data) 27 When we execute the above code, it produces the following result − $`1st_Quarter` [1] "Jan" "Feb" "Mar" $A_Matrix [,1] [,2] [,3] [1,] 3 5 -2 [2,] 9 1 8 $A_Inner_list $A_Inner_list[[1]] [1] "green" $A_Inner_list[[2]] [1] 12.3 Accessing List Elements Elements of the list can be accessed by the index of the element in the list. In case of named lists it can also be accessed using the names. We continue to use the list in the above example − # Create a list containing a vector, a matrix and a list. list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3)) # Give names to the elements in the list. names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list") # Access the first element of the list. print(list_data[1]) # Access the thrid element. As it is also a list, all its elements will be printed. print(list_data[3]) # Access the list element using the name of the element. print(list_data$A_Matrix) When we execute the above code, it produces the following result − $`1st_Quarter` [1] "Jan" "Feb" "Mar" $A_Inner_list $A_Inner_list[[1]] [1] "green" $A_Inner_list[[2]] [1] 12.3 [,1] [,2] [,3] [1,] 3 5 -2 [2,] 9 1 8 Manipulating List Elements We can add, delete and update list elements as shown below. We can add and delete elements only at the end of a list. But we can update any element. # Create a list containing a vector, a matrix and a list. 28 list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3)) # Give names to the elements in the list. names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list") # Add element at the end of the list. list_data[4] <- "New element" print(list_data[4]) # Remove the last element. list_data[4] <- NULL # Print the 4th Element. print(list_data[4]) # Update the 3rd Element. list_data[3] <- "updated element" print(list_data[3]) When we execute the above code, it produces the following result − [[1]] [1] "New element" $<NA> NULL $`A Inner list` [1] "updated element" Merging Lists You can merge many lists into one list by placing all the lists inside one list() function. # Create two lists. list1 <- list(1,2,3) list2 <- list("Sun","Mon","Tue") # Merge the two lists. merged.list <- c(list1,list2) # Print the merged list. print(merged.list) When we execute the above code, it produces the following result − [[1]] [1] 1 [[2]] [1] 2 [[3]] [1] 3 [[4]] [1] "Sun" [[5]] 29 [1] "Mon" [[6]] [1] "Tue" Converting List to Vector A list can be converted to a vector so that the elements of the vector can be used for further manipulation. All the arithmetic operations on vectors can be applied after the list is converted into vectors. To do this conversion, we use the unlist() function. It takes the list as input and produces a vector. # Create lists. list1 <- list(1:5) print(list1) list2 <-list(10:14) print(list2) # Convert the lists to vectors. v1 <- unlist(list1) v2 <- unlist(list2) print(v1) print(v2) # Now add the vectors result <- v1+v2 print(result) When we execute the above code, it produces the following result − [[1]] [1] 1 2 3 4 5 [[1]] [1] 10 11 12 13 14 [1] 1 2 3 4 5 [1] 10 11 12 13 14 [1] 11 13 15 17 19 UNIT-III QUESTIONS 1. What is a Vector? Express creating, naming vectors, vector arithmetic and vector subsetting with example 2. What is a matrix? Discuss Creating and Naming Matrices, Matrix Sub setting with example. 3. What is a Factor? Explain Factor Levels, Summarizing a Factor, Ordered Factors, Comparing Ordered Factors with examples 4. Creating Data Frame, Subsetting of Data Frames, Extending Data Frames and Sorting Data Frames with examples. 5. Creating List and Named List, Accessing List Elements, Manipulating List Elements, Merging Lists, Converting Lists to Vectors with examples. 6. Discuss about Creating Arrays, accessing elements, checking if an element exists, Dimension, length and loop through an array with examples. 7. Describe the Class in R Language. 30