INTRODUCTION TO DATA SCIENCE WITH R INDEX Sno 1 2 3 4 5 6 7 8 Date Program 24/08/2021 Basic programs of R and Sample Programs on Arithmetic Operations on vectors-I 1. Addition 2. Subtraction 3. Multiplication 4. Division 5. Modulous 31/08/2021 Basic programs of R and Sample Programs on Arithmetic Operations on vectors-II 1. Floor Division 2. Exponent 07/09/2021 Operations on Matrix 1. Matrix Addition 2. Matrix Subtraction 3. Matrix Multiplication 4. Matrix Division 14/09/2021 Operations on Matrix 1. Row Concatenation 2. Column Concatenation 2. Deleting a Column 3. Updating a Row 21/09/2021 Graphs 1. Line Graph 2. Bell Curve 28/09/2021 Graphs 1. Bar plot 2. Pie Chart 29/09/2021 Loops in R 1. repeat 2. while 3. for 05/10/2021 Functions in R Page Number 1 2 3-5 6-8 9-11 12-13 14-16 17 Program 1: Basic Operations of R and sample programs on Arithmetic Operations Program: var1=c(4,5) var2=c(2,4) print("addition of two vectors") print(var1+var2) print("subtraction of two vectors") print(var1-var2) print("multiplication of two vectors") print(var1*var2) print("division of two vectors") print(var1/var2) print("modulous of two vectors") print(var1%%var2) Output: [1] "addition of two vectors" [1] 6 9 [1] "subtraction of two vectors" [1] 2 1 [1] "multiplication of two vectors" [1] 8 20 [1] "division of two vectors" [1] 2.00 1.25 [1] "modulous of two vectors" [1] 0 1 1|Page Program 2: Basic programs of R and Sample Programs on Arithmetic Operations on vectors-II 1. Floor Division 2. Exponent Program: print("Floor Division of two vectors") print(var1%/%var2) print("Exponent") print(var1^var2) Output [1] "Floor Division of two vectors" [1] 2 1 [1] "Exponent" [1] 16 625 2|Page Program 3 Operation on Matrix: Addition Program: A=matrix(c(3,5,4,6,7,8,9,3,5),nrow=3, ncol=3, byrow=TRUE) print("First 3x3 matrix") print(A) B=matrix(c(5,6,7,2,4,3,6,9,7),nrow=3, ncol=3, byrow=TRUE) print("Second 3x3 matrix") print(B) print("result of addition of two matrices") print(A+B) Output: [1] "3x3 matrix" [,1] [,2] [,3] [1,] 3 5 4 [2,] 6 7 8 [3,] 9 3 5 [1] "3x3 matrix" [,1] [,2] [,3] [1,] 5 6 7 [2,] 2 4 3 [3,] 6 9 7 [1] "result of addition" [,1] [,2] [,3] [1,] 8 11 11 [2,] 8 11 11 [3,] 15 12 12 3|Page Subtraction of Two matrices print("result of subtraction of two matrices") print(A-B) Output: [1] "First 3x3 matrix" [,1] [,2] [,3] [1,] 3 5 4 [2,] 6 7 8 [3,] 9 3 5 [1] "Second 3x3 matrix" [,1] [,2] [,3] [1,] 5 6 7 [2,] 2 4 3 6 9 7 [3,] [1] "result of subtraction of two matrices" [,1] [,2] [,3] [1,] -2 -1 -3 [2,] 4 3 5 [3,] 3 -6 -2 Multiplication of Two matrices print("result of multiplication of two matrices") print(A*B) Output: [1] "First 3x3 matrix" [,1] [,2] [,3] [1,] 3 5 4 [2,] 6 7 8 [3,] 9 3 5 4|Page [1] "Second 3x3 matrix" [,1] [,2] [,3] [1,] 5 6 7 [2,] 2 4 3 [3,] 6 9 7 [1] "result of multiplication of two matrices" [,1] [,2] [,3] [1,] 15 30 28 [2,] 12 28 24 [3,] 54 27 35 Division of Two matrices print("result of division of two matrices") print(A/B) Output: [1] "First 3x3 matrix" [,1] [,2] [,3] [1,] 3 5 4 [2,] 6 7 8 [3,] 9 3 5 [1] "Second 3x3 matrix" [,1] [,2] [,3] [1,] 5 6 7 [2,] 2 4 3 [3,] 6 9 7 [1] "result of division of two matrices" [,1] [,2] [,3] [1,] 0.6 0.8333333 0.5714286 [2,] 3.0 1.7500000 2.6666667 [3,] 1.5 0.3333333 0.7142857 5|Page Program 4: Row Concatenation of two matrices print("result of row concatenation of two matrices") print(rbind(A,B)) Output: [1] "First 3x3 matrix" [,1] [,2] [,3] [1,] 3 5 4 [2,] 6 7 8 [3,] 9 3 5 [1] "Second 3x3 matrix" [,1] [,2] [,3] [1,] 5 6 7 [2,] 2 4 3 [3,] 6 9 7 [1] "result of row concatenation of two matrices" [,1] [,2] [,3] [1,] 3 5 4 [2,] 6 7 8 [3,] 9 3 5 [4,] 5 6 7 [5,] 2 4 3 [6,] 6 9 7 6|Page Deleting Second Column A=matrix(c(3,5,4,6,7,8,9,3,5),nrow=3, ncol=3, byrow=TRUE) print("Before deleting Second column") print(A) A=A[,-2] print("After deleting Second column") print(A) Output: [1] "Before deleting Second column" [,1] [,2] [,3] [1,] 3 5 4 [2,] 6 7 8 [3,] 9 3 5 [1] "After deleting Second column" [,1] [,2] [1,] 3 4 [2,] 6 8 [3,] 9 5 Deleting Second Row A=matrix(c(3,5,4,6,7,8,9,3,5),nrow=3, ncol=3, byrow=TRUE) print("Before deleting Second Row") print(A) A=A[-2,] print("After deleting Second Row") print(A) Output: [1] "Before deleting Second Row" [,1] [,2] [,3] [1,] 3 7|Page 5 4 [2,] 6 7 8 [3,] 9 3 5 [1] "After deleting Second Row" [,1] [,2] [,3] [1,] 3 5 4 [2,] 9 3 5 Updating Second Row A=matrix(c(3,5,4,6,7,8,9,3,5),nrow=3, ncol=3, byrow=TRUE) print("Before updating Second Row") print(A) A[2,]=c(11,12,15) print("After updating Second Row") print(A) Output: [1] "Before updating Second Row" [,1] [,2] [,3] [1,] 3 5 4 [2,] 6 7 8 [3,] 9 3 5 [1] "After updating Second Row" [,1] [,2] [,3] [1,] 3 5 4 [2,] 11 12 15 [3,] 9 8|Page 3 5 Program 5: Line Graph A line chart is a graph that connects a series of points by drawing line segments between them. These points are ordered in one of their coordinate (usually the x-coordinate) value. Line charts are usually used in identifying the trends in data. The plot() function in R is used to create the line graph. Syntax The basic syntax to create a line chart in R is – plot(v,type,col,xlab,ylab) Following is the description of the parameters used − • v is a vector containing the numeric values. • type takes the value "p" to draw only the points, "l" to draw only the lines and "o" to draw both points and lines. • xlab is the label for x axis. • ylab is the label for y axis. • main is the Title of the chart. • col is used to give colors to both the points and lines. Program: marks=c(15,22,35,55,45,65) plot(marks, type="l", col="Blue") Output: Program2: 9|Page marks=c(15,22,35,55,45,65) plot(marks,type = "o", col = "red", xlab = "Roll Number", ylab = "Marks in Statistics", main = "Marks Obtained by Data Science Students") Output 10 | P a g e Bell Curves 1. First we generate normal distributed data using rnorm function Syntax of rnorm function in R: rnorm(n, mean, sd) n: It is the number of observations(sample size). mean: It is the mean value of the sample data. Its default value is zero. sd: It is the standard deviation. Its default value is 1. Program n=floor(rnorm(10000,500,100)) t=table(n) plot(t) Output: 11 | P a g e Program 6: Bar Chart Program: marks=c(92,50,45,73) barplot(marks, main="Comparing marks of 5 subjects", xlab="marks", ylab="subjects", names.arg = c("eng","comp","math"," r program"), col="blue",horiz=FALSE) Output: 12 | P a g e Pie Chart Syntax: The basic syntax for creating a pie-chart using the R is − pie(x, labels, radius, main, col, clockwise) Following is the description of the parameters used − x is a vector containing the numeric values used in the pie chart. labels is used to give description to the slices. radius indicates the radius of the circle of the pie chart.(value between −1 and +1). main indicates the title of the chart. col indicates the color palette. clockwise is a logical value indicating if the slices are drawn clockwise or anti clockwise. Program vtr=c(43,31,64,40,29) names=c("london","paris","los angeles", "mexico","new york") pie(vtr,labels=names,main="Favourable Places", col= rainbow(length(vtr))) Output: 13 | P a g e Program 7: There are three types of loop in R programming: 1. for 2. while 3. repeat 1. for loop Syntax: for (value in sequence) { statement } Flow Chart: Program to print first five natural numbers: for (i in 1:5) { # statement print(i) } Output: [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 Program to display days of week using for loop week = c('Sunday', 'Monday','Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday') for (day in week) 14 | P a g e { print(day) } Output: [1] "Sunday" [1] "Monday" [1] "Tuesday" [1] "Wednesday" [1] "Thursday" [1] "Friday" [1] "Saturday" While loop Syntax: while( condition) { statement } Flow chart: Program to calculate factorial of 5. n=5 factorial = 1 i=1 while (i <= n) { factorial = factorial * i i=i+1 } print(factorial) Output: [1] 120 15 | P a g e Repeat Loop Repeat loop does not have any condition to terminate the loop, a programmer must specifically place a condition within the loop’s body and use the declaration of a break statement to terminate this loop. If no condition is present in the body of the repeat loop then it will iterate infinitely. Syntax: repeat { statement if( condition ) { break } } Flow chart: Program: Program to display numbers from 1 to 5 using repeat loop in R. val = 1 repeat { print(val) val = val + 1 if(val > 5) { break } } Output: [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 16 | P a g e Program 8: Functions in R Function Components The different parts of a function are − 1. Function Name − This is the actual name of the function. It is stored in R environment as an object with this name. 2. Arguments − An argument is a placeholder. When a function is invoked, you pass a value to the argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can have default values. 3. Function Body − The function body contains a collection of statements that defines what the function does. 4. Return Value − The return value of a function is the last expression in the function body to be evaluated. R has many in-built functions which can be directly called in the program without defining them first. We can also create and use our own functions referred as user defined functions. Built-in Function Simple examples of in-built functions are seq(), mean(), max(), sum(x) and paste(...) etc. They are directly called by user written programs. You can refer most widely used R functions. # Create a sequence of numbers from 32 to 44. print(seq(32,44)) # Find mean of numbers from 25 to 82. print(mean(25:82)) # Find sum of numbers frm 41 to 68. print(sum(41:68)) User-defined Function We can create user-defined functions in R. They are specific to what a user wants and once created they can be used like the built-in functions. Below is an example of how a function is created and used. Program to create a function to print squares of first n natural numbers new = function(n) Output: { for(i in 1:n) [1] 1 { b =i^2 [1] 4 print(b) [1] 9 } } [1] 16 new(5) [1] 25 17 | P a g e