Amity University Madhya Pradesh, Gwalior Amity School of Engineering and Technology Department of Computer Science and Engineering CSE 724 Data Analytics Lab Submitted to: Submitted by: Dr. Ghanshyam Prasad Dubey Roshni Singh Associate Professor, CSE Enroll-No: A60205320008 B-Tech IT [C] ASET, VII - Semester Table of Content S. No 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Name of Lab Experiment Introduction of functionalities, disadvantages. R along features, with fundamentals, advantages and Program make a simple calculator that can add, subtract, multiply and divide using functions Write a program for Descriptive statistics in R using Iris dataset. Write a program for data visualization (plots graphs and charts) using Iris dataset. Write a program for Data analytics operation and data visualization using Mtcars dataset. Write a program for reading and writing datasets like excel, csv. Write a program for Visualizations: plot the histogram, bar chart and pie chart on sample data. Write a program for Exploratory Data Analysis using Iris and Mtcars datasets. Write a program to find Correlation and Covariance on Iris dataset, plot the correlation plot on dataset. Write a program to apply regression model techniques to predict the data on a dataset. Write a program to apply K-Mean clustering on dataset. Write a program for classification model and evaluate the performance of the classifier? Page No 1-2 3-5 6-7 8-12 13-16 17 18-19 20-23 24-25 26-27 28-32 33-37 Signature CSE 724 – Data Analytics Lab Lab File Lab- 1 Aim: Introduction of R along with fundamentals, functionalities, features, advantages and disadvantages. Theory: R is a popular programming language and environment specifically designed for statistical computing and graphics. Let's break down its introduction, fundamentals, functionalities, features, advantages, and disadvantages: Introduction to R: Purpose: R was developed by statisticians to perform data analysis, statistical modeling, visualization, and machine learning. Open Source: It's open-source and freely available, maintained by a vibrant community of developers and statisticians. Cross-platform: R runs on various platforms, including Windows, macOS, and Linux. Fundamentals of R: Data Structures: R has versatile data structures like vectors, matrices, arrays, data frames, and lists. Functions: Functions are fundamental in R; users can create their functions for specific tasks. Packages: R has a vast repository of packages for specific tasks, extending its functionality. Functionalities of R: Statistical Analysis: R excels in statistical analysis, providing a wide range of functions and packages for descriptive statistics, inferential statistics, regression, etc. Data Visualization: It offers powerful visualization tools through packages like ggplot2 for creating intricate and customizable plots and graphs. Machine Learning: R provides numerous packages like caret, randomForest, and xgboost for implementing machine learning algorithms. Features of R: Rich Set of Packages: A vast collection of packages available on CRAN (Comprehensive R Archive Network) and other repositories. Graphics Capabilities: High-quality graphical capabilities for data visualization and exploration. Community Support: Active and supportive community, providing help, packages, and resources. Advantages of R: Statistical Analysis: Tailored for statistical analysis, making it powerful for data manipulation and modeling. Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 1 CSE 724 – Data Analytics Lab Lab File Graphics and Visualization: Offers sophisticated and customizable visualization capabilities. Community and Documentation: Strong community support and extensive documentation available for users. Disadvantages of R: Learning Curve: Steeper learning curve, especially for beginners without a programming background. Speed and Memory Usage: Execution speed can be slower compared to some other languages for certain tasks, and it might consume more memory. Package Quality: Quality and documentation of some packages may vary. Despite its drawbacks, R remains a prominent language in the data science community due to its extensive statistical capabilities, visualization tools, and active user base contributing to its development and improvement. Result: Fundamental of R has been thoroughly studied. Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 2 CSE 724 – Data Analytics Lab Lab File Lab-2 Aim: Program make a simple calculator that can add, subtract, multiply and divide using functions Code: # Function to add two numbers add <- function(a, b) { return(a + b) } # Function to subtract two numbers subtract <- function(a, b) { return(a - b) } # Function to multiply two numbers multiply <- function(a, b) { return(a * b) } # Function to divide two numbers divide <- function(a, b) { if (b != 0) { return(a / b) } else { return("Cannot divide by zero") } } # Function to perform calculations based on user input Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 3 CSE 724 – Data Analytics Lab Lab File calculate <- function() { num1 <- as.numeric(readline("Enter first number: ")) num2 <- as.numeric(readline("Enter second number: ")) operation <- readline("Enter operation (+, -, *, /): ") result <- if (operation == "+") { add(num1, num2) } else if (operation == "-") { subtract(num1, num2) } else if (operation == "*") { multiply(num1, num2) } else if (operation == "/") { divide(num1, num2) } else { "Invalid operation" } cat("Result:", result, "\n") } # Call the calculate function to start the calculator calculate() Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 4 CSE 724 – Data Analytics Lab Lab File Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 5 CSE 724 – Data Analytics Lab Lab File Lab- 3 Aim: Write a program for Descriptive statistics in R using Iris dataset. Code: # Load the Iris dataset data(iris) # Display the structure of the Iris dataset str(iris) # Summary statistics for each numeric variable in the Iris dataset summary(iris[, 1:4]) # Mean of each numeric variable by species aggregate(iris[, 1:4], by = list(iris$Species), FUN = mean) # Median of each numeric variable by species aggregate(iris[, 1:4], by = list(iris$Species), FUN = median) # Standard deviation of each numeric variable by species aggregate(iris[, 1:4], by = list(iris$Species), FUN = sd) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 6 CSE 724 – Data Analytics Lab Lab File Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 7 CSE 724 – Data Analytics Lab Lab File Lab- 4 Aim: Write a program for data visualization (plots graphs and charts) using Iris dataset. Code: # Load the Iris dataset data(iris) # Scatter plot of Sepal Length vs Sepal Width colored by Species plot(iris$Sepal.Length, iris$Sepal.Width, col = as.numeric(iris$Species), xlab = "Sepal Length", ylab = "Sepal Width", main = "Sepal Length vs Sepal Width by Species") Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 8 CSE 724 – Data Analytics Lab Lab File # Boxplot of Petal Length by Species boxplot(Petal.Length ~ Species, data = iris, xlab = "Species", ylab = "Petal Length", main = "Petal Length by Species") Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 9 CSE 724 – Data Analytics Lab Lab File # Histogram of Sepal Length hist(iris$Sepal.Length, breaks = 15, col = "skyblue", xlab = "Sepal Length", ylab = "Frequency", main = "Histogram of Sepal Length") Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 10 CSE 724 – Data Analytics Lab Lab File # Density plot of Petal Width by Species install.packages("ggplot2") #install.packages("path_to_downloaded_package/ggplot2_3.4.4.zip", repos = NULL) library(ggplot2) ggplot(iris, aes(x = Petal.Width, fill = Species)) + geom_density(alpha = 0.6) + labs(x = "Petal Width", y = "Density", title = "Density Plot of Petal Width by Species") Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 11 CSE 724 – Data Analytics Lab Lab File # Bar chart of Species counts counts <- table(iris$Species) barplot(counts, main = "Counts of Each Species", xlab = "Species", ylab = "Count", col = "lightblue") Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 12 CSE 724 – Data Analytics Lab Lab File Lab - 5 Aim: Write a program for Data analytics operation and data visualization using Mtcars dataset. Code: # Load the mtcars dataset data(mtcars) # Display the structure of the mtcars dataset str(mtcars) # Summary statistics for numeric variables in mtcars summary(mtcars) # Correlation matrix for numeric variables in mtcars cor(mtcars) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 13 CSE 724 – Data Analytics Lab Lab File # Boxplot of mpg by number of cylinders boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders", ylab = "Miles per Gallon", main = "Boxplot of MPG by Number of Cylinders") Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 14 CSE 724 – Data Analytics Lab Lab File # Scatterplot of horsepower vs. weight plot(mtcars$wt, mtcars$hp, xlab = "Weight", ylab = "Horsepower", main = "Scatterplot of Horsepower vs. Weight", col = ifelse(mtcars$cyl == 4, "red", ifelse(mtcars$cyl == 6, "blue", "green")), pch = 19) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 15 CSE 724 – Data Analytics Lab Lab File # Bar chart of count of cars by number of gears barplot(table(mtcars$gear), main = "Count of Cars by Number of Gears", xlab = "Number of Gears", ylab = "Count", col = "skyblue") Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 16 CSE 724 – Data Analytics Lab Lab File Lab - 6 Aim: Write a program for reading and writing datasets like excel, csv. Code: # Read a CSV file into a variable data <- read.csv("C:/Users/chara/Documents/Datasets/Salary_Data.csv") # Replace "your_file.csv" with the path to your CSV file # Display the structure of the data str(data) # Perform operations or analysis on the data # For example: summary(data) write.csv(data, "C:/Users/chara/Documents/Datasets/Housing.csv") str(data) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 17 CSE 724 – Data Analytics Lab Lab File Lab -7 Aim: Write a program for Visualizations: plot the histogram, bar chart and pie chart on sample data. Code: Sample dataset – mtcars df<-mtcars hist(df$mpg, col='steelblue',main='Histogram',xlab='mpg',ylab='Frequency') Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 18 CSE 724 – Data Analytics Lab Lab File barplot(df$cyl, xlab="cyl", ylab="frequency", main="Bar-Chart", col='blue') Output: pie(df$mpg, labels='Pie chart', main='',col=c("brown","yellow","blue"),radius=1) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 19 CSE 724 – Data Analytics Lab Lab File Lab- 8 Aim: Write a program for Exploratory Data Analysis using Iris and Mtcars datasets. Code: Iris Dataset library(ggplot2) library(dplyr) data(iris) str(iris) Output: summary(iris) pairs(iris[, 1:4], col = iris$Species, pch = 19) Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 20 CSE 724 – Data Analytics Lab Lab File Output: boxplot(Sepal.Length ~ Species,data=iris) Output: boxplot(Sepal.Width ~ Species, data=iris) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 21 CSE 724 – Data Analytics Lab Lab File Mtcars Dataset library(ggplot2) library(dplyr) data(mtcars) str(mtcars) summary(mtcars) Output: pairs(mtcars) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 22 CSE 724 – Data Analytics Lab Lab File Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 23 CSE 724 – Data Analytics Lab Lab File Lab -9 Aim: Write a program to find Correlation and Covariance on Iris dataset, plot the correlation plot on dataset. Code: data(iris) str(iris) cor_matrix<-cor(iris[,1:4]) cov_matrix<-cov(iris[,1:4]) print(cor_matrix) print(cov_matrix) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 24 CSE 724 – Data Analytics Lab Lab File # Correlation Plot install.packages(“corrplot”) library(corrplot) cor_matrix<-cor(iris[,1:4]) corrplot(cor_matrix, method = "color", tl.col = "black", tl.srt = 45) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 25 CSE 724 – Data Analytics Lab Lab File Lab - 10 Aim: Write a program to apply regression model techniques to predict the data on a dataset. Code: # Load the necessary library for linear regression library(stats) # Load your dataset or use an existing one # For this example, let's use the mtcars dataset data(mtcars) # Explore the dataset head(mtcars) # Split the dataset into training and testing sets set.seed(123) split_index <- sample(1:nrow(mtcars), 0.8 * nrow(mtcars)) train_data <- mtcars[split_index, ] test_data <- mtcars[-split_index, ] # Build a linear regression model model <- lm(mpg ~ hp, data = train_data) # Make predictions on new data new_data <- data.frame(hp = c(100, 150, 200)) # Example new data with horsepower values predictions <- predict(model, newdata = new_data) Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 26 CSE 724 – Data Analytics Lab Lab File # Display the predictions cat("Predictions:", predictions, "\n") Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 27 CSE 724 – Data Analytics Lab Lab File Lab- 11 Aim: Write a program to apply K-Mean clustering on dataset. Code: install.packages ("ClusterR") install.packages ("cluster") library(ClusterR) library(cluster) iris_1<- iris [,5] set.seed(240) kmeans.re<- kmeans (iris_1, centers=3, nstart=20) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 28 CSE 724 – Data Analytics Lab Lab File kmeans.re$cluster cm->table(iris$species, kmeans.re$cluster) cm Output: plot(iris_1[c("Sepal.Length", "Sepal.width")]) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 29 CSE 724 – Data Analytics Lab Lab File plot(iris_1 [c (Sepal. Length", "Sepal.width")], col= kmeans.re$cluster) Output: plot(iris_1[c("Sepal.Length", "Sepal.width")], col= kmeans. re$cluster, main="K-means with 3 clusters") Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 30 CSE 724 – Data Analytics Lab Lab File kmeans.re$centers kmeans.re$centers[,c(“Sepal.Length”,”Sepal.Width”)] Output: points (kmeans. re$centers [,c("Sepal. Length", "Sepal.width")], col=1:3, pch=8, cex=3) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 31 CSE 724 – Data Analytics Lab Lab File clusplot(iris_1[c("Sepal.Length", "Sepal.width")],y_means, lines=0, shade=TRUE, color= TRUE, labels=2, plotchar= FALSE, span= TRUE, main=paste("Cluster iris"), xlab='Sepal.Length', ylab='Sepal.Width') Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 32 CSE 724 – Data Analytics Lab Lab File Lab- 12 Aim: Write a program for classification model and evaluate the performance of the classifier Code: Decision tree library(datasets) library(caTools) library(party) library(dplyr) library("magrittr") data<-(readingSkills.package="party") data head(party::readingSkills) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 33 CSE 724 – Data Analytics Lab Lab File sample_data=sample.split(readingSkills, SplitRatio=0.8) sample_data train_data<-subset(readingSkills,sample_data==TRUE) train_data Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 34 CSE 724 – Data Analytics Lab Lab File test_data<-subset(readingSkills,sample_data==FALSE) test_data Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 35 CSE 724 – Data Analytics Lab Lab File model<- ctree(nativeSpeaker ~ . , train_data) plot(model) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 36 CSE 724 – Data Analytics Lab Lab File predict_model<- predict(model, test_data) mat<-table(test_data$nativeSpeaker, predict_model) mat ac_test<- sum(diag(mat))/sum(mat) ac_test print(paste(‘Accuracy for test is found to be’ , ac_test)) Output: Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem 37