Uploaded by Roshni Singh

Data Analytics Lab File: R Programming Experiments

advertisement
Amity University Madhya Pradesh, Gwalior
Amity School of Engineering and Technology
Department of Computer Science and Engineering
CSE 724
Data Analytics Lab
Submitted to:
Submitted by:
Dr. Ghanshyam Prasad Dubey
Roshni Singh
Associate Professor, CSE
Enroll-No: A60205320008
B-Tech IT [C]
ASET, VII - Semester
Table of Content
S. No
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Name of Lab Experiment
Introduction of
functionalities,
disadvantages.
R along
features,
with fundamentals,
advantages
and
Program make a simple calculator that can add,
subtract, multiply and divide using functions
Write a program for Descriptive statistics in R using
Iris dataset.
Write a program for data visualization (plots graphs
and charts) using Iris dataset.
Write a program for Data analytics operation and data
visualization using Mtcars dataset.
Write a program for reading and writing datasets like
excel, csv.
Write a program for Visualizations: plot the histogram,
bar chart and pie chart on sample data.
Write a program for Exploratory Data Analysis using
Iris and Mtcars datasets.
Write a program to find Correlation and Covariance
on Iris dataset, plot the correlation plot on dataset.
Write a program to apply regression model techniques
to predict the data on a dataset.
Write a program to apply K-Mean clustering on
dataset.
Write a program for classification model and evaluate
the performance of the classifier?
Page No
1-2
3-5
6-7
8-12
13-16
17
18-19
20-23
24-25
26-27
28-32
33-37
Signature
CSE 724 – Data Analytics Lab
Lab File
Lab- 1
Aim: Introduction of R along with fundamentals, functionalities, features, advantages
and disadvantages.
Theory:
R is a popular programming language and environment specifically designed for statistical
computing and graphics. Let's break down its introduction, fundamentals, functionalities,
features, advantages, and disadvantages:
Introduction to R:
Purpose: R was developed by statisticians to perform data analysis, statistical modeling,
visualization, and machine learning.
Open Source: It's open-source and freely available, maintained by a vibrant community of
developers and statisticians.
Cross-platform: R runs on various platforms, including Windows, macOS, and Linux.
Fundamentals of R:
Data Structures: R has versatile data structures like vectors, matrices, arrays, data frames, and
lists.
Functions: Functions are fundamental in R; users can create their functions for specific tasks.
Packages: R has a vast repository of packages for specific tasks, extending its functionality.
Functionalities of R:
Statistical Analysis: R excels in statistical analysis, providing a wide range of functions and
packages for descriptive statistics, inferential statistics, regression, etc.
Data Visualization: It offers powerful visualization tools through packages like ggplot2 for
creating intricate and customizable plots and graphs.
Machine Learning: R provides numerous packages like caret, randomForest, and xgboost for
implementing machine learning algorithms.
Features of R:
Rich Set of Packages: A vast collection of packages available on CRAN (Comprehensive R
Archive Network) and other repositories.
Graphics Capabilities: High-quality graphical capabilities for data visualization and
exploration.
Community Support: Active and supportive community, providing help, packages, and
resources.
Advantages of R:
Statistical Analysis: Tailored for statistical analysis, making it powerful for data manipulation
and modeling.
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
1
CSE 724 – Data Analytics Lab
Lab File
Graphics and Visualization: Offers sophisticated and customizable visualization capabilities.
Community and Documentation: Strong community support and extensive documentation
available for users.
Disadvantages of R:
Learning Curve: Steeper learning curve, especially for beginners without a programming
background.
Speed and Memory Usage: Execution speed can be slower compared to some other languages
for certain tasks, and it might consume more memory.
Package Quality: Quality and documentation of some packages may vary.
Despite its drawbacks, R remains a prominent language in the data science community due to
its extensive statistical capabilities, visualization tools, and active user base contributing to its
development and improvement.
Result: Fundamental of R has been thoroughly studied.
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
2
CSE 724 – Data Analytics Lab
Lab File
Lab-2
Aim: Program make a simple calculator that can add, subtract, multiply and divide
using functions
Code:
# Function to add two numbers
add <- function(a, b) {
return(a + b)
}
# Function to subtract two numbers
subtract <- function(a, b) {
return(a - b)
}
# Function to multiply two numbers
multiply <- function(a, b) {
return(a * b)
}
# Function to divide two numbers
divide <- function(a, b) {
if (b != 0) {
return(a / b)
} else {
return("Cannot divide by zero")
}
}
# Function to perform calculations based on user input
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
3
CSE 724 – Data Analytics Lab
Lab File
calculate <- function() {
num1 <- as.numeric(readline("Enter first number: "))
num2 <- as.numeric(readline("Enter second number: "))
operation <- readline("Enter operation (+, -, *, /): ")
result <- if (operation == "+") {
add(num1, num2)
} else if (operation == "-") {
subtract(num1, num2)
} else if (operation == "*") {
multiply(num1, num2)
} else if (operation == "/") {
divide(num1, num2)
} else {
"Invalid operation"
}
cat("Result:", result, "\n")
}
# Call the calculate function to start the calculator
calculate()
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
4
CSE 724 – Data Analytics Lab
Lab File
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
5
CSE 724 – Data Analytics Lab
Lab File
Lab- 3
Aim: Write a program for Descriptive statistics in R using Iris dataset.
Code:
# Load the Iris dataset
data(iris)
# Display the structure of the Iris dataset
str(iris)
# Summary statistics for each numeric variable in the Iris dataset
summary(iris[, 1:4])
# Mean of each numeric variable by species
aggregate(iris[, 1:4], by = list(iris$Species), FUN = mean)
# Median of each numeric variable by species
aggregate(iris[, 1:4], by = list(iris$Species), FUN = median)
# Standard deviation of each numeric variable by species
aggregate(iris[, 1:4], by = list(iris$Species), FUN = sd)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
6
CSE 724 – Data Analytics Lab
Lab File
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
7
CSE 724 – Data Analytics Lab
Lab File
Lab- 4
Aim: Write a program for data visualization (plots graphs and charts) using Iris
dataset.
Code:
# Load the Iris dataset
data(iris)
# Scatter plot of Sepal Length vs Sepal Width colored by Species
plot(iris$Sepal.Length, iris$Sepal.Width, col = as.numeric(iris$Species),
xlab = "Sepal Length", ylab = "Sepal Width", main = "Sepal Length vs Sepal Width by
Species")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
8
CSE 724 – Data Analytics Lab
Lab File
# Boxplot of Petal Length by Species
boxplot(Petal.Length ~ Species, data = iris,
xlab = "Species", ylab = "Petal Length", main = "Petal Length by Species")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
9
CSE 724 – Data Analytics Lab
Lab File
# Histogram of Sepal Length
hist(iris$Sepal.Length, breaks = 15, col = "skyblue",
xlab = "Sepal Length", ylab = "Frequency", main = "Histogram of Sepal Length")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
10
CSE 724 – Data Analytics Lab
Lab File
# Density plot of Petal Width by Species
install.packages("ggplot2")
#install.packages("path_to_downloaded_package/ggplot2_3.4.4.zip", repos = NULL)
library(ggplot2)
ggplot(iris, aes(x = Petal.Width, fill = Species)) +
geom_density(alpha = 0.6) +
labs(x = "Petal Width", y = "Density", title = "Density Plot of Petal Width by Species")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
11
CSE 724 – Data Analytics Lab
Lab File
# Bar chart of Species counts
counts <- table(iris$Species)
barplot(counts, main = "Counts of Each Species", xlab = "Species", ylab = "Count", col =
"lightblue")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
12
CSE 724 – Data Analytics Lab
Lab File
Lab - 5
Aim: Write a program for Data analytics operation and data visualization using Mtcars
dataset.
Code:
# Load the mtcars dataset
data(mtcars)
# Display the structure of the mtcars dataset
str(mtcars)
# Summary statistics for numeric variables in mtcars
summary(mtcars)
# Correlation matrix for numeric variables in mtcars
cor(mtcars)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
13
CSE 724 – Data Analytics Lab
Lab File
# Boxplot of mpg by number of cylinders
boxplot(mpg ~ cyl, data = mtcars,
xlab = "Number of Cylinders", ylab = "Miles per Gallon",
main = "Boxplot of MPG by Number of Cylinders")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
14
CSE 724 – Data Analytics Lab
Lab File
# Scatterplot of horsepower vs. weight
plot(mtcars$wt, mtcars$hp,
xlab = "Weight", ylab = "Horsepower",
main = "Scatterplot of Horsepower vs. Weight",
col = ifelse(mtcars$cyl == 4, "red", ifelse(mtcars$cyl == 6, "blue", "green")),
pch = 19)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
15
CSE 724 – Data Analytics Lab
Lab File
# Bar chart of count of cars by number of gears
barplot(table(mtcars$gear),
main = "Count of Cars by Number of Gears",
xlab = "Number of Gears", ylab = "Count",
col = "skyblue")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
16
CSE 724 – Data Analytics Lab
Lab File
Lab - 6
Aim: Write a program for reading and writing datasets like excel, csv.
Code:
# Read a CSV file into a variable
data <- read.csv("C:/Users/chara/Documents/Datasets/Salary_Data.csv")
# Replace "your_file.csv" with the path to your CSV file
# Display the structure of the data
str(data)
# Perform operations or analysis on the data
# For example:
summary(data)
write.csv(data, "C:/Users/chara/Documents/Datasets/Housing.csv")
str(data)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
17
CSE 724 – Data Analytics Lab
Lab File
Lab -7
Aim: Write a program for Visualizations: plot the histogram, bar chart and pie chart on
sample data.
Code:
Sample dataset – mtcars
df<-mtcars
hist(df$mpg, col='steelblue',main='Histogram',xlab='mpg',ylab='Frequency')
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
18
CSE 724 – Data Analytics Lab
Lab File
barplot(df$cyl, xlab="cyl", ylab="frequency", main="Bar-Chart", col='blue')
Output:
pie(df$mpg, labels='Pie chart', main='',col=c("brown","yellow","blue"),radius=1)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
19
CSE 724 – Data Analytics Lab
Lab File
Lab- 8
Aim: Write a program for Exploratory Data Analysis using Iris and Mtcars datasets.
Code:
Iris Dataset library(ggplot2)
library(dplyr)
data(iris)
str(iris)
Output:
summary(iris)
pairs(iris[, 1:4], col = iris$Species, pch = 19)
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
20
CSE 724 – Data Analytics Lab
Lab File
Output:
boxplot(Sepal.Length ~ Species,data=iris)
Output:
boxplot(Sepal.Width ~ Species, data=iris)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
21
CSE 724 – Data Analytics Lab
Lab File
Mtcars Dataset
library(ggplot2)
library(dplyr)
data(mtcars)
str(mtcars)
summary(mtcars)
Output:
pairs(mtcars)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
22
CSE 724 – Data Analytics Lab
Lab File
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
23
CSE 724 – Data Analytics Lab
Lab File
Lab -9
Aim: Write a program to find Correlation and Covariance on Iris dataset, plot the
correlation plot on dataset.
Code:
data(iris)
str(iris)
cor_matrix<-cor(iris[,1:4])
cov_matrix<-cov(iris[,1:4])
print(cor_matrix)
print(cov_matrix)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
24
CSE 724 – Data Analytics Lab
Lab File
# Correlation Plot
install.packages(“corrplot”)
library(corrplot)
cor_matrix<-cor(iris[,1:4])
corrplot(cor_matrix, method = "color", tl.col = "black", tl.srt = 45)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
25
CSE 724 – Data Analytics Lab
Lab File
Lab - 10
Aim: Write a program to apply regression model techniques to predict the data on a
dataset.
Code:
# Load the necessary library for linear regression
library(stats)
# Load your dataset or use an existing one
# For this example, let's use the mtcars dataset
data(mtcars)
# Explore the dataset
head(mtcars)
# Split the dataset into training and testing sets
set.seed(123)
split_index <- sample(1:nrow(mtcars), 0.8 * nrow(mtcars))
train_data <- mtcars[split_index, ]
test_data <- mtcars[-split_index, ]
# Build a linear regression model
model <- lm(mpg ~ hp, data = train_data)
# Make predictions on new data
new_data <- data.frame(hp = c(100, 150, 200)) # Example new data with horsepower values
predictions <- predict(model, newdata = new_data)
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
26
CSE 724 – Data Analytics Lab
Lab File
# Display the predictions
cat("Predictions:", predictions, "\n")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
27
CSE 724 – Data Analytics Lab
Lab File
Lab- 11
Aim: Write a program to apply K-Mean clustering on dataset.
Code:
install.packages ("ClusterR")
install.packages ("cluster")
library(ClusterR)
library(cluster)
iris_1<- iris [,5]
set.seed(240)
kmeans.re<- kmeans (iris_1, centers=3, nstart=20)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
28
CSE 724 – Data Analytics Lab
Lab File
kmeans.re$cluster
cm->table(iris$species, kmeans.re$cluster)
cm
Output:
plot(iris_1[c("Sepal.Length", "Sepal.width")])
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
29
CSE 724 – Data Analytics Lab
Lab File
plot(iris_1 [c (Sepal. Length", "Sepal.width")], col= kmeans.re$cluster)
Output:
plot(iris_1[c("Sepal.Length", "Sepal.width")], col= kmeans. re$cluster, main="K-means with
3 clusters")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
30
CSE 724 – Data Analytics Lab
Lab File
kmeans.re$centers
kmeans.re$centers[,c(“Sepal.Length”,”Sepal.Width”)]
Output:
points (kmeans. re$centers [,c("Sepal. Length", "Sepal.width")], col=1:3, pch=8, cex=3)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
31
CSE 724 – Data Analytics Lab
Lab File
clusplot(iris_1[c("Sepal.Length", "Sepal.width")],y_means, lines=0, shade=TRUE, color=
TRUE, labels=2, plotchar= FALSE, span= TRUE, main=paste("Cluster iris"),
xlab='Sepal.Length', ylab='Sepal.Width')
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
32
CSE 724 – Data Analytics Lab
Lab File
Lab- 12
Aim: Write a program for classification model and evaluate the performance of the
classifier
Code:
Decision tree
library(datasets)
library(caTools)
library(party)
library(dplyr)
library("magrittr")
data<-(readingSkills.package="party")
data
head(party::readingSkills)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
33
CSE 724 – Data Analytics Lab
Lab File
sample_data=sample.split(readingSkills, SplitRatio=0.8)
sample_data
train_data<-subset(readingSkills,sample_data==TRUE)
train_data
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
34
CSE 724 – Data Analytics Lab
Lab File
test_data<-subset(readingSkills,sample_data==FALSE)
test_data
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
35
CSE 724 – Data Analytics Lab
Lab File
model<- ctree(nativeSpeaker ~ . , train_data)
plot(model)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
36
CSE 724 – Data Analytics Lab
Lab File
predict_model<- predict(model, test_data)
mat<-table(test_data$nativeSpeaker, predict_model)
mat
ac_test<- sum(diag(mat))/sum(mat)
ac_test
print(paste(‘Accuracy for test is found to be’ , ac_test))
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
37
Download