Uploaded by Roshni Singh

Roshni Singh Data analytics lab file

advertisement
Amity University Madhya Pradesh, Gwalior
Amity School of Engineering and Technology
Department of Computer Science and Engineering
CSE 724
Data Analytics Lab
Submitted to:
Submitted by:
Dr. Ghanshyam Prasad Dubey
Roshni Singh
Associate Professor, CSE
Enroll-No: A60205320008
B-Tech IT [C]
ASET, VII - Semester
Table of Content
S. No
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Name of Lab Experiment
Introduction of
functionalities,
disadvantages.
R along
features,
with fundamentals,
advantages
and
Program make a simple calculator that can add,
subtract, multiply and divide using functions
Write a program for Descriptive statistics in R using
Iris dataset.
Write a program for data visualization (plots graphs
and charts) using Iris dataset.
Write a program for Data analytics operation and data
visualization using Mtcars dataset.
Write a program for reading and writing datasets like
excel, csv.
Write a program for Visualizations: plot the histogram,
bar chart and pie chart on sample data.
Write a program for Exploratory Data Analysis using
Iris and Mtcars datasets.
Write a program to find Correlation and Covariance
on Iris dataset, plot the correlation plot on dataset.
Write a program to apply regression model techniques
to predict the data on a dataset.
Write a program to apply K-Mean clustering on
dataset.
Write a program for classification model and evaluate
the performance of the classifier?
Page No
1-2
3-5
6-7
8-12
13-16
17
18-19
20-23
24-25
26-27
28-32
33-37
Signature
CSE 724 – Data Analytics Lab
Lab File
Lab- 1
Aim: Introduction of R along with fundamentals, functionalities, features, advantages
and disadvantages.
Theory:
R is a popular programming language and environment specifically designed for statistical
computing and graphics. Let's break down its introduction, fundamentals, functionalities,
features, advantages, and disadvantages:
Introduction to R:
Purpose: R was developed by statisticians to perform data analysis, statistical modeling,
visualization, and machine learning.
Open Source: It's open-source and freely available, maintained by a vibrant community of
developers and statisticians.
Cross-platform: R runs on various platforms, including Windows, macOS, and Linux.
Fundamentals of R:
Data Structures: R has versatile data structures like vectors, matrices, arrays, data frames, and
lists.
Functions: Functions are fundamental in R; users can create their functions for specific tasks.
Packages: R has a vast repository of packages for specific tasks, extending its functionality.
Functionalities of R:
Statistical Analysis: R excels in statistical analysis, providing a wide range of functions and
packages for descriptive statistics, inferential statistics, regression, etc.
Data Visualization: It offers powerful visualization tools through packages like ggplot2 for
creating intricate and customizable plots and graphs.
Machine Learning: R provides numerous packages like caret, randomForest, and xgboost for
implementing machine learning algorithms.
Features of R:
Rich Set of Packages: A vast collection of packages available on CRAN (Comprehensive R
Archive Network) and other repositories.
Graphics Capabilities: High-quality graphical capabilities for data visualization and
exploration.
Community Support: Active and supportive community, providing help, packages, and
resources.
Advantages of R:
Statistical Analysis: Tailored for statistical analysis, making it powerful for data manipulation
and modeling.
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
1
CSE 724 – Data Analytics Lab
Lab File
Graphics and Visualization: Offers sophisticated and customizable visualization capabilities.
Community and Documentation: Strong community support and extensive documentation
available for users.
Disadvantages of R:
Learning Curve: Steeper learning curve, especially for beginners without a programming
background.
Speed and Memory Usage: Execution speed can be slower compared to some other languages
for certain tasks, and it might consume more memory.
Package Quality: Quality and documentation of some packages may vary.
Despite its drawbacks, R remains a prominent language in the data science community due to
its extensive statistical capabilities, visualization tools, and active user base contributing to its
development and improvement.
Result: Fundamental of R has been thoroughly studied.
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
2
CSE 724 – Data Analytics Lab
Lab File
Lab-2
Aim: Program make a simple calculator that can add, subtract, multiply and divide
using functions
Code:
# Function to add two numbers
add <- function(a, b) {
return(a + b)
}
# Function to subtract two numbers
subtract <- function(a, b) {
return(a - b)
}
# Function to multiply two numbers
multiply <- function(a, b) {
return(a * b)
}
# Function to divide two numbers
divide <- function(a, b) {
if (b != 0) {
return(a / b)
} else {
return("Cannot divide by zero")
}
}
# Function to perform calculations based on user input
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
3
CSE 724 – Data Analytics Lab
Lab File
calculate <- function() {
num1 <- as.numeric(readline("Enter first number: "))
num2 <- as.numeric(readline("Enter second number: "))
operation <- readline("Enter operation (+, -, *, /): ")
result <- if (operation == "+") {
add(num1, num2)
} else if (operation == "-") {
subtract(num1, num2)
} else if (operation == "*") {
multiply(num1, num2)
} else if (operation == "/") {
divide(num1, num2)
} else {
"Invalid operation"
}
cat("Result:", result, "\n")
}
# Call the calculate function to start the calculator
calculate()
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
4
CSE 724 – Data Analytics Lab
Lab File
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
5
CSE 724 – Data Analytics Lab
Lab File
Lab- 3
Aim: Write a program for Descriptive statistics in R using Iris dataset.
Code:
# Load the Iris dataset
data(iris)
# Display the structure of the Iris dataset
str(iris)
# Summary statistics for each numeric variable in the Iris dataset
summary(iris[, 1:4])
# Mean of each numeric variable by species
aggregate(iris[, 1:4], by = list(iris$Species), FUN = mean)
# Median of each numeric variable by species
aggregate(iris[, 1:4], by = list(iris$Species), FUN = median)
# Standard deviation of each numeric variable by species
aggregate(iris[, 1:4], by = list(iris$Species), FUN = sd)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
6
CSE 724 – Data Analytics Lab
Lab File
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
7
CSE 724 – Data Analytics Lab
Lab File
Lab- 4
Aim: Write a program for data visualization (plots graphs and charts) using Iris
dataset.
Code:
# Load the Iris dataset
data(iris)
# Scatter plot of Sepal Length vs Sepal Width colored by Species
plot(iris$Sepal.Length, iris$Sepal.Width, col = as.numeric(iris$Species),
xlab = "Sepal Length", ylab = "Sepal Width", main = "Sepal Length vs Sepal Width by
Species")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
8
CSE 724 – Data Analytics Lab
Lab File
# Boxplot of Petal Length by Species
boxplot(Petal.Length ~ Species, data = iris,
xlab = "Species", ylab = "Petal Length", main = "Petal Length by Species")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
9
CSE 724 – Data Analytics Lab
Lab File
# Histogram of Sepal Length
hist(iris$Sepal.Length, breaks = 15, col = "skyblue",
xlab = "Sepal Length", ylab = "Frequency", main = "Histogram of Sepal Length")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
10
CSE 724 – Data Analytics Lab
Lab File
# Density plot of Petal Width by Species
install.packages("ggplot2")
#install.packages("path_to_downloaded_package/ggplot2_3.4.4.zip", repos = NULL)
library(ggplot2)
ggplot(iris, aes(x = Petal.Width, fill = Species)) +
geom_density(alpha = 0.6) +
labs(x = "Petal Width", y = "Density", title = "Density Plot of Petal Width by Species")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
11
CSE 724 – Data Analytics Lab
Lab File
# Bar chart of Species counts
counts <- table(iris$Species)
barplot(counts, main = "Counts of Each Species", xlab = "Species", ylab = "Count", col =
"lightblue")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
12
CSE 724 – Data Analytics Lab
Lab File
Lab - 5
Aim: Write a program for Data analytics operation and data visualization using Mtcars
dataset.
Code:
# Load the mtcars dataset
data(mtcars)
# Display the structure of the mtcars dataset
str(mtcars)
# Summary statistics for numeric variables in mtcars
summary(mtcars)
# Correlation matrix for numeric variables in mtcars
cor(mtcars)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
13
CSE 724 – Data Analytics Lab
Lab File
# Boxplot of mpg by number of cylinders
boxplot(mpg ~ cyl, data = mtcars,
xlab = "Number of Cylinders", ylab = "Miles per Gallon",
main = "Boxplot of MPG by Number of Cylinders")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
14
CSE 724 – Data Analytics Lab
Lab File
# Scatterplot of horsepower vs. weight
plot(mtcars$wt, mtcars$hp,
xlab = "Weight", ylab = "Horsepower",
main = "Scatterplot of Horsepower vs. Weight",
col = ifelse(mtcars$cyl == 4, "red", ifelse(mtcars$cyl == 6, "blue", "green")),
pch = 19)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
15
CSE 724 – Data Analytics Lab
Lab File
# Bar chart of count of cars by number of gears
barplot(table(mtcars$gear),
main = "Count of Cars by Number of Gears",
xlab = "Number of Gears", ylab = "Count",
col = "skyblue")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
16
CSE 724 – Data Analytics Lab
Lab File
Lab - 6
Aim: Write a program for reading and writing datasets like excel, csv.
Code:
# Read a CSV file into a variable
data <- read.csv("C:/Users/chara/Documents/Datasets/Salary_Data.csv")
# Replace "your_file.csv" with the path to your CSV file
# Display the structure of the data
str(data)
# Perform operations or analysis on the data
# For example:
summary(data)
write.csv(data, "C:/Users/chara/Documents/Datasets/Housing.csv")
str(data)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
17
CSE 724 – Data Analytics Lab
Lab File
Lab -7
Aim: Write a program for Visualizations: plot the histogram, bar chart and pie chart on
sample data.
Code:
Sample dataset – mtcars
df<-mtcars
hist(df$mpg, col='steelblue',main='Histogram',xlab='mpg',ylab='Frequency')
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
18
CSE 724 – Data Analytics Lab
Lab File
barplot(df$cyl, xlab="cyl", ylab="frequency", main="Bar-Chart", col='blue')
Output:
pie(df$mpg, labels='Pie chart', main='',col=c("brown","yellow","blue"),radius=1)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
19
CSE 724 – Data Analytics Lab
Lab File
Lab- 8
Aim: Write a program for Exploratory Data Analysis using Iris and Mtcars datasets.
Code:
Iris Dataset library(ggplot2)
library(dplyr)
data(iris)
str(iris)
Output:
summary(iris)
pairs(iris[, 1:4], col = iris$Species, pch = 19)
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
20
CSE 724 – Data Analytics Lab
Lab File
Output:
boxplot(Sepal.Length ~ Species,data=iris)
Output:
boxplot(Sepal.Width ~ Species, data=iris)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
21
CSE 724 – Data Analytics Lab
Lab File
Mtcars Dataset
library(ggplot2)
library(dplyr)
data(mtcars)
str(mtcars)
summary(mtcars)
Output:
pairs(mtcars)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
22
CSE 724 – Data Analytics Lab
Lab File
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
23
CSE 724 – Data Analytics Lab
Lab File
Lab -9
Aim: Write a program to find Correlation and Covariance on Iris dataset, plot the
correlation plot on dataset.
Code:
data(iris)
str(iris)
cor_matrix<-cor(iris[,1:4])
cov_matrix<-cov(iris[,1:4])
print(cor_matrix)
print(cov_matrix)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
24
CSE 724 – Data Analytics Lab
Lab File
# Correlation Plot
install.packages(“corrplot”)
library(corrplot)
cor_matrix<-cor(iris[,1:4])
corrplot(cor_matrix, method = "color", tl.col = "black", tl.srt = 45)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
25
CSE 724 – Data Analytics Lab
Lab File
Lab - 10
Aim: Write a program to apply regression model techniques to predict the data on a
dataset.
Code:
# Load the necessary library for linear regression
library(stats)
# Load your dataset or use an existing one
# For this example, let's use the mtcars dataset
data(mtcars)
# Explore the dataset
head(mtcars)
# Split the dataset into training and testing sets
set.seed(123)
split_index <- sample(1:nrow(mtcars), 0.8 * nrow(mtcars))
train_data <- mtcars[split_index, ]
test_data <- mtcars[-split_index, ]
# Build a linear regression model
model <- lm(mpg ~ hp, data = train_data)
# Make predictions on new data
new_data <- data.frame(hp = c(100, 150, 200)) # Example new data with horsepower values
predictions <- predict(model, newdata = new_data)
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
26
CSE 724 – Data Analytics Lab
Lab File
# Display the predictions
cat("Predictions:", predictions, "\n")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
27
CSE 724 – Data Analytics Lab
Lab File
Lab- 11
Aim: Write a program to apply K-Mean clustering on dataset.
Code:
install.packages ("ClusterR")
install.packages ("cluster")
library(ClusterR)
library(cluster)
iris_1<- iris [,5]
set.seed(240)
kmeans.re<- kmeans (iris_1, centers=3, nstart=20)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
28
CSE 724 – Data Analytics Lab
Lab File
kmeans.re$cluster
cm->table(iris$species, kmeans.re$cluster)
cm
Output:
plot(iris_1[c("Sepal.Length", "Sepal.width")])
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
29
CSE 724 – Data Analytics Lab
Lab File
plot(iris_1 [c (Sepal. Length", "Sepal.width")], col= kmeans.re$cluster)
Output:
plot(iris_1[c("Sepal.Length", "Sepal.width")], col= kmeans. re$cluster, main="K-means with
3 clusters")
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
30
CSE 724 – Data Analytics Lab
Lab File
kmeans.re$centers
kmeans.re$centers[,c(“Sepal.Length”,”Sepal.Width”)]
Output:
points (kmeans. re$centers [,c("Sepal. Length", "Sepal.width")], col=1:3, pch=8, cex=3)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
31
CSE 724 – Data Analytics Lab
Lab File
clusplot(iris_1[c("Sepal.Length", "Sepal.width")],y_means, lines=0, shade=TRUE, color=
TRUE, labels=2, plotchar= FALSE, span= TRUE, main=paste("Cluster iris"),
xlab='Sepal.Length', ylab='Sepal.Width')
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
32
CSE 724 – Data Analytics Lab
Lab File
Lab- 12
Aim: Write a program for classification model and evaluate the performance of the
classifier
Code:
Decision tree
library(datasets)
library(caTools)
library(party)
library(dplyr)
library("magrittr")
data<-(readingSkills.package="party")
data
head(party::readingSkills)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
33
CSE 724 – Data Analytics Lab
Lab File
sample_data=sample.split(readingSkills, SplitRatio=0.8)
sample_data
train_data<-subset(readingSkills,sample_data==TRUE)
train_data
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
34
CSE 724 – Data Analytics Lab
Lab File
test_data<-subset(readingSkills,sample_data==FALSE)
test_data
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
35
CSE 724 – Data Analytics Lab
Lab File
model<- ctree(nativeSpeaker ~ . , train_data)
plot(model)
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
36
CSE 724 – Data Analytics Lab
Lab File
predict_model<- predict(model, test_data)
mat<-table(test_data$nativeSpeaker, predict_model)
mat
ac_test<- sum(diag(mat))/sum(mat)
ac_test
print(paste(‘Accuracy for test is found to be’ , ac_test))
Output:
Roshni Singh (A60205320008), B.Tech (IT), Section “C”, 7th Sem
37
Download