Uploaded by piyush barfa

Experiment 1 - Clustering

advertisement
Skyline University College Sharjah
United Arab Emirates
Big Data Analytics
Course Code: BIT4118
Student Name
Student ID
Remarks
EXPERIMENT NO:1
Name of experiment: - CLUSTERING
Goal: Understanding the intuition of Clustering using K-means clustering
Theory: Imagine a dataset consisting of several points spread over an n-dimensional space. In
order to find patterns over data points on a n-dimensional space, we use unsupervised methods.
One of the most popular unsupervised method is clustering. Clustering is the task of grouping
together a set of objects such that objects in the same cluster are more similar to each
other than objects in different cluster.
Clustering algorithms can be categorized based on their cluster model, in other words on how they
form clusters or groups. Some of the prominent based clustering algorithms are connectivity-based
clustering, centroid-based clustering, Distribution based clustering and density based methods.
In this exercise, centroid based clustering is implemented. In this type of clustering, clusters
are represented by a central vector or a centroid. This centroid might not necessarily be a
member of the dataset. This is an iterative clustering algorithm where the notion of similarity
is derived on how close the data point is to the center of the cluster. In this exercise, we will be
working on mall customers data.
Software Tools: R-Studio
Big Data Analytics
Experiment No. 1
Page 1 of 4
Skyline University College Sharjah
United Arab Emirates
Big Data Analytics
Course Code: BIT4118
Procedure:
1. First, we randomly initialize and select k-points. These k-points are the means
2. We use Euclidean distance to find data-points that are closed to their center of the
cluster
3. Then we calculate the mean of all the points in the cluster which is finding their
centroid
4. We iteratively repeat step 1, step 2, step 3 until all the points are assigned to their
respective clusters
CODE
library(tidyverse)
library(arules)
library(arulesViz)
library(knitr)
library(gridExtra)
library(lubridate)
library(readr)
library(cluster)
library(factoextra)
dataset <- read.csv('Mall_Customers.csv')
head(dataset)
kmeans2 <- kmeans(na.omit(dataset), centers = 5)
str(kmeans2)
fviz_cluster(kmeans2, data = dataset)
fviz_nbclust(dataset, kmeans, method = "wss")
OUTPUT
Big Data Analytics
Experiment No. 1
Page 2 of 4
Skyline University College Sharjah
United Arab Emirates
Big Data Analytics
Course Code: BIT4118
Summary of data
Big Data Analytics
Experiment No. 1
Page 3 of 4
Skyline University College Sharjah
United Arab Emirates
Big Data Analytics
Course Code: BIT4118
Big Data Analytics
Experiment No. 1
Page 4 of 4
Download