Uploaded by Cyprian. Konyeha

R Project assignment

advertisement
Introduction to R for Data Science & Data Analytics Project
Perform k-means clustering analysis based on the two features in the Data.csv
using R.
The meaning of each feature is listed below.
• travel_time: the travel time (unit: minutes) for each resident based on a local
travel survey
• travel_cost: the travel cost (unit: $) for each resident based on a local travel survey
Answer the following questions:
1. Plot travel_cost (y-axis) against travel_time (x-axis) and comment on the pattern
of the points and the potential number of clusters via eyeballing. (5 points)
2. If we initially assume the total number of clusters is 2, perform k-means analysis.
Report the coordinates of the cluster centers. Visualize the clusters by plotting
travel_cost (y-axis) against travel_time (x-axis) and coloring points based on the
corresponding clusters. (20 points)
3. Apply the elbow method to determine the optimal number of clusters by testing
the following potential number of clusters: 1, 2, 3, 4, 5, 6, and 7. Generate the scree
plot. Comment on the optimal number of clusters based on the scree plot. (15 points)
4. Redo the k-means analysis based on the identified optimal number of clusters
from Question 3. Visualize the clusters by plotting travel_cost (y-axis) against
travel_time (x-axis) and coloring points based on the corresponding clusters. (15
points)
5. Calculate the Dunn index based on the k-means analysis results in Question 4. (10
points)
6. Determine the optimal number of clusters using the Dunn index by testing the
following potential number of clusters: 2, 3, 4, 5, 6, and 7. Plot the Dunn index (y-axis)
against the number of clusters (x-axis). Comment on the optimal number of clusters
based on the Dunn index. (20 points)
7. Redo the k-means analysis based on the identified optimal number of clusters
from Question 5. Visualize the clusters by plotting travel_cost (y-axis) against
travel_time (x-axis) and coloring points based on the corresponding clusters. (15
points)
Note: Please submit both a word document with answers to each question (plain
English explanations + screenshots of key steps and results) and an R script
Download