B-5.K-Mean Clustering

advertisement
Assignment: 5 (Group -B)
Regularity (2) Performance(5) Oral(3)
Total (10)
Dated Sign
Title of Assignment: K-Mean Clustering.
Problem Definition: To implement K-Mean Clustering.
Theory:
Objective:
Student will learn:
1. The Basic Concepts of Clustering: Centroid, Ecludian Distance.
2. Logic of K-Mean Clustering implementation.
5. Introduction:
A cluster is a subset of the set of data points that are close together, using some distance
measure. A definition of clustering could be the process of organizing data into groups whose
members are similar in some way. A cluster is therefore a collection of data points which are
similar" between them and are dissimilar" to the data points belonging to other clusters. There
are several methods for computing a clustering. One of the most important is the k-means
algorithm.
5.1 K-means Clustering:
1. We assume that we have n data points
columns in a matrix
which we organize as
2. Let
denote a partitioning of the data in X into k clusters:
3. Let the mean, or the centroid, of the cluster be
Where nj is the number of elements in πj.
4. We describe K-means algorithm based on the Euclidean distance measure.
The tightness or coherence of cluster πj can be measured as the sum
5. The closer the vectors are to the centroid, the smaller the value of qj . The quality of a
clustering can be measured as the overall coherence,
6. In the k-means algorithm we seek a partitioning that has optimal coherence, in the sense
that it is the solution of the minimization problem,
7. The K-means algorithm,
.

Initialization: Choose k initial centroids.

Form k clusters by assigning all data points to the closest centroid.

Recompute the centroid of each cluster.

Repeat the second and third steps until convergence.
The initial partitioning is often chosen randomly. The algorithm usually has rather fast
convergence, but one cannot guarantee that the algorithm finds the global minimum.
5.2 Algorithm:
Now we will see that how the K-Mean Clustering algorithm works?
Fig: K-Mean Clustering algorithm
5.3 Example:
Now we will see a simple but detailed explanation of example showing the implementation of kmeans algorithm (using K=2) as follows:
Table: k-means algorithm (using K=2)
Step 1:
Initialization: Randomly we choose following two centroids (k=2) for two clusters.
In this case the 2 centroid are: m1= (1.0, 1.0) and m2= (5.0, 7.0).
Table: Two centroids (k=2) for two clusters.
Step 2:
Thus, we obtain two clusters containing:
{1, 2, 3} and {4, 5, 6, 7}
Their new centroids are:
Step 3:
Now using these centroids we compute the Euclidean distance of each object, as shown in
table.
Therefore, the new clusters are:
{1, 2} and {3, 4, 5, 6, 7}
Next centroids are: m1= (1.25, 1.5) and m2 = (3.9, 5.1)
Step 4:
The clusters obtained are:
{1, 2} and {3, 4, 5, 6, 7}
Therefore, there is no change in the cluster.
Thus, the algorithm comes to a halt here and final result consists of 2 clusters {1, 2} and {3,
4, 5, 6, 7}.
Step 5: PLOT
5.4 Weighted K-Means:
The Weighted K-Means algorithm (abbreviated as WK-Means) described by Chan, Huang and
their collaborators (Chan et al., 2004; Huang et al., 2005; Huang et al., 2008) is a modification of
the K-Means criterion to include unknown weights, to the features v, v=1,..., M in the dataset.
Their approach to the problem relates feature weights to a set of patterns during the process of
clustering, aligned to the wrapper approach idea for feature selection.
The above equation shows the weighted k-means criterion which subject to,
According to the criterion, each feature weight should be non-negative and add to the unity. The
criterion has a user-defined parameter β, which expresses the rate of impact weights will have on
their contribution to the distance. The WK-Means algorithm follows an iterative optimization
similar to K-Means, and by consequence it is affected by some of its strengths, such as its
convergence in a finite number of iterations.
5.5 Conclusion: hence we studded how to implement K-Mean Clustering
Download