Uploaded by Rashmi Bhat

Clustering

advertisement
Clustering
Ms. Rashmi Bhat
What is Clustering??
 Grouping of objects
How will you group these together??
What is Clustering??
Option 1: By Type
Option 2: By Color
What is Clustering??
Option 3: By Shape
What is Cluster Analysis??
The process of grouping a set of physical or abstract objects into classes of
similar objects is called as Clustering.
 A cluster is a collection of data objects that are similar to one another
within the same cluster and are dissimilar to the objects in other clusters.
 Cluster analysis has been extensively focused mainly on distance-based
cluster analysis.
What is Cluster Analysis??
 How clustering differs from classification???
What is Cluster Analysis??
 Clustering is also called data segmentation
 Clustering is finding borders between groups,
 Segmenting is using borders to form groups
 Clustering is the method of creating segments.
 Clustering can also be used for outlier detection
What is Cluster Analysis??
 Classification: Supervised Learning
Classes are predetermined
 Based on training data set
 Used to classify future observations

 Clustering : Unsupervised Learning
 Classes are not known in advance
 No prior knowledge
 Used to explore (understand) the data
 Clustering is a form of learning by observation, rather than learning by
examples.
Applications of Clustering
 Marketing:
 Segmentation of the customer based on behavior
 Banking:
 ATM Fraud detection (outlier detection)
 Gene analysis:
 Identifying gene responsible for a disease
 Image processing:
 Identifying objects on an image (face detection)
 Houses:
 Identifying groups of houses according to their house type, value, and geographical location
Requirements of Clustering Analysis
 The following are typical requirements of clustering in data mining:
 Scalability
 Dealing with different types of attributes
 Discovering clusters with arbitrary shapes
 Ability to deal with noisy data
 Minimal requirements for domain knowledge to determine input parameters
 Incremental clustering
 High dimensionality
 Constraint-based clustering
 Interpretability and usability
Distance Measures
 Cluster analysis has been extensively focused mainly on distance-based
cluster analysis
 Distance is defined as the quantitative measure of how far apart two objects are.
 The similarity measure is the measure of how much alike two data objects
are.
 If the distance is small, the features are having a high degree of similarity.
 Whereas a large distance will be a low degree of similarity.
 Generally, similarity are measured in the range 0 to 1 [0,1].
 Similarity = 1 if X = Y
 Similarity = 0 if X ≠ Y
(Where X, Y are two objects)
Distance
Measures
Euclidean Distance
Manhattan Distance
Minkowski Distance
Cosine Similarity
Jaccard Similarity
Distance
Measures
• The Euclidean distance between two points is the length of
the path connecting them.
• The Pythagorean theorem gives this distance between two
points.
𝑫 𝑿, 𝒀 =
𝒙𝟐 − 𝒙𝟏
𝟐
+ 𝒚𝟐 − 𝒚𝟏
𝟐
Distance
Measures
• Manhattan distance is a metric in which the distance
between two points is calculated as the sum of the
absolute differences of their Cartesian coordinates.
• It is the total sum of the difference between the xcoordinates and y-coordinates.
𝑫 𝑨, 𝑩 = 𝒙𝟐 − 𝒙𝟏 + 𝒚𝟐 − 𝒚𝟏
Distance
Measures
• It is the generalized form of the Euclidean and Manhattan
Distance Measure.
𝟏
𝒏
|𝒙𝒊 − 𝒚𝒊 |𝒑
𝑫 𝑿, 𝒀 =
𝒊=𝟏
𝒑
𝒑
𝒏
|𝒙𝒊 − 𝒚𝒊 |𝒑
=
𝒊=𝟏
Distance
Measures
• The cosine similarity metric finds the normalized dot
product of the two attributes.
• By determining the cosine similarity, we would
effectively try to find the cosine of the angle between
the two objects.
• The cosine of 0° is 1, and it is less than 1 for any
other angle.
Distance
Measures
• When we consider Jaccard similarity these objects
will be sets.
|𝑨∪𝑩|=7
|𝑨∩𝑩|=2
𝐴∩𝐵
𝐽𝑎𝑐𝑐𝑎𝑟𝑑 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑱 𝑨, 𝑩 =
𝐴∪𝐵
=
2
7
= 0.286
Clustering Techniques
 Clustering techniques are categorized in following categories
Partitioning Methods
Hierarchical Methods
Density-based Methods
Grid-based Methods
Model-based Methods
Partitioning Method
 Construct a partition of a database 𝑫 of 𝒏 objects into 𝒌 clusters
 each cluster contains at least one object
 each object belongs to exactly one cluster
 Given a 𝒌, find a partition of 𝒌 clusters that optimizes the chosen
partitioning criterion (min distance from cluster centers)
 Global optimal: exhaustively enumerate all partitions Stirling(n,k)
(S(10,3) = 9.330, S(20,3) = 580.606.446,…)
 Heuristic methods: k-means and k-medoids algorithms
 k-means: Each cluster is represented by the center of the cluster.
 k-medoids or PAM (Partition around medoids): Each cluster is represented by one of
the objects in the cluster.
𝑘-means Clustering
Input:
𝒌 clusters, 𝒏 objects of database 𝑫.
Output:
Set of 𝒌 clusters minimizing squared error function
Algorithm:
1. Arbitrarily choose 𝒌 objects from 𝑫 as the initial cluster centers;
2. Repeat
1. (Re)assign each object to the cluster to which the object is the most similar, based on
the mean value of the objects in the cluster;
2. Update the cluster means, i.e., calculate the mean value of the objects for each cluster;
3. Until no change;
𝑘-means Clustering
Example: Cluster the following data example into 3 clusters using k-means clustering and Euclidean
distance
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
𝑘-means Clustering
1. Choose arbitrary 3 points as cluster centers
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
C1 = (2,1)
C2 = (4,4)
C3 = (2,3)
𝑘-means Clustering
2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster
centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
C1 = (2,1)
C2 = (4,4)
C3 = (2,3)
𝑫=
𝒙𝟐 − 𝒙𝟏
𝟐
𝟐 … . 𝑬𝒖𝒄𝒍𝒊𝒅𝒆𝒂𝒏 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆
+ 𝒚𝟐 − 𝒚𝟏
𝐷1 𝑃1, 𝐶1 =
2−2
2
+ 1−5
2
= 16 = 4
Cluster1 = { }
Cluster2 = { }
Cluster3 = { }
𝐷1 𝑃1, 𝐶2 =
4−2
2
+ 4−5
2
= 5 = 2.236
𝐷1 𝑃1, 𝐶3 =
2−2
2
+ 3−5
2
= 4=2
Cluster1 = { }
Cluster2 = { }
Cluster3 = {(2,5)}
𝐷2 𝑃2, 𝐶1 =
2−2
2
+ 1−1
2
=0
𝐷2 𝑃2, 𝐶2 =
4−2
2
+ 4−1
2
= 13 = 3.605
𝐷2 𝑃2, 𝐶3 =
2−2
2
+ 3−1
2
= 4=2
Cluster1 = {(2,1)}
Cluster2 = { }
Cluster3 = {(2,5)}
Similarly, assign other points to appropriate cluster.
𝑘-means Clustering
2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster
centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
C1 = (2,1)
C2 = (4,4)
C3 = (2,3)
Cluster1 = {(2,1)}
Cluster2 = {(4,4),(7,1)}
Cluster3 = {(2,5)}
Cluster1 = { }
Cluster2 = { }
Cluster3 = {(2,5)}
Cluster1 = {(2,1)}
Cluster2 = {(4,4),(7,1), (3,5)}
Cluster3 = {(2,5)}
Cluster1 = {(2,1)}
Cluster2 = { }
Cluster3 = {(2,5)}
Cluster1 = {(2,1)}
Cluster2 = {(7,1)}
Cluster3 = {(2,5)}
Cluster1 = {(2,1), (1,2)}
Cluster2 = {(4,4),(7,1), (3,5), (6,2), (6,1), (3,4)}
Cluster3 = {(2,3),(2,5)}
𝑘-means Clustering
2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster
centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
𝑘-means Clustering
3. Update the cluster means
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
Old Cluster Centers:
C1 = (2,1)
C2 = (4,4)
C3 = (2,3)
Clusters:
Cluster1 = {(2,1), (1,2), }
Cluster2 = {(4,4),(7,1), (3,5), (6,2), (6,1), (3,4)}
Cluster3 = {(2,3),(2,5)}
Calculate the mean of the points in each cluster
𝑚𝑒𝑎𝑛1 =
𝑚𝑒𝑎𝑛2 =
𝑚𝑒𝑎𝑛3 =
2+1 1+2
,
2
2
4+7+3+6+6+3 4+1+5+2+1+4
,
6
6
2+2 3+5
,
2
2
New Cluster Centers:
C1 = (1.5, 1.5)
C2 = (4.83, 2.83)
C3 = (2, 4)
𝑘-means Clustering
2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster
centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
𝑘-means Clustering
2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster
centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
𝑘-means Clustering
2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each
cluster centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
Updated Cluster Centers:
C1 = (1.5, 1.5)
C2 = (4.83, 2.83)
C3 = (2, 4)
Cluster1 = { }
Cluster2 = { }
Cluster3 = {(2,5)}
Cluster1 = {(2,1)}
Cluster2 = { }
Cluster3 = {(2,5)}
2
𝐷1 𝑃1, 𝐶1 =
1.5 − 2
𝐷1 𝑃1, 𝐶2 =
4.83 − 2
𝐷1 𝑃1, 𝐶3 =
2−2
𝐷2 𝑃2, 𝐶1 =
1.5 − 2
𝐷2 𝑃2, 𝐶2 =
4.83 − 2
2
+ 1.5 − 5
2
+ 2.83 − 5
+ 4−5
2
2
2
2
2
= 3.566
= 0.707
+ 2.83 − 1
𝐷2 𝑃2, 𝐶3 = 2 − 2 2 + 4 − 1
Similarly, assign other points to appropriate cluster.
= 3.535
=1
+ 1.5 − 1
2
2
=3
2
= 13 = 3.3701
𝑘-means Clustering
2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each
cluster centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
Updated Cluster Centers:
C1 = (1.5, 1.5)
C2 = (4.83, 2.83)
C3 = (2, 4)
Cluster1 = {(2,1)}
Cluster2 = { }
Cluster3 = {(2,5)}
Updated Clusters
Cluster1 = {(2,1), (1,2) }
Cluster2 = {(7,1), (4,4), (6,2), (6,1)}
Cluster3 = {(3,5), (2,5), (3,4), (2,3)}
𝑘-means Clustering
2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each
cluster centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
𝑘-means Clustering
2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each
cluster centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
3. Update Cluster centers by repeating the process until there is no
change in clusters
Old Cluster Centers:
C1 = (1.5, 1.5)
C2 = (4.83, 2.83)
C3 = (2, 4)
New Cluster Centers:
C1 = (1.5, 1.5)
C2 = (5.75, 2)
C3 = (2.5, 4.25)
Updated Clusters
Cluster1 = {(2,1), (1,2) }
Cluster2 = {(7,1), (4,4), (6,2), (6,1)}
Cluster3 = {(3,5), (2,5), (3,4), (2,3)}
𝑘-means Clustering
2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each
cluster centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
𝑘-means Clustering
2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each
cluster centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
3. Update Cluster centers by repeating the process until there is no
change in clusters
Old Cluster Centers:
C1 = (1.5, 1.5)
C2 = (5.75, 2)
C3 = (2.5, 4.25)
New Cluster Centers:
C1 = (1.5, 1.5)
C2 = (6.33, 1.33)
C3 = (2.8, 4.2)
Updated Clusters
Cluster1 = {(2,1), (1,2) }
Cluster2 = {(7,1), (6,2), (6,1)}
Cluster3 = {(3,5), (2,5), (4,4), (3,4), (2,3)}
𝑘-means Clustering
2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each
cluster centers. And choose closest one.
Point
X
Y
P1
2
5
P2
2
1
P3
7
1
P4
3
5
P5
4
4
P6
6
2
P7
1
2
P8
6
1
P9
3
4
P10
2
3
𝑘-means Clustering
Apply k-means algorithm for the following data set with two
clusters.
D={15, 16, 19, 20, 20, 21, 22, 28, 35, 40, 41, 42, 43, 44, 60, 61, 65}
𝑘-means Clustering
 Advantages:
 Relatively scalable and efficient in processing large data sets
 The computational complexity of the algorithm is 𝑂 𝑛𝑘𝑡
 where 𝑛 is the total number of objects, 𝑘 is the number of clusters, and 𝑡 is the number of iterations
 This method terminates at a local optimum.
Disadvantages:
 Can be applied only when the mean of a cluster is defined
 The necessity for users to specify 𝑘, the number of clusters, in advance.
 Sensitive to noise and outlier data points
𝑘-means Clustering
 How to cluster categorical data?
 Variant of 𝑘-means is used for clustering categorical data: 𝑘-modes Method
 Replace mean of cluster with mode of data
 A new dissimilarity measures to deal with categorical objects
 A frequency-based method to update modes of clusters.
𝑘-Medoids Clustering
 Picks actual objects to represent the clusters, using one representative object
per cluster
 Each remaining object is clustered with the representative object to which it is
the most similar.
 Partitioning method is then performed based on the principle of minimizing
the sum of the dissimilarities between each object and its corresponding
reference point
 Absolute Error criterion is used
𝑘
𝐸=
𝑝 − 𝑂𝑗
𝑗=1 𝑝∈𝑐𝑗
Where
• 𝑝 is the point in space representing
a given object in cluster 𝑐𝑗
• 𝑂𝑗 is the representative object of
cluster 𝑐𝑗
𝑘-Medoids Clustering
 The iterative process of replacing representative objects by nonrepresentative
objects continues as long as the quality of the resulting clustering is improved.
 cost function that measures the average dissimilarity between an object and the
representative object of its cluster.
 Four cases are examined for each of the nonrepresentative objects, 𝑝.
𝑶𝒊
𝑶𝒊
𝑶𝒊
𝑶𝒋
𝒑
𝑶𝒋
𝒑
𝑶𝒓𝒂𝒏𝒅𝒐𝒎
Case 1
Before Swapping
𝑶𝒊
𝑶𝒋
𝒑
𝑶𝒓𝒂𝒏𝒅𝒐𝒎
Case 2
After Swapping
𝑶𝒓𝒂𝒏𝒅𝒐𝒎
Case 3
𝑶𝒋
𝒑
𝑶𝒓𝒂𝒏𝒅𝒐𝒎
Case 4
𝑘-Medoids Clustering
 Each time a reassignment occurs, a difference in absolute error, 𝐸, is
contributed to the cost function.
 Therefore, the cost function calculates the difference in absolute-error value if
a current representative object is replaced by a nonrepresentative object.
 The total cost of swapping is the sum of costs incurred by all nonrepresentative
objects.
 If the total cost is negative, then 𝑂𝑗 is replaced or swapped with 𝑂𝑟𝑎𝑛𝑑𝑜𝑚
 If the total cost is positive, the current representative object, 𝑂𝑗 , is considered acceptable, and
nothing is changed.
 PAM(Partitioning AroundMedoids) was one of the first k-medoids algorithms
𝑘-Medoids Clustering
Input: 𝑘 number of clusters, 𝑛 data objects from data set 𝐷
Output: a set of 𝑘 clusters
Algorithm:
1. Arbitrarily select 𝑘 objects as the representative objects or seeds
2. Repeat
1. Assign each remaining objects to the cluster the nearest representative object
2. Randomly select the non- representative object 𝑂𝑟𝑎𝑛𝑑𝑜𝑚
3. Compute the total cost 𝑆 of swapping 𝑂𝑗 with 𝑂𝑟𝑎𝑛𝑑𝑜𝑚
4. If 𝑆 < 0, then swap 𝑂𝑗 with 𝑂𝑟𝑎𝑛𝑑𝑜𝑚 to form the new set of k representative objects
3. Until no change
𝑘-Medoids Clustering
Data Objects
X
Y
O1
2
6
O2
3
4
O3
3
8
O4
4
7
O5
6
2
O6
6
4
O7
7
3
O8
7
4
O9
8
5
O10
7
6
Aim: Create two Clusters
Step 1:
Choose randomly two medoids
(representative objects)
𝑂3 = 3,8
𝑂8 = (7,4)
𝑘-Medoids Clustering
Data Objects
X
Y
O1
2
6
O2
3
4
O3
3
8
O4
4
7
O5
6
2
O6
6
4
O7
7
3
O8
7
4
O9
8
5
O10
7
6
Cluster
Aim: Create two Clusters
Step 2:
Assign each object to the closest
representative object
Using Euclidean distance, we
form following clusters
𝑘-Medoids Clustering
Data Objects
X
Y
Cluster
O1
2
6
O2
3
4
O3
3
8
O4
4
7
O5
6
2
O6
6
4
O7
7
3
O8
7
4
O9
8
5
O10
7
6
C1
C1
C1
C1
C2
C2
C2
C2
C2
C2
Aim: Create two Clusters
Step 2:
Assign each object to the closest
representative object
Using Euclidean distance, we
form following clusters
C1={O1, O2, O3, O4}
C2={O5, O6, O7, O8, O9, O10}
𝑘-Medoids Clustering
Data Objects
X
Y
Cluster
O1
2
6
O2
3
4
O3
3
8
O4
4
7
O5
6
2
O6
6
4
O7
7
3
O8
7
4
O9
8
5
O10
7
6
C1
C1
C1
C1
C2
C2
C2
C2
C2
C2
Aim: Create two Clusters
Step 3:
Compute the absolute error (for the set of representative objects 𝑂3 and 𝑂8 )
𝑘
𝐸=
𝑝 − 𝑂𝑗
𝑗=1 𝑝∈𝑐𝑗
𝑬 = 𝑶𝟏 − 𝑶𝟑 + 𝑶𝟐 − 𝑶𝟑 + 𝑶𝟑 − 𝑶𝟑 + 𝑶𝟒 − 𝑶𝟑
+ 𝑶𝟓 − 𝑶𝟖 + 𝑶𝟔 − 𝑶𝟖 + 𝑶𝟕 − 𝑶𝟖 + 𝑶𝟖 − 𝑶𝟖 + 𝑶𝟗 − 𝑶𝟖 + 𝑶𝟏𝟎 − 𝑶𝟖
𝑶𝟏 − 𝑶𝟑 = 𝒙 𝟏 − 𝒙 𝟑 + 𝒚 𝟏 − 𝒚 𝟑
. . . . Manhattan Distance
𝑘-Medoids Clustering
Data Objects
X
Y
Cluster
O1
2
6
O2
3
4
O3
3
8
O4
4
7
O5
6
2
O6
6
4
O7
7
3
O8
7
4
O9
8
5
O10
7
6
C1
C1
C1
C1
C2
C2
C2
C2
C2
C2
Aim: Create two Clusters
Step 3:
Compute the absolute error (for the set of representative objects 𝑂3 and 𝑂8 )
𝑘
𝐸=
𝑝 − 𝑂𝑗
𝑗=1 𝑝∈𝑐𝑗
𝑬 = 𝑶𝟏 − 𝑶𝟑 + 𝑶𝟐 − 𝑶𝟑 + 𝑶𝟑 − 𝑶𝟑 + 𝑶𝟒 − 𝑶𝟑
+ 𝑶𝟓 − 𝑶𝟖 + 𝑶𝟔 − 𝑶𝟖 + 𝑶𝟕 − 𝑶𝟖 + 𝑶𝟖 − 𝑶𝟖 + 𝑶𝟗 − 𝑶𝟖 + 𝑶𝟏𝟎 − 𝑶𝟖
𝑬= 𝟑+𝟒+𝟎+𝟐 + 𝟑+𝟏+𝟏+𝟎+𝟐+𝟐
𝑬 = 𝟏𝟖
𝑘-Medoids Clustering
Data Objects
X
Y
Cluster
O1
2
6
O2
3
4
O3
3
8
O4
4
7
O5
6
2
O6
6
4
O7
7
3
O8
7
4
O9
8
5
O10
7
6
C1
C1
C1
C1
C2
C2
C2
C2
C2
C2
Aim: Create two Clusters
Step 4:
Choose a random object 𝑂9
Swap 𝑂8 and 𝑂9
Compute the absolute error (for
the set of representative objects
𝑂3 and 𝑂9 )
𝑘-Medoids Clustering
Data Objects
X
Y
Cluster
O1
2
6
O2
3
4
O3
3
8
O4
4
7
O5
6
2
O6
6
4
O7
7
3
O8
7
4
O9
8
5
O10
7
6
C1
C1
C1
C1
C2
C2
C2
C2
C2
C2
Aim: Create two Clusters
𝑬 = 𝑶𝟏 − 𝑶𝟑 + 𝑶𝟐 − 𝑶𝟑 + 𝑶𝟑 − 𝑶𝟑 + 𝑶𝟒 − 𝑶𝟑
+ 𝑶𝟓 − 𝑶𝟗 + 𝑶𝟔 − 𝑶𝟗 + 𝑶𝟕 − 𝑶𝟗 + 𝑶𝟖 − 𝑶𝟗 + 𝑶𝟗 − 𝑶𝟗 + 𝑶𝟏𝟎 − 𝑶𝟗
𝑬 = 𝟑 + 𝟒 + 𝟎 + 𝟐 + (𝟓 + 𝟑 + 𝟑 + 𝟐 + 𝟎 + 𝟐)
𝑬 = 𝟐𝟒
Step 5:
Compute the cost function
𝑆 = 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 𝑓𝑜𝑟𝑶𝟑 , 𝑶𝟖 − 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 𝑓𝑜𝑟𝑶𝟑 , 𝑶𝟗
𝑆 = 18 − 24 = −6
As 𝑆 < 0, we swap 𝑶𝟖 with 𝑶𝟗
𝑘-Medoids Clustering
Data Objects
X
Y
Cluster
O1
2
6
O2
3
4
O3
3
8
O4
4
7
O5
6
2
O6
6
4
O7
7
3
O8
7
4
O9
8
5
O10
7
6
C1
C1
C1
C1
C2
C2
C2
C2
C2
C2
Aim: Create two Clusters
Step 6:
New medoids are 𝑶𝟑 with 𝑶𝟗
Repeat Step 2
Assign each object to the
closest representative object.
𝑘-Medoids Clustering
 Which method is more robust 𝑘-Means or 𝑘-Medoids?
 The k-medoids method is more robust than k-means in the presence of noise and outliers,
because a medoid is less influenced by outliers or other extreme values than a mean.
 The processing of 𝑘-Medoids is more costly than the k-means method.
Hierarchical Clustering
 Groups data objects into a tree of clusters.
Hierarchical
Clustering
Methods
Agglomerative
Divisive
Hierarchical Clustering
 Agglomerative Hierarchical Clustering
 Starts by placing each object in its own cluster
 Merges these atomic clusters into larger and larger clusters
 It will halt when all of the objects are in a single cluster or until certain termination
conditions are satisfied.
 Bottom-Up Strategy.
 The user can specify the desired number of clusters as a termination condition.
Hierarchical Clustering
Application of Agglomerative NESting
(AGNES) Hierarchical Clustering
ABFCDEG
Step 4
Step 3
CDEG
ABF
AB
A
CDE
Step 2
CD
B
F
C
Step 1
D
E
G Step 0
Hierarchical Clustering
 Divisive Hierarchical Clustering Method
 Starting with all objects in one cluster.
 Subdivides the cluster into smaller and smaller pieces.
 It will halt when each object forms a cluster on its own or until it satisfies certain termination
conditions
 Top-Down Strategy
 The user can specify the desired number of clusters as a termination condition.
Hierarchical Clustering
Application of DIvisive ANAlysis
(DIANA) Hierarchical Clustering
ABFCDEG
Step 0
Step 1
CDEG
ABF
AB
A
CDE
Step 2
Step 1
CD
B
F
C
D
E
G Step 0
Hierarchical Clustering
 A tree structure called a dendrogram is used to represent the process of
hierarchical clustering.
Fig. Dendrogram representation for hierarchical clustering of data objects {a, b, c, d, e}
Hierarchical Clustering
 Four widely used measures for distance between clusters
 𝒑 − 𝒑′ is distance between two objects 𝑝 and 𝑝′.
 𝒎𝒊 is mean for cluster 𝑪𝒊
 𝒏𝒊 is number of objects in cluster 𝑪𝒊 .
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒:
𝑑𝑚𝑖𝑛 𝐶𝑖 , 𝐶𝑗 = 𝑚𝑖𝑛𝑝∈𝐶𝑖 ,𝑝′∈𝐶𝑗 𝑝 − 𝑝′
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒:
𝑑𝑚𝑎𝑥 𝐶𝑖 , 𝐶𝑗 = 𝑚𝑎𝑥𝑝∈𝐶𝑖 ,𝑝′∈𝐶𝑗 𝑝 − 𝑝′
𝑀𝑒𝑎𝑛 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒:
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒:
𝑑𝑚𝑒𝑎𝑛 𝐶𝑖 , 𝐶𝑗 = 𝑚𝑖 − 𝑚𝑗
1
𝑑𝑎𝑣𝑔 𝐶𝑖 , 𝐶𝑗 =
𝑝 − 𝑝′
𝑛𝑖 𝑛𝑗
𝑝∈𝐶𝑖 𝑝′∈𝐶𝑗
Hierarchical Clustering
 If an algorithm uses minimum distance measure, an algorithm is called a
nearest-neighbor clustering algorithm.
If the clustering process is terminated when the minimum distance between
nearest clusters exceeds an arbitrary threshold, it is called a single-linkage
algorithm.
 If an algorithm uses maximum distance measure, an algorithm is called a
farthest-neighbor clustering algorithm.
 If the clustering process is terminated when the maximum distance between
nearest clusters exceeds an arbitrary threshold, it is called a completelinkage algorithm.
 An agglomerative hierarchical clustering algorithm that uses the minimum
distance measure is also called a minimal spanning tree algorithm.
Download