Clustering Ms. Rashmi Bhat What is Clustering?? Grouping of objects How will you group these together?? What is Clustering?? Option 1: By Type Option 2: By Color What is Clustering?? Option 3: By Shape What is Cluster Analysis?? The process of grouping a set of physical or abstract objects into classes of similar objects is called as Clustering. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Cluster analysis has been extensively focused mainly on distance-based cluster analysis. What is Cluster Analysis?? How clustering differs from classification??? What is Cluster Analysis?? Clustering is also called data segmentation Clustering is finding borders between groups, Segmenting is using borders to form groups Clustering is the method of creating segments. Clustering can also be used for outlier detection What is Cluster Analysis?? Classification: Supervised Learning Classes are predetermined Based on training data set Used to classify future observations Clustering : Unsupervised Learning Classes are not known in advance No prior knowledge Used to explore (understand) the data Clustering is a form of learning by observation, rather than learning by examples. Applications of Clustering Marketing: Segmentation of the customer based on behavior Banking: ATM Fraud detection (outlier detection) Gene analysis: Identifying gene responsible for a disease Image processing: Identifying objects on an image (face detection) Houses: Identifying groups of houses according to their house type, value, and geographical location Requirements of Clustering Analysis The following are typical requirements of clustering in data mining: Scalability Dealing with different types of attributes Discovering clusters with arbitrary shapes Ability to deal with noisy data Minimal requirements for domain knowledge to determine input parameters Incremental clustering High dimensionality Constraint-based clustering Interpretability and usability Distance Measures Cluster analysis has been extensively focused mainly on distance-based cluster analysis Distance is defined as the quantitative measure of how far apart two objects are. The similarity measure is the measure of how much alike two data objects are. If the distance is small, the features are having a high degree of similarity. Whereas a large distance will be a low degree of similarity. Generally, similarity are measured in the range 0 to 1 [0,1]. Similarity = 1 if X = Y Similarity = 0 if X ≠ Y (Where X, Y are two objects) Distance Measures Euclidean Distance Manhattan Distance Minkowski Distance Cosine Similarity Jaccard Similarity Distance Measures • The Euclidean distance between two points is the length of the path connecting them. • The Pythagorean theorem gives this distance between two points. 𝑫 𝑿, 𝒀 = 𝒙𝟐 − 𝒙𝟏 𝟐 + 𝒚𝟐 − 𝒚𝟏 𝟐 Distance Measures • Manhattan distance is a metric in which the distance between two points is calculated as the sum of the absolute differences of their Cartesian coordinates. • It is the total sum of the difference between the xcoordinates and y-coordinates. 𝑫 𝑨, 𝑩 = 𝒙𝟐 − 𝒙𝟏 + 𝒚𝟐 − 𝒚𝟏 Distance Measures • It is the generalized form of the Euclidean and Manhattan Distance Measure. 𝟏 𝒏 |𝒙𝒊 − 𝒚𝒊 |𝒑 𝑫 𝑿, 𝒀 = 𝒊=𝟏 𝒑 𝒑 𝒏 |𝒙𝒊 − 𝒚𝒊 |𝒑 = 𝒊=𝟏 Distance Measures • The cosine similarity metric finds the normalized dot product of the two attributes. • By determining the cosine similarity, we would effectively try to find the cosine of the angle between the two objects. • The cosine of 0° is 1, and it is less than 1 for any other angle. Distance Measures • When we consider Jaccard similarity these objects will be sets. |𝑨∪𝑩|=7 |𝑨∩𝑩|=2 𝐴∩𝐵 𝐽𝑎𝑐𝑐𝑎𝑟𝑑 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑱 𝑨, 𝑩 = 𝐴∪𝐵 = 2 7 = 0.286 Clustering Techniques Clustering techniques are categorized in following categories Partitioning Methods Hierarchical Methods Density-based Methods Grid-based Methods Model-based Methods Partitioning Method Construct a partition of a database 𝑫 of 𝒏 objects into 𝒌 clusters each cluster contains at least one object each object belongs to exactly one cluster Given a 𝒌, find a partition of 𝒌 clusters that optimizes the chosen partitioning criterion (min distance from cluster centers) Global optimal: exhaustively enumerate all partitions Stirling(n,k) (S(10,3) = 9.330, S(20,3) = 580.606.446,…) Heuristic methods: k-means and k-medoids algorithms k-means: Each cluster is represented by the center of the cluster. k-medoids or PAM (Partition around medoids): Each cluster is represented by one of the objects in the cluster. 𝑘-means Clustering Input: 𝒌 clusters, 𝒏 objects of database 𝑫. Output: Set of 𝒌 clusters minimizing squared error function Algorithm: 1. Arbitrarily choose 𝒌 objects from 𝑫 as the initial cluster centers; 2. Repeat 1. (Re)assign each object to the cluster to which the object is the most similar, based on the mean value of the objects in the cluster; 2. Update the cluster means, i.e., calculate the mean value of the objects for each cluster; 3. Until no change; 𝑘-means Clustering Example: Cluster the following data example into 3 clusters using k-means clustering and Euclidean distance Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 𝑘-means Clustering 1. Choose arbitrary 3 points as cluster centers Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 C1 = (2,1) C2 = (4,4) C3 = (2,3) 𝑘-means Clustering 2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 C1 = (2,1) C2 = (4,4) C3 = (2,3) 𝑫= 𝒙𝟐 − 𝒙𝟏 𝟐 𝟐 … . 𝑬𝒖𝒄𝒍𝒊𝒅𝒆𝒂𝒏 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 + 𝒚𝟐 − 𝒚𝟏 𝐷1 𝑃1, 𝐶1 = 2−2 2 + 1−5 2 = 16 = 4 Cluster1 = { } Cluster2 = { } Cluster3 = { } 𝐷1 𝑃1, 𝐶2 = 4−2 2 + 4−5 2 = 5 = 2.236 𝐷1 𝑃1, 𝐶3 = 2−2 2 + 3−5 2 = 4=2 Cluster1 = { } Cluster2 = { } Cluster3 = {(2,5)} 𝐷2 𝑃2, 𝐶1 = 2−2 2 + 1−1 2 =0 𝐷2 𝑃2, 𝐶2 = 4−2 2 + 4−1 2 = 13 = 3.605 𝐷2 𝑃2, 𝐶3 = 2−2 2 + 3−1 2 = 4=2 Cluster1 = {(2,1)} Cluster2 = { } Cluster3 = {(2,5)} Similarly, assign other points to appropriate cluster. 𝑘-means Clustering 2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 C1 = (2,1) C2 = (4,4) C3 = (2,3) Cluster1 = {(2,1)} Cluster2 = {(4,4),(7,1)} Cluster3 = {(2,5)} Cluster1 = { } Cluster2 = { } Cluster3 = {(2,5)} Cluster1 = {(2,1)} Cluster2 = {(4,4),(7,1), (3,5)} Cluster3 = {(2,5)} Cluster1 = {(2,1)} Cluster2 = { } Cluster3 = {(2,5)} Cluster1 = {(2,1)} Cluster2 = {(7,1)} Cluster3 = {(2,5)} Cluster1 = {(2,1), (1,2)} Cluster2 = {(4,4),(7,1), (3,5), (6,2), (6,1), (3,4)} Cluster3 = {(2,3),(2,5)} 𝑘-means Clustering 2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 𝑘-means Clustering 3. Update the cluster means Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 Old Cluster Centers: C1 = (2,1) C2 = (4,4) C3 = (2,3) Clusters: Cluster1 = {(2,1), (1,2), } Cluster2 = {(4,4),(7,1), (3,5), (6,2), (6,1), (3,4)} Cluster3 = {(2,3),(2,5)} Calculate the mean of the points in each cluster 𝑚𝑒𝑎𝑛1 = 𝑚𝑒𝑎𝑛2 = 𝑚𝑒𝑎𝑛3 = 2+1 1+2 , 2 2 4+7+3+6+6+3 4+1+5+2+1+4 , 6 6 2+2 3+5 , 2 2 New Cluster Centers: C1 = (1.5, 1.5) C2 = (4.83, 2.83) C3 = (2, 4) 𝑘-means Clustering 2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 𝑘-means Clustering 2. Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 𝑘-means Clustering 2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 Updated Cluster Centers: C1 = (1.5, 1.5) C2 = (4.83, 2.83) C3 = (2, 4) Cluster1 = { } Cluster2 = { } Cluster3 = {(2,5)} Cluster1 = {(2,1)} Cluster2 = { } Cluster3 = {(2,5)} 2 𝐷1 𝑃1, 𝐶1 = 1.5 − 2 𝐷1 𝑃1, 𝐶2 = 4.83 − 2 𝐷1 𝑃1, 𝐶3 = 2−2 𝐷2 𝑃2, 𝐶1 = 1.5 − 2 𝐷2 𝑃2, 𝐶2 = 4.83 − 2 2 + 1.5 − 5 2 + 2.83 − 5 + 4−5 2 2 2 2 2 = 3.566 = 0.707 + 2.83 − 1 𝐷2 𝑃2, 𝐶3 = 2 − 2 2 + 4 − 1 Similarly, assign other points to appropriate cluster. = 3.535 =1 + 1.5 − 1 2 2 =3 2 = 13 = 3.3701 𝑘-means Clustering 2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 Updated Cluster Centers: C1 = (1.5, 1.5) C2 = (4.83, 2.83) C3 = (2, 4) Cluster1 = {(2,1)} Cluster2 = { } Cluster3 = {(2,5)} Updated Clusters Cluster1 = {(2,1), (1,2) } Cluster2 = {(7,1), (4,4), (6,2), (6,1)} Cluster3 = {(3,5), (2,5), (3,4), (2,3)} 𝑘-means Clustering 2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 𝑘-means Clustering 2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 3. Update Cluster centers by repeating the process until there is no change in clusters Old Cluster Centers: C1 = (1.5, 1.5) C2 = (4.83, 2.83) C3 = (2, 4) New Cluster Centers: C1 = (1.5, 1.5) C2 = (5.75, 2) C3 = (2.5, 4.25) Updated Clusters Cluster1 = {(2,1), (1,2) } Cluster2 = {(7,1), (4,4), (6,2), (6,1)} Cluster3 = {(3,5), (2,5), (3,4), (2,3)} 𝑘-means Clustering 2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 𝑘-means Clustering 2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 3. Update Cluster centers by repeating the process until there is no change in clusters Old Cluster Centers: C1 = (1.5, 1.5) C2 = (5.75, 2) C3 = (2.5, 4.25) New Cluster Centers: C1 = (1.5, 1.5) C2 = (6.33, 1.33) C3 = (2.8, 4.2) Updated Clusters Cluster1 = {(2,1), (1,2) } Cluster2 = {(7,1), (6,2), (6,1)} Cluster3 = {(3,5), (2,5), (4,4), (3,4), (2,3)} 𝑘-means Clustering 2. Repeat: Assign each point to its closest cluster center. Calculate distance of each point from each cluster centers. And choose closest one. Point X Y P1 2 5 P2 2 1 P3 7 1 P4 3 5 P5 4 4 P6 6 2 P7 1 2 P8 6 1 P9 3 4 P10 2 3 𝑘-means Clustering Apply k-means algorithm for the following data set with two clusters. D={15, 16, 19, 20, 20, 21, 22, 28, 35, 40, 41, 42, 43, 44, 60, 61, 65} 𝑘-means Clustering Advantages: Relatively scalable and efficient in processing large data sets The computational complexity of the algorithm is 𝑂 𝑛𝑘𝑡 where 𝑛 is the total number of objects, 𝑘 is the number of clusters, and 𝑡 is the number of iterations This method terminates at a local optimum. Disadvantages: Can be applied only when the mean of a cluster is defined The necessity for users to specify 𝑘, the number of clusters, in advance. Sensitive to noise and outlier data points 𝑘-means Clustering How to cluster categorical data? Variant of 𝑘-means is used for clustering categorical data: 𝑘-modes Method Replace mean of cluster with mode of data A new dissimilarity measures to deal with categorical objects A frequency-based method to update modes of clusters. 𝑘-Medoids Clustering Picks actual objects to represent the clusters, using one representative object per cluster Each remaining object is clustered with the representative object to which it is the most similar. Partitioning method is then performed based on the principle of minimizing the sum of the dissimilarities between each object and its corresponding reference point Absolute Error criterion is used 𝑘 𝐸= 𝑝 − 𝑂𝑗 𝑗=1 𝑝∈𝑐𝑗 Where • 𝑝 is the point in space representing a given object in cluster 𝑐𝑗 • 𝑂𝑗 is the representative object of cluster 𝑐𝑗 𝑘-Medoids Clustering The iterative process of replacing representative objects by nonrepresentative objects continues as long as the quality of the resulting clustering is improved. cost function that measures the average dissimilarity between an object and the representative object of its cluster. Four cases are examined for each of the nonrepresentative objects, 𝑝. 𝑶𝒊 𝑶𝒊 𝑶𝒊 𝑶𝒋 𝒑 𝑶𝒋 𝒑 𝑶𝒓𝒂𝒏𝒅𝒐𝒎 Case 1 Before Swapping 𝑶𝒊 𝑶𝒋 𝒑 𝑶𝒓𝒂𝒏𝒅𝒐𝒎 Case 2 After Swapping 𝑶𝒓𝒂𝒏𝒅𝒐𝒎 Case 3 𝑶𝒋 𝒑 𝑶𝒓𝒂𝒏𝒅𝒐𝒎 Case 4 𝑘-Medoids Clustering Each time a reassignment occurs, a difference in absolute error, 𝐸, is contributed to the cost function. Therefore, the cost function calculates the difference in absolute-error value if a current representative object is replaced by a nonrepresentative object. The total cost of swapping is the sum of costs incurred by all nonrepresentative objects. If the total cost is negative, then 𝑂𝑗 is replaced or swapped with 𝑂𝑟𝑎𝑛𝑑𝑜𝑚 If the total cost is positive, the current representative object, 𝑂𝑗 , is considered acceptable, and nothing is changed. PAM(Partitioning AroundMedoids) was one of the first k-medoids algorithms 𝑘-Medoids Clustering Input: 𝑘 number of clusters, 𝑛 data objects from data set 𝐷 Output: a set of 𝑘 clusters Algorithm: 1. Arbitrarily select 𝑘 objects as the representative objects or seeds 2. Repeat 1. Assign each remaining objects to the cluster the nearest representative object 2. Randomly select the non- representative object 𝑂𝑟𝑎𝑛𝑑𝑜𝑚 3. Compute the total cost 𝑆 of swapping 𝑂𝑗 with 𝑂𝑟𝑎𝑛𝑑𝑜𝑚 4. If 𝑆 < 0, then swap 𝑂𝑗 with 𝑂𝑟𝑎𝑛𝑑𝑜𝑚 to form the new set of k representative objects 3. Until no change 𝑘-Medoids Clustering Data Objects X Y O1 2 6 O2 3 4 O3 3 8 O4 4 7 O5 6 2 O6 6 4 O7 7 3 O8 7 4 O9 8 5 O10 7 6 Aim: Create two Clusters Step 1: Choose randomly two medoids (representative objects) 𝑂3 = 3,8 𝑂8 = (7,4) 𝑘-Medoids Clustering Data Objects X Y O1 2 6 O2 3 4 O3 3 8 O4 4 7 O5 6 2 O6 6 4 O7 7 3 O8 7 4 O9 8 5 O10 7 6 Cluster Aim: Create two Clusters Step 2: Assign each object to the closest representative object Using Euclidean distance, we form following clusters 𝑘-Medoids Clustering Data Objects X Y Cluster O1 2 6 O2 3 4 O3 3 8 O4 4 7 O5 6 2 O6 6 4 O7 7 3 O8 7 4 O9 8 5 O10 7 6 C1 C1 C1 C1 C2 C2 C2 C2 C2 C2 Aim: Create two Clusters Step 2: Assign each object to the closest representative object Using Euclidean distance, we form following clusters C1={O1, O2, O3, O4} C2={O5, O6, O7, O8, O9, O10} 𝑘-Medoids Clustering Data Objects X Y Cluster O1 2 6 O2 3 4 O3 3 8 O4 4 7 O5 6 2 O6 6 4 O7 7 3 O8 7 4 O9 8 5 O10 7 6 C1 C1 C1 C1 C2 C2 C2 C2 C2 C2 Aim: Create two Clusters Step 3: Compute the absolute error (for the set of representative objects 𝑂3 and 𝑂8 ) 𝑘 𝐸= 𝑝 − 𝑂𝑗 𝑗=1 𝑝∈𝑐𝑗 𝑬 = 𝑶𝟏 − 𝑶𝟑 + 𝑶𝟐 − 𝑶𝟑 + 𝑶𝟑 − 𝑶𝟑 + 𝑶𝟒 − 𝑶𝟑 + 𝑶𝟓 − 𝑶𝟖 + 𝑶𝟔 − 𝑶𝟖 + 𝑶𝟕 − 𝑶𝟖 + 𝑶𝟖 − 𝑶𝟖 + 𝑶𝟗 − 𝑶𝟖 + 𝑶𝟏𝟎 − 𝑶𝟖 𝑶𝟏 − 𝑶𝟑 = 𝒙 𝟏 − 𝒙 𝟑 + 𝒚 𝟏 − 𝒚 𝟑 . . . . Manhattan Distance 𝑘-Medoids Clustering Data Objects X Y Cluster O1 2 6 O2 3 4 O3 3 8 O4 4 7 O5 6 2 O6 6 4 O7 7 3 O8 7 4 O9 8 5 O10 7 6 C1 C1 C1 C1 C2 C2 C2 C2 C2 C2 Aim: Create two Clusters Step 3: Compute the absolute error (for the set of representative objects 𝑂3 and 𝑂8 ) 𝑘 𝐸= 𝑝 − 𝑂𝑗 𝑗=1 𝑝∈𝑐𝑗 𝑬 = 𝑶𝟏 − 𝑶𝟑 + 𝑶𝟐 − 𝑶𝟑 + 𝑶𝟑 − 𝑶𝟑 + 𝑶𝟒 − 𝑶𝟑 + 𝑶𝟓 − 𝑶𝟖 + 𝑶𝟔 − 𝑶𝟖 + 𝑶𝟕 − 𝑶𝟖 + 𝑶𝟖 − 𝑶𝟖 + 𝑶𝟗 − 𝑶𝟖 + 𝑶𝟏𝟎 − 𝑶𝟖 𝑬= 𝟑+𝟒+𝟎+𝟐 + 𝟑+𝟏+𝟏+𝟎+𝟐+𝟐 𝑬 = 𝟏𝟖 𝑘-Medoids Clustering Data Objects X Y Cluster O1 2 6 O2 3 4 O3 3 8 O4 4 7 O5 6 2 O6 6 4 O7 7 3 O8 7 4 O9 8 5 O10 7 6 C1 C1 C1 C1 C2 C2 C2 C2 C2 C2 Aim: Create two Clusters Step 4: Choose a random object 𝑂9 Swap 𝑂8 and 𝑂9 Compute the absolute error (for the set of representative objects 𝑂3 and 𝑂9 ) 𝑘-Medoids Clustering Data Objects X Y Cluster O1 2 6 O2 3 4 O3 3 8 O4 4 7 O5 6 2 O6 6 4 O7 7 3 O8 7 4 O9 8 5 O10 7 6 C1 C1 C1 C1 C2 C2 C2 C2 C2 C2 Aim: Create two Clusters 𝑬 = 𝑶𝟏 − 𝑶𝟑 + 𝑶𝟐 − 𝑶𝟑 + 𝑶𝟑 − 𝑶𝟑 + 𝑶𝟒 − 𝑶𝟑 + 𝑶𝟓 − 𝑶𝟗 + 𝑶𝟔 − 𝑶𝟗 + 𝑶𝟕 − 𝑶𝟗 + 𝑶𝟖 − 𝑶𝟗 + 𝑶𝟗 − 𝑶𝟗 + 𝑶𝟏𝟎 − 𝑶𝟗 𝑬 = 𝟑 + 𝟒 + 𝟎 + 𝟐 + (𝟓 + 𝟑 + 𝟑 + 𝟐 + 𝟎 + 𝟐) 𝑬 = 𝟐𝟒 Step 5: Compute the cost function 𝑆 = 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 𝑓𝑜𝑟𝑶𝟑 , 𝑶𝟖 − 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 𝑓𝑜𝑟𝑶𝟑 , 𝑶𝟗 𝑆 = 18 − 24 = −6 As 𝑆 < 0, we swap 𝑶𝟖 with 𝑶𝟗 𝑘-Medoids Clustering Data Objects X Y Cluster O1 2 6 O2 3 4 O3 3 8 O4 4 7 O5 6 2 O6 6 4 O7 7 3 O8 7 4 O9 8 5 O10 7 6 C1 C1 C1 C1 C2 C2 C2 C2 C2 C2 Aim: Create two Clusters Step 6: New medoids are 𝑶𝟑 with 𝑶𝟗 Repeat Step 2 Assign each object to the closest representative object. 𝑘-Medoids Clustering Which method is more robust 𝑘-Means or 𝑘-Medoids? The k-medoids method is more robust than k-means in the presence of noise and outliers, because a medoid is less influenced by outliers or other extreme values than a mean. The processing of 𝑘-Medoids is more costly than the k-means method. Hierarchical Clustering Groups data objects into a tree of clusters. Hierarchical Clustering Methods Agglomerative Divisive Hierarchical Clustering Agglomerative Hierarchical Clustering Starts by placing each object in its own cluster Merges these atomic clusters into larger and larger clusters It will halt when all of the objects are in a single cluster or until certain termination conditions are satisfied. Bottom-Up Strategy. The user can specify the desired number of clusters as a termination condition. Hierarchical Clustering Application of Agglomerative NESting (AGNES) Hierarchical Clustering ABFCDEG Step 4 Step 3 CDEG ABF AB A CDE Step 2 CD B F C Step 1 D E G Step 0 Hierarchical Clustering Divisive Hierarchical Clustering Method Starting with all objects in one cluster. Subdivides the cluster into smaller and smaller pieces. It will halt when each object forms a cluster on its own or until it satisfies certain termination conditions Top-Down Strategy The user can specify the desired number of clusters as a termination condition. Hierarchical Clustering Application of DIvisive ANAlysis (DIANA) Hierarchical Clustering ABFCDEG Step 0 Step 1 CDEG ABF AB A CDE Step 2 Step 1 CD B F C D E G Step 0 Hierarchical Clustering A tree structure called a dendrogram is used to represent the process of hierarchical clustering. Fig. Dendrogram representation for hierarchical clustering of data objects {a, b, c, d, e} Hierarchical Clustering Four widely used measures for distance between clusters 𝒑 − 𝒑′ is distance between two objects 𝑝 and 𝑝′. 𝒎𝒊 is mean for cluster 𝑪𝒊 𝒏𝒊 is number of objects in cluster 𝑪𝒊 . 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒: 𝑑𝑚𝑖𝑛 𝐶𝑖 , 𝐶𝑗 = 𝑚𝑖𝑛𝑝∈𝐶𝑖 ,𝑝′∈𝐶𝑗 𝑝 − 𝑝′ 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒: 𝑑𝑚𝑎𝑥 𝐶𝑖 , 𝐶𝑗 = 𝑚𝑎𝑥𝑝∈𝐶𝑖 ,𝑝′∈𝐶𝑗 𝑝 − 𝑝′ 𝑀𝑒𝑎𝑛 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒: 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒: 𝑑𝑚𝑒𝑎𝑛 𝐶𝑖 , 𝐶𝑗 = 𝑚𝑖 − 𝑚𝑗 1 𝑑𝑎𝑣𝑔 𝐶𝑖 , 𝐶𝑗 = 𝑝 − 𝑝′ 𝑛𝑖 𝑛𝑗 𝑝∈𝐶𝑖 𝑝′∈𝐶𝑗 Hierarchical Clustering If an algorithm uses minimum distance measure, an algorithm is called a nearest-neighbor clustering algorithm. If the clustering process is terminated when the minimum distance between nearest clusters exceeds an arbitrary threshold, it is called a single-linkage algorithm. If an algorithm uses maximum distance measure, an algorithm is called a farthest-neighbor clustering algorithm. If the clustering process is terminated when the maximum distance between nearest clusters exceeds an arbitrary threshold, it is called a completelinkage algorithm. An agglomerative hierarchical clustering algorithm that uses the minimum distance measure is also called a minimal spanning tree algorithm.