Discerning Linkage-Based Algorithms Among Hierarchical Clustering Methods Margareta Ackerman and Shai Ben-David IJCAI 2011 Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science …. All apply clustering to gain a first understanding of the structure of large data sets. The Theory-Practice Gap Both statements still apply today. Bridging the Theory-Practice Gap: Previous work • Axioms of clustering [(Kleinberg, NIPS 02), (Ackerman & Ben-David, NIPS 08), (Meila, NIPS 08)] • Clusterability [(Balcan, Blum, and Vempala, STOC 08), (Ackerman & Ben-David, AISTATS 09) ] Bridging the Theory-Practice Gap: Clustering algorithm selection There are a wide variety of clustering algorithms, which often produce very different clusterings. How should a user decide which algorithm to use for a given application? M. Ackerman, S. Ben-David, and D. Loker Our approach for clustering algorithm selection We propose a framework that lets a user utilize prior knowledge to select an algorithm • Identify properties that distinguish between the input-output behaviour of different clustering algorithms • The properties should be: 1) Intuitive and “user-friendly” 2) Useful for classifying clustering algorithms Previous Work in Property-Based Framework • A property-based classification of partitional clustering algorithms (Ackerman, Ben-David, and Loker, NIPS ‘10) • A characterizing of a single-linkage with the kstopping criteria (Zadeh and Ben-David, UAI 09) • A characterization of linkage-based clustering with the k-stopping criteria (Ackerman, Ben-David, and Loker, COLT ‘10) Our contributions • Extend the above property-based framework to the hierarchical clustering setting • Propose two intuitive properties that uniquely indentify hierarchical linkage-based clustering algorithms • Show that common hierarchical algorithms, including bisecting k-means, cannot be simulated by any linkage-based algorithm Outline • Define Linkage-Based clustering • Introduce two new properties of hierarchical clustering algorithms • Main result • Hierarchical clustering paradigms that are not linkage-based • Conclusions Formal Setup: Dendrograms and clusterings Dendrogram: A set C_i is a cluster in a dendrogram D if there exists a node in the dendrogram so that C_i is the set of its leaf descendents. Formal Setup: Dendrograms and clusterings C = {C1, … , Ck} is a clustering in a dendrogram D if – Ci is a cluster in D for all 1≤ i ≤ k, and – clusters are disjoint, Ci∩Cj = Ø for all 1≤ i<j ≤k. Formal Setup: Hierarchical clustering algorithm A Hierarchical Clustering Algorithm A maps Input: A data set X with a distance function d, denoted (X,d) to Output: A dendrogram of X Linkage-Based Algorithm An algorithm A is Linkage-Based if there exists a linkage-function l:{(X1, X2 ,d): d over X1uX2 }→ R+ such that for any (X,d), A(X,d) can be constructed as follows: • Create a single-node tree for every elements of X Linkage-Based Algorithm An algorithm A is Linkage-Based if there exists a linkage-function l:{(X1, X2 ,d): d over X1uX2 }→ R+ such that for any (X,d), A(X,d) can be constructed as follows: • Create a single-node tree for every elements of X • Repeat the following until a single tree remains: Merge the pair of trees whose element sets are closest according to l. Ex. Single-linkage, average-linkage, complete linkage Outline • Define Linkage-Based clustering • Introduce two new properties of hierarchical clustering algorithms • Main result • Hierarchical clustering paradigms that are not linkage-based • Conclusions Locality Informal Definition D = A(X,d) D’ = A(X’,d) X’={x1, …, x6} If we select a set of disjoint clusters from a dendrogram, and run the algorithm on the union of these clusters, we obtain a result that is consistent with the original dendrogram. Outer Consistency A(X,d) C •The outer-consistent change makes the clustering C more prominent. •If A is outer-consistent, then A(X,d’) will also include the clustering C. C on dataset (X,d’) C on dataset (X,d) Increase pairwise between-cluster distances Outline • Define Linkage-Based clustering • Introduce two new properties of hierarchical clustering algorithms • Main result • Hierarchical clustering paradigms that are not linkage-based • Conclusions Our Main Result Theorem: A hierarchical clustering function is Linkage-Based if and only if it is Local and Outer-Consistent. Brief Sketch of Proof Recall direction: If A satisfies Outer-Consistency and Locality, then A is Linkage-Based. Goal: Define a linkage function l so that the linkage-based clustering based on l outputs A(X,d) (for every X and d). Brief Sketch of Proof • Define an operator <A : (X,Y,d1) <A (Z,W,d2) if when we run A on (XuYuZuW,d), where d extends d1 and d2, X and Y are merged before Z and W. A(X,d) • Prove that <A can be extended to a partial ordering by proving that it is cycle-free • This implies that there exists an order preserving function l that maps pairs of data sets to R+. Z W X Y Outline • Define Linkage-Based clustering • Introduce two new properties of hierarchical clustering • Main result • Hierarchical clustering paradigms that are not linkage-based • Conclusions Hierarchical but Not Linkage-Based • P -Divisive algorithms construct dendrograms top-down using a partitional 2-clustering algorithm P to determine how to split nodes. • Many natural partitional 2-clustering algorithms satisfy the following property: A partitional 2-clustering algorithm P is Context Sensitive if there exist d⊂ d’ so that P({x,y,z),d) = {x, {y,z}} and P({x,y,z,w} ,d’)= {{x,y}, {z,w}}. Ex. K-means, min-sum, min-diameter, and further-centroids. Hierarchical but Not Linkage-Based Theorem: If P is context-sensitive, then the P –divisive algorithm fails the locality property. • The input-output behaviour of some natural divisive algorithms is distinct from that of all linkage-based algorithms. • The bisecting k-means algorithm, and other natural divisive algorithms, cannot be simulated by any linkage-based algorithm. Outline • Define Linkage-Based clustering • Introduce two new properties of hierarchical clustering algorithms • Main result • Hierarchical clustering paradigms that are not linkage-based • Conclusions Conclusions • We characterize hierarchical Linkage-Based clustering in terms of two intuitive properties. • Show that some natural hierarchical algorithms have different input-output behavior than any linkage-based algorithm. Locality D = A(X,d) D’ = A(X’,d) X’={x1, …, x6} For any clustering C = {C1, … , Ck} in D = A(X,d), • C is also a clustering in D’ = A(X’ = u Ci , d) • C i roots the same sub-dendrogram in both D and D’ • For all x,y in X’, x occurs below y in D iff the same holds in D’.