Discerning Linkage-Based Algorithms Among Hierarchical

advertisement
Discerning Linkage-Based Algorithms
Among Hierarchical Clustering Methods
Margareta Ackerman
and
Shai Ben-David
IJCAI 2011
Clustering is one of the most widely used tools
for exploratory data analysis.
Social Sciences
Biology
Astronomy
Computer Science
….
All apply clustering to gain a first understanding of
the structure of large data sets.
The Theory-Practice Gap
Both statements still apply today.
Bridging the Theory-Practice Gap:
Previous work
• Axioms of clustering [(Kleinberg, NIPS 02),
(Ackerman & Ben-David, NIPS 08), (Meila, NIPS
08)]
• Clusterability [(Balcan, Blum, and Vempala,
STOC 08), (Ackerman & Ben-David, AISTATS 09) ]
Bridging the Theory-Practice Gap:
Clustering algorithm selection
There are a wide variety of clustering
algorithms, which often produce very
different clusterings.
How should a user decide which algorithm to
use for a given application?
M. Ackerman, S. Ben-David, and D. Loker
Our approach for clustering algorithm selection
We propose a framework that lets a user utilize prior
knowledge to select an algorithm
• Identify properties that distinguish between the
input-output behaviour of different clustering
algorithms
• The properties should be:
1) Intuitive and “user-friendly”
2) Useful for classifying clustering algorithms
Previous Work in Property-Based Framework
• A property-based classification of partitional
clustering algorithms (Ackerman, Ben-David, and
Loker, NIPS ‘10)
• A characterizing of a single-linkage with the kstopping criteria (Zadeh and Ben-David, UAI 09)
• A characterization of linkage-based clustering with
the k-stopping criteria (Ackerman, Ben-David, and
Loker, COLT ‘10)
Our contributions
• Extend the above property-based
framework to the hierarchical clustering
setting
• Propose two intuitive properties that
uniquely indentify hierarchical linkage-based
clustering algorithms
• Show that common hierarchical algorithms,
including bisecting k-means, cannot be
simulated by any linkage-based algorithm
Outline
• Define Linkage-Based clustering
• Introduce two new properties of
hierarchical clustering algorithms
• Main result
• Hierarchical clustering paradigms that are
not linkage-based
• Conclusions
Formal Setup:
Dendrograms and clusterings
Dendrogram:
A set C_i is a cluster in a dendrogram D if there
exists a node in the dendrogram so that C_i is the
set of its leaf descendents.
Formal Setup:
Dendrograms and clusterings
C = {C1, … , Ck} is a clustering in a dendrogram D if
– Ci is a cluster in D for all 1≤ i ≤ k, and
– clusters are disjoint, Ci∩Cj = Ø for all 1≤ i<j ≤k.
Formal Setup:
Hierarchical clustering algorithm
A Hierarchical Clustering Algorithm A
maps
Input: A data set X with a distance function d,
denoted (X,d)
to
Output: A dendrogram of X
Linkage-Based Algorithm
An algorithm A is Linkage-Based if there exists a
linkage-function l:{(X1, X2 ,d): d over X1uX2 }→ R+
such that for any (X,d), A(X,d) can be constructed as
follows:
• Create a single-node tree for every elements of X
Linkage-Based Algorithm
An algorithm A is Linkage-Based if there exists a
linkage-function l:{(X1, X2 ,d): d over X1uX2 }→ R+
such that for any (X,d), A(X,d) can be constructed as
follows:
• Create a single-node tree for every elements of X
• Repeat the following until a single tree remains:
Merge the pair of trees whose element sets are
closest according to l.
Ex. Single-linkage, average-linkage,
complete linkage
Outline
• Define Linkage-Based clustering
• Introduce two new properties of
hierarchical clustering algorithms
• Main result
• Hierarchical clustering paradigms that are
not linkage-based
• Conclusions
Locality
Informal Definition
D = A(X,d)
D’ = A(X’,d)
X’={x1, …, x6}
If we select a set of disjoint clusters from a dendrogram,
and run the algorithm on the union of these clusters, we
obtain a result that is consistent with the original
dendrogram.
Outer Consistency
A(X,d)
C
•The outer-consistent change
makes the clustering C more
prominent.
•If A is outer-consistent, then
A(X,d’) will also include the
clustering C.
C on dataset (X,d’)
C on dataset (X,d)
Increase pairwise
between-cluster
distances
Outline
• Define Linkage-Based clustering
• Introduce two new properties of
hierarchical clustering algorithms
• Main result
• Hierarchical clustering paradigms that are
not linkage-based
• Conclusions
Our Main Result
Theorem:
A hierarchical clustering function is
Linkage-Based
if and only if
it is Local and Outer-Consistent.
Brief Sketch of Proof
Recall direction:
If A satisfies Outer-Consistency and Locality, then A
is Linkage-Based.
Goal:
Define a linkage function l so that the linkage-based
clustering based on l outputs A(X,d)
(for every X and d).
Brief Sketch of Proof
• Define an operator <A :
(X,Y,d1) <A (Z,W,d2) if when we run A on (XuYuZuW,d),
where d extends d1 and d2, X and Y are merged before
Z and W.
A(X,d)
• Prove that <A can be extended
to a partial ordering by proving
that it is cycle-free
• This implies that there exists
an order preserving function l
that maps pairs of data sets to
R+.
Z
W
X
Y
Outline
• Define Linkage-Based clustering
• Introduce two new properties of
hierarchical clustering
• Main result
• Hierarchical clustering paradigms that are
not linkage-based
• Conclusions
Hierarchical but Not Linkage-Based
• P -Divisive algorithms construct dendrograms top-down
using a partitional 2-clustering algorithm P to
determine how to split nodes.
• Many natural partitional 2-clustering algorithms satisfy
the following property:
A partitional 2-clustering algorithm P is
Context Sensitive if there exist d⊂ d’ so that
P({x,y,z),d) = {x, {y,z}} and P({x,y,z,w} ,d’)= {{x,y}, {z,w}}.
Ex. K-means, min-sum, min-diameter, and further-centroids.
Hierarchical but Not Linkage-Based
Theorem:
If P is context-sensitive, then the
P –divisive algorithm fails the
locality property.
• The input-output behaviour of some natural divisive
algorithms is distinct from that of all linkage-based
algorithms.
• The bisecting k-means algorithm, and other natural
divisive algorithms, cannot be simulated by any
linkage-based algorithm.
Outline
• Define Linkage-Based clustering
• Introduce two new properties of
hierarchical clustering algorithms
• Main result
• Hierarchical clustering paradigms that are
not linkage-based
• Conclusions
Conclusions
• We characterize hierarchical Linkage-Based
clustering in terms of two intuitive
properties.
• Show that some natural hierarchical
algorithms have different input-output
behavior than any linkage-based algorithm.
Locality
D = A(X,d)
D’ = A(X’,d)
X’={x1, …, x6}
For any clustering C = {C1, … , Ck} in D = A(X,d),
• C is also a clustering in D’ = A(X’ = u Ci , d)
• C i roots the same sub-dendrogram in both D and D’
• For all x,y in X’, x occurs below y in D iff the same holds in D’.
Download