Informatics and Mathematical Modelling / Intelligent Signal Processing Clustering on the Simplex Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark EMMDS 2009 July 3rd, 2009 1 Informatics and Mathematical Modelling / Intelligent Signal Processing Joint work with Christian Walder Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark DTU Informatics Intelligent Signal Processing Technical University of Denmark EMMDS 2009 July 3rd, 2009 2 Informatics and Mathematical Modelling / Intelligent Signal Processing Clustering Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. (Wikipedia) EMMDS 2009 July 3rd, 2009 3 Informatics and Mathematical Modelling / Intelligent Signal Processing Clustering approaches K-means iterative refinement algorithm (Lloyd, 1982; Hartigan, 1979) Guarantee of optimality: No single change in assignment better than current assignment (1-spin stability). Assignmnt Step (S): Assign each data point to the cluster with closest mean value Update Step (C): Calculate the new mean value for each cluster Problem NP-complete (Megiddo and Supowit, 1984) Relaxations of the hard assigment problem: Annealing approaches based on temperature parameter Drawbacks: (T0 the original clustering problem is recovered) (see for instance Hofmann and Buhmann, 1997) Fuzzy clustering (Hathaway and Bezdek, 1988) Expectation Maximization (Mixture of Gaussians) Spectral Clustering EMMDS 2009 July 3rd, 2009 4 Previously relaxations are either not exact or dependent on some problem specific annealing parameter in order to recover the original binary combinatorial assignments. Informatics and Mathematical Modelling / Intelligent Signal Processing From the K-means objective to Pairwise Clustering K-mean objective Pairwise Clustering (Buhmann and Hofmann, 1994) K similarity matrix, K=XTX equivalent to the k-means objective EMMDS 2009 July 3rd, 2009 5 Informatics and Mathematical Modelling / Intelligent Signal Processing Although Clustering is hard there is room to be simple(x) minded! Binary Combinatorial (BC) EMMDS 2009 July 3rd, 2009 Simplicial Relaxation (SR) 6 Informatics and Mathematical Modelling / Intelligent Signal Processing The simplicial relaxation (SR) admits standard continuous optimization to solve for the pairwise clustering problems. For instance by normalization invariant projected gradient ascent: EMMDS 2009 July 3rd, 2009 7 Informatics and Mathematical Modelling / Intelligent Signal Processing Synthetic data example K-means SR-clustering Brown and grey clusters each contain 1000 data-points in R2 Whereas the remaining clusters each have 250 data-points. EMMDS 2009 July 3rd, 2009 8 Informatics and Mathematical Modelling / Intelligent Signal Processing SR-clustering algorithm driven by high density regions EMMDS 2009 July 3rd, 2009 9 Informatics and Mathematical Modelling / Intelligent Signal Processing Thus, solutions in general substantially better than Lloyd’s algorithm having the same computational complexity SR-clustering (init=1) EMMDS 2009 July 3rd, 2009 SR-clustering (init=0.01) 10 Lloyd’s K-means Informatics and Mathematical Modelling / Intelligent Signal Processing K-means SR-clustering (init=1) SR-clustering (init=0.01) 10 components EMMDS 2009 50 components July 3rd, 2009 11 100 components Informatics and Mathematical Modelling / Intelligent Signal Processing SR-clustering for Kernel based semisupervised learning Kernel based semi-supervised learning based on pairwise clustering (Basu et al, 2004, Kulis et al. 2005, Kulis et al, 2009) EMMDS 2009 July 3rd, 2009 12 Informatics and Mathematical Modelling / Intelligent Signal Processing Simplicial relaxation admit solving the problem as a (non-convex) continous optimization problem EMMDS 2009 July 3rd, 2009 13 Informatics and Mathematical Modelling / Intelligent Signal Processing Class labels can be handled explicitly fixing Must and cannot links can be absorbed into the Kernel Hence the problem reduces more or less to standard SR-clustering problem for the estimation of S EMMDS 2009 July 3rd, 2009 14 Informatics and Mathematical Modelling / Intelligent Signal Processing At stationarity we have that the gradients of elements in each column of S that are 1 are larger than elements that are 0. Thus, evaluating the impact of the supervision can be done estimating the minimal lagrange multipliers that guarantee stationarity of the solution obtained by the SR-clustering algorithm. This is a convex optimization problem Thus, Lagrange multipliers give a measure of conflict between the data and the supervision EMMDS 2009 July 3rd, 2009 15 Informatics and Mathematical Modelling / Intelligent Signal Processing Digit classification with one miss-labeled data observation from each class. EMMDS 2009 July 3rd, 2009 16 Informatics and Mathematical Modelling / Intelligent Signal Processing Community Detection in Complex Networks Communities/modules: a natural divisions of network nodes into densely connected subgroups (Newman & Girvan 2003) G(V,E) Adjacency Matrix A Permuted adjacency matrix PAPT Community detection algorithm Permutation P of graph from clustering assignment S EMMDS 2009 July 3rd, 2009 17 Informatics and Mathematical Modelling / Intelligent Signal Processing Common Community detection objectives Hamiltonian (Fu & Anderson, 1986, Reichardt & Bornholdt, 2004) Generic problems of the form Modularity EMMDS 2009 (Newman & Girvan, 2004) July 3rd, 2009 18 Informatics and Mathematical Modelling / Intelligent Signal Processing Again we can make an exact relaxation to the simplex! EMMDS 2009 July 3rd, 2009 19 Informatics and Mathematical Modelling / Intelligent Signal Processing EMMDS 2009 July 3rd, 2009 20 Informatics and Mathematical Modelling / Intelligent Signal Processing EMMDS 2009 July 3rd, 2009 21 Informatics and Mathematical Modelling / Intelligent Signal Processing SR-clustering of complex networks Quality of solutions comparable to results obtained by extensive Gibbs sampling EMMDS 2009 July 3rd, 2009 22 Informatics and Mathematical Modelling / Intelligent Signal Processing So far we have demonstrated how binary combinatorial constraints are recovered at stationarity when relaxing the problems to the simplex. However, simplex constraints also holds promising data mining properties of their own! EMMDS 2009 July 3rd, 2009 23 Informatics and Mathematical Modelling / Intelligent Signal Processing The Convex Hull Def: The convex hull/convex envelope of XRMN is the minimal convex set containing X. (Informally it can be described as a rubber band wrapped around the data points.) Finding the convex hull is solvable in linear time, O(N) (McCallum and D. Avis, 1979) However, the size of the convex set grows exponentially with the dimensionality of the data, O(logM-1(N)) (Dwyer, 1988) The Principal Convex Hull (PCH) Def: The best convex set of size K according to some measure of distortion D(·|·) (Mørup et al. 2009). (Informally it can be described as a less flexible rubber band that wraps most of the data points.) EMMDS 2009 July 3rd, 2009 24 Informatics and Mathematical Modelling / Intelligent Signal Processing The mathematical formulation of the Principal Convex Hull (PCH) is given by two simplex constraints ”Principal” in terms of the Frobenius norm X C: Give the fraction in which observations in X are used to form each feature (distinct aspects/freaks). In general C will be very sparse!! S: Give the fraction each observation resembles each distinct aspects XC. S X C (note when K large enough such that EMMDS 2009 July 3rd, 2009 the PCH recover the convex hull) 25 Informatics and Mathematical Modelling / Intelligent Signal Processing Relation between the PCH model, low rank decomposition and clustering approaches PCH naturally bridges clustering and low-rank approximations! EMMDS 2009 July 3rd, 2009 26 Informatics and Mathematical Modelling / Intelligent Signal Processing Two important properties of the PCH model The PCH model is invariant to affine transformation and scaling The PCH model is unique up to permutation of the components EMMDS 2009 July 3rd, 2009 27 Informatics and Mathematical Modelling / Intelligent Signal Processing More contrast in features than obtained by clustering approaches. As such, PCH aim for distict aspects/regions in data The PCH model strives to attain Platonic ”Ideal Forms” EMMDS 2009 July 3rd, 2009 28 Informatics and Mathematical Modelling / Intelligent Signal Processing Data contain 3 components: High-Binding regions Low-binding regions Non-binding regions Each voxel given concentration fraction of these regions XC S EMMDS 2009 July 3rd, 2009 29 Informatics and Mathematical Modelling / Intelligent Signal Processing NMF spectroscopy of samples of mixtures of propanol butanol and pentanol. EMMDS 2009 July 3rd, 2009 30 Informatics and Mathematical Modelling / Intelligent Signal Processing Medium size and large size Movie lens data (www.grouplens.org) Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users Large size: 10,000,054 ratings of 10,677 movies given by 71,567 EMMDS 2009 July 3rd, 2009 31 Informatics and Mathematical Modelling / Intelligent Signal Processing Conclusion The simplex offers unique data mining properties Simplicial relaxations (SR) form exact relaxation of common hard assignment clustering problems, i.e. K-means, Pairwise Clustering and Community detection in graphs. SR Enable to solve binary combinatorial problems using standard solvers from continuous optimization. The proposed SR-clustering algorithm outperforms traditional iterative refinement algorithms No need for annealing parameter. hard assignments guaranteed at stationarity (Theorem 1 and 2) Semi-Supervised learning can be posed as continuous optimization problem with associated lagrange multipliers giving an evaluation measure of each supervised constraint EMMDS 2009 July 3rd, 2009 32 Informatics and Mathematical Modelling / Intelligent Signal Processing Conclusion cont. The Principal Convex Hull (PCH) formed by two types of simplex constraints Extract distinct aspects of the data Relevant for data mining in general where low rank approximation and clustering approaches have been invoked. EMMDS 2009 July 3rd, 2009 33 Informatics and Mathematical Modelling / Intelligent Signal Processing A reformulation of ”Lex Parsimoniae” The simplest explanation is usually the best. - William of Ockham The simplex explanation is usually the best. Simplicity is the ultimate sophistication. - Leonardo Da Vinci Simplexity is the ultimate sophistication. The presented work is described in: M. Mørup and L. K. Hansen ”An Exact Relaxation of Clustering”, Submitted JMLR 2009 M. Mørup, C. Walder and L. K. Hansen ”Simplicial Semi-supervised Learning”, submitted M. Mørup and L. K. Hansen ” Platonic Forms Revisited”, submitted EMMDS 2009 July 3rd, 2009 34