and k (s)

March 11, 2010 Sparsity-Cognizant Overlapping Co-clustering Hao Zhu Dept. of ECE, Univ. of Minnesota http:// spincom.ece.umn.edu Acknowledgements: G. Mateos, Profs. G. B. Giannakis, N. D. Sidiropoulos, A. Banerjee, and G. Leus NSF grants CCF 0830480 and CON 014658 ARL/CTA grant no. DAAD19-01-2-0011 1 Outline  Motivation and context  Problem statement and Plaid models  Sparsity-cognizant overlapping co-clustering (SOC)  Uniqueness  Simulated tests  Conclusions and future research SPinCOM University of Minnesota 2 Context Co-clustering (biclustering) = two-way clustering  Clustering: partition the objects(samples, rows) based on a similarity criteria on their attributes(features, columns) [Tan-Steinbach-Kumar ’06]  Co-clustering: simultaneous clustering of objects and attributes [Busygin et al ’08]  Dense, approximately constant-valued submatrices  NP-hard: reduce to ordinary clustering as in k-means attributes objects SPinCOM University of Minnesota 3 Context Co-clustering (biclustering) = two-way clustering  Clustering: partition the objects(samples, rows) based on a similarity criteria on their attributes(features, columns) [Tan-Steinbach-Kumar ’06]  Co-clustering: simultaneous clustering of objects and attributes [Busygin et al ’08]  Dense, approximately constant-valued submatrices  NP-hard: reduce to ordinary clustering as in k-means Application areas  Social network: cohesive subgroups of actors within a network[Wasserman et al ’94]  Bioinformatics: interpertable biological structure in gene expression data [Lazzeroni et al ’02]  Internet traffic: dominat host groups with strong interactions [Jin et al ’09] SPinCOM University of Minnesota 4 Related Works and our Focus Matrix factorization based on SVD  Bipartite spectral graph partitioning [Dhillon ’01]  Orthogonal nonnegative matrix factorization (tNMF) [Ding et al ’06]  Search for non-overlapping co-clusters using orthogonality Overlapping co-clustering under Bayesian framework [Fu-Banerjee ’09]  Probabilistic model for co-cluster membership indicators and parameters  EM algorithm for inference and parameter estimation  E-step: Gibbs sampling for membership indicator detection Plaid models [Lazzeroni et al ’02]  Superposition of multiple overlapping layers (co-clusters)  Greedy layer search: one at a time SPinCOM University of Minnesota 5 Related Works and our Focus (cont’d) Borrow plaid model features  Overlapping: some objects/attributes may relate to multiple co-clusters  Partial co-clustering motivated by “Uninteresting background”  Linear model requires low computational burden Exploit sparsity in co-cluster membership vectors  Sparse information hidden in large data set  Parsimonious models: more interpretable and informative Simultaneous cross-layer optimization  Compared with greedy layer-by-layer strategy Our focus: Sparsity-cognizant overlapping co-clustering (SOC) algorithm SPinCOM University of Minnesota 6 Modeling Matrix Y: n × p  induced by two groups of interacting nodes and  Yij measures the strength of the relationship between and Ex-1: Internet traffic: traffic activity graph (TAG)1  Track the traffic flow between inside host outside host and Ex-2: Gene expression microarray data2  Measure the level with which gene sample SPinCOM University of Minnesota 1,2 The two pictures are taken from [Jin et al ’09] and [Lazzeroni et al ’02], respectively. is expressed in 7 Submatrices Hidden dense/uniform submatrices  capture a subset of related to a subset of that has similar feature values  reveal certain informative behavioral patterns Features  distributed “sparsely” in Y compared to the data dimension np  may overlap because of some multiple-functioned nodes Goal: Efficient co-clustering algorithms to extract the underlying submatrices, by exploiting sparsity and accounting for possible overlapping SPinCOM University of Minnesota 8 Plaid Models Matrix Y: superposition of k submatrices (layers)  l : level of layer  il (jl) = (0 background) 1 if 0 otherwise ( ) is in the -th layer Row/column-level related effects il and jl  l common to all the nodes in the layer  il and jl express the node-related response SPinCOM University of Minnesota 9 Problem Statement Problem: Given the plaid model, seek the optimal membership indicators  Data fitting error penalized by the L1 norm of the indicators   > 0 controls the sparsity enforced  Facilitate extraction of the more informative/inpretable submatrices out of Y The optimal solution is NP-hard  Binary constraints on membership indicators [Tuner et al ’05]  Product of different variables Efficient sub-optimal algorithm to identify the submatrices jointly  Recall the submatrices are detected one at a time in [Lazzeroni et al ’02] SPinCOM University of Minnesota 10 Sparsity-cognizant overlapping co-clustering (SOC) Background-layer-free residue matrix Z Iterative cycling update of , , and   Per iteration s, (s) collects all ijk(s) values, likewise for (s) and (s) (s-1) (s-1) (s) (s-1) (s) (s) (s) Different from [Lazzeroni et al ’02]  All the k layers are updated jointly, less prone to error propagation across layers  Membership indicators are updated with binary constraint SPinCOM University of Minnesota combinatorial complexity 11 Updating (s) Given (s-1) and (s-1) Unconstrained quadratic programming closed-form solution  Inversion of a large matrix leads to numerical instablity Coordinate descent algorithm alternating across all the layers For l = 1, ..., k  Define residue matrix  Reduce to : by extracting from the rows il(s-1)=1 and the columns jl(s-1)=1  Update for T cycles (T small) SPinCOM University of Minnesota 12 Updating (s) and (s) Given (s) and (s-1), determine (s) Obtain jointly membership indicators for the i -th row  Important for overlapping submatrices to eliminate cross effects L1 norm penalty reduces to linear term due to non-negativity  Quadratic minimization subject to {0,1} binary constraints NP-hard  Similar problems in MIMO/multiuser detection with binary alphabet (Near-) optimal sphere decoding algorithm (SDA)  Incurs polynomial (cubic) complexity in general Same techniques to detect (s) with (s) and (s) SPinCOM University of Minnesota 13 Convergence and Implementation SOC algorithm converges (at least) to a stationary point  Data fitting error cost: bounded below and non-increasing per iteration Pruning steps [Lazzeroni et al ’02], [Tuner et al ’05] Initialization  Background level fitting to obtain matrix Z (Recall the submatrix parameter fitting)  Membership indicators (0) and (0) : K-means [Tuner et al ’05] Parameter choices  Number of layers k : explain a certain percentage of variation  Sparse regularization parameter : trial-and-error/bi-cross-validation [Witten et al ’09] SPinCOM University of Minnesota 14 Uniqueness Plain plaid models: decomposition into product of unknown matrices  Binary-valued matrices: R=[il] (n ×k) and K =[jl] (p ×k)  Diagonal matrix D = diag(1 , ... , k) Blind source separation (BSS) [Talwar et al ’96], [van der Veen et al ’96]  Product of two matrices, finite alphabet (FA)/constant modulus (CM) constraint  (Generally) uniquely identifiable with enough number of samples 3-way array (Candecomp/Parafac) [Kruskal ’77], [Sidiropoulos et al ’00] two-way  Unique up to permutation and scaling:  Fails to hold if h = 1 SPinCOM University of Minnesota 15 Uniqueness (cont’d) Sparsity in blind identification  Sparse component analysis: very sparse representation [Georgiev et al ’07]  Non-negative source separation using local dominance [Chan et al ’08] Proposition: Consider where diagonal matrix D and binary-valued matrices R, K are all of full rank. Each column vector kl of K is locally sparse 8l, which means there exists an (unknown) row index jl such that Given Z, the matrices R, D, and K are unique up to permutations. Proof relies on convex analysis  Affine hull of column vectors of K coincides with the one of Z  Under local sparseness, its convex hull becomes the intersection of the affine hull and the positive orthant  Columns of K are extreme points of convex hull Results hold also when R is locally sparse (symmetry) SPinCOM University of Minnesota 16 Preliminary Simulation Two uniform blocks + noises ~ Unif [0, 0.5] SOC parameters: k=2, S=20, T=1, and =0,3 =0 Original Plaid Permuted SPinCOM University of Minnesota =3 17 Real Data To Simulate Internet traffic flow data  Uncover different types of co-clusters: in-star, out-star, bi-mesh,....  Examples of Email application: department servers, Gmail  Overlapping co-clusters may reveal server farms Gene expression microarray data  Co-clusters may exhibit some biological patterns  Need to check with the gene enrichment value SPinCOM University of Minnesota 18 Concluding Summary  Plaid models to reveal overlapping co-clusters Exploit sparsity for parsimonious recovery Jointly decide among multiple layers  SOC algorithm iteratively updates the unknown parameters Coordinate descent solver for the layer level parameters Sphere decoder detects membership indicators jointly  Local sparseness leads to unique decomposition SPinCOM University of Minnesota 19 Future Directions  Implementation issues with parameter choices  Efficient initializations and membership vector detection  Comprehensive numerical experiments on real data Thank You! SPinCOM University of Minnesota 20

and k (s)

Related documents

Products

Support

and k (s)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib