Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao1, Feng Liang2, Wei Fan3, Yizhou Sun1, Jiawei Han1 1 CS UIUC 2 STAT UIUC 3 IBM TJ Watson A Toy Example x1 x2 x1 x1 x2 x1 x2 x4 x3 x4 x3 x4 x6 x5 x6 x5 x6 1 1 x3 x4 2 x5 x2 3 x7 x3 2 x6 x5 3 x7 x7 x7 2/19 Motivations • Consensus maximization – Combine outputs of multiple supervised and unsupervised models on a set of objects for better label predictions – The predicted labels should agree with the base models as much as possible • Motivations – Unsupervised models provide useful constraints for classification tasks – Model diversity improves prediction accuracy and robustness – Model combination at output level is needed in distributed computing or privacy-preserving applications 3/19 Related Work (1) • Single models – Supervised: SVM, Logistic regression, …… – Unsupervised: K-means, spectral clustering, …… – Semi-supervised learning, collective inference • Supervised ensemble – Require raw data and labels: bagging, boosting, Bayesian model averaging – Require labels: mixture of experts, stacked generalization – Majority voting works at output level and does not require labels 4/19 Related Work (2) • Unsupervised ensemble – find a consensus clustering from multiple partitionings without accessing the features • Multi-view learning – a joint model is learnt from both labeled and unlabeled data from multiple sources – it can be regarded as a semi-supervised ensemble requiring access to the raw data 5/19 Related Work (3) Supervised Learning SVM, Logistic Regression, …... Semisupervised Learning Semi-supervised Learning, Collective Inference Unsupervised Learning K-means, Spectral Clustering, …... Single Models Bagging, Boosting, Bayesian model averaging, …... Mixture of Experts, Stacked Generalization Multi-view Learning Majority Voting Consensus Maximization Clustering Ensemble Ensemble at Raw Data Ensemble at Output Level 6/19 Groups-Objects g1 x1 x2 x1 g4 x2 x1 g7 x2 x1 1 1 x2 g10 g12 x3 x4 x3 x4 x3 x4 2 g5 2 g8 x5 3 x6 g3 g2 x7 x5 x6 x5 x6 x3 x4 g11 x5 x6 3 g6 g9 x7 x7 x7 7/19 Bipartite Graph [1 0 0] object i [0 1 0] [0 0 1] M1 M2 qj group j ui …… conditional prob vector adjacency 1 ui q j aij 0 otherwise initial probability M3 M4 ui [ui1 ,..., uic ] q j [q j1 ,..., q jc ] [1 0... 0] g j 1 y j ...... ...... [0 ...0 1] g c j …… Groups Objects 8/19 Objective [1 0 0] minimize disagreement [0 1 0] [0 0 1] s 2 min Q,U ( aij || ui q j || ||q j y j ||2 ) n v i 1 j 1 M1 M2 qj ui Similar conditional probability if the object is connected to the group …… M3 M4 j 1 Do not deviate much from the initial probability …… Groups Objects 9/19 Methodology [1 0 0] Iterate until convergence [0 1 0] [0 0 1] Update probability of a group a u y ij i j n M1 qj ui qj i 1 n a ij i 1 M2 …… Update probability of an object a q ij j v ui M3 j 1 v a j 1 M4 ij …… Groups Objects 10/19 Constrained Embedding v min Q ,U groups q j 1 z 1 au a n c jz i 1 n i 1 ij iz ij q jz 1 if g j ' s label is z s 2 min Q,U ( aij || ui q j || ||q j y j ||2 n v i 1 j 1 objects j 1 constraints for groups from classification models 11/19 Ranking on Consensus Structure [1 0 0] [0 1 0] [0 0 1] 1 T 1 q.1 D ( Dv A Dn A)q.1 D1 y.1 M1 M2 qj ui …… qj query M3 M4 adjacency matrix personalized damping factors …… Groups Objects 12/19 Incorporating Labeled Information [1 0 0] Objective [0 1 0] [0 0 1] s 2 min Q,U ( aij || ui q j || ||q j y j ||2 ) n v i 1 j 1 M1 M2 qj 2 || ui f i || i 1 ui Update probability of a group a u y ij i j n qj …… i 1 n a i 1 M3 ij Update probability of an v object a q ij j v M4 j 1 l ui …… Groups Objects j 1 v a j 1 ij ui aij q j f i j 1 v a j 1 ij 13/19 Experiments-Data Sets • 20 Newsgroup – newsgroup messages categorization – only text information available • Cora – research paper area categorization – paper abstracts and citation information available • DBLP – researchers area prediction – publication and co-authorship network, and publication content – conferences’ areas are known 14/19 Experiments-Baseline Methods (1) • Single models – 20 Newsgroup: • logistic regression, SVM, K-means, min-cut – Cora • abstracts, citations (with or without a labeled set) – DBLP • publication titles, links (with or without labels from conferences) • Proposed method – – – – BGCM BGCM-L: semi-supervised version combining four models 2-L: two models 3-L: three models 15/19 Experiments-Baseline Methods (2) Supervised Learning SVM, Logistic Regression, …... Semisupervised Learning Semi-supervised Learning, Collective Inference Unsupervised Learning K-means, Spectral Clustering, …... Single Models Bagging, Boosting, Bayesian model averaging, …... Mixture of Experts, Stacked Generalization Multi-view Learning Majority Voting Consensus Maximization Clustering Ensemble Ensemble at Raw Data Ensemble at Output Level • Ensemble approaches – clustering ensemble on all of the four modelsMCLA, HBGF 16/19 Accuracy (1) 17/19 Accuracy (2) 18/19 19/19 Conclusions • Summary – Combine the complementary predictive powers of multiple supervised and unsupervised models – Lossless summarization of base model outputs in group-object bipartite graph – Propagate labeled information between group and object nodes iteratively – Two interpretations: constrained embedding and ranking on consensus structure – Results on various data sets show the benefits 20/19