Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao1, Feng Liang2, Wei Fan3, Yizhou Sun1, Jiawei Han1 1 CS UIUC 2 STAT UIUC 3 IBM TJ Watson A Toy Example x1 x2 x1 x1 x2 x1 x2 x4 x3 x4 x3 x4 x6 x5 x6 x5 x6 1 1 x3 x4 2 x5 x2 3 x7 x3 2 x6 x5 3 x7 x7 x7 2/19 Motivations • Consensus maximization – Combine outputs of multiple supervised and unsupervised models on a set of objects for better label predictions – The predicted labels should agree with the base models as much as possible • Motivations – Unsupervised models provide useful constraints for classification tasks – Model diversity improves prediction accuracy and robustness – Model combination at output level is needed in distributed computing or privacy-preserving applications 3/19 Related Work (1) • Single models – Supervised: SVM, Logistic regression, …… – Unsupervised: K-means, spectral clustering, …… – Semi-supervised learning, collective inference • Supervised ensemble – Require raw data and labels: bagging, boosting, Bayesian model averaging – Require labels: mixture of experts, stacked generalization – Majority voting works at output level and does not require labels 4/19 Related Work (2) • Unsupervised ensemble – find a consensus clustering from multiple partitionings without accessing the features • Multi-view learning – a joint model is learnt from both labeled and unlabeled data from multiple sources – it can be regarded as a semi-supervised ensemble requiring access to the raw data 5/19 Related Work (3) Supervised Learning SVM, Logistic Regression, …... Semisupervised Learning Semi-supervised Learning, Collective Inference Unsupervised Learning K-means, Spectral Clustering, …... Single Models Bagging, Boosting, Bayesian model averaging, …... Mixture of Experts, Stacked Generalization Multi-view Learning Majority Voting Consensus Maximization Clustering Ensemble Ensemble at Raw Data Ensemble at Output Level 6/19 Groups-Objects g1 x1 x2 x1 g4 x2 x1 g7 x2 x1 1 1 x2 g10 g12 x3 x4 x3 x4 x3 x4 2 g5 2 g8 x5 3 x6 g3 g2 x7 x5 x6 x5 x6 x3 x4 g11 x5 x6 3 g6 g9 x7 x7 x7 7/19 Bipartite Graph [1 0 0] object i [0 1 0] [0 0 1]  M1 M2  qj group j  ui …… conditional prob vector adjacency 1 ui q j aij   0 otherwise initial probability M3 M4  ui  [ui1 ,..., uic ]  q j  [q j1 ,..., q jc ] [1 0... 0] g j 1   y j   ...... ...... [0 ...0 1] g  c j  …… Groups Objects 8/19 Objective [1 0 0] minimize disagreement [0 1 0] [0 0 1]  s   2   min Q,U ( aij || ui  q j ||    ||q j  y j ||2 ) n v i 1 j 1 M1 M2  qj  ui Similar conditional probability if the object is connected to the group …… M3 M4 j 1 Do not deviate much from the initial probability …… Groups Objects 9/19 Methodology [1 0 0] Iterate until convergence [0 1 0] [0 0 1]  Update probability of a group   a u   y  ij i j n M1  qj  ui  qj  i 1 n a ij i 1 M2 ……  Update probability of an object  a q  ij j v  ui  M3 j 1 v a j 1 M4 ij …… Groups Objects 10/19 Constrained Embedding v min Q ,U groups  q j 1 z 1 au    a n c jz i 1 n i 1 ij iz ij q jz  1 if g j ' s label is z s   2   min Q,U ( aij || ui  q j ||    ||q j  y j ||2 n v i 1 j 1 objects j 1 constraints for groups from classification models 11/19 Ranking on Consensus Structure [1 0 0] [0 1 0] [0 0 1]    1 T 1 q.1  D ( Dv A Dn A)q.1  D1 y.1  M1 M2  qj  ui ……  qj query M3 M4 adjacency matrix personalized damping factors …… Groups Objects 12/19 Incorporating Labeled Information [1 0 0]  Objective [0 1 0] [0 0 1]  s   2   min Q,U ( aij || ui  q j ||    ||q j  y j ||2 ) n v i 1 j 1 M1 M2  qj   2    || ui  f i || i 1  ui Update probability of a group   a u   y  ij i j n  qj  …… i 1 n a i 1 M3 ij  Update probability of an v object  a q  ij j v M4 j 1 l  ui  …… Groups Objects j 1 v a j 1 ij  ui     aij q j  f i j 1 v a j 1 ij  13/19 Experiments-Data Sets • 20 Newsgroup – newsgroup messages categorization – only text information available • Cora – research paper area categorization – paper abstracts and citation information available • DBLP – researchers area prediction – publication and co-authorship network, and publication content – conferences’ areas are known 14/19 Experiments-Baseline Methods (1) • Single models – 20 Newsgroup: • logistic regression, SVM, K-means, min-cut – Cora • abstracts, citations (with or without a labeled set) – DBLP • publication titles, links (with or without labels from conferences) • Proposed method – – – – BGCM BGCM-L: semi-supervised version combining four models 2-L: two models 3-L: three models 15/19 Experiments-Baseline Methods (2) Supervised Learning SVM, Logistic Regression, …... Semisupervised Learning Semi-supervised Learning, Collective Inference Unsupervised Learning K-means, Spectral Clustering, …... Single Models Bagging, Boosting, Bayesian model averaging, …... Mixture of Experts, Stacked Generalization Multi-view Learning Majority Voting Consensus Maximization Clustering Ensemble Ensemble at Raw Data Ensemble at Output Level • Ensemble approaches – clustering ensemble on all of the four modelsMCLA, HBGF 16/19 Accuracy (1) 17/19 Accuracy (2) 18/19 19/19 Conclusions • Summary – Combine the complementary predictive powers of multiple supervised and unsupervised models – Lossless summarization of base model outputs in group-object bipartite graph – Propagate labeled information between group and object nodes iteratively – Two interpretations: constrained embedding and ranking on consensus structure – Results on various data sets show the benefits 20/19

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao

Related documents

Products

Support

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib