Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao

advertisement
Graph-based Consensus Maximization
among Multiple Supervised and
Unsupervised Models
Jing Gao1, Feng Liang2, Wei Fan3,
Yizhou Sun1, Jiawei Han1
1 CS UIUC
2 STAT UIUC
3 IBM TJ Watson
A Toy Example
x1
x2
x1
x1
x2
x1
x2
x4
x3
x4
x3
x4
x6
x5
x6
x5
x6
1
1
x3
x4
2
x5
x2
3
x7
x3
2
x6
x5
3
x7
x7
x7
2/19
Motivations
• Consensus maximization
– Combine outputs of multiple supervised and
unsupervised models on a set of objects for better label
predictions
– The predicted labels should agree with the base models
as much as possible
• Motivations
– Unsupervised models provide useful constraints for
classification tasks
– Model diversity improves prediction accuracy and
robustness
– Model combination at output level is needed in
distributed computing or privacy-preserving applications
3/19
Related Work (1)
• Single models
– Supervised: SVM, Logistic regression, ……
– Unsupervised: K-means, spectral clustering, ……
– Semi-supervised learning, collective inference
• Supervised ensemble
– Require raw data and labels: bagging, boosting,
Bayesian model averaging
– Require labels: mixture of experts, stacked
generalization
– Majority voting works at output level and does not
require labels
4/19
Related Work (2)
• Unsupervised ensemble
– find a consensus clustering from multiple
partitionings without accessing the features
• Multi-view learning
– a joint model is learnt from both labeled and
unlabeled data from multiple sources
– it can be regarded as a semi-supervised
ensemble requiring access to the raw data
5/19
Related Work (3)
Supervised
Learning
SVM,
Logistic Regression,
…...
Semisupervised
Learning
Semi-supervised
Learning,
Collective Inference
Unsupervised
Learning
K-means,
Spectral Clustering,
…...
Single
Models
Bagging,
Boosting,
Bayesian
model
averaging,
…...
Mixture of
Experts,
Stacked
Generalization
Multi-view Learning
Majority
Voting
Consensus
Maximization
Clustering Ensemble
Ensemble at
Raw Data
Ensemble
at Output
Level
6/19
Groups-Objects
g1
x1
x2
x1
g4
x2
x1
g7
x2
x1
1
1
x2
g10
g12
x3
x4
x3
x4
x3
x4
2 g5
2
g8
x5
3
x6
g3
g2
x7
x5
x6
x5
x6
x3
x4
g11
x5
x6
3
g6
g9
x7
x7
x7
7/19
Bipartite Graph
[1 0 0]
object i
[0 1 0] [0 0 1]

M1
M2

qj
group j

ui
……
conditional prob vector
adjacency
1 ui q j
aij  
0 otherwise
initial probability
M3
M4

ui  [ui1 ,..., uic ]

q j  [q j1 ,..., q jc ]
[1 0... 0] g j 1
 
y j   ...... ......
[0 ...0 1] g  c
j

……
Groups
Objects
8/19
Objective
[1 0 0]
minimize disagreement
[0 1 0] [0 0 1]

s
  2
 
min Q,U ( aij || ui  q j ||    ||q j  y j ||2 )
n
v
i 1 j 1
M1
M2

qj

ui
Similar conditional probability if the
object is connected to the group
……
M3
M4
j 1
Do not deviate much from the initial
probability
……
Groups
Objects
9/19
Methodology
[1 0 0]
Iterate until convergence
[0 1 0] [0 0 1]

Update probability of a group


a
u


y
 ij i j
n
M1

qj

ui

qj 
i 1
n
a
ij
i 1
M2
……

Update probability of an object

a
q
 ij j
v

ui 
M3
j 1
v
a
j 1
M4
ij
……
Groups
Objects
10/19
Constrained Embedding
v
min Q ,U
groups
 q
j 1 z 1
au


 a
n
c
jz
i 1
n
i 1
ij iz
ij
q jz  1 if g j ' s label is z
s
  2
 
min Q,U ( aij || ui  q j ||    ||q j  y j ||2
n
v
i 1 j 1
objects
j 1
constraints for groups from
classification models
11/19
Ranking on Consensus Structure
[1 0 0]
[0 1 0] [0 0 1]



1 T
1
q.1  D ( Dv A Dn A)q.1  D1 y.1

M1
M2

qj

ui
……

qj
query
M3
M4
adjacency
matrix
personalized
damping factors
……
Groups
Objects
12/19
Incorporating Labeled Information
[1 0 0]

Objective
[0 1 0] [0 0 1]

s
  2
 
min Q,U ( aij || ui  q j ||    ||q j  y j ||2 )
n
v
i 1 j 1
M1
M2

qj
  2
   || ui  f i ||
i 1

ui
Update probability of a group


a
u


y
 ij i j
n

qj 
……
i 1
n
a
i 1
M3
ij

Update probability of an
v object

a
q
 ij j
v
M4
j 1
l

ui 
……
Groups
Objects
j 1
v
a
j 1
ij

ui 


 aij q j  f i
j 1
v
a
j 1
ij

13/19
Experiments-Data Sets
• 20 Newsgroup
– newsgroup messages categorization
– only text information available
• Cora
– research paper area categorization
– paper abstracts and citation information available
• DBLP
– researchers area prediction
– publication and co-authorship network, and
publication content
– conferences’ areas are known
14/19
Experiments-Baseline Methods (1)
• Single models
– 20 Newsgroup:
• logistic regression, SVM, K-means, min-cut
– Cora
• abstracts, citations (with or without a labeled set)
– DBLP
• publication titles, links (with or without labels from conferences)
• Proposed method
–
–
–
–
BGCM
BGCM-L: semi-supervised version combining four models
2-L: two models
3-L: three models
15/19
Experiments-Baseline Methods (2)
Supervised
Learning
SVM,
Logistic Regression,
…...
Semisupervised
Learning
Semi-supervised
Learning,
Collective Inference
Unsupervised
Learning
K-means,
Spectral Clustering,
…...
Single
Models
Bagging,
Boosting,
Bayesian
model
averaging,
…...
Mixture of
Experts,
Stacked
Generalization
Multi-view Learning
Majority
Voting
Consensus
Maximization
Clustering Ensemble
Ensemble at
Raw Data
Ensemble
at Output
Level
• Ensemble approaches
– clustering ensemble on all of the four modelsMCLA, HBGF
16/19
Accuracy (1)
17/19
Accuracy (2)
18/19
19/19
Conclusions
• Summary
– Combine the complementary predictive powers of
multiple supervised and unsupervised models
– Lossless summarization of base model outputs in
group-object bipartite graph
– Propagate labeled information between group and
object nodes iteratively
– Two interpretations: constrained embedding and
ranking on consensus structure
– Results on various data sets show the benefits
20/19
Download