KDD_Presentation_final - University of Central Florida

advertisement
Multi-label Relational Neighbor Classification
using Social Context Features
Xi Wang and Gita Sukthankar
Department of EECS
University of Central Florida
Motivation
 The conventional relational
classification model focuses on
the single-label classification
problem.
 Real-world relational datasets
contain instances associated
with multiple labels.
 Connections between instances
in multi-label networks are
driven by various casual
reasons.
Artificial
Intelligence
Data Mining
Machine Learning
Example: Scientific collaboration network
1
Problem Formulation
 Node classification in multi-relational networks
 Input:
 Network structure (i.e., connectivity information)
 Labels of some actors in the network
 Output:
 Labels of the other actors
2
Classification in Networked Data
 Homophily: nodes with similar labels are more likely to be
connected
 Markov assumption:
 The label of one node depends on that of its immediate neighbors in
the graph
 Relational models are built based on the labels of neighbors.
 Predictions are made using collective inference.
3
Contribution
 A new multi-label iterative relational neighbor classifier
(SCRN)
 Extract social context features using edge clustering to
represent a node’s potential group membership
 Use of social features boosts classification performance
over benchmarks on several real-world collaborative
networked datasets
4
Relational Neighbor Classifier
 The Relational Neighbor (RN) classifier proposed by Macskassy et al.
(MRDM’03), is a simple relational probabilistic model that makes
predictions for a given node based solely on the class labels of its
neighbors.
Training Graph
Iteration 1
Iteration 2
5
Relational Neighbor Classifier
 Weighted-vote relational neighbor classifier (wvRN)
estimates prediction probability as:
P ( Li  c | v i ) 
1
z
 w (v , v
i
j
)  P(L j  c | N j )
v jN i
Here z is the usual normalization factor, and w(vi , v j )
is the weight of the link between node vi and v j
6
Apply RN in Multi-relational Network
: nodes with both labels (red, green)
: nodes with green label only
: nodes with red label only
Ground truth
7
Edge-Based Social Feature Extraction
 Connections in human networks are mainly affiliationdriven.
 Since each connection can often be regarded as principally
resulting from one affiliation, links possess a strong
correlation with a single affiliation class.
 The edge class information is not readily available in most
social media datasets, but an unsupervised clustering
algorithm can be applied to partition the edges into disjoint
sets (KDD’09,CIKM’09).
8
Cluster edges using K-Means
 Scalable edge clustering method proposed by Tang et al.
(CIKM’09).
 Each edge is represented in a feature-based format, where
each edge is characterized by its adjacent nodes.
 K-means clustering is used to separate the edges into
groups, and the social feature (SF) vector is constructed
based on edge cluster IDs.
Original network
Step1 : Edge representations
Step2: Construct social features
9
Edge-Clustering Visualization
Figure: A subset of DBLP with 95 instances. Edges are clustered into 10
groups, with each shown in a different color.
10
Proposed Method: SCRN
 The initial set of reference features for class c can be
defined as the weighted sum of social feature vectors for
nodes known to be in class c:
RV (c) =
1
c
P(l
å
i =1)´ SF(vi )
K
| Vc | vi ÎVcK
 Then node vi’s class propagation probability for class c
conditioned on its social features:
PCP (lic | SF(vi )) = sim(SF(vi ), RV(c))
11
SCRN
 SCRN estimates the class-membership probability of node vi
belonging to class c using the following equation:
1
P(l | Ni , SF(vi )) = å PCP (lic | SF(vi ))´ w(vi , v j ) ´ P(l cj | N j )
z v j ÎNi
c
i
class propagation probability
similarity between connected nodes
(link weight)
class probability of its neighbors
12
SCRN Overview
Input: {G,V, E,C, LK } , Max_Iter
Output: LU for nodes in V U
1. Construct nodes’ social feature space
2. Initialize the class reference vectors for each class
3. Calculate the class-propagation probability for each test
node
4. Repeat until # of iterations > Max_Iter or predictions
converge




Estimate test node’s class probability
Update the test node’s class probability in collective inference
Update the class reference vectors
Re-calculate each node’s class-propagation probability
13
SCRN Visualization
Figure: SCRN on synthetic multi-label network with 1000 nodes and 32 classes
(15 iterations).
14
Datasets
DBLP
 We construct a weighted collaboration network for
authors who have published at least 2 papers during the
2000 to 2010 time- frame.
 We selected 15 representative conferences in 6 research
areas:
DataBase: ICDE,VLDB, PODS, EDBT
Data Mining: KDD, ICDM, SDM, PAKDD
Artificial Intelligence: IJCAI, AAAI
Information Retrieval: SIGIR, ECIR
Computer Vision: CVPR
Machine Learning: ICML, ECML
15
Datasets
IMDb
 We extract movies and TV shows released between
2000 and 2010, and those directed by the same director
are linked together.
 We only retain movies and TV programs with greater
than 5 links.
 Each movie can be assigned to a subset of 27 different
candidate movie genres in the database such as
“Drama", “Comedy", “Documentary" and “Action”.
16
Datasets
YouTube
 A subset of data (15000 nodes) from the original
YouTube dataset[1] using snowball sampling.
 Each user in YouTube can subscribe to different interest
groups and add other users as his/her contacts.
 Class labels are 47 interest groups.
[1] http://www.public.asu.edu/~ltang9/social_ dimension.html
17
Comparative Methods
Edge (EdgeCluster)
wvRN
Prior
Random
18
Experiment Setting
 Size of social feature space :
 1000 for DBLP and YouTube; 10000 for IMDb
 Class propagation probability is calculated with the
Generalized Histogram Intersection Kernel.
 Relaxation Labeling is used in the collective inference
framework for SCRN and wvRN.
 We assume the number of labels for testing nodes is known.
19
Experiment Setting
 We employ the network cross-validation (NCV) method
(KAIS’11) to reduce the overlap between test samples.
 Classification performance is evaluated based on Micro-F1,
Macro-F1 and Hamming Loss.
20
Results (Micro-F1)
DBLP
Micro-F1 accuracy (%)
70
60
SCRN
50
Edge
40
wvRN
30
Prior
20
Random
10
5
10
15
20
25
Training data percentage(%)
30
21
Results (Macro-F1)
DBLP
Macro-F1 accuracy (%)
70
60
SCRN
50
Edge
40
wvRN
30
Prior
20
Random
10
5
10
15
20
25
Training data percentage (%)
30
22
Results (Hamming Loss)
DBLP
23
Results (Hamming Loss)
IMDb
24
Results (Hamming Loss)
YouTube
25
Conclusion
 Links in multi-relational networks are heterogeneous.
 SCRN exploits label homophily while simultaneously
leveraging social feature similarity through the introduction
of class propagation probabilities.
 Significantly boosts classification performance on multilabel collaboration networks.
 Our open-source implementation of SCRN is available at:
http://code.google.com/p/multilabel-classification-on-social-network/
26
Reference
 MACSKASSY, S. A., AND PROVOST, F. A simple relational classifier. In
Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM) at
KDD, 2003, pp. 64–76.
 TANG, L., AND LIU, H. Relational learning via latent social dimensions. In
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining (KDD), 2009, pp. 817–826.
 TANG, L., AND LIU, H. Scalable learning of collective behavior based on sparse
social dimensions. In Proceedings of International Conference on Information and
Knowledge Management (CIKM), 2009, pp. 1107-1116.
 NEVILLE, J., GALLAGHER, B., ELIASSI-RAD, T., AND WANG, T. Correcting
evaluation bias of relational classifiers with network cross validation. Knowledge
and Information Systems (KAIS), 2011, pp. 1–25.
27
Thank you!
28
Download