Slides

advertisement
Active Learning for Networked Data Based
on Non-progressive Diffusion Model
Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing
Dept. of Computer Science and Technology
Tsinghua University, China
An Example
An Example
Instances
Correlation
An Example
?
?
?
Instances
?
Correlation
?
?
Classify each instance into {+1, -1}
An Example
?
+1
?
Instances
+1
Correlation
-1
?
An Example
?
+1
?
Instances
+1
Correlation
-1
?
Query for label
An Example
?
+1
?
Instances
+1
Correlation
-1
-1
Problem: Active Learning for Networked Data
Challenge+1
?
?
It is expensive to query for labels!
Questions
Instances
+1
Which instances should we select to query? Correalation
-1 instances do we need to query, for an accurate classifier?
How many
?
Challenges
Active Learning for Networked Data
How to leverage network correlation among
instances?
How to query in a batch mode?
Batch Mode Active Learning for Networked Data
Given a graph G
 (VU ,VL , y L , E , X)
Features Matrix
Labels
of labeled
instances
Labeled
instances
Unlabeled
instances
Edges
Our objective is
maxVs VU Q(VS )
Subject to
The utility function
A subset of unlabeled
instances
| VS | k
Labeling budget
Factor Graph Model
?
?
?
Variable Node
?
Factor Node
?
?
Factor Graph Model
The joint probability
Local factor function
Edge factor function


1
T
T
P(y | y L ; θ)  exp   λ f ( yi , xi )   β g ( yi , y j ) 
Z
( vi , v j )E
vi V

Log likelihood of labeled instances
O (θ)  log  y|y L exp θ S - log y L exp θ S
T
T
Factor Graph Model
Learning
Gradient descent
O
 E P ( y | y L ;θ ) S  E P ( y ; θ ) S
θ
Calculate the expectation: Loopy Belief Propagation (LBP)
Message from variable
to factor
 y  f (xi )   f
Message from factor to
variable
 f  y (xi )  ~ y f (x f )  y N ( f )\{ y }  y  f (x j )
i
*
N ( yi )\ f
1
 f 
( xi )
y
*
i

 1
i
i
j
i
j
Question: How to select instances from
Factor graph for active learning?
Basic principle: Maximize the Ripple Effects
?
?
?
?
?
?
Maximize the Ripple Effects
?
?
?
?
?
Labeling information is
propagated
+1
Maximize the Ripple Effects
?
?
?
?
?
Labeling information is
propagated
+1
Maximize the Ripple Effects
?
?
?
?
Labeling information is
propagated
?
Statistical bias is
propagated
+1
How to model the propagation process in a unlabeled network?
Diffusion Model
Linear Threshold Model
Each instance  has a threshold ()
Each instance  at time  has two statuses   = 0 (inactive) or 1 (active)
Each instance  has a set of neighbors ()
Progressive Diffusion Model
  = 1 iff
∈() −1
 ≥ () or −1  = 1
Non-Progressive Diffusion Model
  = 1 iff
∈() −1
 ≥ ()
Linear Threshold
Maximize the Ripple Effects
?
Will it be dominated by labeling
Based ?on non-progressive diffusion model
?
information (active) or statistical
bias (inactive)?
Maximize the number of activated instances in the end
?
Labeling
information is
An instance  has an uncertainty measure
μ()
propagated
?
We aim to activate the most uncertain instances!
Statistical bias is
propagated
+1
Instantiate the Problem
Active Learning Based on Non-Progressive Diffusion Model
maxVS VU {maxVT VU | VT |} ,
With constraints
f 0 (v)  1  v  VS
| VS | k
The number of activated instances
Initially activate all queried instances
 M s.t. v  VT    M , f (v)  1
All instances in  should be active after convergence
v  VU \ VT , u  VT ,  (v)   (u )
We activate the most uncertain instances
f (v)  1  uN ( v ) f 1 (u )  t (v)
Based on the non-progressive diffusion
Reduce the Problem
The original problem
Fix | |, maximize | |
The reduced problem
Fix | |, minimize | |
Constraints are inherited.
Reduction procedure
Enumerate | | by bisection. Solve the reduced
problem.
Algorithm
The reduced problem
Fix | |, minimize | |
The key idea
Find a superset  ( ⊆  )
Such that there exists a subset  ( ⊆  )
If we initially activate  , we can activate  finally
Algorithm
Input:  = ,   for each instance
Output: 
Initialize  to be top  uncertain instances;
For each iteration:
greedily select a set  with minimum thresholds from  \V ,
while satisfying the constraint that each instance  ∈  has at least
() neighbors in  ∪  ;
 ←  ∪  ;
if  = ∅ then converges;
Greedily select a set  with minimum degrees from  , while satisfying
the constraint that each instance  ∈  has at least () neighbors in
 ;
Return  ;
Theoretical Analysis
Convergence
Lemma 1 The algorithm will converge within O (| VU |  | VT |) time.
Correctness
Theorem 1 If the algorithm converges,  is a feasible solution, i.e., if we initially label  ,
we will activate  finally.
Approximation Ratio
Theorem 2 Let , be the solution given by the algorithm, , represent the optimal
solution. Let Δ be the max degree of instances and suppose   ≤ (). Then we have
| Vs , g |
| Vs ,opt
( )2

| (1    ) Avg[2t (v)  d (v)]
Experiments
Datasets
Datasets
#Variable node
#Factor node
Coauthor
6,096
24,468
Slashdot
370
1,686
Mobile
314
513
Enron
100
236
Comparison Methods
Batch Mode Active Learning (BMAL), proposed by Shi et al.
Influence Maximization Selection (IMS), proposed by Zhuang et al.
Maximum Uncertainty (MU)
Random (RAN)
Max Coverage (MaxCo), our method
Experiments
Performance
Related Work
Active Learning for Networked Data
Actively learning to infer social ties
H. Zhuang, J. Tang, W. Tang, T. Lou, A. Chin and X. Wang
Batch mode active learning for networked data
L. Shi, Y. Zhao and J. Tang
Towards active learning on graphs: an error bound minimization approach
Q. Gu and J. Han
Integreation of active learing in a collaborative crf
O. Martinez and G. Tsechpenakis
Diffusion Model
On the non-progressive spread of influence through social networks
M. Fazli, M. Ghodsi, J. Habibi, P. J. Khalilabadi, V. Mirrokni and S. S. Sadeghabad
Maximizing the spread of influence through a social network
D. Kempe, J. Kleinberg and E. Tardos
Conclusion
Connect active learning for networked data to non-progressive
diffusion model, and precisely formulate the problem
Propose an algorithm to solve the problem
Theoretically guarantee the convergence, correctness and
approximation ratio of the algorithm
Empirically evaluate the performance of the algorithm on four
datasets of different genres
Future work
Consider active learning for networked data in a streaming
setting, where data distribution and network structure are
changing over time
About Me
Zhilin Yang
[email protected]
3rd year undergraduate at Tsinghua Univ.
Applying for PhD programs this year
Data Mining & Machine Learning
Thanks!
[email protected]
Download
Related flashcards
Create Flashcards