Slides

advertisement
Active Learning for Networked Data Based
on Non-progressive Diffusion Model
Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing
Dept. of Computer Science and Technology
Tsinghua University, China
An Example
An Example
Instances
Correlation
An Example
?
?
?
Instances
?
Correlation
?
?
Classify each instance into {+1, -1}
An Example
?
+1
?
Instances
+1
Correlation
-1
?
An Example
?
+1
?
Instances
+1
Correlation
-1
?
Query for label
An Example
?
+1
?
Instances
+1
Correlation
-1
-1
Problem: Active Learning for Networked Data
Challenge+1
?
?
It is expensive to query for labels!
Questions
Instances
+1
Which instances should we select to query? Correalation
-1 instances do we need to query, for an accurate classifier?
How many
?
Challenges
Active Learning for Networked Data
How to leverage network correlation among
instances?
How to query in a batch mode?
Batch Mode Active Learning for Networked Data
Given a graph G
 (VU ,VL , y L , E , X)
Features Matrix
Labels
of labeled
instances
Labeled
instances
Unlabeled
instances
Edges
Our objective is
maxVs VU Q(VS )
Subject to
The utility function
A subset of unlabeled
instances
| VS | k
Labeling budget
Factor Graph Model
?
?
?
Variable Node
?
Factor Node
?
?
Factor Graph Model
The joint probability
Local factor function
Edge factor function


1
T
T
P(y | y L ; θ)  exp   λ f ( yi , xi )   β g ( yi , y j ) 
Z
( vi , v j )E
vi V

Log likelihood of labeled instances
O (θ)  log  y|y L exp θ S - log y L exp θ S
T
T
Factor Graph Model
Learning
Gradient descent
O
 E P ( y | y L ;θ ) S  E P ( y ; θ ) S
θ
Calculate the expectation: Loopy Belief Propagation (LBP)
Message from variable
to factor
 y  f (xi )   f
Message from factor to
variable
 f  y (xi )  ~ y f (x f )  y N ( f )\{ y }  y  f (x j )
i
*
N ( yi )\ f
1
 f 
( xi )
y
*
i

 1
i
i
j
i
j
Question: How to select instances from
Factor graph for active learning?
Basic principle: Maximize the Ripple Effects
?
?
?
?
?
?
Maximize the Ripple Effects
?
?
?
?
?
Labeling information is
propagated
+1
Maximize the Ripple Effects
?
?
?
?
?
Labeling information is
propagated
+1
Maximize the Ripple Effects
?
?
?
?
Labeling information is
propagated
?
Statistical bias is
propagated
+1
How to model the propagation process in a unlabeled network?
Diffusion Model
Linear Threshold Model
Each instance 𝑣 has a threshold 𝑡(𝑣)
Each instance 𝑣 at time 𝜏 has two statuses 𝑓𝜏 𝑣 = 0 (inactive) or 1 (active)
Each instance 𝑣 has a set of neighbors 𝑁(𝑣)
Progressive Diffusion Model
𝑓𝜏 𝑣 = 1 iff
𝑢∈𝑁(𝑣) 𝑓𝜏−1
𝑢 ≥ 𝑡(𝑣) or 𝑓𝜏−1 𝑣 = 1
Non-Progressive Diffusion Model
𝑓𝜏 𝑣 = 1 iff
𝑢∈𝑁(𝑣) 𝑓𝜏−1
𝑢 ≥ 𝑡(𝑣)
Linear Threshold
Maximize the Ripple Effects
?
Will it be dominated by labeling
Based ?on non-progressive diffusion model
?
information (active) or statistical
bias (inactive)?
Maximize the number of activated instances in the end
?
Labeling
information is
An instance 𝑣 has an uncertainty measure
μ(𝑣)
propagated
?
We aim to activate the most uncertain instances!
Statistical bias is
propagated
+1
Instantiate the Problem
Active Learning Based on Non-Progressive Diffusion Model
maxVS VU {maxVT VU | VT |} ,
With constraints
f 0 (v)  1  v  VS
| VS | k
The number of activated instances
Initially activate all queried instances
 M s.t. v  VT    M , f (v)  1
All instances in 𝑉𝑇 should be active after convergence
v  VU \ VT , u  VT ,  (v)   (u )
We activate the most uncertain instances
f (v)  1  uN ( v ) f 1 (u )  t (v)
Based on the non-progressive diffusion
Reduce the Problem
The original problem
Fix |𝑉𝑆 |, maximize |𝑉𝑇 |
The reduced problem
Fix |𝑉𝑇 |, minimize |𝑉𝑆 |
Constraints are inherited.
Reduction procedure
Enumerate |𝑉𝑇 | by bisection. Solve the reduced
problem.
Algorithm
The reduced problem
Fix |𝑉𝑇 |, minimize |𝑉𝑆 |
The key idea
Find a superset 𝑉𝜏 (𝑉𝑇 ⊆ 𝑉𝜏 )
Such that there exists a subset 𝑉𝑆 (𝑉𝑆 ⊆ 𝑉𝜏 )
If we initially activate 𝑉𝑆 , we can activate 𝑉𝜏 finally
Algorithm
Input: 𝑉𝑇 = 𝑘, 𝑡 𝑣 for each instance
Output: 𝑉𝑆
Initialize 𝑉𝑇 to be top 𝑘 uncertain instances;
For each iteration:
greedily select a set 𝑉𝑃 with minimum thresholds from 𝑉𝑈 \V𝑇 ,
while satisfying the constraint that each instance 𝑣 ∈ 𝑉𝑇 has at least
𝑡(𝑣) neighbors in 𝑉𝑃 ∪ 𝑉𝑇 ;
𝑉𝑇 ← 𝑉𝑇 ∪ 𝑉𝑃 ;
if 𝑉𝑃 = ∅ then converges;
Greedily select a set 𝑉𝑆 with minimum degrees from 𝑉𝑇 , while satisfying
the constraint that each instance 𝑣 ∈ 𝑉𝑇 has at least 𝑡(𝑣) neighbors in
𝑉𝑆 ;
Return 𝑉𝑆 ;
Theoretical Analysis
Convergence
Lemma 1 The algorithm will converge within O (| VU |  | VT |) time.
Correctness
Theorem 1 If the algorithm converges, 𝑉𝑆 is a feasible solution, i.e., if we initially label 𝑉𝑆 ,
we will activate 𝑉𝑇 finally.
Approximation Ratio
Theorem 2 Let 𝑉𝑠,𝑔 be the solution given by the algorithm, 𝑉𝑠,𝑜𝑝𝑡 represent the optimal
solution. Let Δ be the max degree of instances and suppose 𝑡 𝑣 ≤ 𝛽𝑑(𝑣). Then we have
| Vs , g |
| Vs ,opt
( )2

| (1    ) Avg[2t (v)  d (v)]
Experiments
Datasets
Datasets
#Variable node
#Factor node
Coauthor
6,096
24,468
Slashdot
370
1,686
Mobile
314
513
Enron
100
236
Comparison Methods
Batch Mode Active Learning (BMAL), proposed by Shi et al.
Influence Maximization Selection (IMS), proposed by Zhuang et al.
Maximum Uncertainty (MU)
Random (RAN)
Max Coverage (MaxCo), our method
Experiments
Performance
Related Work
Active Learning for Networked Data
Actively learning to infer social ties
H. Zhuang, J. Tang, W. Tang, T. Lou, A. Chin and X. Wang
Batch mode active learning for networked data
L. Shi, Y. Zhao and J. Tang
Towards active learning on graphs: an error bound minimization approach
Q. Gu and J. Han
Integreation of active learing in a collaborative crf
O. Martinez and G. Tsechpenakis
Diffusion Model
On the non-progressive spread of influence through social networks
M. Fazli, M. Ghodsi, J. Habibi, P. J. Khalilabadi, V. Mirrokni and S. S. Sadeghabad
Maximizing the spread of influence through a social network
D. Kempe, J. Kleinberg and E. Tardos
Conclusion
Connect active learning for networked data to non-progressive
diffusion model, and precisely formulate the problem
Propose an algorithm to solve the problem
Theoretically guarantee the convergence, correctness and
approximation ratio of the algorithm
Empirically evaluate the performance of the algorithm on four
datasets of different genres
Future work
Consider active learning for networked data in a streaming
setting, where data distribution and network structure are
changing over time
About Me
Zhilin Yang
kimiyoung@yeah.net
3rd year undergraduate at Tsinghua Univ.
Applying for PhD programs this year
Data Mining & Machine Learning
Thanks!
kimiyoung@yeah.net
Download