Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and Technology Tsinghua University, China An Example An Example Instances Correlation An Example ? ? ? Instances ? Correlation ? ? Classify each instance into {+1, -1} An Example ? +1 ? Instances +1 Correlation -1 ? An Example ? +1 ? Instances +1 Correlation -1 ? Query for label An Example ? +1 ? Instances +1 Correlation -1 -1 Problem: Active Learning for Networked Data Challenge+1 ? ? It is expensive to query for labels! Questions Instances +1 Which instances should we select to query? Correalation -1 instances do we need to query, for an accurate classifier? How many ? Challenges Active Learning for Networked Data How to leverage network correlation among instances? How to query in a batch mode? Batch Mode Active Learning for Networked Data Given a graph G (VU ,VL , y L , E , X) Features Matrix Labels of labeled instances Labeled instances Unlabeled instances Edges Our objective is maxVs VU Q(VS ) Subject to The utility function A subset of unlabeled instances | VS | k Labeling budget Factor Graph Model ? ? ? Variable Node ? Factor Node ? ? Factor Graph Model The joint probability Local factor function Edge factor function 1 T T P(y | y L ; θ) exp λ f ( yi , xi ) β g ( yi , y j ) Z ( vi , v j )E vi V Log likelihood of labeled instances O (θ) log y|y L exp θ S - log y L exp θ S T T Factor Graph Model Learning Gradient descent O E P ( y | y L ;θ ) S E P ( y ; θ ) S θ Calculate the expectation: Loopy Belief Propagation (LBP) Message from variable to factor y f (xi ) f Message from factor to variable f y (xi ) ~ y f (x f ) y N ( f )\{ y } y f (x j ) i * N ( yi )\ f 1 f ( xi ) y * i 1 i i j i j Question: How to select instances from Factor graph for active learning? Basic principle: Maximize the Ripple Effects ? ? ? ? ? ? Maximize the Ripple Effects ? ? ? ? ? Labeling information is propagated +1 Maximize the Ripple Effects ? ? ? ? ? Labeling information is propagated +1 Maximize the Ripple Effects ? ? ? ? Labeling information is propagated ? Statistical bias is propagated +1 How to model the propagation process in a unlabeled network? Diffusion Model Linear Threshold Model Each instance 𝑣 has a threshold 𝑡(𝑣) Each instance 𝑣 at time 𝜏 has two statuses 𝑓𝜏 𝑣 = 0 (inactive) or 1 (active) Each instance 𝑣 has a set of neighbors 𝑁(𝑣) Progressive Diffusion Model 𝑓𝜏 𝑣 = 1 iff 𝑢∈𝑁(𝑣) 𝑓𝜏−1 𝑢 ≥ 𝑡(𝑣) or 𝑓𝜏−1 𝑣 = 1 Non-Progressive Diffusion Model 𝑓𝜏 𝑣 = 1 iff 𝑢∈𝑁(𝑣) 𝑓𝜏−1 𝑢 ≥ 𝑡(𝑣) Linear Threshold Maximize the Ripple Effects ? Will it be dominated by labeling Based ?on non-progressive diffusion model ? information (active) or statistical bias (inactive)? Maximize the number of activated instances in the end ? Labeling information is An instance 𝑣 has an uncertainty measure μ(𝑣) propagated ? We aim to activate the most uncertain instances! Statistical bias is propagated +1 Instantiate the Problem Active Learning Based on Non-Progressive Diffusion Model maxVS VU {maxVT VU | VT |} , With constraints f 0 (v) 1 v VS | VS | k The number of activated instances Initially activate all queried instances M s.t. v VT M , f (v) 1 All instances in 𝑉𝑇 should be active after convergence v VU \ VT , u VT , (v) (u ) We activate the most uncertain instances f (v) 1 uN ( v ) f 1 (u ) t (v) Based on the non-progressive diffusion Reduce the Problem The original problem Fix |𝑉𝑆 |, maximize |𝑉𝑇 | The reduced problem Fix |𝑉𝑇 |, minimize |𝑉𝑆 | Constraints are inherited. Reduction procedure Enumerate |𝑉𝑇 | by bisection. Solve the reduced problem. Algorithm The reduced problem Fix |𝑉𝑇 |, minimize |𝑉𝑆 | The key idea Find a superset 𝑉𝜏 (𝑉𝑇 ⊆ 𝑉𝜏 ) Such that there exists a subset 𝑉𝑆 (𝑉𝑆 ⊆ 𝑉𝜏 ) If we initially activate 𝑉𝑆 , we can activate 𝑉𝜏 finally Algorithm Input: 𝑉𝑇 = 𝑘, 𝑡 𝑣 for each instance Output: 𝑉𝑆 Initialize 𝑉𝑇 to be top 𝑘 uncertain instances; For each iteration: greedily select a set 𝑉𝑃 with minimum thresholds from 𝑉𝑈 \V𝑇 , while satisfying the constraint that each instance 𝑣 ∈ 𝑉𝑇 has at least 𝑡(𝑣) neighbors in 𝑉𝑃 ∪ 𝑉𝑇 ; 𝑉𝑇 ← 𝑉𝑇 ∪ 𝑉𝑃 ; if 𝑉𝑃 = ∅ then converges; Greedily select a set 𝑉𝑆 with minimum degrees from 𝑉𝑇 , while satisfying the constraint that each instance 𝑣 ∈ 𝑉𝑇 has at least 𝑡(𝑣) neighbors in 𝑉𝑆 ; Return 𝑉𝑆 ; Theoretical Analysis Convergence Lemma 1 The algorithm will converge within O (| VU | | VT |) time. Correctness Theorem 1 If the algorithm converges, 𝑉𝑆 is a feasible solution, i.e., if we initially label 𝑉𝑆 , we will activate 𝑉𝑇 finally. Approximation Ratio Theorem 2 Let 𝑉𝑠,𝑔 be the solution given by the algorithm, 𝑉𝑠,𝑜𝑝𝑡 represent the optimal solution. Let Δ be the max degree of instances and suppose 𝑡 𝑣 ≤ 𝛽𝑑(𝑣). Then we have | Vs , g | | Vs ,opt ( )2 | (1 ) Avg[2t (v) d (v)] Experiments Datasets Datasets #Variable node #Factor node Coauthor 6,096 24,468 Slashdot 370 1,686 Mobile 314 513 Enron 100 236 Comparison Methods Batch Mode Active Learning (BMAL), proposed by Shi et al. Influence Maximization Selection (IMS), proposed by Zhuang et al. Maximum Uncertainty (MU) Random (RAN) Max Coverage (MaxCo), our method Experiments Performance Related Work Active Learning for Networked Data Actively learning to infer social ties H. Zhuang, J. Tang, W. Tang, T. Lou, A. Chin and X. Wang Batch mode active learning for networked data L. Shi, Y. Zhao and J. Tang Towards active learning on graphs: an error bound minimization approach Q. Gu and J. Han Integreation of active learing in a collaborative crf O. Martinez and G. Tsechpenakis Diffusion Model On the non-progressive spread of influence through social networks M. Fazli, M. Ghodsi, J. Habibi, P. J. Khalilabadi, V. Mirrokni and S. S. Sadeghabad Maximizing the spread of influence through a social network D. Kempe, J. Kleinberg and E. Tardos Conclusion Connect active learning for networked data to non-progressive diffusion model, and precisely formulate the problem Propose an algorithm to solve the problem Theoretically guarantee the convergence, correctness and approximation ratio of the algorithm Empirically evaluate the performance of the algorithm on four datasets of different genres Future work Consider active learning for networked data in a streaming setting, where data distribution and network structure are changing over time About Me Zhilin Yang kimiyoung@yeah.net 3rd year undergraduate at Tsinghua Univ. Applying for PhD programs this year Data Mining & Machine Learning Thanks! kimiyoung@yeah.net