Functional Node Detection on Linked Data Kang Li Jing Gao Suxin Guo Nan Du Aidong Zhang University at Buffalo, The State University of New York {kli22, jing, suxinguo, nandu, azhang} Abstract Networks, which characterize object relationships, are ubiquitous in various domains. One very important problem is to detect the nodes of a specific function in these networks. For example, is a user normal or anomalous in an email network? Does a protein play a key role in a protein-protein interaction network? In many applications, the information we have about the networks usually includes both node characteristics and network structures. Both types of information can contribute to the task of learning functional nodes, and we call the collection of node and link information as linked data. However, existing methods only use a few subjectively selected topological features from network structures to detect functional nodes, thus fail to include highly discriminative and meaningful patterns hidden in linked data. To address this problem, a novel F eature I ntegration based Functional N ode Detection (FIND) algorithm is presented. Specifically, FIND extracts the most discriminative information from both node characteristics and network structures in the form of a unified latent feature representation with the guidance of several labeled nodes. Experiments on two real world data sets validate that the proposed method significantly outperforms the baselines on the detection of three different types of functional nodes. 1 Background and Motivation During the formation and evolution of a network, nodes usually have various types of roles or functionalities. Detection of the nodes having a specific functionality in a network is essential for understanding the corresponding patterns of the network. For instance, critical nodes in social networks are used for contagion analysis [7], bridging nodes in protein interaction networks stand for the key proteins connecting modules [4], and spammers in email networks need to be filtered out for better constitutions of the systems [9]. Existing methods for the problem usually make strong assumptions on the relationships between specific topological properties and the types of functional nodes of interest. For example, nodes having high bridging centralities are categorized as bridging nodes [4], and high PageRank scores of nodes indicate high importance degrees [11]. The selection of such topological properties in each task is usually subjective or sometimes even arbitrary, thus existing methods may miss critical patterns characterizing node functionalities in the networks. Moreover, when the aimed functional nodes are complex, it could be very difficult to obtain their relationships w.r.t. existing topological properties, which makes such strategy ineffective in real practice. As we know, information about a real network usually involves both high-dimensional node characteristics and a network structure which consists of links between the nodes. These two types of information are jointly referred to as linked data. For example, on Facebook, each node denotes a person whose characteristics include preferences, posts, number of friends, etc. and links represent interactions between the users. In the data, both node and link information are critical for the task of identifying the role that each node plays in the network. For example, zombie users tend to have more friends (link structure) than normal users and post meaningless posts (node characteristics). Existing approaches that only use link structure information fail to capture the relevant information in node characteristics. Besides, since link structure describes connectivity between users and is not directly interpreting node functionalities, the selected topological features (e.g. number of friends) on the link structure may miss critical information hidden in the link structure. In conclusion, an effective functional node detection approach has to successfully utilize the information from both the network structure and the node characteristics. In this paper, we propose a novel Feature Integration based Functional Node Detection (FIND) model for the task of detecting nodes of specific functionalities on linked data. Specifically, the proposed FIND model seeks to simultaneously map the information of these two aspects (network structures and node characteristics) to a unified latent feature space to capture the shared characteristics of these two aspects. Besides, several labeled nodes are utilized to guide the mapping and learning process, thus the extracted latent feature representation of the nodes can effectively cap- Table 1: Notation n f r c G Vn×1 En×n Cn×f Sn×n Yn×c number of nodes number of features in the node characteristics number of features in latent representations number of classes in the learning task the linked data set the set of vertices in the linked data the set of links among the vertices the node characteristics the node functional similarity matrix the labels for each node in Vn×1 ture the information of linked data for the purpose of distinguishing the aimed functional nodes. The contributions of this work include: • We present a novel model for functional node detection on linked data, in which we seek to obtain a joint feature representation of node characteristics and link structure to capture the critical features of node functions. We derive an efficient solution to solve the problem, demonstrate its soundness and provide an extension of the approach to two other scenarios. • We develop a novel similarity measure to characterize functional closeness between nodes. As existing similarity measures applied on networks only capture connectivity, the proposed measure provides a novel way to calculate node similarity, which is an important step for detecting functional nodes. • We evaluate the proposed FIND algorithm and its extension on two real and publicly available data sets for the task of learning three different types of functional nodes. Experiments show that the proposed approaches outperform baselines which utilize network structures or node information only and are capable of detecting various functionalities from linked data. 2 Notation and Problem Definition In this section, we first introduce the notation rules and formally define the problem of functional node detection on linked data. In this paper, we use As×t to represent a matrix A which contains s rows and t columns. A linked data set is denoted as G = {Vn×1 , En×n , Cn×f } as in Table 1. In this notation, linked data G contains n nodes which form the node matrix Vn×1 and the link matrix En×n . The nodes contain f features in the node characteristic matrix Cn×f . We assume both the link matrix E and the node characteristic matrices C have been scaled to be non-negative. Suppose there are k existing labeled nodes in the linked data, we denote the label matrix as Yn×c = [Ỹk×c Ŷ(n−k)×c ] in which Ỹk×c is the label matrix for the k existing labeled nodes and Ŷ(n−k)×c is the aimed label matrix for the rest nodes. For simplicity, we treat functional node detection as a binary learning task in which nodes are classified into two classes w.r.t. whether they are the aimed functional nodes. The problem of functional node detection is formally defined as: given a linked data set G, the labels of the functions of k nodes Ỹ , learn a mapping F: G → Y . 3 Methods and Technical Solutions The proposed FIND algorithm contains two major parts: 1) integrating both the links and the node characteristics in a linked data set to a unified latent feature space; and 2) learning a model on the extracted latent feature space for predicting the labels of the unlabeled nodes. More precisely, the intuition can be formulated into the following objective: (3.1) min Xn×r ≥ 0, Bf ×r ≥ 0, Hr×r ≥ 0, Wr×c ≥ 0 c1 kC T − BX T k2F + c2 kS − XHX T k2F + f (XW, Y ). In the equation, Xn×r is the r-dimensional latent feature matrix we wish to extract. Bf ×r is the coefficient matrix which maps the node characteristic matrix Cn×f to the latent feature matrix Xn×r . Hr×r is the feature-feature correlation matrix which maps Sn×n to the aimed latent feature matrix Xn×r . Wr×c is the weight matrix in the classification loss function f (XW, Y ). c1 and c2 are the cost parameters that control the trade-off between the classification error and the reconstruction errors. Sn×n is the node functional similarity matrix computed from the network structure matrix En×n , and it captures the closeness of nodes in their functionalities. We define Sn×n in Section 3.1. In Eq.3.1, f (XW, Y ) is the classification loss defined as: f (XW, Y ) = n X 1 − Xi,: W:,yi + max Xi,: W:,yj i=1 j6=yi + αkW k2,1 , where yi is the index of the largest value in Yi,: , which is also the class index of the node i; and α is the weight decay for the l2,1 norm of Wr×c . In the two reconstruction loss functions, kC T − T 2 BX kF favors mapping the node characteristics into the latent feature matrix X to remove noise; and kS − XHXk2F intends to map the node functional similarity matrix S to the latent feature matrix X. Putting together the two reconstruction loss functions, we are able to learn a joint latent feature matrix X from both the node characteristics and the node functional similarity matrix. Together with f (XW, Y ), the Jason Alice Jack Bob Jason Alice Jack 1.00 0.36 0.08 0.36 1.00 0.36 0.08 0.36 1.00 0.00 0.00 0.00 (a) KN Matrix Bob 0.00 0.00 0.00 1.00 Jason Alice Jack 1.00 0.36 0.13 0.36 1.00 0.36 0.13 0.36 1.00 0.34 0.94 0.34 (b) KF Matrix Jason Alice Jack Bob Jason 1.00 0.57 1.00 0.57 Alice 0.57 1.00 0.57 0.96 Jack 1.00 0.57 1.00 0.57 Bob 0.57 0.96 0.57 1.00 (c) Functional Similarity Matrix Jason Alice Jack Bob Bob 0.34 0.94 0.34 1.00 Figure 2: The Functional Similarity Matrix Figure 1: A Toy Example extracted latent feature matrix X is also discriminative for the detection of the aimed functional nodes. We impose the non-negative constraint to the weight matrix W , coding matrix B and latent feature matrix X. Therefore, in the model, only additive combinations are allowed and no substraction can occur, which is consistent with the intuition of combining both node characteristics and the network as a unified latent feature matrix. On the linked data, with the input of the network structure matrix En×n , node characteristic matrix Cn×f and the label matrix of the existing classified nodes Ỹk×c , we are able to learn W, B, H and X through the model in Eq.3.1. The latent feature matrix X̂(n−k)×r of the unlabeled nodes can then be used in the classifier f (X̂(n−k)×r W, Ŷ(n−k)×c ) to learn the optimal labels Ŷ(n−k)×c . as [5,13] focus on measuring the similarity of two graphs. In the paper, we denote the distance metric on two nodes a and b as Kn (a, b), and the result of it as KNab . Similarly, we denote the distance metric on two graphs Ea and Eb as Kg (Ea , Eb ) and the result as KGEa Eb . Without loss of generality, we assume KNab and KGEa Eb are in the range of [0, 1], and the higher score two instances have, the closer they are. Generally, a network En×n consists of multiple groups of nodes. Nodes in the same group are highly connected, and nodes from different groups have few links or no link among them. We present such a toy example in Fig.1. The network in Fig.1 includes two groups of people: the left group E1 which contains Jack, Alice, Jason and Mike; and the right group E2 which contains Dan, Lily, Bob and Eric. The two groups E1 and E2 are very close in their topologies. For the sake of simplicity, we demonstrate the process of estimating the functional similarities among Jason, Jack, Alice and Bob in Fig.2. Obviously, Alice and Bob should have high functional similarity score since they play as the centers of the two groups. In the existing approaches, as the distance metrics measuring similarities of graphs, Kg can not be applied to measure similarities among nodes. By Kn (Alice, Bob), the score is 0 as in Fig.2.(a), since Alice and Bob are not reachable from each other. In cases when there are only few links between Alice and Bob, the closeness score measured by Kn (Alice, Bob) is still low, indicating Alice and Bob are far away from each other in the network. Thus Kn does not fit the task of functional node detection. Intuitively, if two objects have similar impacts on all the other objects, they should have similar functionalities. To formulate the intuition, we define a function Kf (a, b) measuring the impact of the node a on the node b, and use KFab to denote the result of Kf (a, b) as: 3.1 Node Functional Similarity Matrix Generally, the network structure matrix En×n only describes the connectivity of nodes, and is not directly interpreting the functional relationships between nodes. Two nodes having the same functionalities can be connected, disconnected or even unreachable from each other at all. However, for functional node detection, two nodes having similar functionalities have to be very close in the latent feature matrix Xn×r , no matter whether the two nodes are connected or not in En×n . Therefore, En×n can not be directly used in our model. To utilize the link information of the network, we have to transform En×n to the node functional similarity matrix Sn×n which captures the closeness of two nodes in their functionalities rather than in their connectivities. Unfortunately, existing studies which deal with node functional similarities on linked data can not be applied to map En×n to Sn×n since they evaluate functional similarities on node characteristics instead of (3.2) KFab = Kf (a, b) = S(Ea , a) · Kg (Ea , Eb ) · S(Eb , b), on networks. Existing network-based closeness metrics focus on in which S(Ea , a) represents the impact of the node a two different aspects. Some papers such as [2, 6] seek on its group Ea . Viewing Ea and Eb as two sub graphs, to measure the closeness of nodes. The others such Kg (Ea , Eb ), which captures the closeness between Ea and Eb , transmits the impacts between the group Ea and the group Eb . Obviously, Kf (a, b) = Kf (b, a). The Kf function enables calculating the impact of the node a on the node b and vice versa, no matter whether their groups Ea and Eb are connected or not. We demonstrate the KF matrix of the toy example in Fig.2.(b). The functional closeness of two nodes is estimated by the similarity of their impact scores to all the other nodes as: (3.3) Sab = X d (Kf (a, i), Kf (b, i)) , i6=a,b 3.2 Solution and Analysis In this section, we first present a solution to Eq.3.1, and then theoretically analyze the solution. 3.2.1 Solution We develop a solution to Eq.3.1 based on block coordinate techniques. We first divide the unsolved variables into two sets: {W, Y, B, H} and {X}, and then iteratively fix one set of them and update the other set. Specifically, we have the following solution: (1) Fixing X, the optimization problem can be decomposed into three independent sub problems as: in which Sab is the aimed functional similarity metric (3.5) between a and b; d(u, v) is a similarity metric between u and v, and the closer u and v are, the higher d(u, v) is. (3.6) Eq.3.2 and Eq.3.3 provide us a general framework for estimating node functional similarities. We present a specific formulation of the framework as follows. (3.7) We estimate S(Ea , a) as: nEa X min Wr×c ≥0,Yn×c f (Xn×r Wr×c , Yn×c ) + αkWr×c k2,1 , T T min kCn×f − Bf ×r Xn×r k2F , Bf ×r ≥0 T min kSn×n − Xn×r Hr×r Xn×r k2F . Hr×r ≥0 ! In the three sub problems, Eq.3.5 is a classical SVM task with l2,1 norm. With the fixed latent feature matrix X̃k×r and the labels Ỹk×c for the labeled nodes, Eq.3.5 where nEa is the number of nodes in the Ea . can PnEgroup be solved with many off-the-shelf optimization tools. In the group Ea , we use maxi∈[1,nEa ] j=1a Kn (i, j) Eq.3.6 and Eq.3.7 can be solved efficiently as: to locate the node having the highest accumulated closeness to all the other nodes in the group, and define (3.8) B = C T X · (X T X)−1 , the located node as the group center of Ea . For instance, in the toy example in Fig.1, we identify Alice and Bob as the group centers. S(Ea , a) is then estimated through H = (X T X)−1 · X T SX · (X T X)−1 . the closeness between the node a and the group center (3.9) of Ea . (2) Fixing W , Y , B and H, the objective w.r.t. X is: The functional similarity between node a and node (3.10) b can be obtained by: T T T T S(Ea , a) = Kn a, max i∈[1,nEa ] Kn (i, j) , j=1 J(X) =T r(O − P X + XQX + c2 · XHX XHX (3.4) Sab = qP 1 i6=a,b (Kf (a, i) , − Kf (b, i))2 − 2c2 · SXHX T ), +1 where O = Y Y T + c1 · CC T + c2 · SS T , P = 2Y W T + 2c1 · CB, and Q = W W T + c1 · B T B. 2 By i6=a,b (Kf (a, i) − Kf (b, i)) , we estimate the To optimize Eq.3.10, we employ the following difference between a’s and b’s impacts on all the other updating function: nodes. We then add 1 to it and invert the result to scale 1 Sab to the range of [0, 1]. The higher Sab is, the closer 4 (P + 4c2 · SXH)ij . (3.11) Xij ← Xij a and b are. (4c2 · XHX T XH + 2XQ)ij By this method, we obtain the functional node similarity matrix of the toy example as in Fig.2.(c). In We summarize the solution in Alg.1. It is obvious that in each iteration, the solutions to the result, the functional similarity of Jason and Jack is 1 which indicates that Jason and Jack play the same W , Y , B and H keep decreasing the objective. Thus in role in the network. Since Alice and Bob are the centers this section, we focus on how the updating rule of X in of two very similar groups, they obtain high functional Eq.3.11 decreases the objective function in Eq.3.10. similarity (0.96). The result also indicates that Jason and Alice are very different in their node functionalities 3.2.2 Analysis We first describe two lemmas we use although they are connected. in the proof: qP Algorithm 1 Feature Integration based Functional Node Detection (FIND) 4 0 0T 0 Xij Input: The linked data G = {V, E, C} and the labels T r(c2 · XHX T XHX T ) ≤ c2 · (X HX X H)ij · 0 3 . X Ỹ of k nodes in G ij ij Output: The labels for the other nodes Ŷ Because u ≥ 1 + log u, ∀u ∈ (0, ∞], we have: 1: Calculate the node functional similarity matrix S X by Eq.3.4 Xik T r(P X T ) ≥ Pij Xik · (1 + log 0 ), 2: Randomly initialize X with the non-negative conXik ijk straint X 0 0 Xji Xkl 3: repeat T r(2c2 ·SXHX T ) ≥ 2c2 · Xji Hjk Xkl Sli ·(1+log 0 0 ). Xji Xkl 4: Calculate W and Y by Eq.3.5 ijkl 0 5: Calculate B by Eq.3.8 Summing over all these bounds, we have Z(X, X ) ≥ 6: Calculate H by Eq.3.9 J(X). Also, it is very clear that Z(X, X) = J(X). 7: Update X by Eq.3.11 From the updating rule in Eq.3.14, we can infer: 8: until convergence J(X t+1 ) ≤ Z(X t+1 , X t ) ≤ Z(X t , X t ) = J(X t ). 9: Ŷ = Y − Ỹ Notice that the only case when J(X t+1 ) = J(X t ) is 0 when X t is a local minimum of Z(X, X ). Suppose , Lemma 3.1. (From [1]) For any matrices C ∈ Rn×n + Xmin denotes the local minimum of Z(X, X 0 ), we have: 0 k×k n×k n×k B ∈ R+ , S ∈ R+ and S ∈ R+ , that C and B are J(Xmin ) ≤ ... ≤ J(X t+1 ) ≤ J(X t ) ≤ ... ≤ J(X 1 ) ≤ J(X 0 ), symmetric, the following inequality holds: thus J(X) in Eq.3.10 is non-increasingly updated 2 X (CS 0 B)ip Sip (3.12) ≥ T r(S T CSB). under the updating rule in Eq.3.14. 0 X Sip ip To utilize Proposition 3.1, we need to prove: Lemma 3.2. (From [16]) For any nonnegative symmetn×k Proposition 3.2. The iteratively updating function in ric matrices C ∈ Rk×k and B ∈ Rk×k + + , for H ∈ R+ Eq.3.11 satisfies the updating rule in Eq.3.14. the following inequality holds: 0 T r(HCH T HBH T ) ≤ 0 (3.13) H CH X 0T ik 0 0 H B + H BH 2 0T 0 H C ! 4 Hik 0 ik Hik 3. 0 We define an auxiliary function Z(X, X ), and have: Proof. To find the local minimum of Z(X, X ), we fix 0 X and get: 0 0 3 0 Xij Xij ∂Z(X, X ) = −Pij · + 2(X Q)ij · 0 ∂Xij Xij Xij 0 + 4c2 · (X HX Proposition 3.1. The objective in Eq.3.10 is nonincreasing under the updating rule: (3.14) X t+1 0 0 X H)ij · 3 Xij 0 Xij 3 0 0 t − 4c2 (HX S)ij · = arg min Z(X, X ), X Xij . Xij 0 in which Z(X, X ) = 0T The Hessian matrix of Z(X, X ) is: X ij + Oij − X 0 Pij Xji · (1 + log ij 0 4 Xij + Xij 1X 0 (X Q)ij · 0 3 2 ij Xij Xij 0 ) Xij 4 4 X 0 0T 0 Xij + c2 · (X HX X H)ij · 0 3 Xij ij X 0 0 Xji Xkl − 2c2 · Xji Hjk Xkl Sli (1 + log 0 0 ). Xji Xkl ijkl T r(XQX T ) ≤ 2 X 0 Xij (X Q)ij · 0 Xij ij which is a diagonal matrix with positive entries: 0 Tij = 0 (Pij + 4c2 · (HX S)ij ) · Xij 2 Xij 0 +6 0 ((X Q)ij + 2c2 · (X HX i Xij 0T 0 2 X H)ij ) · Xij 3 0 . Thus the Hessian matrix of Z(X, X ) is positive0 semidefinite and Z(X, X ) is convex. The optimal mini0 mum of it can be obtained by setting ∂Z(X, X )/∂Xij = 0 4 4 X 0 Xij + Xij 1 ≤ (X Q)ij , 0. We then obtain Eq.3.11 which satisfies the updating 0 3 2 ij rule in Eq.3.14. Xij Proof. According to Lemmas 3.1 and 3.2, we have: (3.15) 0 ∂ 2 Z(X, X ) = δik δjl Tij , ∂Xij ∂Xkl By Proposition 3.1 and 3.2, the objective in 4 Empirical Evaluation Eq.3.10 is non-increasing under the updating function We conduct various experiments in this section to in Eq.3.11. evaluate and analyze the effectiveness of the proposed methods FIND in Alg.1 and FINDS on two real world 3.3 Extension If no labeled node is available in the data sets Advogato [8] and Robot.Net. Both of the target linked data, FIND can also be adapted to the two data sets are publicly available1,2 . following two cases: (1) In reality, the origins of linked data are usually 4.1 Dataset Description The Advogato and the dynamic with evolving links and node characteristics. Robot.Net data sets were collected by crawling daily For the detection of functional nodes in a target snap- on the online community web sites Advogato 3 and shot, the well labeled history snapshots of the dynamic Robot.Net 4 . The two web sites provide cooperation enorigins can be used in the learning process if the aimed vironment to the development of open-source softwares node functionality keeps the same over the history and and robots, respectively. To promote collaborations, the target snapshots. both of the two web sites enable each member to mark (2) Intuitively, labeled functional nodes in a source linked neighbors as Apprentices, Journeyers, and Mascan be used to help identify functional nodes in a target ters according to the neighbors’ contributions to the if the source and the target are of the same type. For cooperated projects. instance, identified spams in Gmail can be used to For the experiments, in Advogato, we select the first detect spams in Hotmail, and labeled zombie fans in sampled data in each February from 2007 to 2012, and Facebook can be used to filter zombie fans in Google+. denote the 6 snapshots as A1 to A6 ; and in Robot.Net, The adapted algorithm can also take advantage of we select the first sampled data in each year from 2008 labeled nodes in a source which is the same type of to 2013, and denote the 6 snapshots as R1 to R6 . By the linked data as the aimed linked data. votes, the nodes in each snapshot are divided into three Generally, there are three benefits for the adapted functional classes: Apprentice Nodes, Journeyer algorithm: 1) in a timely learning task, there is no Nodes and Master Nodes. need to re-train the model; 2) it removes the demand Due to the page limit, the details of parameter of labeled nodes in the target linked data; and 3) it setting are presented in the supplementary file. enables the learning of a large target snapshot with a small history or source linked data. 4.2 Baselines and Metric The inputs of the proLet’s denote the history or source linked data as GS posed algorithm FIND include three parts: the network and the aimed linked data as GT . We then train on GS structure En×n , the node characteristics Cn×f ; and the and test on GT as follows. k existing labeled nodes. Based on the three parts, we In the training process, we apply FIND in Alg.1 on choose four baselines for comparisons: GS to learn the parameter matrices BS , HS and WS ; (1) The PageRank algorithm [11] is an unsupervised In the testing process, the target linked data GT measurement on importance of nodes. Since the concan be mapped into latent feature space X by using the cepts of Apprentice, Journeyer , and Master are same parameter matrices BS and HS : also related to importance, we set PageRank as the first baseline to exam how well a selected topological feature of link structure can characterize the aimed node (3.16) min c1 kC T − BS X T k2F + c2 kS − XHS X T k2F . X functions; (2) In each aimed learning task, the number of Its solution and theoretical analysis are very similar to functional nodes is much less than the number of the those in Section 3.2, thus we skip the details here. The other nodes. One class SVM (OCSVM) [12] is an updating function w.r.t. X that optimizes Eq.3.16 is: established unsupervised model for such imbalanced (3.17) data sets. We set OCSVM as the second baseline to 1 4 (2c1 · CBS + 4c2 · SXHS )ij test how well the node characteristics can capture the . Xij ← Xij (4c2 · XHS X T XHS + 2c1 · XBST BS )ij features of the aimed node functions; (3) To evaluate the composite impact of the input network structure and the k existing labeled nodes, we The aimed label matrix Y is obtained by plugging X into the trained classifier f (XWS , Y ). We denote the above adapted algorithm as FINDS (Feature Integration based Functional Node Detection with Source/History Linked Data). 1 2 3 4 net/ Table 2: Experiments on the Advogato Dataset ROC True positive rate 1 0.9 0.8 0.7 0.6 0.5 FIND SSC SCF OCSVM PageRank 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 False positive rate 0.9 1 Figure 3: Experiments on Apprentice Node Detection in Advogato create a supervised method which maps the node functional similarity matrix Sn×n to a latent feature space with the guidance of the existing labeled nodes, and name the method as Supervised Coding of Functional Similarity Matrix (SCF). This baseline contains the regularized classification loss measuring the prediction errors of the labeled nodes, and the reconstruction loss of the mapping from the node functional similarity matrix Sn×n to the latent feature matrix X in Eq.3.1; (4) Similarly, to evaluate the composite impact of the input node characteristics and the k existing labeled nodes, we compare the proposed algorithms to supervised sparse coding (SSC) [18] which maps the node characteristics to a latent feature space with the guidance of the labeled nodes. Due to the class imbalance in the learning tasks, we set Area Under the Curve (AUC) as the evaluation metric to numerically compare the investigated approaches. The output values of AUC are in the range of [0, 1], and the higher the output is, the better the performance is. A1 A2 A3 A4 A5 A6 SSC 0.8593 0.7323 0.6280 0.7555 0.7468 0.7104 A1 A2 A3 A4 A5 A6 SSC 0.8986 0.9090 0.8968 0.8144 0.8963 0.9144 A1 A2 A3 A4 A5 A6 SSC 0.8662 0.8865 0.8803 0.8456 0.8833 0.8514 Apprentice Nodes SCF OCSVM PageRank 0.7517 0.8070 0.6826 0.7111 0.6484 0.7089 0.7246 0.6570 0.6706 0.7271 0.6625 0.6888 0.7356 0.6591 0.6689 0.7377 0.6600 0.6849 Journeyer Nodes SCF OCSVM PageRank 0.7423 0.7701 0.7767 0.7567 0.6340 0.7520 0.7537 0.6306 0.7031 0.7581 0.6368 0.7079 0.7629 0.6382 0.6923 0.7652 0.6395 0.6769 Master Nodes SCF OCSVM PageRank 0.8036 0.7821 0.7883 0.8105 0.6327 0.7208 0.8139 0.6346 0.6883 0.8157 0.6378 0.6569 0.8161 0.6409 0.7123 0.8177 0.6429 0.7110 FIND 0.9361 0.8937 0.8943 0.8938 0.8774 0.8825 FIND 0.9754 0.9741 0.9759 0.9771 0.9763 0.9763 FIND 0.9830 0.9825 0.9832 0.9802 0.9831 0.9842 OCSVM which uses node characteristics for unsupervised functional node detection got the lowest AUC scores; in contrast, PageRank which mines node importance on the network structure achieved better performance than OCSVM. This fact indicates that the arbitrarily selected node characteristics can not well characterize node functions; SCF, which maps the node functional similarity matrix to a latent feature space with the supervision of the labeled nodes, significantly outperformed OCSVM and achieved higher AUC scores than PageRank; among the four baselines, SSC, which encodes the node characteristics into the latent feature space with the guidance of the labeled nodes, achieved the best performance. The proposed FIND algorithm successfully integrates discriminative information in the node characteristics and the network structure, thus achieved sig4.3 Performance Study on FIND To evaluate the nificantly better performance than all the baselines. On proposed method FIND in Alg.1, we experimentally the mean AUC scores, FIND outperformed SSC, SCF, test its performance on detecting Apprentice Nodes, OCSVM and PageRank by 14.4%, 24.1%, 42.6% and Journeyer Nodes and Master Nodes. 35.0%, respectively. In each learning task, we randomly pick 3 nodes in each class as training instances and test on all the other 4.3.2 Experiments on the Robot.Net Dataset nodes. Table 2 and Table 3 present the results of The results of the experiments on the Robot.N et data the three learning tasks on Advogato and Robot.Net, re- set are shown in Table 3. spectively. Moreover, to better illustrate the difference SCF performed the worst; PageRank reached better of each investigated method, we demonstrate the ROC performance than SCF in all the binary learning tasks; curves of detecting Apprentice Nodes on Advogato in SSC obtained slightly higher AUC scores than SCF Fig.3. and PageRank; and OCSVM performed the best in the rest baselines. Compared to the baselines, the 4.3.1 Experiments on the Advogato Dataset proposed FIND algorithm achieved significantly better On the Advogato data set, the results of detecting three performance. On mean AUC scores, FIND made 35.3%, different types of functional nodes show very consistent 59.6%, 12.9% and 47.8% improvements over SSC, SCF, trends as demonstrated in Table 2. OCSVM and PageRank, respectively. Table 5: Experiments on Robot.Net with Labeled A1 Table 3: Experiments on the Robot.Net Dataset SSC 0.7476 0.6352 0.6176 0.5164 0.6149 0.5689 R1 R2 R3 R4 R5 R6 SSC 0.8741 0.7735 0.7322 0.5579 0.5306 0.8646 FIND 0.9267 0.9146 0.9389 0.9702 0.9593 0.9556 R1 R2 R3 R4 R5 R6 Master Nodes SCF OCSVM PageRank 0.6324 0.8225 0.7044 0.6015 0.8372 0.6257 0.6027 0.8389 0.6233 0.6036 0.8405 0.5896 0.5960 0.8429 0.5876 0.5930 0.8449 0.6267 R1 R2 R3 R4 R5 R6 k 1 FINDS 0.9603 0.9386 0.9538 0.9571 0.9356 0.9662 4.4 Performance Study on FINDS As discussed in Section 3.3, the proposed FINDS algorithm can be used in two cases: 1) using a labeled history snapshot as input if the origin of the target linked data set is dynamic; and 2) using a labeled source which is the same type of linked data as the target link data. In this section, we focus on detecting Master Nodes on the Robot.Net data set, and experimentally evaluate the FINDS algorithms in the two cases. 4.4.1 On History Linked Data We set R1 in the Robot.N et data set as the labeled history snapshot, and use it to supervise the detection of Master Nodes on R1 to R6. The results are presented in Table 4. Among the five investigated approaches, SCF and PageRank reached very close performance in all the binary learning tasks; SSC increased the mean AUC score by 0.2115 and 0.1902 over SCF and PageRank, respectively; and OCSVM performed slightly better than SSC. Compared to the baselines, the proposed FINDS algorithm achieved significantly better performance. On mean AUC scores, FINDS made 16.6%, 57.4%, 13.6% and 52.0% improvements over SSC, SCF, OCSVM and PageRank, respectively. FINDS 0.9505 0.9530 0.9505 0.9395 0.9725 0.9636 r 0.96 0.94 1 0.9 0.92 0.95 0.9 0.9 0.8 R1 R2 R3 R4 R5 R6 0.88 0.85 0.86 0.7 0.84 0.8 FIND 0.9559 0.9340 0.9435 0.9542 0.9062 0.9302 Table 4: Experiments on Robot.Net with Labeled R1 SSC 0.8498 0.8091 0.8049 0.7936 0.7330 0.9078 Master Nodes SCF OCSVM PageRank 0.6324 0.8225 0.7044 0.6015 0.8372 0.6257 0.6027 0.8389 0.6233 0.6036 0.8405 0.5896 0.5960 0.8429 0.5876 0.5930 0.8449 0.6267 SSC 0.8927 0.8426 0.7032 0.6229 0.7489 0.8874 AUC R1 R2 R3 R4 R5 R6 FIND 0.8887 0.8679 0.8529 0.8714 0.8743 0.8514 AUC R1 R2 R3 R4 R5 R6 SSC 0.7673 0.6513 0.6535 0.6956 0.7368 0.6543 Apprentice Nodes SCF OCSVM PageRank 0.5327 0.7996 0.6288 0.5197 0.7398 0.5480 0.5419 0.7466 0.6136 0.5416 0.7460 0.6610 0.5450 0.7506 0.5912 0.5489 0.7513 0.5923 Journeyer Nodes SCF OCSVM PageRank 0.6140 0.8228 0.7125 0.5806 0.8460 0.5540 0.5772 0.8424 0.6592 0.5770 0.8436 0.6262 0.5734 0.8409 0.6275 0.5699 0.8440 0.5825 Master Nodes SCF OCSVM PageRank 0.6296 0.8215 0.7075 0.5988 0.8394 0.6189 0.6002 0.8424 0.6333 0.6011 0.8430 0.5967 0.5930 0.8457 0.5834 0.5916 0.8459 0.6211 80 0.82 60 Pa ram ete r 40 Va lue R4 20 R1 R2 R3 Data R5 0.6 R6 0.8 0.78 0.5 0 10 1 10 Parameter Value 2 10 Figure 4: Experiments on Varying Number of Training Instances k and Number of Latent Features r. In the experiments, the proposed FINDS algorithm performed consistently better than any baseline in all the binary learning tasks. Overall, on the mean AUC scores, the FINDS algorithm outperformed SSC, SCF, OCSVM and PageRank by 22.0%, 57.9%, 14.0% and 52.5%, respectively. 4.5 Impact of Training Sets In the above experiments, on each data, we use 3 labeled instances in training. To evaluate the impact of the size of training instances k, we also conduct experiments on various settings of k and demonstrate the results in Fig.4. From the graph, we can observe that when k = 2, since guidance of the labeled instances is too limited, the AUC scores by the proposed FIND model significantly vary from R1 to R6 . Nevertheless, after slightly increasing k to 4, the model can achieve on average over 0.9 in AUC scores. Such high performance keeps stable when we keep increasing the number of labeled training instances. This experiment well confirm that the proposed FIND model is effective in detecting node functions when only a few labeled instances are provided. 4.6 Parameter Sensitivity In FIND and FINDS, the number of latent features r controls the expressive power of the latent feature matrix Xn×r , and is closely related to the performance of the proposed FIND and FINDS. In this section, we use Master Node Detection on Robot.N et to demonstrate how r impacts the performance and to show the process of setting it for FIND. Other parameters are set in the same way. As shown in Fig.4, on R1 to R6, the AUC scores dramatically in4.4.2 On Source Linked Data We set A1 in the crease to over 0.9 when r varies from 1 to 3. When r is Advogato data set as the labeled source linked data, larger than 10, the model performs slightly worse due to and use it to supervise the detection of Master Nodes over-fitting. According to this observation, we set r = 3 on R1 to R6. The results are presented in Table 5. in each binary classification task in the experiments. 5 Related Work Although mining linked data has been a hot topic recently, the proposed work in this paper significantly differs from existing work in both the method and the aimed task. Graph-based semi-supervised learning [17, 19] and collective classification [3, 10] also tackle the classification problem on linked data. In such topics, links are treated as important constraints for classification tasks, and connected nodes are more likely to belong to the same class while disconnected nodes should belong to different classes. In contrast, in the proposed functional node detection task, nodes of the same functionality may not be reachable from each other in the networks. For instance, in Gmail, spammers in different geographical regions may not have links between them at all but they still share the same functionality. Therefore, we must extract discriminative information from links instead of treating links as constraints in our task. There are several existing studies concerning about integrating network structures and node characteristics into a unified latent feature space. The method in [14] summarizes information on linked data into four types of relationships including CoFollowing and etc. in a supervised framework. In [15], the authors used pseudoclass label information to supervise the integration of features for community detection. In these methods, two nodes have more similar latent feature vectors only if they are highly connected with each other, which is different from the proposed algorithms. 6 Conclusions In this paper, we proposed a novel algorithm for functional node detection on linked data. The basic idea is to seek a joint latent space which captures the characteristics of both node attributes and network structures while maximizing the discriminative features guided by existing labeled nodes. We presented an iterative solution to the method and provided theoretical analysis of the solution. Moreover, in the paper, we also developed a novel node similarity metric which measures the functional closeness between nodes. Experiments on two real world data sets validated that the proposed method significantly outperforms the baselines by 13% to 60% on the detection of three different types of functional nodes. 7 Acknowledgment The work published in this paper is partially supported by the National Science Foundation under Grants No. 1218393, No. 1016929. References [1] C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix t-factorizations for clustering. Proc. of KDD’06, 19(2), 2006. [2] L. Ge and A. Zhang. Pseudo cold start link prediction with multiple sources in social networks. Proc. of SDM’12, 2012. [3] L. Getoor and B. Taskar. Introduction to Statistical Relational Learning, volume L. MIT Press, 2007. [4] W. Hwang, Y. Cho, A. Zhang, and M. Ramanathan. Bridging centrality: Identifying bridging nodes in scalefree networks. Proc. of KDD’06, 2006. [5] H. Kashima and A. Inokuchi. Kernels for graph classification. Computer, 2002. [6] R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. Proc. of ICML’02, 2002. [7] C. J. Kuhlman, V. S. A. Kumar, M. V. Marathe, S. S. Ravi, and D. J. Rosenkrantz. Finding critical nodes for inhibiting diffusion of complex contagions in social networks. Proc. of ECML’10, 2010. [8] P. Massa, K. Souren, M. Salvetti, and D. Tomasoni. Trustlet, open research on trust metrics. Scalable Computing: Practice and Experience, 2008. [9] M. McCord and M. Chuah. Spam detection on twitter using traditional classifiers. Autonomic and Trusted Computing, 2011. [10] J. Neville and D. Jensen. Relational dependency networks. Journal of Machine Learning Research, 8(8), 2007. [11] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. World Wide Web Internet And Web Information Systems, 1998. [12] B. Schlkopf, J. C. Platt, J. S. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 2001. [13] N. Shervashidze, P. Schweitzer, V. Leeuwen, E. Jan, K. Mehlhorn, and K. Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 2011. [14] J. Tang and H. Liu. Feature selection with linked data in social media. Proc. of ASUCISE’11, 2011. [15] J. Tang and H. Liu. Unsupervised feature selection for linked social media data. Proc. of KDD’12, 2012. [16] H. Wang, H. Huang, and C. Ding. Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. Proc. of CIKM’11, 2011. [17] W. Wang and Z. Zhou. A new analysis of co-training. ICML, 2010. [18] J. Yang, K. Yu, and T. Huang. Supervised translationinvariant sparse coding. Proc. of CVPR’10, 2010. [19] X. Zhu. Semi-supervised learning with graphs. Science, 27, 2005.