Functional Node Detection on Linked Data

advertisement
Functional Node Detection on Linked Data
Kang Li
Jing Gao
Suxin Guo
Nan Du
Aidong Zhang
University at Buffalo, The State University of New York
{kli22, jing, suxinguo, nandu, azhang}@buffalo.edu
Abstract
Networks, which characterize object relationships, are ubiquitous in various domains. One very important problem is to
detect the nodes of a specific function in these networks. For
example, is a user normal or anomalous in an email network?
Does a protein play a key role in a protein-protein interaction network? In many applications, the information we
have about the networks usually includes both node characteristics and network structures. Both types of information
can contribute to the task of learning functional nodes, and
we call the collection of node and link information as linked
data. However, existing methods only use a few subjectively selected topological features from network structures
to detect functional nodes, thus fail to include highly discriminative and meaningful patterns hidden in linked data.
To address this problem, a novel F eature I ntegration based
Functional N ode Detection (FIND) algorithm is presented.
Specifically, FIND extracts the most discriminative information from both node characteristics and network structures
in the form of a unified latent feature representation with
the guidance of several labeled nodes. Experiments on two
real world data sets validate that the proposed method significantly outperforms the baselines on the detection of three
different types of functional nodes.
1
Background and Motivation
During the formation and evolution of a network, nodes
usually have various types of roles or functionalities.
Detection of the nodes having a specific functionality
in a network is essential for understanding the corresponding patterns of the network. For instance, critical
nodes in social networks are used for contagion analysis [7], bridging nodes in protein interaction networks
stand for the key proteins connecting modules [4], and
spammers in email networks need to be filtered out for
better constitutions of the systems [9].
Existing methods for the problem usually make
strong assumptions on the relationships between specific topological properties and the types of functional
nodes of interest. For example, nodes having high bridging centralities are categorized as bridging nodes [4], and
high PageRank scores of nodes indicate high importance
degrees [11]. The selection of such topological properties in each task is usually subjective or sometimes
even arbitrary, thus existing methods may miss critical
patterns characterizing node functionalities in the networks. Moreover, when the aimed functional nodes are
complex, it could be very difficult to obtain their relationships w.r.t. existing topological properties, which
makes such strategy ineffective in real practice.
As we know, information about a real network usually involves both high-dimensional node characteristics
and a network structure which consists of links between
the nodes. These two types of information are jointly
referred to as linked data. For example, on Facebook,
each node denotes a person whose characteristics include preferences, posts, number of friends, etc. and
links represent interactions between the users. In the
data, both node and link information are critical for
the task of identifying the role that each node plays in
the network. For example, zombie users tend to have
more friends (link structure) than normal users and post
meaningless posts (node characteristics). Existing approaches that only use link structure information fail
to capture the relevant information in node characteristics. Besides, since link structure describes connectivity
between users and is not directly interpreting node functionalities, the selected topological features (e.g. number of friends) on the link structure may miss critical
information hidden in the link structure. In conclusion,
an effective functional node detection approach has to
successfully utilize the information from both the network structure and the node characteristics.
In this paper, we propose a novel Feature
Integration based Functional Node Detection (FIND)
model for the task of detecting nodes of specific functionalities on linked data. Specifically, the proposed
FIND model seeks to simultaneously map the information of these two aspects (network structures and
node characteristics) to a unified latent feature space to
capture the shared characteristics of these two aspects.
Besides, several labeled nodes are utilized to guide the
mapping and learning process, thus the extracted latent
feature representation of the nodes can effectively cap-
Table 1: Notation
n
f
r
c
G
Vn×1
En×n
Cn×f
Sn×n
Yn×c
number of nodes
number of features in the node characteristics
number of features in latent representations
number of classes in the learning task
the linked data set
the set of vertices in the linked data
the set of links among the vertices
the node characteristics
the node functional similarity matrix
the labels for each node in Vn×1
ture the information of linked data for the purpose of
distinguishing the aimed functional nodes.
The contributions of this work include:
• We present a novel model for functional node detection on linked data, in which we seek to obtain a
joint feature representation of node characteristics
and link structure to capture the critical features of
node functions. We derive an efficient solution to
solve the problem, demonstrate its soundness and
provide an extension of the approach to two other
scenarios.
• We develop a novel similarity measure to characterize functional closeness between nodes. As existing
similarity measures applied on networks only capture connectivity, the proposed measure provides a
novel way to calculate node similarity, which is an
important step for detecting functional nodes.
• We evaluate the proposed FIND algorithm and its
extension on two real and publicly available data
sets for the task of learning three different types
of functional nodes. Experiments show that the
proposed approaches outperform baselines which
utilize network structures or node information only
and are capable of detecting various functionalities
from linked data.
2
Notation and Problem Definition
In this section, we first introduce the notation rules and
formally define the problem of functional node detection
on linked data.
In this paper, we use As×t to represent a matrix A
which contains s rows and t columns. A linked data set
is denoted as G = {Vn×1 , En×n , Cn×f } as in Table 1.
In this notation, linked data G contains n nodes which
form the node matrix Vn×1 and the link matrix En×n .
The nodes contain f features in the node characteristic
matrix Cn×f . We assume both the link matrix E and
the node characteristic matrices C have been scaled to
be non-negative.
Suppose there are k existing labeled nodes in the
linked data, we denote the label matrix as Yn×c =
[Ỹk×c Ŷ(n−k)×c ] in which Ỹk×c is the label matrix for
the k existing labeled nodes and Ŷ(n−k)×c is the aimed
label matrix for the rest nodes. For simplicity, we treat
functional node detection as a binary learning task in
which nodes are classified into two classes w.r.t. whether
they are the aimed functional nodes.
The problem of functional node detection is formally defined as: given a linked data set G, the
labels of the functions of k nodes Ỹ , learn a mapping F: G → Y .
3 Methods and Technical Solutions
The proposed FIND algorithm contains two major
parts: 1) integrating both the links and the node
characteristics in a linked data set to a unified latent
feature space; and 2) learning a model on the extracted
latent feature space for predicting the labels of the
unlabeled nodes. More precisely, the intuition can be
formulated into the following objective:
(3.1)
min
Xn×r ≥ 0, Bf ×r ≥ 0,
Hr×r ≥ 0, Wr×c ≥ 0
c1 kC T − BX T k2F + c2 kS − XHX T k2F
+ f (XW, Y ).
In the equation, Xn×r is the r-dimensional latent feature matrix we wish to extract. Bf ×r is the coefficient matrix which maps the node characteristic matrix Cn×f to the latent feature matrix Xn×r . Hr×r
is the feature-feature correlation matrix which maps
Sn×n to the aimed latent feature matrix Xn×r . Wr×c
is the weight matrix in the classification loss function
f (XW, Y ). c1 and c2 are the cost parameters that control the trade-off between the classification error and
the reconstruction errors.
Sn×n is the node functional similarity matrix computed from the network structure matrix En×n , and it
captures the closeness of nodes in their functionalities.
We define Sn×n in Section 3.1.
In Eq.3.1, f (XW, Y ) is the classification loss defined as:
f (XW, Y ) =
n X
1 − Xi,: W:,yi + max Xi,: W:,yj
i=1
j6=yi
+ αkW k2,1 ,
where yi is the index of the largest value in Yi,: , which
is also the class index of the node i; and α is the weight
decay for the l2,1 norm of Wr×c .
In the two reconstruction loss functions, kC T −
T 2
BX kF favors mapping the node characteristics into
the latent feature matrix X to remove noise; and kS −
XHXk2F intends to map the node functional similarity
matrix S to the latent feature matrix X.
Putting together the two reconstruction loss functions, we are able to learn a joint latent feature matrix
X from both the node characteristics and the node functional similarity matrix. Together with f (XW, Y ), the
Jason
Alice
Jack
Bob
Jason Alice Jack
1.00 0.36 0.08
0.36 1.00 0.36
0.08 0.36 1.00
0.00 0.00 0.00
(a) KN Matrix
Bob
0.00
0.00
0.00
1.00
Jason Alice Jack
1.00 0.36 0.13
0.36 1.00 0.36
0.13 0.36 1.00
0.34 0.94 0.34
(b) KF Matrix
Jason Alice Jack Bob
Jason 1.00 0.57 1.00 0.57
Alice 0.57 1.00 0.57 0.96
Jack 1.00 0.57 1.00 0.57
Bob 0.57 0.96 0.57 1.00
(c) Functional Similarity Matrix
Jason
Alice
Jack
Bob
Bob
0.34
0.94
0.34
1.00
Figure 2: The Functional Similarity Matrix
Figure 1: A Toy Example
extracted latent feature matrix X is also discriminative
for the detection of the aimed functional nodes.
We impose the non-negative constraint to the
weight matrix W , coding matrix B and latent feature
matrix X. Therefore, in the model, only additive combinations are allowed and no substraction can occur,
which is consistent with the intuition of combining both
node characteristics and the network as a unified latent
feature matrix.
On the linked data, with the input of the network
structure matrix En×n , node characteristic matrix Cn×f
and the label matrix of the existing classified nodes
Ỹk×c , we are able to learn W, B, H and X through the
model in Eq.3.1. The latent feature matrix X̂(n−k)×r
of the unlabeled nodes can then be used in the classifier
f (X̂(n−k)×r W, Ŷ(n−k)×c ) to learn the optimal labels
Ŷ(n−k)×c .
as [5,13] focus on measuring the similarity of two graphs.
In the paper, we denote the distance metric on two
nodes a and b as Kn (a, b), and the result of it as
KNab . Similarly, we denote the distance metric on
two graphs Ea and Eb as Kg (Ea , Eb ) and the result as
KGEa Eb . Without loss of generality, we assume KNab
and KGEa Eb are in the range of [0, 1], and the higher
score two instances have, the closer they are.
Generally, a network En×n consists of multiple
groups of nodes. Nodes in the same group are highly
connected, and nodes from different groups have few
links or no link among them. We present such a toy
example in Fig.1. The network in Fig.1 includes two
groups of people: the left group E1 which contains Jack,
Alice, Jason and Mike; and the right group E2 which
contains Dan, Lily, Bob and Eric. The two groups E1
and E2 are very close in their topologies.
For the sake of simplicity, we demonstrate the
process of estimating the functional similarities among
Jason, Jack, Alice and Bob in Fig.2.
Obviously, Alice and Bob should have high functional similarity score since they play as the centers of
the two groups. In the existing approaches, as the distance metrics measuring similarities of graphs, Kg can
not be applied to measure similarities among nodes. By
Kn (Alice, Bob), the score is 0 as in Fig.2.(a), since Alice and Bob are not reachable from each other. In cases
when there are only few links between Alice and Bob,
the closeness score measured by Kn (Alice, Bob) is still
low, indicating Alice and Bob are far away from each
other in the network. Thus Kn does not fit the task of
functional node detection.
Intuitively, if two objects have similar impacts
on all the other objects, they should have similar
functionalities. To formulate the intuition, we define
a function Kf (a, b) measuring the impact of the node
a on the node b, and use KFab to denote the result of
Kf (a, b) as:
3.1 Node Functional Similarity Matrix Generally, the network structure matrix En×n only describes
the connectivity of nodes, and is not directly interpreting the functional relationships between nodes. Two
nodes having the same functionalities can be connected,
disconnected or even unreachable from each other at all.
However, for functional node detection, two nodes having similar functionalities have to be very close in the
latent feature matrix Xn×r , no matter whether the two
nodes are connected or not in En×n . Therefore, En×n
can not be directly used in our model. To utilize the link
information of the network, we have to transform En×n
to the node functional similarity matrix Sn×n which
captures the closeness of two nodes in their functionalities rather than in their connectivities.
Unfortunately, existing studies which deal with
node functional similarities on linked data can not be
applied to map En×n to Sn×n since they evaluate
functional similarities on node characteristics instead of (3.2) KFab = Kf (a, b) = S(Ea , a) · Kg (Ea , Eb ) · S(Eb , b),
on networks.
Existing network-based closeness metrics focus on in which S(Ea , a) represents the impact of the node a
two different aspects. Some papers such as [2, 6] seek on its group Ea . Viewing Ea and Eb as two sub graphs,
to measure the closeness of nodes. The others such Kg (Ea , Eb ), which captures the closeness between Ea
and Eb , transmits the impacts between the group Ea
and the group Eb . Obviously, Kf (a, b) = Kf (b, a). The
Kf function enables calculating the impact of the node
a on the node b and vice versa, no matter whether
their groups Ea and Eb are connected or not. We
demonstrate the KF matrix of the toy example in
Fig.2.(b).
The functional closeness of two nodes is estimated
by the similarity of their impact scores to all the other
nodes as:
(3.3)
Sab =
X
d (Kf (a, i), Kf (b, i)) ,
i6=a,b
3.2 Solution and Analysis In this section, we first
present a solution to Eq.3.1, and then theoretically
analyze the solution.
3.2.1 Solution We develop a solution to Eq.3.1
based on block coordinate techniques. We first divide
the unsolved variables into two sets: {W, Y, B, H} and
{X}, and then iteratively fix one set of them and
update the other set. Specifically, we have the following
solution:
(1) Fixing X, the optimization problem can be
decomposed into three independent sub problems as:
in which Sab is the aimed functional similarity metric (3.5)
between a and b; d(u, v) is a similarity metric between
u and v, and the closer u and v are, the higher d(u, v)
is.
(3.6)
Eq.3.2 and Eq.3.3 provide us a general framework
for estimating node functional similarities. We present
a specific formulation of the framework as follows.
(3.7)
We estimate S(Ea , a) as:
nEa
X
min
Wr×c ≥0,Yn×c
f (Xn×r Wr×c , Yn×c ) + αkWr×c k2,1 ,
T
T
min kCn×f
− Bf ×r Xn×r
k2F ,
Bf ×r ≥0
T
min kSn×n − Xn×r Hr×r Xn×r
k2F .
Hr×r ≥0
!
In the three sub problems, Eq.3.5 is a classical SVM
task with l2,1 norm. With the fixed latent feature matrix
X̃k×r and the labels Ỹk×c for the labeled nodes, Eq.3.5
where nEa is the number of nodes in the
Ea . can
PnEgroup
be solved with many off-the-shelf optimization tools.
In the group Ea , we use maxi∈[1,nEa ] j=1a Kn (i, j) Eq.3.6 and Eq.3.7 can be solved efficiently as:
to locate the node having the highest accumulated
closeness to all the other nodes in the group, and define (3.8)
B = C T X · (X T X)−1 ,
the located node as the group center of Ea . For instance,
in the toy example in Fig.1, we identify Alice and Bob
as the group centers. S(Ea , a) is then estimated through
H = (X T X)−1 · X T SX · (X T X)−1 .
the closeness between the node a and the group center (3.9)
of Ea .
(2) Fixing W , Y , B and H, the objective w.r.t. X is:
The functional similarity between node a and node (3.10)
b can be obtained by:
T
T
T
T
S(Ea , a) = Kn
a,
max
i∈[1,nEa ]
Kn (i, j) ,
j=1
J(X) =T r(O − P X + XQX + c2 · XHX XHX
(3.4)
Sab = qP
1
i6=a,b (Kf (a, i)
,
− Kf
(b, i))2
− 2c2 · SXHX T ),
+1
where O = Y Y T + c1 · CC T + c2 · SS T , P = 2Y W T +
2c1 · CB, and Q = W W T + c1 · B T B.
2
By
i6=a,b (Kf (a, i) − Kf (b, i)) , we estimate the
To optimize Eq.3.10, we employ the following
difference between a’s and b’s impacts on all the other updating function:
nodes. We then add 1 to it and invert the result to scale
1
Sab to the range of [0, 1]. The higher Sab is, the closer
4
(P + 4c2 · SXH)ij
.
(3.11)
Xij ← Xij
a and b are.
(4c2 · XHX T XH + 2XQ)ij
By this method, we obtain the functional node
similarity matrix of the toy example as in Fig.2.(c). In We summarize the solution in Alg.1.
It is obvious that in each iteration, the solutions to
the result, the functional similarity of Jason and Jack
is 1 which indicates that Jason and Jack play the same W , Y , B and H keep decreasing the objective. Thus in
role in the network. Since Alice and Bob are the centers this section, we focus on how the updating rule of X in
of two very similar groups, they obtain high functional Eq.3.11 decreases the objective function in Eq.3.10.
similarity (0.96). The result also indicates that Jason
and Alice are very different in their node functionalities 3.2.2 Analysis We first describe two lemmas we use
although they are connected.
in the proof:
qP
Algorithm 1 Feature Integration based Functional
Node Detection (FIND)
4
0
0T
0
Xij
Input: The linked data G = {V, E, C} and the labels T r(c2 · XHX T XHX T ) ≤ c2 ·
(X HX X H)ij · 0 3 .
X
Ỹ of k nodes in G
ij
ij
Output: The labels for the other nodes Ŷ
Because u ≥ 1 + log u, ∀u ∈ (0, ∞], we have:
1: Calculate the node functional similarity matrix S
X
by Eq.3.4
Xik
T r(P X T ) ≥
Pij Xik · (1 + log 0 ),
2: Randomly initialize X with the non-negative conXik
ijk
straint
X 0
0
Xji Xkl
3: repeat
T r(2c2 ·SXHX T ) ≥ 2c2 ·
Xji Hjk Xkl Sli ·(1+log 0 0 ).
Xji Xkl
4:
Calculate W and Y by Eq.3.5
ijkl
0
5:
Calculate B by Eq.3.8
Summing over all these bounds, we have Z(X, X ) ≥
6:
Calculate H by Eq.3.9
J(X). Also, it is very clear that Z(X, X) = J(X).
7:
Update X by Eq.3.11
From the updating rule in Eq.3.14, we can infer:
8: until convergence
J(X t+1 ) ≤ Z(X t+1 , X t ) ≤ Z(X t , X t ) = J(X t ).
9: Ŷ = Y − Ỹ
Notice that the only case when J(X t+1 ) = J(X t ) is
0
when X t is a local minimum of Z(X, X ). Suppose
,
Lemma 3.1. (From [1]) For any matrices C ∈ Rn×n
+
Xmin denotes the local minimum of Z(X, X 0 ), we have:
0
k×k
n×k
n×k
B ∈ R+ , S ∈ R+ and S ∈ R+ , that C and B are
J(Xmin ) ≤ ... ≤ J(X t+1 ) ≤ J(X t ) ≤ ... ≤ J(X 1 ) ≤ J(X 0 ),
symmetric, the following inequality holds:
thus J(X) in Eq.3.10 is non-increasingly updated
2
X (CS 0 B)ip Sip
(3.12)
≥ T r(S T CSB).
under the updating rule in Eq.3.14.
0
X
Sip
ip
To utilize Proposition 3.1, we need to prove:
Lemma 3.2. (From [16]) For any nonnegative symmetn×k
Proposition 3.2. The iteratively updating function in
ric matrices C ∈ Rk×k
and B ∈ Rk×k
+
+ , for H ∈ R+
Eq.3.11 satisfies the updating rule in Eq.3.14.
the following inequality holds:
0
T r(HCH T HBH T ) ≤
0
(3.13)
H CH
X
0T
ik
0
0
H B + H BH
2
0T
0
H C
!
4
Hik
0
ik
Hik
3.
0
We define an auxiliary function Z(X, X ), and have:
Proof. To find the local minimum of Z(X, X ), we fix
0
X and get:
0
0
3
0
Xij
Xij
∂Z(X, X )
= −Pij ·
+ 2(X Q)ij · 0
∂Xij
Xij
Xij
0
+ 4c2 · (X HX
Proposition 3.1. The objective in Eq.3.10 is nonincreasing under the updating rule:
(3.14)
X
t+1
0
0
X H)ij ·
3
Xij
0
Xij
3
0
0
t
− 4c2 (HX S)ij ·
= arg min Z(X, X ),
X
Xij
.
Xij
0
in which
Z(X, X ) =
0T
The Hessian matrix of Z(X, X ) is:
X
ij
+
Oij −
X
0
Pij Xji · (1 + log
ij
0
4
Xij
+ Xij
1X 0
(X Q)ij ·
0 3
2 ij
Xij
Xij
0 )
Xij
4
4
X 0
0T
0
Xij
+ c2 ·
(X HX X H)ij · 0 3
Xij
ij
X 0
0
Xji Xkl
− 2c2 ·
Xji Hjk Xkl Sli (1 + log 0 0 ).
Xji Xkl
ijkl
T r(XQX T ) ≤
2
X 0
Xij
(X Q)ij · 0
Xij
ij
which is a diagonal matrix with positive entries:
0
Tij =
0
(Pij + 4c2 · (HX S)ij ) · Xij
2
Xij
0
+6
0
((X Q)ij + 2c2 · (X HX
i
Xij
0T
0
2
X H)ij ) · Xij
3
0
.
Thus the Hessian matrix of Z(X, X ) is positive0
semidefinite and Z(X, X ) is convex. The optimal mini0
mum of it can be obtained by setting ∂Z(X, X )/∂Xij =
0 4
4
X
0
Xij + Xij
1
≤
(X Q)ij
, 0. We then obtain Eq.3.11 which satisfies the updating
0 3
2 ij
rule in Eq.3.14.
Xij
Proof. According to Lemmas 3.1 and 3.2, we have:
(3.15)
0
∂ 2 Z(X, X )
= δik δjl Tij ,
∂Xij ∂Xkl
By Proposition 3.1 and 3.2, the objective in 4 Empirical Evaluation
Eq.3.10 is non-increasing under the updating function We conduct various experiments in this section to
in Eq.3.11.
evaluate and analyze the effectiveness of the proposed
methods FIND in Alg.1 and FINDS on two real world
3.3 Extension If no labeled node is available in the data sets Advogato [8] and Robot.Net. Both of the
target linked data, FIND can also be adapted to the two data sets are publicly available1,2 .
following two cases:
(1) In reality, the origins of linked data are usually 4.1 Dataset Description The Advogato and the
dynamic with evolving links and node characteristics. Robot.Net data sets were collected by crawling daily
For the detection of functional nodes in a target snap- on the online community web sites Advogato 3 and
shot, the well labeled history snapshots of the dynamic Robot.Net 4 . The two web sites provide cooperation enorigins can be used in the learning process if the aimed vironment to the development of open-source softwares
node functionality keeps the same over the history and and robots, respectively. To promote collaborations,
the target snapshots.
both of the two web sites enable each member to mark
(2) Intuitively, labeled functional nodes in a source linked neighbors as Apprentices, Journeyers, and Mascan be used to help identify functional nodes in a target ters according to the neighbors’ contributions to the
if the source and the target are of the same type. For cooperated projects.
instance, identified spams in Gmail can be used to
For the experiments, in Advogato, we select the first
detect spams in Hotmail, and labeled zombie fans in sampled data in each February from 2007 to 2012, and
Facebook can be used to filter zombie fans in Google+. denote the 6 snapshots as A1 to A6 ; and in Robot.Net,
The adapted algorithm can also take advantage of we select the first sampled data in each year from 2008
labeled nodes in a source which is the same type of to 2013, and denote the 6 snapshots as R1 to R6 . By the
linked data as the aimed linked data.
votes, the nodes in each snapshot are divided into three
Generally, there are three benefits for the adapted functional classes: Apprentice Nodes, Journeyer
algorithm: 1) in a timely learning task, there is no Nodes and Master Nodes.
need to re-train the model; 2) it removes the demand
Due to the page limit, the details of parameter
of labeled nodes in the target linked data; and 3) it setting are presented in the supplementary file.
enables the learning of a large target snapshot with a
small history or source linked data.
4.2 Baselines and Metric The inputs of the proLet’s denote the history or source linked data as GS posed algorithm FIND include three parts: the network
and the aimed linked data as GT . We then train on GS structure En×n , the node characteristics Cn×f ; and the
and test on GT as follows.
k existing labeled nodes. Based on the three parts, we
In the training process, we apply FIND in Alg.1 on choose four baselines for comparisons:
GS to learn the parameter matrices BS , HS and WS ;
(1) The PageRank algorithm [11] is an unsupervised
In the testing process, the target linked data GT measurement on importance of nodes. Since the concan be mapped into latent feature space X by using the
cepts of Apprentice, Journeyer , and Master are
same parameter matrices BS and HS :
also related to importance, we set PageRank as the first
baseline to exam how well a selected topological feature of link structure can characterize the aimed node
(3.16)
min c1 kC T − BS X T k2F + c2 kS − XHS X T k2F .
X
functions;
(2) In each aimed learning task, the number of
Its solution and theoretical analysis are very similar to functional nodes is much less than the number of the
those in Section 3.2, thus we skip the details here. The other nodes. One class SVM (OCSVM) [12] is an
updating function w.r.t. X that optimizes Eq.3.16 is: established unsupervised model for such imbalanced
(3.17)
data sets. We set OCSVM as the second baseline to
1
4
(2c1 · CBS + 4c2 · SXHS )ij
test how well the node characteristics can capture the
.
Xij ← Xij
(4c2 · XHS X T XHS + 2c1 · XBST BS )ij
features of the aimed node functions;
(3) To evaluate the composite impact of the input
network
structure and the k existing labeled nodes, we
The aimed label matrix Y is obtained by plugging X
into the trained classifier f (XWS , Y ).
We denote the above adapted algorithm as FINDS
(Feature Integration based Functional Node Detection
with Source/History Linked Data).
1 http://www.trustlet.org/datasets/advogato/
2 http://www.trustlet.org/datasets/robots
3 http://www.advogato.org/
4 http://robots.net/
net/
Table 2: Experiments on the Advogato Dataset
ROC
True positive rate
1
0.9
0.8
0.7
0.6
0.5
FIND
SSC
SCF
OCSVM
PageRank
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
False positive rate
0.9
1
Figure 3: Experiments on Apprentice Node Detection
in Advogato
create a supervised method which maps the node functional similarity matrix Sn×n to a latent feature space
with the guidance of the existing labeled nodes, and
name the method as Supervised Coding of Functional
Similarity Matrix (SCF). This baseline contains the regularized classification loss measuring the prediction errors of the labeled nodes, and the reconstruction loss of
the mapping from the node functional similarity matrix
Sn×n to the latent feature matrix X in Eq.3.1;
(4) Similarly, to evaluate the composite impact
of the input node characteristics and the k existing
labeled nodes, we compare the proposed algorithms to
supervised sparse coding (SSC) [18] which maps the
node characteristics to a latent feature space with the
guidance of the labeled nodes.
Due to the class imbalance in the learning tasks, we
set Area Under the Curve (AUC) as the evaluation metric to numerically compare the investigated approaches.
The output values of AUC are in the range of [0, 1], and
the higher the output is, the better the performance is.
A1
A2
A3
A4
A5
A6
SSC
0.8593
0.7323
0.6280
0.7555
0.7468
0.7104
A1
A2
A3
A4
A5
A6
SSC
0.8986
0.9090
0.8968
0.8144
0.8963
0.9144
A1
A2
A3
A4
A5
A6
SSC
0.8662
0.8865
0.8803
0.8456
0.8833
0.8514
Apprentice Nodes
SCF
OCSVM
PageRank
0.7517
0.8070
0.6826
0.7111
0.6484
0.7089
0.7246
0.6570
0.6706
0.7271
0.6625
0.6888
0.7356
0.6591
0.6689
0.7377
0.6600
0.6849
Journeyer Nodes
SCF
OCSVM
PageRank
0.7423
0.7701
0.7767
0.7567
0.6340
0.7520
0.7537
0.6306
0.7031
0.7581
0.6368
0.7079
0.7629
0.6382
0.6923
0.7652
0.6395
0.6769
Master Nodes
SCF
OCSVM
PageRank
0.8036
0.7821
0.7883
0.8105
0.6327
0.7208
0.8139
0.6346
0.6883
0.8157
0.6378
0.6569
0.8161
0.6409
0.7123
0.8177
0.6429
0.7110
FIND
0.9361
0.8937
0.8943
0.8938
0.8774
0.8825
FIND
0.9754
0.9741
0.9759
0.9771
0.9763
0.9763
FIND
0.9830
0.9825
0.9832
0.9802
0.9831
0.9842
OCSVM which uses node characteristics for unsupervised functional node detection got the lowest AUC
scores; in contrast, PageRank which mines node importance on the network structure achieved better performance than OCSVM. This fact indicates that the arbitrarily selected node characteristics can not well characterize node functions; SCF, which maps the node functional similarity matrix to a latent feature space with
the supervision of the labeled nodes, significantly outperformed OCSVM and achieved higher AUC scores
than PageRank; among the four baselines, SSC, which
encodes the node characteristics into the latent feature
space with the guidance of the labeled nodes, achieved
the best performance.
The proposed FIND algorithm successfully integrates discriminative information in the node characteristics and the network structure, thus achieved sig4.3 Performance Study on FIND To evaluate the nificantly better performance than all the baselines. On
proposed method FIND in Alg.1, we experimentally the mean AUC scores, FIND outperformed SSC, SCF,
test its performance on detecting Apprentice Nodes, OCSVM and PageRank by 14.4%, 24.1%, 42.6% and
Journeyer Nodes and Master Nodes.
35.0%, respectively.
In each learning task, we randomly pick 3 nodes in
each class as training instances and test on all the other 4.3.2 Experiments on the Robot.Net Dataset
nodes. Table 2 and Table 3 present the results of The results of the experiments on the Robot.N et data
the three learning tasks on Advogato and Robot.Net, re- set are shown in Table 3.
spectively. Moreover, to better illustrate the difference
SCF performed the worst; PageRank reached better
of each investigated method, we demonstrate the ROC performance than SCF in all the binary learning tasks;
curves of detecting Apprentice Nodes on Advogato in SSC obtained slightly higher AUC scores than SCF
Fig.3.
and PageRank; and OCSVM performed the best in
the rest baselines. Compared to the baselines, the
4.3.1 Experiments on the Advogato Dataset proposed FIND algorithm achieved significantly better
On the Advogato data set, the results of detecting three performance. On mean AUC scores, FIND made 35.3%,
different types of functional nodes show very consistent 59.6%, 12.9% and 47.8% improvements over SSC, SCF,
trends as demonstrated in Table 2.
OCSVM and PageRank, respectively.
Table 5: Experiments on Robot.Net with Labeled A1
Table 3: Experiments on the Robot.Net Dataset
SSC
0.7476
0.6352
0.6176
0.5164
0.6149
0.5689
R1
R2
R3
R4
R5
R6
SSC
0.8741
0.7735
0.7322
0.5579
0.5306
0.8646
FIND
0.9267
0.9146
0.9389
0.9702
0.9593
0.9556
R1
R2
R3
R4
R5
R6
Master Nodes
SCF
OCSVM
PageRank
0.6324
0.8225
0.7044
0.6015
0.8372
0.6257
0.6027
0.8389
0.6233
0.6036
0.8405
0.5896
0.5960
0.8429
0.5876
0.5930
0.8449
0.6267
R1
R2
R3
R4
R5
R6
k
1
FINDS
0.9603
0.9386
0.9538
0.9571
0.9356
0.9662
4.4 Performance Study on FINDS As discussed
in Section 3.3, the proposed FINDS algorithm can be
used in two cases: 1) using a labeled history snapshot
as input if the origin of the target linked data set is
dynamic; and 2) using a labeled source which is the
same type of linked data as the target link data.
In this section, we focus on detecting Master
Nodes on the Robot.Net data set, and experimentally
evaluate the FINDS algorithms in the two cases.
4.4.1 On History Linked Data We set R1 in the
Robot.N et data set as the labeled history snapshot, and
use it to supervise the detection of Master Nodes on
R1 to R6. The results are presented in Table 4.
Among the five investigated approaches, SCF and
PageRank reached very close performance in all the binary learning tasks; SSC increased the mean AUC score
by 0.2115 and 0.1902 over SCF and PageRank, respectively; and OCSVM performed slightly better than SSC.
Compared to the baselines, the proposed FINDS algorithm achieved significantly better performance. On
mean AUC scores, FINDS made 16.6%, 57.4%, 13.6%
and 52.0% improvements over SSC, SCF, OCSVM and
PageRank, respectively.
FINDS
0.9505
0.9530
0.9505
0.9395
0.9725
0.9636
r
0.96
0.94
1
0.9
0.92
0.95
0.9
0.9
0.8
R1
R2
R3
R4
R5
R6
0.88
0.85
0.86
0.7
0.84
0.8
FIND
0.9559
0.9340
0.9435
0.9542
0.9062
0.9302
Table 4: Experiments on Robot.Net with Labeled R1
SSC
0.8498
0.8091
0.8049
0.7936
0.7330
0.9078
Master Nodes
SCF
OCSVM
PageRank
0.6324
0.8225
0.7044
0.6015
0.8372
0.6257
0.6027
0.8389
0.6233
0.6036
0.8405
0.5896
0.5960
0.8429
0.5876
0.5930
0.8449
0.6267
SSC
0.8927
0.8426
0.7032
0.6229
0.7489
0.8874
AUC
R1
R2
R3
R4
R5
R6
FIND
0.8887
0.8679
0.8529
0.8714
0.8743
0.8514
AUC
R1
R2
R3
R4
R5
R6
SSC
0.7673
0.6513
0.6535
0.6956
0.7368
0.6543
Apprentice Nodes
SCF
OCSVM
PageRank
0.5327
0.7996
0.6288
0.5197
0.7398
0.5480
0.5419
0.7466
0.6136
0.5416
0.7460
0.6610
0.5450
0.7506
0.5912
0.5489
0.7513
0.5923
Journeyer Nodes
SCF
OCSVM
PageRank
0.6140
0.8228
0.7125
0.5806
0.8460
0.5540
0.5772
0.8424
0.6592
0.5770
0.8436
0.6262
0.5734
0.8409
0.6275
0.5699
0.8440
0.5825
Master Nodes
SCF
OCSVM
PageRank
0.6296
0.8215
0.7075
0.5988
0.8394
0.6189
0.6002
0.8424
0.6333
0.6011
0.8430
0.5967
0.5930
0.8457
0.5834
0.5916
0.8459
0.6211
80
0.82
60
Pa
ram
ete
r
40
Va
lue
R4
20
R1
R2
R3
Data
R5
0.6
R6
0.8
0.78
0.5 0
10
1
10
Parameter Value
2
10
Figure 4: Experiments on Varying Number of Training
Instances k and Number of Latent Features r.
In the experiments, the proposed FINDS algorithm
performed consistently better than any baseline in all
the binary learning tasks. Overall, on the mean AUC
scores, the FINDS algorithm outperformed SSC, SCF,
OCSVM and PageRank by 22.0%, 57.9%, 14.0% and
52.5%, respectively.
4.5 Impact of Training Sets In the above experiments, on each data, we use 3 labeled instances in training. To evaluate the impact of the size of training instances k, we also conduct experiments on various settings of k and demonstrate the results in Fig.4. From
the graph, we can observe that when k = 2, since guidance of the labeled instances is too limited, the AUC
scores by the proposed FIND model significantly vary
from R1 to R6 . Nevertheless, after slightly increasing
k to 4, the model can achieve on average over 0.9 in
AUC scores. Such high performance keeps stable when
we keep increasing the number of labeled training instances. This experiment well confirm that the proposed FIND model is effective in detecting node functions when only a few labeled instances are provided.
4.6 Parameter Sensitivity In FIND and FINDS,
the number of latent features r controls the expressive
power of the latent feature matrix Xn×r , and is closely
related to the performance of the proposed FIND and
FINDS. In this section, we use Master Node Detection
on Robot.N et to demonstrate how r impacts the performance and to show the process of setting it for FIND.
Other parameters are set in the same way. As shown
in Fig.4, on R1 to R6, the AUC scores dramatically in4.4.2 On Source Linked Data We set A1 in the crease to over 0.9 when r varies from 1 to 3. When r is
Advogato data set as the labeled source linked data, larger than 10, the model performs slightly worse due to
and use it to supervise the detection of Master Nodes over-fitting. According to this observation, we set r = 3
on R1 to R6. The results are presented in Table 5.
in each binary classification task in the experiments.
5
Related Work
Although mining linked data has been a hot topic
recently, the proposed work in this paper significantly
differs from existing work in both the method and
the aimed task. Graph-based semi-supervised learning
[17, 19] and collective classification [3, 10] also tackle
the classification problem on linked data. In such
topics, links are treated as important constraints for
classification tasks, and connected nodes are more likely
to belong to the same class while disconnected nodes
should belong to different classes. In contrast, in
the proposed functional node detection task, nodes
of the same functionality may not be reachable from
each other in the networks. For instance, in Gmail,
spammers in different geographical regions may not
have links between them at all but they still share
the same functionality. Therefore, we must extract
discriminative information from links instead of treating
links as constraints in our task.
There are several existing studies concerning about
integrating network structures and node characteristics
into a unified latent feature space. The method in [14]
summarizes information on linked data into four types
of relationships including CoFollowing and etc. in a
supervised framework. In [15], the authors used pseudoclass label information to supervise the integration of
features for community detection. In these methods,
two nodes have more similar latent feature vectors only
if they are highly connected with each other, which is
different from the proposed algorithms.
6
Conclusions
In this paper, we proposed a novel algorithm for functional node detection on linked data. The basic idea is
to seek a joint latent space which captures the characteristics of both node attributes and network structures
while maximizing the discriminative features guided by
existing labeled nodes. We presented an iterative solution to the method and provided theoretical analysis
of the solution. Moreover, in the paper, we also developed a novel node similarity metric which measures
the functional closeness between nodes. Experiments
on two real world data sets validated that the proposed
method significantly outperforms the baselines by 13%
to 60% on the detection of three different types of functional nodes.
7
Acknowledgment
The work published in this paper is partially supported
by the National Science Foundation under Grants No.
1218393, No. 1016929.
References
[1] C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix t-factorizations for clustering. Proc. of
KDD’06, 19(2), 2006.
[2] L. Ge and A. Zhang. Pseudo cold start link prediction
with multiple sources in social networks. Proc. of
SDM’12, 2012.
[3] L. Getoor and B. Taskar. Introduction to Statistical
Relational Learning, volume L. MIT Press, 2007.
[4] W. Hwang, Y. Cho, A. Zhang, and M. Ramanathan.
Bridging centrality: Identifying bridging nodes in scalefree networks. Proc. of KDD’06, 2006.
[5] H. Kashima and A. Inokuchi. Kernels for graph
classification. Computer, 2002.
[6] R. I. Kondor and J. Lafferty. Diffusion kernels on
graphs and other discrete input spaces. Proc. of
ICML’02, 2002.
[7] C. J. Kuhlman, V. S. A. Kumar, M. V. Marathe, S. S.
Ravi, and D. J. Rosenkrantz. Finding critical nodes
for inhibiting diffusion of complex contagions in social
networks. Proc. of ECML’10, 2010.
[8] P. Massa, K. Souren, M. Salvetti, and D. Tomasoni.
Trustlet, open research on trust metrics. Scalable
Computing: Practice and Experience, 2008.
[9] M. McCord and M. Chuah. Spam detection on twitter
using traditional classifiers. Autonomic and Trusted
Computing, 2011.
[10] J. Neville and D. Jensen. Relational dependency
networks. Journal of Machine Learning Research, 8(8),
2007.
[11] L. Page, S. Brin, R. Motwani, and T. Winograd.
The pagerank citation ranking: Bringing order to the
web. World Wide Web Internet And Web Information
Systems, 1998.
[12] B. Schlkopf, J. C. Platt, J. S. Shawe-Taylor, A. J.
Smola, and R. C. Williamson. Estimating the support
of a high-dimensional distribution. Neural Computation, 2001.
[13] N. Shervashidze, P. Schweitzer, V. Leeuwen, E. Jan,
K. Mehlhorn, and K. Borgwardt. Weisfeiler-lehman
graph kernels. Journal of Machine Learning Research,
2011.
[14] J. Tang and H. Liu. Feature selection with linked data
in social media. Proc. of ASUCISE’11, 2011.
[15] J. Tang and H. Liu. Unsupervised feature selection for
linked social media data. Proc. of KDD’12, 2012.
[16] H. Wang, H. Huang, and C. Ding. Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. Proc. of CIKM’11,
2011.
[17] W. Wang and Z. Zhou. A new analysis of co-training.
ICML, 2010.
[18] J. Yang, K. Yu, and T. Huang. Supervised translationinvariant sparse coding. Proc. of CVPR’10, 2010.
[19] X. Zhu. Semi-supervised learning with graphs. Science,
27, 2005.
Download