LRBM: A Restricted Boltzmann Machine based Data Kang Li

advertisement
LRBM: A Restricted Boltzmann Machine based
Approach for Representation Learning on Linked
Data
Kang Li∗ , Jing Gao† , Suxin Guo‡ , Nan Du§ , Xiaoyi Li¶ and Aidong Zhangk
Department of Computer Science and Engineering
The State University of New York at Buffalo
Emails: {kli22∗ , jing† , suxinguo‡ , nandu§ , xiaoyili¶ and azhangk }@buffalo.edu
Abstract—Linked data consist of both node attributes, e.g.,
preferences, posts and degrees, and links which describe the
connections between nodes. They have been widely used to
represent various network systems, such as social networks,
biological networks and etc. Knowledge discovery on linked data
is of great importance to many real applications. One of the
major challenges of learning linked data is how to effectively and
efficiently extract useful information from both node attributes
and links in linked data. Current studies on this topic either
use selected topological statistics to represent network structures,
or linearly map node attributes and network structures to a
shared latent feature space. However, while approaches based
on statistics may miss critical patterns in network structure,
approaches based on linear mappings may not be sufficient to
capture the non-linear characteristics of nodes and links.
To handle the challenge, we propose, to our knowledge,
the first deep learning method to learn from linked data. A
restricted Boltzmann machine model named LRBM is developed
for representation learning on linked data. In LRBM, we aim to
extract the latent feature representation of each node from both
node attributes and network structures, non-linearly map each
pair of nodes to the links, and use hidden units to control the
mapping. The details of how to adapt LRBM for link prediction
and node classification on linked data have also been presented.
In the experiments, we test the performance of LRBM as well as
other baselines on link prediction and node classification. Overall,
the extensive experimental evaluations confirm the effectiveness
of the proposed LRBM model in mining linked data.
I.
I NTRODUCTION
In many current studies and applications, linked data are
used to describe systems consisting of interacted objects.
Given that each node represents an object, in linked data,
node attributes contain features, preferences, and actions, and
links describe interactions between nodes. For instance, in a
social network like Facebook, each user is viewed as a node.
Node attributes may include gender, habits and number of
friends (i.e. degree), and links can represent if two users are
friends, if one comments on another’s posts and other types of
interactions. The great expressive power of linked data enables
this data format to capture both characteristics and interactions
of objects in various systems, such as linked gene mutation
databases and ecommerce websites.
Knowledge discovery on linked data is the process of
leveraging both node attributes and link structures for the
learning on the corresponding systems, and it is of great
importance to understanding the characteristics and interaction
patterns in these systems. For instance, [1] discusses how to
predict friendships on social networks. [2] learns from linked
drug networks to predict potentially useful drugs for diseases.
[3] utilizes link structure in linked data to recommend social
groups on social media websites.
One of the major challenges in mining linked data is how
to effectively and efficiently utilize information from both node
attributes and link structures. For this challenge, many existing
models seek to represent link structures using selected statistics
on networks, then combine selected statistics with node attributes. For example, in [4], [5], the authors characterized link
structure using recursive egonet-based statistics, and further
used the statistics to learn object roles. In [6], researchers
detected spammers using degrees of friends. The primary
limitation of these approaches is that the topological statistics
in each task is usually subjectively selected. Therefore, such
methods may miss critical patterns in linked data. Moreover,
when the aimed tasks are complex, it could be very difficult
to select or create relevant topological statistics.
To avoid the limitation of the above methods, other approaches aim to extract a shared representation for both node
attributes and link structures. For example, in [7], links are
viewed as interactions between latent features of connected
nodes. In [8], latent features are extracted w.r.t. the criterion
that objects from different clusters are dissimilar while objects
in the same clusters are similar. These methods usually rely on
linear mappings to capture the relations among node attributes,
network structures and the aimed latent feature representations of nodes. As a result, such approaches suffer from the
simplicity of linear mappings and fail to capture non-linear
characteristics of nodes and links.
To address the above issues, in this paper, we propose
a novel model named LRBM, which stands for Restricted
Boltzmann Machines for Latent Feature Learning in Linked
Data. Different from the aforementioned methods using graph
statistics, the proposed model does not rely on any subjectively
selected topological statistics, and is capable of characterizing
both node attributes and link structures in a unified framework. At the heart of this model is a shared latent feature
representation of each node, which is used to formulate nonlinear relations among nodes, links and hidden units. To avoid
large amount of sampling, Contrastive Divergence (CD) [9],
[10] is applied to train the model and other techniques such as
fine-tune and parameter sharing are also used to simplify the
h
calculation. Moreover, the proposed LRBM model can also be
”stacked” as traditional RBM to explore ”deep” characteristics
of nodes and high-order interaction patterns between nodes.
Furthermore, we present the details of how to apply LRBM
to two popular topics on linked data, which are link prediction
and node classification. These two topics are closely related to
tasks such as friend recommendations in social networks and
informative gene selection in genetic networks.
• We propose a novel binary and conditional RBM model
to effectively extract latent features of objects in linked
data with unweighted interactions.
• We extend the binary model to general cases to extract
representations of linked data with weighted interactions.
• We present the details of how to apply tensor factorization, parameter sharing and fine-tune to reduce the
number of parameters of the proposed model, and develop
the solution to the proposed LRBM model.
• We also provide the details of implementing the proposed
model for link prediction and node classification.
• We experimentally evaluate the proposed method on link
prediction and node classification. The overall performance well confirms the effectiveness of LRBM.
BACKGROUND
Before diving into the details of LRBM, we first introduce
two relevant methods which are graph factorization models
and conditional Restricted Boltzmann Machines (cRBM).
A. Graph Factorization Models
Given a linked data set, suppose the link structure can be
represented by an n × n size relational link matrix L, where n
is the number of the objects and Lij stands for the connection
between node i and node j. In binary graph factorization
models which focus on modeling existence of links between
two arbitrary nodes, entry Lij = 1 indicates the presence of
a link and Lij = 0 indicates the absence of a link between
node i and node j. Such graph factorization models usually
favor predicting the existence of unobserved links (entries of
Y ) using observed links as:
p(Lij = 1|µi , µj ) = F(ς + Ψ(µi , µj )).
W
W
t-1
v
(A)
Overall, the contributions of this work include:
II.
h
(1)
In Eq.1, µi and µj are the latent feature representations of
node i and node j, respectively. ς is a bias. p(Lij = 1|µi , µj )
stands for the probability that there is a link between node i
and node j given their latent feature representations.
t
v
(B)
Fig. 1: Illustrations of an RBM in (A) and an cRBM in (B). In
(B), hidden layer h relates to both current visible layer vt and
previous visible layer vt−1 , and h, vt and vt−1 are connected
with a three-way tensor W .
In Eq.1, Ψ(µi , µj ) is a distance function enforcing that
two nodes are likely to interact with each other if their latent
feature representations are similar, and vice versa.
In the latent distance model [11], Ψ(µi , µj ) is defined as
a negative distance as Ψ(µi , µj ) = −d(µi , µj ) where d is a
distance function. In other models such as the latent eigen
model [12] and the latent feature relational model [13], [14],
Ψ(µi , µj ) is defined as Ψ(µi , µj ) = µ>
i W µj , where W is a
parameter matrix subsuming the interacting patterns between
node i and node j. W is usually constrained to be symmetric
for modeling symmetric (i.e. undirected) linked data and be
asymmetric for modeling directed linked data.
B. Conditional Restricted Boltzmann Machines
As demonstrated in Fig.1.(A), Restricted Boltzmann Machines (RBM) are undirected graphical models that define
a probability distribution over a set of observed or visible
variables and a set of unobserved or hidden variables. In the
model, the term Restricted indicates that there is no visible
to visible interaction or hidden to hidden interaction.
In RBM, the joint distribution between visible variables v and hidden variables h is defined as p(v, h) =
1
Z exp(−E(v, h)), where Z is a constant that normalizes the
joint probability distribution p(v, h). E(v, h) is the energy
function and it is usually defined as:
X
X
X
E(v, h) = −
Wij vi hj −
ai vi −
bj hj .
(2)
ij
i
j
In Eq.2, Wij is the mapping function between the visible
unit vi and the hidden unit hj . ai and bj are biases for vi and
hj , respectively.
The free energy of the model is:
X
F (v) = − log
exp(−E(v, h))
h
F is the mapping function that maps the latent factors
µi and µj to the conditional probability p(Lij = 1|µi , µj ).
In related matrix factorization models, F is usually defined
as F(x) = x for linear mappings. In other related models,
the sigmoid function F(x) = 1+e1−x and exponential family
distributions F(x|θ) = Φ(x) exp(φ(θ) · T (x) − Ξ(θ)) are
generally used, where Φ(x), φ(θ), T (x) and Ξ(θ) are known
functions and θ is the hyper-parameter in F(x|θ).
=−
X
i
vi> ai −
X
log(1 + exp(bj + v > W:,j )).
(3)
j
To learn a parameter θ of the model, by using gradient descent
of the log-likelihood, we have:
−
∂F (v 0 )
∂ log p(v)
∂F (v) X
=
−
p(v 0 )
.
∂θ
∂θ
∂θ
0
v
(4)
In Eq.4, the first term after the symbol ”=” is usually
referred as the positive phase and can be computed exactly.
The last term in Eq.4 is referred as the negative phase. This
term stands for an expectation over the model distributions for
all possible configurations of input visible variables v 0 , and it
can hardly be directly computed. To solve the problem, Hinton
[15] showed that the negative phase can be approximated
using samples obtained by starting a Gibbs chain at a training
vector, and named this technique Contrastive Divergence (CD)
[9]. The CD process which terminates after K-steps of Gibbs
sampling is called CD-K.
The technique CD-K enables RBM models to be trained
efficiently and be applied to handle large data sets. Current
studies show that RBM and the corresponding deep belief
networks can achieve good performance when the data size is
large. Such models are generally used in areas like computer
vision and signal processing.
One of the advances of RBM in the past few years is called
conditional RBM (cRBM) [16], [17]. In such models, the
dependence between unknown parameters and visible inputs
is modeled with all-way correlations.
Current studies apply cRBM on mining time-series data
[18], [19]. As demonstrated in Fig.1, at time t, suppose the
current visible input is vt and historical data are v<t (E.g.,
in Fig.1.(B), v<t is vt−1 ). cRBM defines a joint probability
distribution over vt and current hidden variable ht , which is
conditional on v<t and the model parameter θ:
1
p(vt , hv |v<t , θ) = exp(−E(vt , hv |v<t , θ)),
Z
X
X
X
(5)
E=−
Wijk vi,t vj,<t hk,t −
ai vi,t −
bj hj,t .
ijk
i
j
In Eq.5, Wijk is a three-way tensor that connects historical
visible variables v<t , current visible variables vt and current
hidden variables ht . a and b are biases on vt and ht .
Such a cRBM model describes the pattern how historical
data v<t are related to current data vt , and can be trained
efficiently using CD-K technique.
III.
R ESTRICTED B OLTZMANN M ACHINES FOR L INKED
DATA
In this section, we first show a latent factor model as
the starting point, then extend it to the LRBM model. For
simplicity, we develop LRBM from unweighted linked data
and then expand the model to more general cases.
A. A Latent Factor Model
Suppose in a linked data set, Ai is the node attributes of
the i-th node and Ll is the weight of the l-th link. The index
l of a link is one-to-one mapped to a pair of node indices il
and jl , and the weight Ll may contain multiple features, and
each feature stands for the weight of a type of interaction.
Furthermore, suppose a link points from node i to node j,
we denote node i as the sender and node j as the receiver
of this link. In undirected/bidirectional networks where links
are undirected, each node of a link is both the sender and
the receiver. For simplicity, we first consider the case when
weights of links are either 1 or 0, where 1 represents existence
of interactions and 0 represents no interaction.
Inspired by the existing models of graph factorizations,
we seek to use latent feature representations to encode the
characteristic of each node, and to learn how these latent representations relate to their attributes A and links L. Specifically,
we assume that the latent representation of each node contains
two parts, which are sender ”behaviors” and receiver ”behaviors”. By this assumption, each node contributes to its outgoing
links according to its sender ”behavior”, and contributes to
its incoming links according to its receiver ”behavior”. We
explicitly denote the two types of latent representation as
Ri and Si for receiver and sender ”behaviors” of node i,
respectively.
As demonstrated in Fig.2.(A), node attributes Ai of node i
is related to its latent receiver behavior Ri and sender behavior
Si . Besides, Si and Rj decide the interaction from node i to
node j. To model the interaction of node attributes A, links
L, and the latent representations S and R, we propose a latent
factor model with the energy function:
X
X
E(S, R; A, L) = −
Wis Si Ai −
Wir Ri Ai
−
i
X
XX
i
j
i
Wijl Ll Si Rj .
(6)
l
In Eq.6, the first two terms define how the characteristics
S and R of nodes impact the node attributes A. In these
terms, W s and W r are tensors that connect A to S and
R, respectively. The third term of Eq.6 defines how sender
behavior Si of node i and receiver behavior Rj of node j
relate to the link Ll . W is a three-way tensor that connects
the weight Ll and the latent behavior Si and Rj .
Notice that, in this model, Ll may not necessarily connect
node i and node j. In most recent social network research, it
is observed that the connectivity of two nodes is also affected
by the nodes close to these two nodes(i.e. neighbor nodes).
Therefore, in this general energy function, we consider the
relations between each link and every directed pair of nodes.
Eq.6 defines a mapping relation among A, R, S and
L, and the negative energy −E(S, R; A, L) measures the
compatibility of this mapping. In details, the model contains
two parts, which are among A, R and S, and among R, S and
L. The first one covers two compressions and can be optimized
efficiently using the off-the-shelf methods. The second one is a
linear mapping between the output line L and the two trained
latent matrices R and S. In many cases, such simple linear
mapping can not effectively capture relations among L, R and
S. When number of known links is small, it also tends to
overfit the linked data.
To address the above issue, we keep the first part the same,
and replace the second part (the mapping from R and S to L)
with a more effective model based on conditional restricted
Boltzmann machines.
B. The Binary Model
Given the latent receiver and sender behaviors R and S,
to model the probability of a link L which is related to S
L
L
h
biases and gated biases as:
X
X
E(L, h; S, R) = −
Wijkl Si Rj hk Ll −
ckl hk Ll
ijkl
W
R
−
W
S
R
X
kl
ak hk −
k
S
X
(8)
bl Ll .
l
In Eq.8, ckl is the gated bias that shifts the activity level of
hk and Ll conditionally. ak and bl are standard additive biases,
and they shift the activity levels of hk and Ll , respectively.
Using the energy function in Eq.7, we define the joint
probability distribution p(L, h; S, R) over links and hidden
variables as:
A
A
(A)
(B)
Fig. 2: Illustrations of a latent factor model in (A) and the
binary LRBM in (B). In (A), node attributes A is independently mapped to latent sender behavior S and latent receiver
behavior R, and S and R are further mapped to corresponding
links L with a three-way tensor W . In (B), a hidden variable
h is added to control the mapping.
and R, we consider the energy function that includes all the
components S, R and L. To explicitly characterize the possible
ways in which these components are related, we add a hidden
variable h.
The energy function that captures the interactions of the
components S, R, L and h is:
X
E(L, h; S, R) = −
Wijkl Si Rj hk Ll .
(7)
ijkl
As demonstrated in Fig.2.(B), in Eq.7, Wijkl is an element
of a four-way tensor W , and it connects the i-th sender Si ,
the j-th receiver Rj , the k-th hidden variable hk and the l-th
link Ll . Specifically, W learns from the training triplets of S,
R and L, and it measures how a directed pair of nodes i and
j impact a link l.
The negative energy function −E(L, h; S, R) measures
the compatibility of the components S, R, L and h. In this
function, the hidden variable matrix h controls the way in
which W connects the other components. Therefore, if h is
fixed, the energy function in Eq.7 is simply a mapping among
the components. Therefore if we can perfectly fix the hidden
variable matrix h, we then obtain an ideal mapping among S,
R and L. To achieve this goal, in practice, we utilize the energy
function in Eq.7 to estimate the probability distribution of the
hidden variable h and the link matrix L, and then marginalize
over all the possible mappings.
In order to model affine of h and L, we further enhance the
model with biases. While the most classic restricted Boltmann
machines only include additive biases which ”shift” the activity
level of each unit, the recent conditional models also include
gated biases which ”shift” the activity level conditionally. In
this paper, the enhanced energy function includes both standard
p(L, h; S, R) =
1
exp(−E(L, h; S, R)),
Z(S, R)
(9)
where
Z(S, R) =
X
exp(−E(L, h; S, R)).
L,h
In Eq.9, Z(S, R) is a normalizing term that depends on the
node attributes S and R.
By the joint probability distribution in Eq.9, given the
node attributes S and R, the probability of each link can be
estimated as:
X
p(L|S, R) =
p(L, h|S, R).
(10)
h
In the above model, Z(S, R) in Eq.9 and p(L|S, R) in
Eq.10 can not be directly calculated in most of the cases,
because both of them need to sum over exponentially large
size of all possible h, and Z(S, R) need to further sum over
all possible links L.
In practice, however, we do not have to calculate the
exact values of Z(S, R) and p(L|S, R) in either training or
testing process. Specifically, in the training process, given node
attributes S and R and the set of existing links L, we can infer
the probability of activating each unit of h as:
p(hk = 1|S, R, L) =
1 + exp(−
1
P
.
W
S
i Rj Ll −
ijkl
ijl
l ckl Ll − ak )
(11)
P
Notice that in Eq.11, each unit of h is inferred independently based on node attributes and known links. Therefore,
there is no hidden-hidden connections in the model.
Similarly, in testing, given node attributes S and R, and
the approximated hidden variable matrix h, we can infer the
probability of activating a link as:
p(Ll = 1|S, R, h) =
1 + exp(−
1
P
.
W
S
ijkl i Rj hk −
ijk
k ckl hk − bl )
(12)
P
By Eq.11 and Eq.12, the model estimates the linear dependencies of the components S, R, L and h, and utilizes the
linear dependencies in the learning of unknown links.
L
h
L
h
may not be sufficient to capture the complex characteristics of
nodes and links. Therefore, it is also highly desirable to enable
the model to cover more general distributions.
b
To enable modeling continuous weights on links, we keep
the hidden variable binary and re-define the energy function
with additive Gaussian noise as:
L
h
W
W
Factors
R
S
W
W
R
S
R
(A)
S
(B)
Fig. 3: Illustrations of the factored LRBM in (A) and the nonbinary LRBM in (B). In (A), the four-way tensor W is factored
into W S , W R , W h and W L which connect S, R, h and L
through factors. In (B), a Gaussian bias is added to links L,
and it enables model weights on links.
C. The Factored Model
In general, the model in Eq.8 can be viewed as a regressive model in which a transformation is built between
node attributes S and R and links L. In the transformation,
the hidden variables h is exponentially large and it encodes
exponentially many ways of mappings. Fortunately, we don’t
have to calculate the exponentially many settings of h, as
explained in Eq.11 and Eq.12. However, we have to obtain
the estimation of Wijkl , whose number of parameters is
quartic when the numbers of node attributes, links and hidden
variables are comparable.
Recent studies on tensors and factored conditional restricted Boltzmann machines [17], [19] suggest that multi-way,
multiplicative interactions can be captured by using much less
parameters by factorizing the weight tensors. Inspired by these
studies, we factorize the quartic tensor W into a product of
pairwise interactions as shown in Fig.3.
When we apply the factorization to Eq.8, the first term is
then factorized as:
X
XX
S
R
h
L
Wijkl Si Rj hk Ll →
Wif
Wjf
Wkf
Wlf
Si Rj hk Ll .
ijkl
1 XX S R h L
Wif Wjf Wkf Wlf Si Rj hk Ll
γ
f ijkl
X
1X
1 X
(Ll − bl )2 −
ckl hk Ll −
ak hk .
+ 2
2γ
γ
l
kl
k
(14)
E(L, h; S, R) = −
f
ijkl
In Eq.14, γ is the variance. The difference between the
model in Eq.14 and the model in Eq.8 is that in Eq.14, we
change the standard and additive biases to additive Gaussian
noise. Therefore, in Eq.14, instead of shifting the activity level
of each unit of the link matrix, the model centers the activity
levels with means and variances and model the continuous
characteristics of weights in weighted linked data.
By the model, in the training process, the probability
distribution p(h|S, R, L) keeps the same because the additive
gaussian noise cancels out in exponentiating and conditioning [16]. In the testing process, the probability distribution
p(L|S, R, h) changes into a Gaussian distribution as:
X
X
L
S
R
h
p(Ll = 1|S, R, h) = N (γ
Wlf
Wif
Wjf
Wkf
Si Rj hk
f
+γ
ijk
X
k
ckl hk + bl , γ 2 ).
(15)
Notice that in Eq.15, the conditional probability of each
Ll is estimated independently across the others. This fact is
consistent to the characteristic of restrict Boltzmann machines.
The Gaussian distribution in Eq.15 enables regression of
the weights Ll . In cases when the weights are not Gaussian
distributed, this model can also be simply adapted to other
exponential distributions with modification on the biases.
(13)
In Eq.13, f is the index of a set of pairwise weight matrices
W , W R , W h and W L . These four pairwise matrices connect
the factors to senders, receivers, hidden variables and links,
respectively. Therefore, the factors in the factorization act as
agents of the four components S, R, L and h and handle their
interactions with the others. By this factorization, the size of
parameters is reduced from O(n4 ) to O(n2 ) when the sizes
of the four components are comparable.
S
D. The Non-Binary Model
By Eq.8 and Eq.13, we define a gated, factored and conditional restricted Boltzmann machines on unweighted linked
data. However, links between objects are not always unweighted. Learning and predicting link weights are extremely
useful in many link related tasks, such as predicting user-user
interaction types in social networks and detecting influence
strengths of users in information diffusion networks. Furthermore, due to the diversity of linked data, binary distribution
E. The Solution
In Eq.15, the marginal distribution is a mixture of exponentially many Gaussian distributions and it is intractable to
be evaluated. Fortunately, it does not need to be explicitly
estimated in either training or testing process. Following the
contrastive divergence as used in [18], [19], we can obtain a
good approximation to the gradient of unknown parameters.
The learning rules for W S , W R , W h and W L takes the form:
∆Wuf ∝ hαu βv1 f ζv2 f θv3 f idata − hαu βv1 f ζv2 f θv3 f irecon
(16)
In Eq.16, u, v 1 , v 2 , v 3 ∈ {i, j, k, l} is the index of units. αu
stands for the units connected to Wuf . The other terms βv1 f ,
ζv2 f and θv3 f correspond to the other three units involved in
the four-way tensors.
hidata is the expectation w.r.t. the data distribution and
hirecon is the expectation w.r.t. the distribution of the ”reconstructed” data. Specifically, to estimate hirecon , we start with
a Markov chain at the data distribution, and perform K-step
alternating Gibbs sampling (i.e. CD-K) in which we iteratively
update the hidden variables h and the link weights L according
to Eq.11 and Eq.15, respectively.
S’, R’
W’, h’, L
S, R
As an example, the updating rule for Wif is:
X
X
X
S
R
h
L
∆Wif
∝hSi
Wjf
Rj
Wkf
hk
Wlf
Ll idata
j
−hSi
X
k
R
Wjf
Rj
j
X
l
h
Wkf
hk
k
X
L
Wlf
Ll irecon .
S, R
W, h, L
A
l
Similar to [20], the biases ak , bl and ckl can be updated
using simplified versions of Eq.16. For instance, the updating
rule for ak is:
∆ak ∝ hhk idata − hhk irecon .
W, h, L
(17)
(18)
By Eq.17 and Eq.18, given R, S, L and the updated h,
we can optimize the parameters involved in the model. As
discussed in Eq.6, R and S are independently mapped from the
node attributes A as Si = Ai Wis and Ri = Ai Wir . To update
the parameters W r and W s in the mapping, we simply backpropagate the gradients obtained from the CD-K. According
to the chain rule, the updating function for W s is:
A
(A)
(B)
Fig. 4: A unified view of LRBM in (A) and An illustration of
deep LRBM in (B). In (A), LRBM can be viewed as a mapping
from visible node attribute Ai to hidden node features Si ∪ Ri
for node i. W , h and L are the detailed mapping relations. In
(B), {Si ∪ Ri }N
i=1 is viewed as visible variables to the upper
LRBM enabling deep learning.
observations, we diffuse Si and Ri to the directly connected
nodes of node i in each updating iteration as:
η X
Si → (1 − η)Si +
Sj .
(21)
Ni
j∈Ni
F. Parameter Sharing
In Eq.21, Ni is the set of neighbors of node i and Ni is
the size of the set. η ∈ [0, 1] is a parameter controlling to
what degree each node relies on the neighbors. The similar
diffusion is also performed on R. In each alternating update,
the diffusion starts from the first node in the shuffled node lists
and ends when all the nodes have been diffused. This diffusion
smoothes latent node representations Si and Ri of node i with
its neighbors, enhances the impact of local neighbors, and fits
the observations that the connectivity of two nodes is affected
by neighbor nodes.
In Section III-C, we factorize the four-way tensors to
reduce the size of parameters from O(n4 ) to O(n2 ) when
the number of units in each way is comparable to the others.
To further improve the efficiency of the factored model and
take full advantage of the linked structure, in this part, we
discuss the effect of tying some sets of parameters together
and diffusing the learned parameters across the linked data.
The four-way tensor Wijkl connects each pair of nodes i
and j to each observed and possible link Ll . Similar to the
diffusion of S and R, we can use fine-tune to constraint that
each link Ll is only impacted by the directly connected nodes
and the nearby nodes instead of all the nodes. In details, we
set Wijkl = 0 when the shortest path from node i and node j
to link l is larger than a threshold.
In Eq.18, Wis and Wir work as mappers that map the node
attributes Ai to the latent sender behavior Si and receiver
behavior Ri , respectively. To enforce that the mappings are
consistent across all the nodes, we can apply the tying functions Wis = Wjs and Wir = Wjr for two arbitrary nodes i and
j. By this tying operation, node attributes are mapped into
S and R following the same way across arbitrary nodes in a
linked data set. The updating rules for W s and W r are then
modified accordingly. For instance, on W s , we can apply:
1 X
>
hA>
(20)
∆W s ∝
i Ci idata − hAi Ci irecon .
N i
Such a constraint will give rise to the significance of local
impacts. Since in real networks, the degree (in-degree and outdegree) of each node is usually unrelated to the number of
nodes in the network and can be viewed as a constant, this finetuning operation can reduce the number of the combinations
of Si , Rj and Ll from O(n4 ) to O(n2 ).
∆Wis ∝ hA>
Ci idata − hA>
i Ci irecon ,
Xi
X
X
X
S
R
h
L
Ci =
Wif
Wjf Rj
Wkf
hk
Wlf
Ll .
f
j
k
(19)
l
For W r , the updating rule is very similar to Eq.19, thus
we omit the details here.
In Eq.20, N is the total number of nodes in the linked data
set and Ci is the same as the term in Eq.19.
The meanings of S and R are latent characteristics of
nodes. In current studies, we usually observe that the characteristic of each node is significantly affected by its neighboring
nodes in the linked data. To adapt our model towards such
G. Time Complexity and Extensibility
Suppose the number of hidden units in h, the number of
features in node attributes A, the number of factors in tensor
factorization and etc. are constants, and the number of link is
O(n).
In CD-K, to sample the hidden variables h from the others
according to Eq.11, the complexity is O(n2 ). To sample the
links E from the others according to Eq.15, the complexity is
O(n2 ), given that sizes of hidden variables h and factors are
tuneable and can be viewed as constants. Therefore, the CDK takes O(n2 ) time. After CD-K, the updates of W S , W R ,
W h and W L take at most O(n2 ). Besides, the complexity of
updating W S and W R is O(n2 ). Overall, the time complexity
is O(n2 ) when the number of links is linearly proportional to
the number of nodes.
When the connects between objects are much denser than
the above case, the number of links is usually assumed to be
3
smaller than O(n 2 ), which means each two nodes are either
directly connected or have one intermediate node on average.
In this case, the time complexity is O(n3 ).
In some applications, we also need to consider link prediction problem when insufficient knowledge of node attributes or
links is provided, such as friend recommendation for new users
of a social network. In such cases, knowledge diffusion can
be applied to the ”unclear” nodes according to the similar rule
used in Eq.21. In details, the attributes of the unclear nodes
will be enhanced using diffused attributes from neighbors as:
η X
Ai → (1 − η)Ai +
Aj .
(22)
Ni
j∈Ni
Similar to other RBM related models, since each unit in
LRBM is sampled and updated independent of the other units,
LRBM can be developed into parallel in the same way as [21].
Due to the space constraint of the paper, we skip the details
of parallel implementations of LRBM.
The enhanced node attributes Ai is then fed into the trained
LRBM model for link prediction.
A significantly strong point of RBM is that it can be
”stacked” to support deep learning on visible inputs. We argue
that the proposed LRBM can also be extended to support
deep learning on linked data as illustrated in Fig.4. In the
framework, we can view LRBM as a traditional RBM that
maps visible node attributes A to hidden node representations
{Si ∪Ri }ni=1 . The other involved parameters W and h, and the
input link structure L, are treated conceptually as the mapping
functions that maps Ai and Si ∪ Ri . Therefore, the extracted
latent node representation Si ∪ Ri can be used as visible units
to train a LRBM that stacks on top, and higher LRBMs can
be generated in the similar way.
Different from the link prediction problem which focuses
on interactions between nodes, the Node Classification Problem focuses on how each node relates to a set of pre-defined
classes. For instance, [23] discusses how to classify users in a
social network w.r.t. whether a user is going to buy a product,
and [24] explores how to classify users in social networks
w.r.t. their social media engagement levels and politic interest
degrees. In general, given a set of node-class pairs, models
on node classification problem aim to divide nodes into predefined classes.
Moreover, the proposed LRBM model can also be applied
to pure networks which have no node attributes. In details,
pseudo node attributes can be generated by many off-theshelf graphical statistics. Such pseudo node attributes work
as the starting point in the training, and the trained model
can be viewed as a trade-off between the subjectively selected
graphical statistics and the objective link structure.
IV.
T WO A PPLICATIONS OF LRBM
In this section, we discuss how to apply the proposed
LRBM model to discover knowledge on linked data. Specifically, we present the details of two applications, which are
link prediction and node classification.
B. Node Classification
In the proposed LRBM model, given the node attributes
A and the links L, we can extract the sender behavior S
and receiver behavior R of each node. R and S, as the
latent node representations, encode information from both node
attributes A and link structure L, and characterize how each
node interacts with the others.
To apply the LRBM model, for node i, we intend to learn
a mapping from the trained Ri and Si to the class label Yi .
Similar to Eq.8, we add a hidden variable hY and biases to
control the patterns of mapping. The energy function for the
node classification problem is:
X
X
Y
E(Y, hY ; S, R) = −
Wijk
Si Ri hYj Yk −
aj hYj
j
ijk
−
X
k
bk Yk −
X
cjk hYj
(23)
Yk .
jk
A. Link Prediction
Given a snapshot of a linked data set, which contains
attributes of nodes and links of nodes, can we infer noisy links
at the current state and new interactions in the near future?
This problem is defined in [22] and generally recognized as
the Link Prediction Problem.
Models on link
in many areas. For
recommendation on
filter noisy links on
prediction problem are very meaningful
instance, they can be applied for friend
social networks, and can also be used to
protein-protein interaction networks.
In general, the proposed LRBM model focuses on characterizing the interactions among node attributes A, latent factors
h, S and R, and links L. Therefore, to predict the link between
two nodes i and j, if we have known links and attributes of
the two nodes, we can use the LRBM model to learn how
these two nodes interact with the other nodes, and to learn the
probability distributions of their link according to Eq.15.
In Eq.23, the first term define how node class Y relates
to the latent node representations S and R, and to the hidden
variables h. The other terms are biases and gated-biases of Y
and h. This node classification model can be trained in the
similar factoring style as in Eq.13, and it can also be adapted
to continues class labels in a way similar to Eq.14. Besides,
when classifying new nodes, the same technique as in Eq.22
can be applied.
V.
E XPERIMENTS
In this part, we experimentally evaluate the proposed
LRBM model on link prediction and node classification. The
MATLAB code of our program is publicly available1 .
Datasets
1 https://www.dropbox.com/sh/p9x7cs5t2wyjd4l/AABwJmJafjGq7XnqWuFHZ2Zda
TABLE I: Performance on Link Prediction with RMSE (×102 )
and AUC.
Exposure
14.53 (0.966)
14.74 (0.994)
27.11 (0.715)
15.69 (0.940)
12.20 (0.973)
1
0.9
0.9
0.8
0.8
0.7
LRBM
CN
AA
DLP
Fact
0.6
0.5
0.4
We use the following three datasets in the experiments.
The first linked data set SocialE [25] is a social network
of residents in an undergraduate dormitory. This data set was
collected from October 2008 to May 2009. It covers information of residents, such as diet, exercise, obesity, eating habits,
political opinions and etc., and it also contains interactions
between residents, which include phone calls, music sharing,
friendships and etc. The data set is available upon request2 .
The second data set used in the experiments is the Robot,
which is a social network of users of the website Robot.Net3 .
This data set was crawled daily from the website since 2008.
In the data set, each user is labeled by the others as Observer,
Apprentice, Journeyer or Master w.r.t. the performance of the
user in their communities. The data set is publicly available4 .
The third data set investigated in the experiments is the
Exposure, which is a gene-gene interaction network. This data
set is obtained from a study on cigarette smoke inhalation [26].
In the study, rats were exposed to cigarette smoke during eight
months. Gene expression data were collected in the process,
and molecular changes were found to identify the biological
and pathological consequences of cigarette smoke inhalation.
In our experiments, we use the genes whose p-values (via ttest) are smaller than 0.05.
A. Link Prediction
Metrics and Baselines
In the task of link prediction, we evaluate the performance
of each method on two aspects, which are link existence
prediction and link weight prediction. The former focuses
on predicting if a link exists or not, and the latter concerns
predicting the weight of each link. In the evaluation, we
use Area Under the Curve (AUC) scores on link existence
prediction, and use Root Mean Square Error (RMSE) on link
weight prediction. For AUC scores, the range is [0, 1], and
the larger the AUC score is, the better the performance is.
For RMSE, the smaller the RMSE is, the more accurate a
prediction is. Moreover, to evaluate the stability of each model,
we run each experiment multiple times and calculate standard
error (SE) on the multiple results of the same experiment. The
smaller SE is, the more stable the model is. In the experiments,
we compare the proposed LRBM model with four baselines,
which are: Common Neighbors (CN) [27], Adamic-Adar (AA)
[28], Deep Learning for Link Prediction in Social Service
(DLP) [29], and Link Prediction via Factorization (Fact) [7].
Performance Study
2 http://realitycommons.media.mit.edu/socialevolution.html
3 http://robots.net/
4 http://www.trustlet.org/datasets/robots
net/
Scores over Increasing Number of Hidden Units
Scores
Robot
4.85 (0.795)
5.04 (0.744)
7.31 (0.895)
4.89 (0.788)
3.57 (0.998)
AUC Score
CN
AA
DLP
F act
LRBM
SocialE
3.35 (0.530)
5.71 (0.718)
5.55 (0.803)
3.17 (0.515)
0.94 (0.876)
AUC Scores over Increasing Flip Rate
1
0
5
10
0.7
0.6
AUC
RMSE⋅10
0.5
15
20
25
30
Flip Rate
35
40
45
50
0.4
10
20
40
80
160
320
640
Number of Units in the Hidden Layer
(a) Experiments on Increasing Flip (b) Experiments on Increasing NumRates
ber of Hidden Units
Fig. 5: Link Prediction on Robot
To evaluate the model, we perform link prediction on
’noisy’ linked data and evaluate how the predictions fit the
’clean’ linked data. In details, we ’flip’ a set of links in network
structures and train on the distorted linked data. In the flipping
process, a selected link is removed if it exists in the original
network structure, or added with weight 1 if it does not exist.
The performance is then evaluated w.r.t. how well a predicted
link structure matches the original (unflipped) network. On
each data set, the number of flipped links is the same with
the number of existing links. We repeat the process 20 times
and show the average performance in Table I. In the table,
AUC scores are presented in parenthesises and RMSE scores
are presented outside of parenthesises.
On link existence prediction, compared to the four baselines, LRBM performs the best on SocialE and Robot, and
performs slightly worse than AA on Exposure. Among the
three linked data sets, SocialE and Robot are very sparse, with
average degrees 8.4 and 3.95, respectively. Exposure is much
denser than the other two sets with average degree 103.5. The
baselines CN , AA and F act all tend to perform better on
denser linked data and to perform significantly worse on the
two sparse sets. Different from these baselines, LRBM can
perform comparably well on Robot and Exposure, regardless
of whether the linked data is sparse or not. Overall, on average,
LRBM outperforms the four baselines by 15.9% to 26.9%.
On link weight prediction, among all the five investigated
approaches, LRBM performs significantly better than the rest
on all the data sets. On average, LRBM outperforms the four
baselines in the task of link weight prediction by 26.5% to
58.2%.
Flip Rate Test
Besides the above performance study, we also do experiments on each data set using varying number of flipped links.
We denote the ratio of number of flipped links to the number
of existing links in a data set as Flip Rate. In this section, we
vary the flip rate from 0.1 to 50 on Robot, and present the
results in Fig.5. In the figure, the SE is plotted as a bar on
each point. To better present SE, we scale the magnitude of
these SE scores by 10 times.
As indicated in Fig.5.(a), as flip rate increases from 0.1
to 50, the performance of each investigated model decreases.
Specifically, the performance of Fact decreases the fastest, and
reaches even lower than 0.5 when the flip rate is larger than
30. For DLP and AA, the AUC scores decrease in the similar
trend, and for CN the decreasing speed is the slowest. On
the proposed LRBM model, the performance decreases from
1 to 0.9. LRBM consistently outperforms the rest approaches
ROC Curves of Node Classification on SocialE
ROC Curves of Node Classification on Robot
1
0.9
0.8
0.8
True Positive Rate
True Positive Rate
1
0.9
0.7
0.6
0.5
0.4
LRBM:0.836
CF:0.651
LAP:0.782
SSC:0.587
ICA:0.618
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
False Positive Rate
0.8
0.9
0.7
0.6
0.5
0.4
LRBM:0.897
CF:0.595
LAP:0.67
SSC:0.745
ICA:0.821
0.3
0.2
0.1
1
(a) Experiments on SocialE
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
False Positive Rate
0.8
0.9
1
(b) Experiments on Robot
Fig. 6: ROC Curves of Node Classification. The model and
AUC score of each curve are listed in the legends.
over all the settings. Moreover, as shown in Fig.5.(a), LRBM
achieves the lowest SE in all these cases, which represents that
the performance of LRBM is more stable than the baselines.
Parameter Sensitivity Test
In the proposed model, the hidden layer h controls the mapping between latent node representations and links. Therefore,
the number of hidden units in h is closely related to the learn
ability of LRBM. To test the sensitivity of LRBM on h, we
vary the number of units of h from 10 to 640 and present the
results in Fig.5.(b). Over the whole test range, LRBM performs
very stable. On link existence prediction, the AUC scores keep
nearly the same. On link weight prediction, the RMSE scores
slightly decreases when the number of units is larger than 40
and smaller than 320. Theoretically, if the number of units is
too smaller, h can not encode sufficient many mappings, and if
the number of units is too large, h tends to overfit in training.
In our settings, based on the results in Fig.5.(b), we set the
number of units in h to 100.
Another parameter that is closely related to the performance of LRBM is the number of features in S and R. Since
S ∪ R is the aimed latent feature representation of objects,
the number of features in S and R is closely related to the
expressive power of S, R and the whole LRBM model. To
test the sensitivity on this parameter, we also vary it from 2
to 128 and obtain a set of results similar to Fig.5.(b). In our
settings, we set the number of features to 10.
B. Node Classification
Baselines and Metrics
In the task of node classification, we evaluate the performance on whether a model can precisely predict the class label
of each node. In the evaluation, we use Receiver Operating
Characteristic (ROC) to demonstrate and use AUC scores to
quantify the performance of each model. In the experiments,
we compare the proposed LRBM model with four baselines,
which are: Coordinates Factorization (CF) [24], Learning, Analyzing and Prediction Model (LAP) [24], Supervised Sparse
Coding (SSC) [30] and Iterative Class Absorption (ICA) [31].
Performance Study In the task of node classification, we
do experiments on the data sets Robot and SocialE. Since
Exposure does not contain labels on node classes, we do not
investigate it in this experiment. On Robot, we study whether
an object is a Master or not. On SocialE, we utilize the survey
on residents’ political interests, and seek to predict whether a
resident is Enthusiastic about politics or not. On both of these
two data sets, we use 6 labeled nodes (3 in each class) in the
training process, and use the rest nodes in testing. The results
are presented in Fig.6. In both data sets, LRBM performs
the best among the five investigated approaches. CF does not
perform well on either data set since it requires sufficient
information. SSC and ICA have worse performance on SocialE
while achieve relatively better performance on Robot. Such
inconsistent performance is partially caused by noisy links in
SocialE. Overall, the proposed LRBM model achieves 6.91%
to 51.01% better performance over the four baselines on the
task of node classification.
VI.
R ELATED W ORK
In Section I, we explained that LRBM significantly differs
from the existing studies of linked data. Besides them, there are
also several other approaches considering non-linear characteristics of nodes and links, such as exponential family models
[14] and probabilistic matrix factorization models [32], [33]. In
such models, node attributes and network structures are deterministically mapped to latent feature representations of nodes.
Different from these approaches, the proposed LRBM model
adopts hidden variable h to control the mapping relations, and
in the training process, LRBM optimizes over all the possible
mapping relations (as in Eq.15) to achieve the most likely
mapping relation w.r.t. observable data. Moreover, LRBM is
also extensible to support deep learning on linked data, which
enables the extraction of deep and high-order characteristics
of nodes and interaction patterns.
Although there are several studies using RBM on networks, the proposed LRBM model distinguishes itself from
the existing work in both the method and the aimed task. In
[34], RBM as well as cRBM have been applied to the task of
learning Drug-Target relations on multidimensional networks.
In [29], RBM has been used for more general link prediction
tasks in social networks. In [21], an advanced model named
ctRBM has been proposed to do link prediction on dynamic
networks. In all of these studies, the weights of links from a
node to the other nodes are viewed as the visible features of
the node. In details, if a network has n nodes, each node has a
feature vector of size n, and each feature is the weight of the
link from this node to the corresponding node. Therefore, the
number of visible units is the same as the number of nodes
in the network. When networks are sparse, in these models,
the latent features of each node (i.e. the hidden variables
of each input) are dominated by the 0s which stand for no
connection to corresponding nodes, causing the latent features
indistinguishable from node to node. Different from these
approaches, the proposed LRBM does not treat the connections
of node i as features. Instead, LRBM treat the connection Lij
as the interaction output of node i and node j. Therefore, the
results of LRBM are not dominated by 0s. Besides, LRBM
integrates not only networks but also node attributes, which
enables LRBM to learn on linked data.
Another advanced RBM that considers the interaction
between input units is the Semi-restricted Boltzmann machines (SRBM) [35]. Different from the proposed LRBM
which seeks to learn the interaction of nodes in linked data,
SRBM focuses on using the interaction between input units
to capture neighbor pixels of a image. Moreover, in SRBM
interactions between input units are treated separately from the
interaction between input units and hidden variables. Instead,
in the proposed LRBM, we use a four-way tensor to subsume
both types of interactions. Overall, compared to SRBM, our
proposed model LRBM is significantly different from it in both
the models and the aimed tasks.
VII.
[11]
[12]
[13]
ACKNOWLEDGEMENT
This work is partially supported by the National Science
Foundation under Grants No. 1218393 and No. 1016929.
[14]
[15]
VIII.
C ONCLUSIONS
Based on the graph factorization models and conditional
RBM models, we proposed a RBM model for linked data. At
the heart of the proposed LRBM are latent feature representations of objects in linked data. In LRBM, we focused on
mapping the latent representations of objects to corresponding
links, and controlling the mapping with hidden variables and
four-way tensors. We then applied tensor factorizations, parameter sharing and fine-tune to reduce the number of parameters
involved in LRBM. Besides, Gaussian bias has been applied
to links to model weights of links. The continuous LRBM
model scales well w.r.t. the number of links in the linked data
and is extensible to be stacked for deep learning. Finally, the
details of how to apply LRBM to link prediction and node
classification on linked data were presented. Experiments on
real datasets validated that LRBM outperforms the investigated
baselines by 15.9% to 58.2% on link prediction and by 6.91%
to 51.01% on node classification. Future work will focus
on extending LRBM for mining dynamic and time-evolving
linked data.
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
B. Taskar, M.-F. Wong, P. Abbeel, and D. Koller, “Link prediction
in relational data,” Proc. of Neural Information Processing Systems
(NIPS’03), 2003.
B. Chen, Y. Ding, and D. J. Wild, “Assessing drug target association
using semantic linked data,” PLOS Computational Biology, 2012.
J. Tang and H. Liu, “Unsupervised feature selection for linked social
media data,” Proc. of ACM Conference on Knowledge Discovery and
Data Mining (KDD’12), 2012.
K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu,
L. Akoglu, D. Koutra, C. Faloutsos, and L. Li, “Roix: Structural role
extraction and mining in large graphs,” Proc. of ACM Conference on
Knowledge Discovery and Data Mining (KDD’12), 2012.
K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong,
and C. Faloutsos, “It’s who you know: Graph mining using recursive
structural features,” Proc. of ACM Conference on Knowledge Discovery
and Data Mining (KDD’11), 2011.
M. McCord and M. Chuah, “Spam detection on twitter using traditional
classifiers,” Autonomic and Trusted Computing, 2011.
A. K. Menon and C. Elkan, “Link prediction via matrix factorization,”
Proc. of European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases (ECML-PKDD’11),
2011.
L. Tang and H. Liu, “Relational learning via latent social dimensions,”
Proc. of ACM Conference on Knowledge Discovery and Data Mining
(KDD’09), 2009.
G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data
with neural networks,” Science, 2006.
T. Tieleman, “Training restricted boltzmann machines using approximations to the likelihood gradient,” Proc. of Interational Conference
on Machine Learning (ICML’08), 2008.
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
P. D. Hoff, A. E. Raftery, and M. S. Handcock, “Latent space approaches to social network analysis,” Journal of the American Statistical
Association, 2002.
P. D. Hoff, “Modeling homophily and stochastic equivalence in symmetric relational data,” Proc. of Neural Information Processing Systems
(NIPS’07), 2007.
K. T. Miller and T. L. Griffiths, “Nonparametric latent feature models
for link prediction,” Proc. of Neural Information Processing Systems
(NIPS’09), 2009.
J. Zhu, “Max-margin nonparametric latent feature models for link
prediction,” Proc. of Interational Conference on Machine Learning
(ICML’12), 2012.
G. E. Hinton, “Training products of experts by minimizing contrastive
divergence,” Neural Computation, 2002.
R. Memisevic and G. Hinton, “Unsupervised learning of image transformations,” Proc. of IEEE Conference on Computer Vision and Pattern
Recognition (CVPR’07), 2007.
M. A. Ranzato and G. Hinton, “Factored 3-way restricted boltzmann
machines for modeling natural images,” Artificial Intelligence, 2010.
G. W. Taylor, G. E. Hinton, and S. Roweis, “Modeling human motion
using binary latent variables,” Proc. of Neural Information Processing
Systems (NIPS’06), 2006.
G. W. Taylor and G. E.Hinton, “Factored conditional restricted boltzmann machines for modeling motion style,” Proc. of International
Conference on Machine Learning (ICML’09), 2009.
K. Cho, A. Ilin, and T. Raiko, “Improved learning of gaussian-bernoulli
restricted boltzmann machines,” Proc. of International Joint Conference
on Neural Networks (IJCNN’11), 2011.
X. Li, N. Du, H. Li, K. Li, J. Gao, and A. Zhang, “A deep learning
approach to link prediction in dynamic networks,” Proc. of SIAM
International Conference on Data Mining (SDM’14), 2014.
D. Liben-Nowell and J. Kleinberg, “The link prediction problem for
social networks,” Proc. of ACM Conference of Information and Knowledge Management (CIKM’03), 2003.
M. Li, Z. Jiang, B. Luo, J. Tang, Q. Gu, and D. Chen, “Product and user
dependent social network models for recommender systems,” Advances
in Knowledge Discovery and Data Mining, 2013.
K. Li, N. Du, S. Guo, J. Gao, and A. Zhang, “Learning, analyzing
and predicting object roles on dynamic networks,” Proc. of IEEE
Internaltional Conference on Data Mining (ICDM’13), 2013.
A. Madan, M. Cebrian, S. Moturu, K. Farrahi, and A. Pentland,
“Sensing the ’health state’ of a community,” Pervasive Computing,
2012.
C. Stevenson, C. Docx, R. Webster, and e. a. C. Battram, “Comprehensive gene expression profiling of rat lung reveals distinct acute and
chronic responses to cigarette smoke inhalation,” AJP-LUNG, 2007.
D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for
social networks,” JASIST, 2007.
L. Adamic and E. Adar, “Friends and neighbors on the web,” Social
Networks, 2003.
F. Liu, B. Liu, C. Sun, M. Liu, and X. Wang, “Deep learning approaches
for link prediction in social network services,” Neural Information
Processing, 2013.
J. Yang, K. Yu, and T. Huang, “Supervised translation-invariant sparse
coding,” Proc. of CVPR’10, 2010.
S. Bhagat, G. Cormode, and S. Muthukrishnan, “Node classification in
social networks,” Social Network Data Analytics, 2011.
M. Jamali and M. Ester, “A matrix factorization technique with trust
propagation for recommendation in social networks,” Proc. of ACM
Recommendation Systems (RecSys’10), 2010.
H. Ma, H. Yang, M. R. Lyu, and I. King, “Sorec: Social recommendation
using probabilistic matrix factorization,” Proc. of ACM Conference of
Information and Knowledge Management (CIKM’08), 2008.
Y. Wang and J. Zeng, “Predicting drug-target interactions using restricted boltzmann machines,” Bioinformatics, 2013.
S. Osindero and G. Hinton, “Modeling image patches with a directed
hierarchy of markov random fields,” Proc. of Neural Information
Processing Systems (NIPS’08), 2008.
Download