Probabilistic Diffusion of Social Influence with Incentives

advertisement
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
1
Probabilistic Diffusion of Social Influence with Incentives
Myungcheol Doo, Ling Liu Member, IEEE,
Abstract—With explosive growth of social media, social computing becomes a new IT feature. A core functionality of social computing is
social network analysis, which studies dynamics of social connectivity among people, including how people influence one another and how
fast information diffuses in a social network and what factors stimulate influence diffusion. One of the models for information diffusion is the
heat diffusion model. Although it is simple in capturing the basic principle of social influence, there are several limitations. First, the uniform
heat diffusion is no longer hold in social networks. Second, high degree nodes are most influential in all contexts is not realistic. In this paper
we propose a probabilistic approach of social influence diffusion model with incentives. Our approach has three features. First we define an
influence diffusion probability for each node instead of uniform probability. Second, we categorize nodes into two classes: active and inactive.
Active nodes have chances to influence inactive nodes but not vice versa. Third, we utilize a system defined diffusion threshold to control how
influence is propagated. We study how incentives can be utilized to boost the influence diffusion. Our experiments show the reward-powered
model is more effective in influence diffusion.
Index Terms—Social Influence, Diffusion of Information, Rewards and incentives, Heat Diffusion Model, Probabilistic Model
F
1
S
I NTRODUCTION
network analysis research can be broadly classified into two categories: (i) applying and extending
existing graph mining and machine learning algorithms
to social network datasets to predict and mine features
of statistical significance and (ii) developing innovative
social computing-specific algorithms that can derive new
insights and new values that traditional general purpose
data mining algorithms may fail to deliver. We argue that
social influence analysis falls into the second category and
it studies the diffusion of influence and how the ideas
and influences are spread and propagated through a social
network. For example, the diffusion of medical innovation
or the sudden spread of viruses and contagious diseases
has been studied in the bioinformatics and health science
domain. The effect of ”word of mouth” has been studied in
business marketing, and the pollution propagation in the
water networks has been studied in technological settings.
Heat diffusion model [6], [25], [26], [37] is used to model
and analyze social influence. Although the heat diffusion
kernel is simple and straightforward in capturing the basic
principle of social influence among a social network of
people, there are several serious limitations of using heat
diffusion to model social influence among people.
The first limitation of using heat diffusion kernel for
modeling social influence is the uniform influence distribution assumption [26]. Every node receives equal amount
of heat (influence) from its neighbor nodes having higher
heat (influence) values, and then it propagates its heat
uniformly to each of its neighbor nodes having lower heat
values. The amount of heat distributed to its neighbors
is computed by the out-degree of the node. For example,
node v has 5 neighbors and the current heat value is hv ,
then v distributes h5v to each adjacent node. However, all
friends of v are not equal. Some are best friends while
others may be acquaintances. Edges between two nodes
should be weighted by some measurable factors and the
weights should be applied to the influence(heat) diffusion
computation.
The second limitation is the assumption that high degree
OCIAL
nodes are always more active in heat diffusion process.
In the context of social influence, it is recognized that
high degree nodes are not always active in posting articles
or interacting with neighbors. Some people report that
they cannot reject friend requests from their clients [34] or
they accept friend requests from even vaguely recognized
people [8] to increase their popularity.
Third but not the least, we argue that the heat diffusion based social influence model is not suitable to study
whether and how incentives may stimulate the rate and
the coverage of influence diffusion in a social network. All
heat diffusion models used in [6], [25], [26], [37] consider
only number of edges to compute influence. However, in a
viral marketing, the number of edges is not the only factor
to consider. People promote new products to their friends
due to either explicit incentives such as financial incentives
or simply a desire to share benefits of products with friends.
Good examples are ICQ and PayPal’s marketing strategy.
ICQ gives a user an option to invite user’s friends while
PayPal gives monetary incentives for viral marketing and
both strategies have worked well [9].
Bearing these issues in mind, in this paper we present a
probabilistic approach to model social influence diffusion
with multi-scale rewards as incentives. Our probabilistic
diffusion approach has three unique features. First, we
distinguish active nodes from inactive nodes. Active nodes
represent people who adopted new products while inactive
nodes represent people who did not adopt yet. Active
nodes have chances to influence inactive neighbor nodes
but not vice versa. If active nodes succeed in activating inactive nodes, then newly activated nodes have also chances
to activate their inactive nodes, which is the same as
viral marketing. Second, we compute an influence diffusion
probability for each pair of nodes, which can differentiate
nodes that have higher level of interactions and higher
node degree from those nodes that have low or zero
interactive activities. Third but not the least, we utilizes
a system defined diffusion threshold, combined with the
pair-wise diffusion probability, to control and manage how
influence is propagated across the social network of n
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
2
I NFLUENCE D IFFUSION : A N OVERVIEW
Given that our probabilistic social influence model is an enhancement of both heat diffusion model [26] and stochastic
influence models [16], in this section we briefly describe
the basics of both heat diffusion model and stochastic
influence model, and then outline the design principle
of our approach. We will present our probabilistic social
influence model (PSI) and our algorithm for computing
influence rank in Section 3 and Section 4 respectively.
2.1
Heat Diffusion Kernel
Given a social graph G = (V, E) with V ={v1 , v2 , . . . , vn },
the heat diffusion model diffuses heat by following four
basic rules:
i Heat transfers from a source node to its connected
neighbor nodes and the amount of v’s heat at time t
is Hv (t);
ii At time t, an amount of heat diffused from v to its
neighbors during ∆t is DHv (∆t);
iii At time t, an amount of heat v received from its
neighbors during ∆t is RHv (∆t); and
iv The heat difference of node v during ∆t (from t to t+1)
is Hv (t) − Hv (t + 1) = RHv (∆t) − DHv (∆t).
Thus, heat diffusion equation at time t for n nodes is
computed as follows [26]:
H(t) =eαtK H(0)
=(I + αtK +
=I +
α2 t2 2 α3 t3 3
K +
K + . . . )H(0)
2!
3!
∞
X
α n tn n
K H(0)
n!
n=1
3) Compute H(t) using Eq. 1
Figure 1(a) shows a network of 10 nodes. Each edge
has a weight computed by d1v . v9 has two out edges, thus
E(v9 , v10 ) has weight 0.5. v10 has only one edge, E(v10 , v9 ),
with weight 1. Figure 1(b) shows the amount of heat after t
period of diffusion from v5 for this example social network.
We observe that for nodes v4 and v6 that are one-hop
away from the heat source (black lines), their heat values
increases in the beginning at the same rate because two
nodes have the same edge weight to v5 . In Figure 1(a) v4
and v6 receive the same heat from v5 and thus v6 , a line with
circle symbol, and v4 , a line with a plus symbol, have the
same diffusion pattern over the duration. Also heat values
of two nodes start decreasing at t = 5 differently because
they have different number of 2-hop away nodes. For nodes
v1 , v2 , and v7 , which are 4-hop away from the heat source
(blue lines), their heat values continue to increase until
t = 10. For nodes that are 2-hop away (red lines) or 3-hop
away (v10 ) from the heat source, their heat values increase
slowly and gradually until t = 10 due to the topology. We
call this heat diffusion model a topology-based approach
because its kernel only concerns the node degree and the
topological connectivity.
0.5 v10 0.5 v9 1 0.5 0.17 0.5 v8 0.17 0.25 4 1.0 v3 0.25 0.5 v2 0.5 4
0.17 0.25 v1 v1
v2
v3
v4
v5
v6
v7
v8
v9
v10
4.5
0.5 0.5 0.25 0.25 0.17 0.25 0.17 v
0.17 5
v7 3.5
v6 0.25 Heat
nodes. Based on this probabilistic diffusion model, we
formally study how incentives can be utilized as stimuli to
further boost the influence diffusion rate and coverage. For
each node in a social network, we compute its probabilitybased social influence ranking score, which is measured
by the approximate influence coverage of this node. Our
experiments show that the reward-powered social influence
model is more effective in terms of both diffusion rate and
diffusion coverage of influence.
2
3
2.5
2
1.5
0.5 1
0.5
0.25 0.5 v5 (a) Constructed Graph
0
0
2
4
6
8
10
Time
(b) Heat
Fig. 1. Heat diffusion example
(1)
where H(0) is a n by 1 column matrix that has heat values
of v ∈ V at time 0 and α is a heat conductivity, and K is n
by n matrix defined in Eq. 2.

1


(vj , vi ) ∈ E

dj
K(i, j) =
(2)
−1
i = j and di > 0


 0
otherwise
where di denotes the out-degree of vertex vi .
In this heat diffusion model, given an active heat source
node vi , we compute H(t) and see how many nodes have
heat value larger than a system defined value θ. If heat
value of inactive node vj is greater than θ, then we consider
vj is activated by vi . H(t) is computed in three steps:
1) Set every node inactive by H(0) = 0;
2) Select a node vi that is not visited and mark as visited
then set (i, 1)th element of H(0) as amount of heat of
vi ;
2.2 Stochastic Influence Diffusion Models
In the heat diffusion model, once a heat source is selected
and values of parameters are set, the result is the same
for every experiment. In contrast, the stochastic diffusion
process involves some non-determinacy. Familiar examples
of processes modeled as stochastic time series include stock
market and exchange rate fluctuations, speech and video
signals. The two basic stochastic models used to gauge social influence are Independent Cascading model (ICM) and
Linear Threshold model (LTM). Both models use a directed
graph G = (V, E), where V is a set of vertices representing
users and E is a set of edge expressing friendship relationship. A node u is called active if u adopts information from
her friends, otherwise it is inactive. Initially all nodes are
inactive.
Independent Cascading Model (ICM). ICM [16], [17],
[21] takes advantages of user interaction. Each relationship
represented by E(u, v) has a probability for u to activate
v, denoted by pu,v , which is assigned randomly by the
system. When a node u first becomes active at time t, it is
given a single chance to activate its inactive neighbor v with
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
probability pu,v at time t+1. For example, a coin with biased
probability pu,v that is likely to turn up heads is tossed. If
the output is head, then we consider u is activated. If v is
activated by u at time t + 1, then v also has one chance to
activate its inactive neighbors at time t + 2. This iterative
process is started with an initial set of active nodes and
stops when there is no more activation possible.
We argue that using random probability is unrealistic to
study social influence in real world social networks. One
important challenge is how to design a stochastic influence
diffusion model that is based on meaningful attributes and
relationships of social network nodes, such as different
types and volumes of social interactions, which are critical
to the stochastic influence diffusion process, to define the
computation of the probability pu,v .
Linear Threshold Model (LTM). LTM [7], [18], [27],
[33], [35] considers that each node’s tendency to be active
increases monotonically as more of its neighbors become
active. For example, the more friends of Alice buy new
iPhones, the higher desire Alice will have for buying an
iPhone. Thus, at some point, v’s active neighbors may reach
to a level of influence that can trigger an inactive neighbor,
v, to be active. This is called node-specific threshold, θv .
In all existing LTMs, this threshold is typically assigned
randomly. A node v is influenced by
X each neighbor u
according to wu,v such that θv ≤
wu,v ≤ 1, and
v∈E(u,v)
0 ≤ wu,v ≤ 1. At time 0, only one node u is active and the
rest are inactive. A neighbor node v is activated by u if the
sum
Xof the weights of all u’s is greater than or equal to θv ,
wu,v ≥ θv . Thus, θv is a system-supplied threshold
v∈E(u,v)
parameter in LTM. If θv is small, then the tendency of
activation is high. For example, an iPhone5S should have
a lower θv than the previous generation of iPhone.
Although LTM captures the tendency of information
propagation, it fails to consider the similarity between a
pair of nodes. Furthermore, the use of randomly generated
threshold to compute the influence diffusion is inadequate.
Instead, we should use the probabilistic influence diffusion
model that can incorporate the dynamics of both node
degree and the amount of recent and past activities as well
as the rate of the information to be diffused across the
network.
Furthermore, none of the existing diffusion models have
studied whether and how incentives may be incorporate
to create stimuli for the diffusion of influence to a broader
coverage of the network at a faster rate.
Bearing the above problems in mind, we develop a
probabilistic influence diffusion approach, powered with
rewards as incentives.
3
P ROBABILISTIC S OCIAL I NFLUENCE M ODEL
We present the basic design of our Probabilistic Social
Influence model (PSI), which is designed by combining the
best of both ICM and LTM while removing the limitation
of each. Comparing with the heat-diffusion model, our PSI
model has introduced probability instead of deterministic
management of influence diffusion over the given topology
of social network.
3
First, we argue that the edge weight between a pair of
nodes u and v should be probabilistic in nature but the
probability should not be randomly assigned. Instead, the
probability of the edge weight should reflect the dynamic
interactions between u and v such that the more interactive
activities from u to v should result in the higher weight on
the edge from u to v and thus the higher probability of u
diffusing its influence to v.
Formally, given a social network graph G = (V, E) where
V is a set of nodes and E denotes a set of directed edges,
each node u ∈ V has node attributes, such as the number of
non-interactive activities, N A(u). Nodes are either inactive
or active. Initially all nodes are inactive. If a node u has
performed some interactive activities with v, then an edge
E(u, v) is created from u to v with the edge weight defined
by the number of interactive activities from u to v, IA(u, v).
Based on this collection of information we compute the
probability, w(u, v), for u to activate or influence v. We will
explain how to compute the probability in the next section.
Second, we need to incorporate the probabilistic differentiation factor into our PSI model in order to determine
whether and when to stop propagating information. In the
real word, we witness at least three categories of social
network participants [32]. Some people are really active
in posting reviews about new products or new ideas, and
promote others to adopt them and/or disseminate them to
more people. Some people are only interested in propagating information to their close friends and receive influence
from their close friends. Other people are passive participants in the social network and they may read reviews
only and do not propagate the information to their friends.
Unfortunately, the heat diffusion based influence model
establishes the influence diffusion process by assuming
that (i) every node always propagates information to their
neighbor nodes (e.g., friends) and that (ii) every node
uniformly diffuses its influence to all its neighbor nodes,
which is unrealistic. Thus, in our PSI model, we differentiate different types of participants in a social network
in terms of their influence diffusion adoption style [16],
[17], [32], such as active, friend only, or non-active by
introducing two parameters: θc as the closeness threshold
and A(u) as the influence adopter category. We allow each
node in a social network to set its personalized threshold
θc , which captures the probabilistic characterization of a
node’s interest in influencing its neighbor nodes. We use
A(u) to determine how an active node u is in propagating
information and diffusing influence to its friends. Once u
decides to propagate information based on A(u), we use θc
to determine which friends of u accept the information or
influence. We will describe in detail how to compute and
set these two parameters, θc and A(u), in Section 3.2 and
Section 3.3 respectively.
Third but not the least, we introduce incentives as a
way to stimulate and encourage inactive nodes to become
interested and actively engaged in the diffusion process of
their neighboring nodes. We will present the probabilistic
social influence model with rewards as incentives in Section
3.4.
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
3.1
Activity-based Probability of Influence
When node u becomes active at time t, u has one chance
to activate all of u’s inactive friends, say v, with the
probability w(u, v). The result of activation is either ”active”
or ”inactive”. We can view the outcome of this random
event as being determined by flipping a coin of bias
probability w(u, v): ”head (active)” or ”tail (inactive)”. In
our PSI model, the probability of ”activation” is defined by
w(u, v) and the probability for ”being inactive” is 1−w(u, v).
The computed probability, w(u, v), is assigned to the edge
E(u, v).
It is important to note that the probability w(u, v) should
be computed based on dynamic properties of the social
network graph. Also for different neighbor nodes, their
probability of being activated (influenced) by u should not
follow a uniform distribution, since it is well known that
some friends are more likely to be influenced while others
may be too stubborn to be influenced. Thus w(u, v), the
probability for u to activate its neighbor v, should not be
1
, since the
simply assigned randomly or computed using
du
degree of node u (du ) fails to capture the amount and the
types of activities u has engaged in the social network and
with its neighbors.
In our PSI approach we capture both the amount and
types of activities engaged and the topological connectivity
by defining w(u, v) as the activity-based probability of
influence. Concretely, we categorize all activities of a node
u into two types: non-interactive activities and interactive
activities. We use N A(u) and to denote the number of noninteractive activities performed by u and IA(u, v) to denote
the total number of interactive activities conducted between
u and its neighbor node v where v ∈ V, (u, v) ∈ E. Examples
of non-interactive activities include posting reviews about
the newly purchased camera or posting tips for programming shell scripts. Thus we consider that N A(u) represents
how active u is in performing non-interactive tasks that can
influence others. Examples of interactive activities include
instance chat, co-author papers between two people, commenting on postings of your friends. N A(u) is open to all of
u’s friends and IA(u, v) is exclusively dedicated to a pair of
users, u and v. The topological connectivity can be captured
by aggregating all IA(u, v) for any v ∈ V, (u, v) ∈ E.
Therefore, the activity-based probability of influence from
u to v should be defined by combining these two attributes.
We formulate the probability w(u, v) as follows:
w(u, v) = α
N A(u)
IA(u, v)
+ (1 − α) P
M AX(N A)
s:(u,s)∈E IA(u, s)
(3)
where α ∈ [0, 1] is a damping factor for balancing between
non-interactive activities and interactive activities in computing the influence probability w(u, v), and M AX(N A) =
M AXv∈V (N A(v)).
Note that w(u, v) is high when N A(u) and IA(u, v) are
relatively large. When α is set to a small value, approaching
zero, large N A(u) will no longer imply high probability
w(u, v) if IA(u, v) is relatively small. Similarly, when α is
set to a value approaching one, then large IA(u, v) will no
longer imply high w(u, v) unless u has significantly high
N A(u).
4
Let Xu,v denote the result of activation of v by u.
Then Xu,v can be defined as a binary mode of either
”active” or ”inactive”. Thus, Xu,v can be seen as a discrete
Bernoulli random variable, which outputs a ”head (active)”
with probability w(u, v) or ”tail (inactive)” with probability
1 − w(u, v). We can formally define Xu,v as:
1,
succeed in activating (head)
Xu,v =
0,
fail to activate (tail), u = v, or (u, v) ∈
/E
(4)
and
wu,v ,
x head
PX (x) =
(5)
1 − wu,v ,
x tail
.
3.2 Closeness Threshold, θc
In a real world social network, a node u may influence
different friends differently based on the level of closeness
that u may have with each of its friends. In order to
differentiate friends of u who are very close, who are simply
acquaintances in the past, or who are no longer in close
contact, we introduce a system defined parameter, called
the closeness threshold θc . In our PSI model, each node u
is given a closeness threshold θc (u). Concretely, u have a
chance to activate its friends v with the activation probability w(u, v) if and only if the following condition is satisfied:
w(u, v) > θc (u). This condition implies that u can only
activate or influence v if the activation probability w(u, v) is
above u’s closeness threshold θc (u). If u is actively engaged
in the diffusion of influence across the network, then θc (u)
should be set to a low value. By setting a low θc (u), the
probability that the condition of w(u, v) > θc (u) holds is
high and thus u has many chances to activate neighbors. On
the other hand, if u is only interested in diffusing influence
to its close friend, u can set its closeness threshold θc (u) to
be higher. Thus, the probability of w(u, v) > θc (u) is lower
than previous setting and u has lower chances to activate
neighbors.
For example if we set θc = 0.3 for all nodes, u has two
friends s and t, w(u, s) = 0.25, and w(u, t) = 0.5. Thus,
u can only activate or diffuse its influence to t. This is
because w(u, s) is lower than θc and s is not considered as
a close friend of v at the diffusion time. Thus the influence
diffusion from u to s is not successful by the current
activation probability.
3.3 Adoption Probability Group, A(u) and Pa (u)
In order to incorporate different types of nodes in terms
of their influence diffusion behavior, we classify all social
network nodes according to their capacity with respect to
social influence diffusion, and we refer to it as adoption
intent for simplicity. According to the statistical study
reported in [32], people can be classified into five groups
with respect to their willingness to adopt an innovation
as shown in Table 1: Innovators, Early Adopters, Early
Majority, Late Majority, and Laggards. Innovators are people who are often the first among their friends to adopt
an innovation. They are very social and have interaction
with other innovators. They propagate innovation to Early
Adopters, who are the second group of people who adopt
an innovation faster than the rest though they may not be
the first one. They are more socially forward than Early
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
TABLE 1. Probability to Propagate or Stop
Category
Innovator
Early Adopters
Early Majority
Late Majority
Laggards
Distribution Ratio
2.5%
13.5%
34%
34%
16%
Pa (u)
0.90
0.45
0.23
0.10
0.05
1 − Pa (u)
0.10
0.55
0.77
0.90
0.95
Majority, the third group. People in Early Majority group
tend to be slower in the adoption process and seldom
hold positions of opinion leadership in a system. Thus
they adopt an innovation after a varying length of time.
Also the time of adoption is significantly longer than the
Innovators and Early Adopters. Early Majority propagates
innovation to Late Majority. People in Late Majority are typically skeptical about an innovation and very little opinion
leadership. The last group is Laggards. People in this group
are the last to adopt an innovation. Unlike some of the
previous categories, individuals in this category show little
or no leadership of opinion in influence diffusion. Laggards
typically tend to be in contact with only family and close
friends.
In our PSI model, we adopt above these five categories
of social network nodes to capture the different interests
and willingness of social network nodes with respect to
information propagation and influence diffusion. For example, innovators are trend leaders. They adopt and also
propagate new ideas and innovation to others actively.
Thus, we set high percentage (say 90%) of them to activate
others and only a small percentage (say 10%) of them
may stop propagation. The percentage of stopper increases
as we move to the next category. For the laggards, they
are neither into adopting new things (accepting influence)
nor propagating it (diffusing influence). Thus we can say
that high percentage (e.g. 90%) of them are classified as
stoppers. Table 1 shows the probability of propagating
Pa (u) and the probability to stop diffusion, 1 − Pa (u), in
each category.
Given Pa (u), a node u tosses a coin with the probability
to have a head Pa (u). If u gets a head, u keeps propagating,
otherwise it stops propagation. We represent this random
event as Yu , which is defined formally as follows:
1,
decide to propagate information (head)
Yu =
0,
decide not to propagate information (tail)
(6)
and
Pp (u),
y is head
PY (y) =
(7)
1 − Pp (u),
y is tail
Now we discuss how Pa (u) is used in conjunction with
the closeness threshold θc and the probability of u influencing v, w(u, v).
Without incorporating the categorization of user nodes,
all users are assumed to engage in the influence diffusion.
The probability of u activating one of its inactive neighbors,
say v, is only dependent on w(u, v) and θc . By introducing
Pa (u), we are differentiating nodes that are more actively
engaged in influence diffusion from nodes that are less
interested in propagating influence. Thus, we determine
5
whether a node u will be propagating its influence to its
inactive neighbor nodes in three steps:
1) With the probability Pa (u), node u decides whether to
propagate its influence to its inactive neighbor nodes
or not;
2) Once u is in the state of propagating its influence,
w(u, v) and θc are used to determine which of its inactive neighbor nodes, v, will be selected for activation
process;
3) For these selected candidate nodes v, w(u, v) is used
to determine whether v is activated by u.
Given a node u, we still need to determine which of the
five categories to which u will belong. This will allow us to
obtain the respective Pa (u), the group specific probability
for propagating influence, as shown in Table 1.
In order to categorize nodes, we propose to introduce
the adoption probability group A(u). A naive approach to
compute A(u) is to use the degree of its friends. We argue
that using the degree of friends to compute A(u) is not
sufficient. As studied in [8], [34], some people want to have
many friends just for increasing popularity, and others may
agree friend requests in order not to be impolite. Thus the
degree of friends may not accurately reflect the adopter
probability and thus should be used as one of factors, rather
than the sole factor, of activeness.
In our PSI model, we propose to combine node degree
with the level of activities in both NA and IA categories to
compute A(u), the influence adoption probability group, as
follows:
(1 − β) · du
β · (N A(u) + IA(u))
+
(8)
A(u) =
M AX(N A) + M AX(IA) M AXv∈V (dv )
where N A(u) is the number of non-interactive activities
that u has performed, IA(u) is the number of interactive
activities that u did with her friends, du is the out-degree of
u and M AXv∈V (dv ) is the maximum degree in the graph
G, and β is a balancing weight function, which carefully
combines node degree and activities in computing A(u). By
varying β we can focus more on activities or degree of node
u. When β is 0, A(u) solely depends on the degree of node.
On the other hand, by setting β to 1, A(u) is determined
exclusively by the normalized number of interactive and
non-interactive activities.
A(u) represents the influence adoption probability of
u and thus can be used to categorize node u into the
appropriate influence adoption group as shown in Table
1.
3.4
PSI with Rewards as Incentives
In this section we describe how to incorporate rewards as
incentives to the probabilistic social influence (PSI) model
in terms of reward effects and reward targets.
3.4.1
Reward Effects
In general, companies give out rewards to people in order
to stimulate the product sales through ”word of mouth”
effect [9]. Rewards can be given in many different forms.
One way to model different forms of rewards is to define
rewards in terms of benefits that a user receives and the
efforts that a user would need to make in order to receive
rewards. We propose to formulate reward effects in terms
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
of two factors: Efforts and Benefit. Some rewards require
much more efforts in terms of time or monetary, whereas
other rewards demand low efforts. For example, a credit
card company C1 offers 50,000 points after you spend
$3,000 in first 3 months. Another company C2 offers $10
credit back when you spend $40 at a restaurant. C1 ’s
promotion requires users to spend at least $3,000 while C2 ’s
promotion requires only $40. Thus we use E to represent
a scale of efforts, which ranges between 1 to mE , and mE
is a positive integer defining the upper bound. Similarly,
we use B to denote a scale of benefit, which is ranging
between 1 to mB . C1 ’s promotion offers 50,000 points
in their monetary system, which is worth of $500, thus
the reward effect is 500/3000=1/6, while C2 ’s promotion
gives back $10 credit, thus the reward effect is 10/40=1/4.
Though the two promotions have different raw benefit and
effort scales, we usually use the reward effect to represent
the ratio of benefit over effort.
In PSI, we can formulate reward effects as the following
formula:
R=c
B
E
(9)
, where c is a normalized function and can be used to
make R within the range between 0 and 1. For example,
By setting c to m1B , we can ensure R to be in the range [0,
1]. Figure 2 shows the trend of reward effects by varying B
and E at the same time while we set mE = 10 and mB = 10.
When we fix the effort scale and increase benefit scale from
1 to 10, then the reward effect also increases up to 10%. But
when we increase effort scale, the reward effect decreases
to as low as 0.1%
10
Benefit Scale(1)
Benefit Scale(2)
Benefit Scale(3)
Benefit Scale(4)
Benefit Scale(5)
Benefit Scale(6)
Benefit Scale(7)
Benefit Scale(8)
Benefit Scale(9)
Benefit Scale(10)
Reward Effect (%)
9
8
7
6
5
4
3
4.1
1
0
2
4
6
8
10
Effort Scale
Fig. 2. Reward Effect
Given a system parameter R and the fact that u agrees
to receive reward, there are two ways to incorporate the
reward incentive R into the PSI model. First, we incorporate
the reward incentive into the probability to propagate
influence, by replacing the activity-based influence probaR
bility Pa (u) with the activity-reward based probability Pa(u)
defined by the following formula:
PaR (u) = Pa (u) + (1 − Pa (u))R
(10)
Alternatively we can also incorporate the reward incentive R into A(u), the influence adoption probability group,
which may upgrade u to a higher adoption probability
group. .
Reward Target
For a social network of n nodes, the next question that
we need to address is how to select nodes to receive
reward under a limited budget constraint. This is a common
question when the amount of rewards to be given out for
the purpose of marketing campaign or influence diffusion
stimulation is limited due to the budget constraint. For
example, most of companies have limited marketing budget to promote their products. The ultimate goal here is
to maximize the influence of the viral marketing with a
limited budget of rewards so that the maximization of the
utility of rewards can lead to the largest possible number
of people to buy the products. It is widely recognized that
random selection of marketing targets is ineffective.
In PSI, we promote to distribute rewards by selecting
only one of five groups of social network nodes at a time:
Innovator, Early Adopter, Early Majority, Late Majority, and
Laggards. Clearly, members in the same group have similar
activation probabilities. Innovators have highest activation
probability and laggards have the lowest. We will compare
and determine which group is the most effective group in
terms of promoting new products. Also even though people
in the chosen group will be exposed to the reward but not
all of people will get the reward because of the limited
marketing budget. Thus we again introduce a probabilistic
control such that each person in the group has one chance
to take the reward or not. If the user takes the reward, then
her probability to propagate the social influence will be
increased from Pa (u) to PaR (u) by incorporating the reward
effect into the Eq. (10).
The problem of choosing a subset of k nodes in a social
network graph, which can provide the maximum influence
coverage, is NP hard [21]. Here k is given based on the
resource budget constraint, assuming that the total cost of
sensing the influence at each of the k nodes should not
exceed the given resource budget. Thus a greedy algorithm
is often employed [26].
4
2
0
3.4.2
6
I NFLUENCE R ANK B Y C OVERAGE
Influence Coverage (IC) and Influence Rank (IR)
One criterion to evaluate the effectiveness of an influence
diffusion model over a social network is to measure the
influence coverage for each of its influence sources and to
aggregate influence coverage of all of its influence sources.
Let G = (V, E) denote a social network graph and u ∈ V
is a node that is chosen as the sole influence source at time
t = 0. In other words u is the only active node at t = 0.
The Influence Coverage (IC) of node u over G, denoted by
IC(u), can be defined by the total number of nodes that u
can influence through hop by hop iterative computation
of u’s influence over the entire network graph G. The
iterative computation terminates when the graph traversal
started from u has reached every node of G and thus the
convergence condition is met. For PSI model, the influence
coverage is computed in terms of influence probability, θc ,
and activation probability group A(u).
After we compute IC for every node in a social network
graph G, we can sort all nodes in G by descending order of
their influence coverage score IC and produce a rank value
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
TABLE 2. A(u) and Category of vi
v1
1.00
0.90
IN
A(u)
P (u)
Group
v2
0.47
0.23
EM
v3
0.10
0.05
LA
v4
0.60
0.45
EA
v5
0.18
0.10
LM
v6
0.47
0.23
EM
v7
0.21
0.10
LM
0.13 v10 v8
0.34
0.23
EM
v9
0.24
0.10
LM
v10
0.10
0.05
LA
0.28 v9 0.91 0.51 0.55 0.54 v8 0.62 0.12 0.51 2 v10 n: # of pos3ng n: # of interac3on 60 25 3 3 25 10 10 v9 23 v8 70 20 v3 3 0.28 40 v2 25 25 40 8 v6 10 0.13 v10 7 25 20 10 7 1 v9 0.55 0.54 1 0.6 v5 (a) Graph with Activities
v3 v7 0.51 0.45 0.16 0.5 0.39 v1 0.66 0.52 5 v8 v6 0.62 0.12 0.51 1 2 2 0.91 0.51 0.6 10 40 60 v
4 2 v7 25 v1 2 1 0.48 v4 0.59 v2 0.48 0.67 0.45 0.45 0.06 0.12 v5 (b) Graph with Probability
Fig. 3. Computed weight w(u, v)
Now we use this example to illustrate how to compute
IC and IR. By using the five categories of activation probability groups and their distribution ratios in Table 1, we
categorize the nodes in this example graph by computing
their adopter probability category A(u) using Eq. (8) as
shown in Table 2. Given that A(v1 ) is included in top 2.5%,
node v1 is chosen as an innovator. Similarly, v3 and v10
are considered as laggards since their adopter probability
group A(v3 ) and A(v10 ) are included in the bottom 16%. In
this example, we set θc to 0.5. To compute IC for v1 , we
set only v1 as the influence diffusion source at t = 0. In
other words, v1 is active and all other nodes are inactive,
as shown in Figure 4(a).
At t = 1, v1 decides to activate her inactive neighbors
with Pa (v1 ) which is 0.9 according to Table 1 for innovator.
This can be viewed as v1 tossing a coin with a probability
Pa (v1 ) = 0.9 by Eq. 7. Let Yv1 denote the result of tossing
a coin. If Yv1 returns 1, it decides to propagate. Then it
v4 0.45 0.59 v3 0.28 v9 0.91 0.51 0.55 0.54 v8 0.6 0.6 0.66 v3 v9 v6 0.62 0.12 v4 0.45 0.55 0.54 0.06 0.45 0.6 v3 v7 0.51 0.45 0.16 0.5 0.39 v1 0.66 0.12 v5 v8 v6 0.62 0.12 0.51 0.52 (c) At t = 3
v5 0.45 0.13 0.91 0.51 0.6 0.48 v2 0.48 0.67 0.28 0.16 0.5 0.59 0.45 v10 0.51 0.45 0.51 0.52 v7 0.39 v1 0.12 (b) At t = 2
0.13 v10 v4 v2 0.48 0.67 (a) At t = 1
0.06 0.48 0.59 v3 v5 0.45 v6 0.62 0.12 0.66 0.52 v2 0.48 0.67 0.51 0.16 0.5 0.06 0.51 0.12 v7 0.45 0.39 v1 0.6 0.48 0.66 0.52 v6 v8 0.6 0.55 0.54 v9 0.39 v1 0.91 0.51 0.28 0.16 0.5 0.13 v10 0.51 0.45 0.6 0.6 for each node u. We refer to this rank value the Influence
Rank (IR) of u.
Figure 3(a) shows an example social network graph with
10 nodes and 26 edges. Each node has the number of
postings in their profile page as shown in the number
underlined. Each edge has a weight value defined by the
number of interactions between two nodes. For example,
v9 has 23 postings and 6 interactions with v10 and 20
interactions with v1 . Recall that we use α ∈ [0, 1] as a weight
function to balance between non-interactive activities and
interactive activities in computing the influence probability
w(u, v) as the edge weight for each edge (u, v) ∈ E. If we
set α to be 0.5. In this example graph, M AX(N A) is 70
from v1 . Thus w(v9 , v10 ) is computed as follows based on
Eq. 3:
IA(v9 , v10 )
N A(v9 )
+ (1 − α) P
w(v9 , v10 ) =α
M AX(N A)
s:(v9 ,s)∈E IA(v9 , s)
23
3
=0.5 ·
+ (1 − 0.5) ·
= 0.28
70
3 + 10
Figure 3(b) shows the graph with influence probability
as edge weight upon the completion of the influence probability computation for each edge in Figure 3(a).
v7 7
0.48 v4 0.59 v2 0.48 0.67 0.45 0.45 0.06 0.12 v5 (d) v1 and v4 try to activate
v2
Fig. 4. Example of activation steps
toss coins again with a probability w(v1 , vj ) in order to
activate inactive friends as long as they pass the following
test: w(v1 , vj ) > θc for ∀j : (v1 , vj ) ∈ E) and vj is inactive.
From Figure 4(a), we see that all of v1 ’s outgoing edges
have the probability greater than θc = 0.5, and all of v1 ’s
neighbors are inactive. Thus, v1 tries to activate all of its 6
neighbors, v2 , v3 , v4 , v6 , v8 , and v9 . By the probabilistic
influence diffusion, which combines a discrete Bernoulli
random variable like tossing a coin (Eq. 6) with w(v1 , vq ),
q = 2, 3, 4, 6, 8, 9, we may have four out of the six nodes,
v2 , v4 , v8 , and v9 , activated as shown in Figure 4(b).
At t = 2, these newly activated nodes will follow their
respective activation probability Pa (vi ) for i = 2, 4, 8, 9
to continue propagating or to stop propagation. Let us
assume that by tossing a coin Y2 , Y8 , Y9 are 1 and Y4
is 0. Thus, v2 , v8 , and v9 have a chance to activate their
inactive neighbors. Given that v2 has no inactive friends
to activate, v2 terminates the diffusion process. Thus, only
v9 and v8 , marked by black solid circles, are considered
at time t=2, each has only one inactive friend. v9 tries to
activate v10 and v8 tries to activate v7 . However, given that
w(v8 , v7 ) = 0.45 ≤ θc = 0.5 and w(v9 , v10 ) = 0.28 ≤ θc = 0.5,
thus, both nodes terminate the diffusion process. As a
result, there are no newly activated nodes at t = 2.
At t = 3, we could not find any other active nodes that
have not been examined and no more new nodes can be
activated. Thus the diffusion process converges and we stop
the influence diffusion process with v1 as the sole influence
source at t = 3, as shown in Figure 4(c). The IC, for v1 is
4 since there are four nodes (v2 , v4 , v8 , v9 ) included in the
coverage of v1 ’s influence.
The computation of IC and IR for the remaining nodes
follows a similar procedure. Clearly the IC computation is
the dominating factor in terms of the overall computation
cost of IR. When the size of a social network is big,
this simple and straightforward algorithm for influence
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
coverage computation is not efficient. In the next section
we will describe one efficient implementation of our IC
algorithm.
4.2
Computing IC by Matrix Multiplication
An alternative and more efficient approach to computing
the IC, for all nodes in a social network graph G = (V, E)
is to use matrix multiplication.
Considering the running example given in Figure 3(a),
we can formulate the computation of IC, for node v1 as a
matrix multiplication problem.
Let St (i) denote the binary state of node vi at time t
and the value of St (i) is either 1 (active) or 0 (inactive). For
example, at t = 0, only v1 is active. Thus we have S0 (1) = 1
and S0 (i) = 0 (1 < i < n) and n denotes the size of V .
v1 makes a decision to propagate influence with the
probability Pa (v1 ). Let Y (i) denote the binary state of node
vi by tossing a coin. If we get Y1 = 1, it implies v1 is
propagating the influence to its inactive neighbor nodes
based on both the threshold θc and the probability weight
w(v1 , vj ). vj satisfies that (v1 , vj ) ∈ E and vj is an inactive
neighbor of v1 . This computation can be written as the
multiplication of S0 (1) and Y1 , namely Y1 × S0 (1).
Assume that v2 is activated by v1 at t = 1. Then the state
of v2 , S1 (2), is 1. Thus, the activation results of v2 at t = 1
can be represented by S1 (2) = X1,2 × Y1 × S0 (1) using Eq.
6.
In short, the activation process can be formulated as a
three-step iterative process:
• Step 1: Define the state of the source of information
diffusion by selecting a node vi at time t = 0, denoted
as S0 (i);
• Step 2: Determine whether to keep propagating or not
by multiplying S0 (i) with Yi , the result of a discrete
Bernoulli random variable test, namely Yi × S0 (i);
• Step 3: Compute the state of vj at t + 1, St+1 (j), as
Xi,j × Yi × S0 (i)
Figure 4(d) shows that both v1 and v4 are direct neighbors
of v2 and both nodes are active at time t, then both nodes
will try to activate v2 using PSI. The result of this activation
can be written as X1,2 + X4,2 . If one of the nodes succeeds
with X1,2 + X4,2 ≥ 1, then v2 is activated; Otherwise v2
remains as inactive. v2 ’s state at t+1 is expressed as follows:
St+1 (2) = X1,2 × Y1 × St (1) + X4,2 × Y4 × St (4)
(11)
We can generalize the above equation to compute the
state of a node vi at time t+1 as follows:
X
St+1 (i) =
Xj,i Yj St (j)
(12)
(vj ,vi )∈E
.
Note that for any node vj that has no direct edge connecting to vi at time t, we have St (j) = 0.
In order to compute IC, for all the vertices at the same
time, the matrix representation is utilized. First, we use a
state column vector St to represent the state of vertices at
time t. For example, if v1 and v4 are active and the rest of
the nodes are inactive at t = 0, then S0 is given below:
T
S0 = 1 0 0 1 0 0 0 0 0 0
(13)
8
Let Xu,v denote the result of coin toss (activation). Node
u has only one chance to activate its neighbor node v. Thus,
Xu,v can be pre-computed and stored as a n × n matrix:


X1,1 X2,1 X3,1 · · · Xn,1
 X1,2 X2,2 X3,2 · · · Xn,2 

X=
(14)


···
X1,n X2,n X3,n · · · Xn,n
Similar to Xu,v , Yu is also the result of coin toss to determine if node u is willing to activate its inactive neighbor
nodes. Thus Yu also can be pre-computed and stored as a
1 × n matrix as follows:
Y = Y1 Y2 Y3 · · · Yn
(15)
By Eq. 13, 14, and 15, we can compute the state of all
vertices at time t + 1 after activation at t as follows
St+1 = XY St
(16)
Algorithm 1 provides a sketch of the IC computation for
a given node vi . We set the given node vi as an influence
source and active. Then we perform a matrix multiplication
until there are no more newly activated nodes. For each
iteration we count the number of activated nodes for vi ’s
IC.
Algorithm 1: IC(vi )
1
2
3
4
5
6
7
8
9
10
S0 ← n × n zero matrix; t = 0;
S0 (i) ← 1 ;
/* vi becomes first active */
ICi ← ∅ ;
/* initialize ICi */
while Newly activated nodes exist do
St+1 = XY St ;
if ∀St+1 (j) ≥ 1 then
St+1 (j) ← 1;
end
ICi ← ICi ∪ {Newly activated nodes}; t ← t + 1;
end
4.3 IR Computation Algorithms for Rewards
We have described a simple way to compute the influence
coverage IC for each node in a social network graph when
setting this node as the sole influence source. We call this
approach the independent IC since the influence coverage
is computed for each individual node independently. However, the ultimate goal to incorporate rewards into our PSI
model is to find the subset of k nodes as the recipients of
the rewards, which can provide the maximum aggregate
influence coverage, given k as the total resource budget for
the rewards.
There are many candidate subsets of k nodes, but we
want only the one with the maximum aggregate influence
coverage. If we simply sort all nodes in an descending order
of their individual IC scores, and select the top k nodes
with the highest individual IC scores, we may not find
the subset of k nodes that provide the maximum aggregate
influence coverage when many of the k nodes have large
overlapping in their IC, i.e., a large number of common
nodes included in their influence coverage.
Given the inherent problems with the independent IC
based method, in this section we consider another two
mechanisms to compute the aggregate influence coverage
by the top k nodes: locally minimal overlap IC, and
globally minimal overlap IC. Both are considered common
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
mechanisms to compute the aggregate influence coverage
by a given subset of k nodes, which improves the independent IC in terms of achieving maximum aggregate
influence coverage.
In PSI with rewards, regardless which of the three mechanisms we use to compute the maximum aggregate influence
coverage, we use the same algorithmic structure in our
design and implementation of the respective IC algorithms.
• Step 1: Compute activation matrix X. Result of activation is the same as coin toss. Once a probability is
given, the result can be pre-computed and stored as a
matrix X.
• Step 2: Compute ICi . For each node vi we compute
corresponding ICi . ICi can be computed individually
or together with other active nodes.
• Step 3: Sort ICi and select top-k nodes order by ICi .
Independent IC Given a social network composed of n
nodes, Algorithm 2 shows the steps of how to compute IR,
using independent IC. The basic steps of this algorithms
is first to calculate IC, of each node vi and then sort nodes
in descending order of their IC. The advantage of this
algorithm is the simplicity in terms of conceptualization
and implementation. However, the big weakness of this
algorithm is the problem of potentially large overlapping of
the top-k influential sets, one by each of the top k nodes.
Table 3 shows IC, for 5 nodes: v1 , v2 , v3 , v4 , v5 . Based on
Algorithm 2, top-2 influential nodes are v1 , v2 . But v2 ’s
coverage is completely overlapping with v1 ’s, leading to
poor results in terms of maximum aggregate information
coverage. This motivates us to consider alternative ways
to compute the aggregate influence coverage, which can
minimize the IC, overlap.
Algorithm 2: Independent IC
1
2
3
4
5
6
Compute Y ;
Compute X;
foreach vi ∈ V do
Compute ICi
end
Compute IR by sorting {ICi |vi ∈ V } by set size;
Locally Minimal Overlap IC This is a localized greedy
algorithm. For each node v, the algorithm first computes
ICv . Then instead of simply choosing the top k nodes with
highest individual IC as the top k influence ranked nodes,
it selects the top k most influential nodes iteratively as follows: At first, node v whose ICv is the largest is selected as
the first seed of top-k influential nodes. For each remaining
node u, we compute the locally minimal overlap IC, LCu ,
which is the difference between two coverages, the chosen
universal coverage, U IC, from a set of previously added
highest IR nodes and the remaining individual coverage
from one of the remaining nodes. We select a node u as the
next highest IR, node if its locally minimal overlap LCu
is the largest among all the remaining nodes. After adding
node u as the next highest IR, node, we remove u from
G and set G0 by G − {u} and union the current aggregate
U IC and the individual IC of node u. This process iterates
until G and G0 are different. For example, v1 is selected
as the highest IR, node. The universal coverage U IC is
now ICv1 . Locally minimal overlap IC, LCu , is shown in
Table 3. Other remaining nodes, v2 , v3 , v4 , computes LCu by
computing difference from IC, which is ICv1 . v2 ’s coverage
is completely overlapping with v1 ’s. Therefore, the locally
minimal overlap for v2 is empty so is for v3 . In comparison,
v4 and v5 has non-empty the locally minimal overlap IC
with v4 having the larger the locally minimal overlap. Thus,
v4 is selected as the next highest IR, node.
Algorithm 3: Locally Minimal Overlap IC
1
2
3
4
5
6
7
8
9
10
11
Algorithm 4: Globally Minimal Overlap IC
1
2
3
4
5
6
7
8
9
10
11
13
Node
v1
v2
v3
v4
v5
IC
v11 , v12 , v13 , v14
v11 , v12 , v13
v11 , v12
v15 , v16
v16
Locally Minimal Overlap IC
ø
ø
ø
v15 , v16
v16
Line 1 to 5 in Algorithm 2
U IC ← ∅; Vlocal ← ∅; G0 ← G;Vlocal ← ∅;
while G0 6= ∅ do
foreach vi ∈ (G − Vlocal ) do
LCi ← ICi − U IC;
end
Select vi that has max(LCi );
Vlocal ← Vlocal ∪ {vi }; G0 ← G − vi ;
U IC ← U IC ∪ ICi ;
end
Compute IR by sorting v ∈ Vlocal ;
Globally Minimal Overlap IC Comparing to independent
IC, Algorithm 3 generates a larger aggregate influence
coverage for a given k nodes and thus a better quality
of influence rank IR. However, due to the nature of a
local greedy algorithm, it does not consider cases where
multiple sources of diffusion exist. Thus locally minimal
overlap may still produce large overlapping among the top
k influential sets. For example, on a launch of a new iPhone,
multiple bloggers post their own reviews. The information
that one person receives may come from several diffusion
sources. People may read multiple reviews before deciding
to purchase a new product (the new iPhone). Thus, we need
to devise a global greedy algorithm to model the multiple
sources of information diffusion, which will enable us to
find the globally minimal overlap IC.
12
TABLE 3. IC
9
14
15
16
Line 1 to 5 in Algorithm 2
U IC ← ∅;Find vi that has max(ICi );U IC ← ICi ;
Vglobal ← vi ; G0 ← G;
while G0 6= ∅ do
S0 ← n × n zero matrix;
foreach vi ∈ Vglobal do
S0 (i) ← 1;
/* heat sources */
end
foreach vi ∈ (V − Vglobal ) do
St (i) ← 1; ICi = IC(Vglobal ∪ vi );
GCi ← ICi − U IC;
end
Find vi that has max(GCi );
Vglobal ← Vglobal ∪ vi ; G0 ← G − vi ;
end
Computing IR by sorting v ∈ Vglobal ;
The first portion of Algorithm 4 is similar to that of
Algorithm 3. It computes individual ICi , then selects a
node vi whose ICi is the largest (Line 2). Then we set v
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
In this section we report our experimental evaluation of
the performance and effectiveness of our PSI model. Our
experiments are conducted with two objectives. First, we
want to show the effects of parameters that we present in
our PSI models such as α, a weight function for balancing
between N A and IA, θc , closeness threshold, and β, a
balancing weight function between the sum of N A and IA
and degree. Each parameter affects the number of activated
nodes. These parameters should be carefully chosen for
different types of SNS. Our experiments will be a good
guidance in selecting those parameters. Second, we want
to evaluate the performance of our PSI models with or
without rewards against topology-based and naive activitybased approach. We show that probability and incentive
approach has up to 7 times bigger activation coverage than
previous approaches.
5.1
Datasets
Three datasets are used in this experiments such as DBLP
[3], Epinions [1], and and Facebook [2]. DBLP datasets
consist of paper authors as nodes and their co-authorship as
edges. For example, two authors u and v wrote two papers,
and u and w wrote three papers, then u and v are connected
by E(u, v) and u and w by E(u, w). Because u wrote two
papers with v and three papers with w, we set IA(u, v) as
2 and IA(u, w) as 3. Note that u wrote total 5 papers with
u and v. Writing five papers can be considered not only
as interactive activities but also non-interactive activities.
Therefore, we set N A(u) as 5. DBLP dataset has 4,768 nodes
and 32,020 edges.
Facebook is a social network service that provides a
profile page for each user. Users can update their profile
page, post photos, and leave comments on her posting or
friends’ postings. When a user u posts something on her
page, we consider it as a non-interactive activities. If u
leaves comments on her friend’s posting, it is considered
as an interactive activity. By counting N A and IA, we
compute N A(u) and IA(u, v). We launched a Facebook
app and in total 273 users used the app. Once a user u
allows us to use the private information, we extract their
friends relationship information. For example, one user
may have 400 friends. Then from one user we can create 401
1
1
0.8
0.6
β = 0.0
β = 0.2
β = 0.4
β = 0.6
β = 0.8
β = 1.0
0.4
0.2
0
0
0.2
0.4
0.6
A(u)
0.8
(a) A(u) (DBLP)
1
1
Cumulative Density Function
E XPERIMENTS
Cumulative Density Function
5
user nodes. Some users have more than 1,000 friends. By
doing this, we can create 76,954 user nodes and 1,121,861
friendship relationships from 273 users.
Massa [28] collected data from Epinions, a website for
consumer reviews and trust networks. Epinions provide a
system that users who bought products can leave reviews
on them. Then potential buyers read reviews and determine
if they buy the product or not. The potential buyers do
not solely rely on the reviews but reviews have influence
on users. Therefore Epinions is a good dataset to gauge
influence. On Epinions, users post reviews. We consider
this reviews as N A. For IA, we use a trust list. For example,
if users u and v posted some reviews. When a user w likes
u’s reviews and does not v’s reviews, then w creates a trust
list by adding u and does a block list by adding v. Next
time when w visits Epinions, reviews from w’s trust list
will be shown and one from w’s block list will be filtered
out. We create E(u, v) from the trust link. Epinions dataset
49,288 nodes and 487,002 edges.
5.2 Effects of β
The first set of experiments focus on β, which is used
to compute A(u) and plays a role of balancing a weight
between the number of activities and the degree of u. Figure
5 shows the adopter probability category A(u) by varying
β. x-axis is the value of A(u) and y-axis is the cumulative
density function. We vary the balancing weight β from 0
to 1.
DBLP data and Facebook data show that when we
increase β, A(u) decreases. Greater value of β means we
consider activities are more important factor in computing
A(u) than degree of a node. Activity information consists
of two values; N A and IA. The sum of two values are
normalized divided by the sum of two values M AX(IA) +
M AX(N A). In order to get high A(u), both N A and IA
should be also large. But not all nodes have two large
values. Therefore when we increase β, A(u) is decreasing.
On the other hand, Epinions dataset has different property.
If we increase β, A(u) also increases. Epinions dataset has
N A which is the number of reviews and does not have
IA. Thus, when we increase and set β to be 1, the value of
A(u) is computed by N A(u) only. However, if we set low β
such as 0, then A(u) completely depends on du . Compared
to N A(u), du is lower and the standard deviation is high
such as mean 9 and std 32, respectively. Big difference in du
du
. From now on, we set
means lower value of
M AXu∈V (d)
β as 0.5 to weight on evenly both the number of activities
and the degree of u.
Cumulative Density Function
as an initial seed of source of information diffusion (Line
3). We denote U IC as the universal IC, for the set of influential nodes Vglobal . Now we use Hill-climbing algorithm
to simulate multiple source of influence diffusion. First,
for each node vi not in Vglobal , we add temporarily vi to
Vglobal , then compute ICi , which simulates multiple sources
of information diffusion (Lines 9-12). Given two coverages,
ICi and U IC, we define globally minimal overlap GCi as
a difference between ICi and IC. After computing GCi for
all remaining nodes, we select vi that has the largest GCi
as the next highest IR, node.
In the next section we will evaluate our PSI with rewards,
including comparing these three different mechanisms to
compute IC and thus the influence rank used to select
the top k most influential nodes to receive rewards as
incentives.
10
0.8
0.6
β = 0.0
β = 0.2
β = 0.4
β = 0.6
β = 0.8
β = 1.0
0.4
0.2
0
0
0.2
0.4
0.6
A(u)
0.8
0.8
0.6
β = 0.0
β = 0.2
β = 0.4
β = 0.6
β = 0.8
β = 1.0
0.4
0.2
1
(b) A(u) (Facebook)
Fig. 5. Adopter Probability Category A(u)
0
0
0.2
0.4
0.6
A(u)
0.8
1
(c) A(u) (Epinions)
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
0.6
0.4
0.2
0
0
Topology
α = 0.0
α = 0.2
α = 0.4
α = 0.6
α = 0.8
α = 1.0
0.2
0.4
0.6
0.8
Edge Weight w(u, v)
1
(a) w(u, v) (DBLP)
1
Cumulative Density Function
1
Cumulative Density Function
Cumulative Density Function
1
0.8
0.8
0.6
0.4
0.2
0
0
Topology
α = 0.0
α = 0.2
α = 0.4
α = 0.6
α = 0.8
α = 1.0
0.2
0.4
0.6
0.8
Edge Weight w(u, v)
1
(b) w(u, v) (Facebook)
0.8
0.6
0.4
0.2
0
0
Topology
α = 0.0
α = 0.2
α = 0.4
α = 0.6
α = 0.8
α = 1.0
0.2
0.4
0.6
0.8
Edge Weight w(u, v)
1
(c) w(u, v) (Epinions)
Fig. 6. Probability w(u, v) for different α
Figure 7 shows the effect of α. x-axis shows the number
7000
# of Influenced People
1000
800
600
α = 0.0
α = 0.2
α = 0.4
α = 0.6
α = 0.8
α = 1.0
400
200
0
0
5
10
15
6000
5000
4000
α = 0.0
α = 0.2
α = 0.4
α = 0.6
α = 0.8
α = 1.0
3000
2000
1000
0
0
20
5
10
K
15
20
K
(a) DBLP
(b) Facebook
Fig. 7. Effects of α
5.4
Effects of θc
The parameter θc is a value used for filtering out acquaintances. If w(u, v) is less than θc , we consider u and v is
not close enough for activation process. Figure 8 shows the
effect of θc for all three datasets. We vary θc from 0.01 to
0.05. If we increase θc then more people are excluded for
activating process because there will be more edges with
w(u, v) < θc . Once the number of nodes to be target of
activation is decreased, the number of activated nodes also
decreases. In the following experiments we set θc as 0.05.
160
140
40
θ = 0.01
θ = 0.03
θ = 0.05
35
120
100
80
60
40
25
20
15
10
200
θ = 0.01
θ = 0.03
θ = 0.05
150
100
50
5
20
0
0
30
250
θ = 0.01
θ = 0.03
θ = 0.05
# of Influenced People
180
# of Influenced People
The parameter α is used in computing w(u, v), the probability for u to activate v. α is a balancing weight between N A
and IA in computing w(u, v). However, Epinions dataset
has only N A values, thus we designed an experiment for
DBLP and Facebook datasets only. In this experiments, we
set θc as 0.05 uniformly for all of nodes and β as 0.5 to
evenly weight on activities and the degree of node. Due
to the value of θc , nodes with w(u, v) less than 0.05 are
excluded for activating process.
Figure 6 shows the cumulative density function of w(u, v)
for each dataset. When we increase α, a weight parameter balancing between N A and IA, w(u, v) decreases. In
computing w(u, v) we normalize N A(u) by dividing
from
P
M AX(N A) and IA(u, v) by dividing from
IA(u). In
other words, N A(u) is normalized over the entire N A
values while IA(u, v) is normalized among u’s IA values.
Then the difference between N A(u) and M AX(N A) might
be larger than the difference between IA(u, v) and ΣIA(u).
The bigger difference, the smaller fraction value, which
IS(u, v)
N A(u)
might be smaller than
. Thus,
means
M AX(N A)
ΣIA(u)
when we increase α, w(u, v) is also decreasing as shown
in Figure 6(a). Note that Epinions dataset has no IA information and w(u, v) completely weight on N A. Thus, when
we set α to be 0, then w(u, v) is also 0. As we increase
α, w(u, v) also increases because w(u, v) weights more on
N A(u)
as shown in Figure 6(c).
M AX(N A)
Note that Figure 6(b) shows that more than 90% of nodes
have w(u, v) < 0.01. For collecting Facebook dataset, we
request 273 users to take a part in our experiments. Thus,
for each node u in these 273 nodes, we can get N A(u),
IA(u), and du and we extract u’s friend network which
results in 76,954 friends and 1,121,861 friendship. For each
node v in 76,954 extracted nodes, we have no N A(v) and
very limited information of IA(v) and dv . For example u
is one of 273 Facebook app users, v is not a Facebook app
user. If v leaves a comment on u’s photo, we create two
nodes u and v, edges E(u, v) and E(v, u), and set IA(v, u)
as 1 and N A(v) as 0. Due to the way of constructing the
social network graph, some nodes have N A, while other
do not have N A. Also some nodes have high IA, while
others have lower IA. This distribution results in lower
w(u, v) for node u that is not one of 273 Facebook App
users. Therefore, more than 90% of w(u, v) are lower than
0.01.
of top-k users and y-axis shows the number of activated
nodes. In both datasets, when we set α to be higher, the
number of influenced people tends to be also higher. In
order to have a fair comparison we will set α as 0.6 in the
next experiments.
# of Influenced People
Effects of α
# of Influenced People
5.3
11
5
10
15
20
K
(a) DBLP
0
0
5
10
15
K
(b) Facebook
20
0
0
5
10
15
20
K
(c) Epinion
Fig. 8. Effects of θc
Note that in Facebook dataset the lines for θc = 0.03 and
θc = 0.05 are the same. We explained why more than 90%
of nodes in Facebook data have lower w(u, v) in section 5.3.
95% of nodes in Facebook dataset has w(u, v) less than 0.01.
Thus when we set θc to be larger than 0.01, most of nodes
do not take a part in information diffusion and the number
of activated nodes are extremely low. Note that when we
increase θc larger than 0.01, the number of nodes with
w(u, v) larger than θc is the same. Therefore, the number
of activated nodes for θc = 0.03 and θc = 0.05 are the same.
From now on we set θc as 0.05 except for Facebook dataset
(θc as 0.01).
5.5 Effects of Reward Effect, R
Next set of experiments show how the reward effect R
affects the number of activated nodes. In this experiment
we set α as 0.6, β as 0.5, and θc as 0.05. For each dataset
we vary R from 0.01 to 0.2. If we set R to be 0.01 than
it boosts the Pa (u) 1%. If we set R to be 0.2, then Pa (u)
w(u, v) = w(u, v) + (1 − w(u, v))R
(17)
A node u who agrees to get a reward will have a boosted
Pa (u), which makes u more actively participate in propagating information diffusion, and increased w(u, v), which
allows u to have higher chance to succeed in activating
a neighbor node v. We applied this modified Eq. 17 and
did the same experiments over Facebook dataset. Figure
12 shows the result of the experiment. Similar to other
datasets, innovators respond more actively over rewards.
12
700
680
680
# of Influenced People
700
660
640
620
600
580
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
560
540
520
0
5
10
15
660
640
620
600
580
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
560
540
520
0
20
5
K
680
680
# of Influenced People
# of Influenced People
700
660
640
620
600
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
560
540
520
0
5
10
15
20
(b) DBLP R = 0.05
700
580
10
K
(a) DBLP R = 0.01
15
660
640
620
600
580
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
560
540
520
0
20
5
K
10
15
20
K
(c) DBLP R = 0.10
(d) DBLP R = 0.20
400
400
380
380
# of Influenced People
# of Influenced People
Fig. 9. Effects of R for DBLP
360
340
320
300
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
280
260
240
0
5
10
15
360
340
320
300
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
280
260
240
0
20
5
K
380
# of Influenced People
400
380
360
340
320
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
280
260
240
0
5
10
15
20
(b) Epinions R = 0.05
400
300
10
K
(a) Epinions R = 0.01
# of Influenced People
is increased 20%. We also varied the marketing target. For
each experiment we select only one group as a marketing
candidate. Individuals in the candidate group have chance
to get the reward. If the individual accepts the reward,
then Pa (u) is boosted by R we set. Therefore the impact
of rewards is to decrease the probability of u to become a
stopper.
x-axis shows the number of top-k users and y-axis
shows the number of activated nodes. For each dataset,
we performed the five experiments. For each experiment,
one group is selected as a marketing candidate. We give
individuals in the selected group a chance to take reward.
For example, Innovator line in the chart shows the total
number of influenced nodes when we assign incentives to
only individuals in innovator group.
In DBLP datasets as shown in Figure 9, when we target
innovators, the number of activated nodes are highest.
Early adopters may be the alternative target for marketing but the effect of rewards for the early adopters is
only slightly bigger than other groups. Remaining three
groups, early majority, late majority, and laggards, are not
appropriate targets for reward. Although R increases the
probability to be active, Pa (u), and u becomes active, u may
still have a very low w(u, v). Innovators and Early Adopters
usually have higher w(u, v) because they have large N A(u)
and IA(u). However, Early Majority, Late Majority and
Laggards may have lower w(u, v) due to their low activities.
Therefore, regards less of R’s boosting in Pa (u), low values
in w(u, v) result in the low number of activated nodes.
In Epinions dataset as shown in Figure 10, we do not
have IA information. Therefore w(u, v) values are also low.
Even Innovators may have lower w(u, v). Due to the lower
w(u, v), R is not effective for all types of nodes when R
is lower than 0.1. Like DBLP dataset, individuals in the
innovator group is the most effective marketing target for
rewards.
For Facebook datasets as shown in Figure 11, the number
of activated nodes are the same for all five marketing
target groups. More than 90% of w(u, v) values in Facebook
datasets are lower than θc . This is due to the way of
constructing social graph. w(u, v) is computed using both
N A(u) and IA(u) but Only 273 users have N A and some
users have IA. Thus, w(u, v) is extremely low. Although
rewards boosts the probability to participation, extremely
low w(u, v) diminishes rewards effects. These figures may
mislead that rewards are not effective at all. For Facebook
dataset, we modified reward effect so that R can boost both
Pa (u) and w(u, v). Boosting w(u, v) is computed as follows:
# of Influenced People
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
15
360
340
320
300
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
280
260
20
240
0
K
(c) Epinions R = 0.10
5
10
15
20
K
(d) Epinions R = 0.20
Fig. 10. Effects of R for Epinions
5.6
Comparions
Lastly, we conducted an experiment to show the performance of heat diffusion model (Topology), PSI model
without rewards (Activity), and PSI with rewards (Reward).
We set α as 0.6, β as 0.5, and θc as 0.05. For Facebook
datasets, we use modified Eq. 17 so that we make R
effective. x-axis is the number of top-k influential users
and y-axis is the number of activated nodes. In all three
datasets, heat diffusion model (Topology) has the lowest
number of influenced people. As explained in Section 1,
heat diffusion model sets the uniform probability to activate
friends. When a degree is high, then all of friends may not
1
be activated because
can be very low. But when we
du
3500
3000
2500
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
2000
1500
1000
0
5
10
15
3000
2500
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
2000
1500
1000
0
20
5
10
15
20
(b) Facebook R = 0.05
4500
4000
4000
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
2000
1500
1000
0
5
10
15
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
2000
5
10
15
20
(d) Facebook R = 0.20
4500
5000
4000
4500
# of Influenced People
# of Influenced People
Fig. 11. Effects of R for Facebook
2500
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
2000
1500
1000
0
5
10
15
4000
3500
3000
2500
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
2000
1500
1000
0
20
5
10
15
20
K
(a) Facebook R = 0.01
(b) Facebook R = 0.05
5000
5500
4500
5000
# of Influenced People
# of Influenced People
K
4000
3500
3000
2500
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
2000
1500
1000
0
5
10
15
K
(c) Facebook R = 0.10
1000
350
300
Topology
Activity
Reward
250
200
150
100
50
5
10
15
20
0
0
5
10
15
20
0
0
K
(b) Facebook
5
10
15
20
K
(c) Epinion
4000
3500
3000
2500
Innovator
Early Adopter
Early Majority
Late Majority
Laggards
2000
1000
0
5
10
15
R ELATED W ORK
Social influence analysis as marketing techniques has received increased attention over the last decade [5], [10],
[14], [19], [21], [24], [31], [38]. It holds the potential to
increase brands or products awareness through word-ofmouth promotion. [21], [31] pioneered the concept of social
influence by modeling the selection of influential sets of
individuals in a social graph as a discrete optimization
problem. It utilizes the provable greedy approximation
algorithm for maximizing the spread of influence in a
social network. [31] proposed a cascading viral marketing
algorithm, which tries to find such a subset of individuals
that if these individuals adopt a new product or innovation,
then they will trigger a large cascade of further adoptions.
[26] proposed a heat-diffusion based viral marketing model
with top k most influential nodes which utilizes the heat
diffusion theory from Physics to describe the diffusion of
innovations and help marketing companies divide their
marketing strategies into several phases.
Another relevant area is social influence rankings [12],
[15], [23], [36], which measures and rank nodes in a social
network by their social influence ranks, similar to PageRank, using the number of followers and/or social network
topology.
4500
1500
20
2000
(a) DBLP
6
2500
K
3000
200
Topology
Activity
Reward
3000
K
3500
300
3000
3500
1000
0
(c) Facebook R = 0.10
Topology
Activity
Reward
Fig. 13. Comparisons of Three Approaches
1500
20
400
4000
K
(a) Facebook R = 0.01
2500
500
0
0
4500
3000
450
400
600
100
K
# of Influenced People
# of Influenced People
K
3500
5000
700
3500
13
# of Influenced People
4000
# of Influenced People
4500
4000
# of Influenced People
4500
# of Influenced People
# of Influenced People
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
20
K
(d) Facebook R = 0.20
Fig. 12. Effects of modified R for Facebook
consider activity information, we differentiate w(u, v) so
that some of close friends are activated and this activation
continues to friends of friends.
On top of probability and activity-based approach, we
select Innovators as marketing target and give them a
chance to accept rewards. Figure 13 shows that our PSI
model with and without rewards have the larger number of
influenced nodes, especially for DBLP dataset, the number
of influenced nodes by PSI with rewards is 7 times more
than the number of influenced nodes by topology-based
approach.
In addition, a number of research projects focus on
issue of finding the top k influential people in SNS in
the context of viral marketing so that the selected people
maximize the spread of influence under certain influence
propagating models [4], [11], [12], [20], [22], [29], [36]. [13]
proposed a new heuristic influence maximization algorithm
to maximize the spread of influence under certain influence
cascade models.
Recently, [30] presented a model in which information
can reach a node via the links of the social network or
through the influence of external sources. [39] model social
influence patterns based on self-influence and co-influence
similarity by incorporating statistical significance in selected node attributes and devise a social influence based
distance metric for clustering large graphs demonstrating
higher quality and higher efficiency compared to existing
approaches. Most of the research on this subject also focused on topology of social networks or static attributes
of node state without probabilistic feature and reward
schemes.
JOURNAL OF TRANSACTIONS ON SERVICES COMPUTING: SPECIAL ISSUE, TSC MANUSCRIPT UNDER CONSIDERATION
7
C ONCLUSION
We have presented a probabilistic social influence diffusion
model (PSI) with incentives. Comparing with previous
approaches, our PSI approach has three novel features.
First, we argue that social influence is sensitive to dynamic properties of social network nodes and we define an
activity-based influence diffusion probability for each pair
of nodes instead of uniform distribution of influence based
solely on topology of social network. We categorize nodes
into two classes: active and inactive. Active nodes can have
one chance to influence inactive nodes but not vice versa.
Second, in order to express the real world more accurately,
we introduce some system parameters, such as a weight
function for balancing between N A and IA for computing
w(u, v), a β damping factor for balancing between the
number of activities and the degree of node u, and a
closeness threshold θc to allow probabilistic propagation of
influence across the social network of N nodes. Third but
not the least, we incorporate multi-scale incentives into our
PSI model as stimuli to further boost the influence diffusion
rate and coverage. Finally we conduct an extensive series
of experiments on various parameters and to evaluate the
performance of our approach. Our experiments show that
our PSI model with rewards is more effective in terms
of both diffusion rate and diffusion coverage of influence.
Although the PSI model presented in this paper centered
on quantitative information about activities and a single
homogeneous social network, one of our ongoing research
efforts is to study how qualitative information about activities, such as context and user profile, may further impact
on social influence diffusion rate and coverage. We are also
interested in studying social influence diffusions in multitier heterogeneous information networks.
ACKNOWLEDGMENTS
14
[16] J. Goldenberg, B. Libai, and E. Muller, “Talk of the network: A
complex systems look at the underlying process of word-of-mouth,”
Marketing Letters, 2001.
[17] ——, “Using complex systems analysis to advance marketing theory
development,” Academy of Marketing Science Review, 2001.
[18] M. Granovetter, “Threshold models of collective behavior,” American
Journal of Sociology, 1978.
[19] D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins, “Information
diffusion through blogspace,” in WWW, 2004.
[20] J. Y. C. Ho and M. Dempsey, “Viral marketing: Motivations to
forward online content,” Journal of Business Research, 2010.
[21] D. Kempe, J. Kleinberg, and É. Tardos, “Maximizing the Spread of
Influence through a Social Network,” in SIGKDD, 2003.
[22] M. Kimura, K. Saito, and R. Nakano, “Extracting influential nodes
for information diffusion on a social network,” in AAAI, 2007.
[23] H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social
network or a news media?” in WWW, 2010.
[24] K. Lerman and R. Ghosh, “Information contagion: An empirical
study of the spread of news on digg and twitter social networks,” in
ICWSM, 2010.
[25] X. Li, W. Zheng, and M. R. Lyu, “A coalitional game model for
heat diffusion based incentive routing and forwarding scheme,” in
Networking. Springer-Verlag, 2009.
[26] H. Ma, H. Yang, M. R. Lyu, and I. King, “Mining social networks
using heat diffusion processes for marketing candidates selection,”
in CIKM, 2008.
[27] M. W. Macy, “Chains of Cooperation: Threshold Effects in Collective
Action,” American Sociological Review, 1991.
[28] P. Massa and P. Avesani, “Trust Metrics in Recommender Systems,”
in Computing with Social Trust, 2009.
[29] M. Mohite and Y. Narahari, “Incentive compatible influence maximization in social networks and application to viral marketing,” in
AAMS, 2011.
[30] S. A. Myers, C. Zhu, and J. Leskovec, “Information diffusion and
external influence in networks,” in SIGKDD, 2012.
[31] M. Richardson and P. Domingos, “Mining knowledge-sharing sites
for viral marketing,” in SIGKDD, 2002.
[32] E. M. Rogers, “Diffusion of Innovations,” Free Press, New York, 1995.
[33] T. Schelling, “Micromotives and macrobehavior,” Norton, 1978.
[34] M. M. Skeels and J. Grudin, “When social networks cross boundaries:
a case study of workplace use of facebook and linkedin,” in ACM
GROUP, 2009.
[35] T. Valente, Network Models of the Diffusion of Innovations. Hampton
Press, 1995.
[36] J. Weng, E.-P. Lim, J. Jiang, and Q. He, “TwitterRank: finding topicsensitive influential twitterers,” in WSDM, 2010.
[37] H. Yang, I. King, and M. R. Lyu, “DiffusionRank: a possible penicillin
for web spamming,” in SIGIR, 2007.
[38] S. Ye and S. Wu, “Measuring message propagation and social influence on twitter.com,” in Social Informatics. Springer, 2010.
[39] Y. Zhou and L. Liu, “Social influence based clustering of heterogeneous information networks,” in SIGKDD, 2013.
The authors acknowledge the partial support from grants
under NSF CISE NetSE, SaTC and I/UCRC and Intel ISTC
on cloud computing.
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
“Epinions,” http://www.epinions.com.
“Facebook,” http://www.facebook.com.
“The DBLP Computer Science Bibliography.”
I. Anger and C. Kittl, “Measuring Influence on Twitter,” in i-KNOW,
2011.
E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic, “The role of social
networks in information diffusion,” in WWW, 2012.
H. Bao and E. Y. Chang, “AdHeat: an influence-based diffusion
model for propagating hints to match ads,” in WWW, 2010.
E. Berger, “Dynamic Monopolies of Constant Size,” Journal of Combinatorial Theory Series, 2001.
D. Boyd and J. Heer, “Profiles as Conversation: Networked Identity
Performance on Friendster,” in HICSS, 2006.
A. D. Bruyn and G. L. Lilien, “A multi-stage model of word-of-mouth
influence through viral marketing,” IJRM, vol. 25, 2008.
V. Buskens and K. Yamaguchi, “A new model for information diffusion in heterogeneous social networks,” Sociological Methodology,
vol. 29, no. 1, 1999.
M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi, “Measuring
user influence in twitter: The million fallacy,” in AAAI, 2010.
M. Cha, A. Mislove, and K. P. Gummadi, “A measurement-driven
analysis of information propagation in the flickr social network,” in
WWW. ACM, 2009.
W. Chen, C. Wang, and Y. Wang, “Scalable influence maximization
for prevalent viral marketing in large-scale social network,” in KDD,
2010.
P. Domingos and M. Richardson, “Mining the network value of
customers,” in SIGKDD, 2001.
R. Ghosh and K. Lerman, “Predicting influential users in online social
networks,” 2010.
Myungcheol Doo Myungcheol Doo is a staff advanced research engineer at Arris Solutions in Lisle,
Illinois. He is conducting a research on analysis
and visualization of large scaled data related to
advertisements. His prior research topics include
indexing spatial objects, location-based alarms, and
social network analysis. He received B.S. in Computer Science Education from Korea University in
2004, M.S. and Ph.D. in Computer Science from
Georgia Institute of Technology in 2007 and 2012
respectively.
Ling Liu Ling Liu is a Professor in the School
of Computer Science at Georgia Institute of Technology. She directs the research programs in Distributed Data Intensive Systems Lab (DiSL), with
current focus on cloud computing and big data
systems. Prof. Ling Liu is an internationally recognized expert in the areas of Database Systems,
Distributed Computing, Internet Data Management,
and Service oriented computing. Prof. Liu has published over 300 international journal and conference
articles and is a recipient of the best paper award
from a number of top venues, including ICDCS 2003, WWW 2004, IEEE
Cloud 2012, ICWS 2013 and 2005 Pat Goldberg Memorial Best Paper
Award. Prof. Liu is also a recipient of IEEE Computer Society Technical
Achievement Award in 2012. In addition to services as general chair and
PC chairs of numerous IEEE and ACM conferences in data engineering,
very large databases and distributed computing fields, Prof. Liu has served
on editorial board of over a dozen international journals. Dr. Liu’s current
research is partially sponsored by NSF, IBM, and Intel.
Download