Uploaded by Duarte Almeida

report g01

advertisement
Instituto Superior Técnico
Network Science
1st semester 2023/2024
Identification of Efficient Spreaders in Complex Networks
António Coelho, no. 95535
Cristi Savin, no. 95549
Duarte Almeida, no. 95565
Abstract
The phenomenon of spreading processes is a crucial aspect of several domains. However, the identification
of the most influential spreaders, without testing every node, remains an open challenge. In [1] it was shown that
it is possible to obtain a better prediction with metrics other than the degree or betweeness of a node. Since then,
the number of publications about the subject has massively increased. In this work, we aim to provide a systematic
survey of various methods designed to discriminate influential nodes. We also provide a comparison between the
various methods using real-world public datasets, which, to our knowledge, has not been done to the extent we
propose ourselves to. The results show that measures that simultaneously consider both k-shell and node degree
achieve the best performance. Our work intends to both provide a complete introduction to the problem, as well
as providing a framework for future works to more easily measure against each other.
Keywords— Efficient Dissemination, Influential Spreaders, K-shell, Complex Networks
1
Introduction
The phenomenon of spreading processes is ubiquitous to several domains, such as societal interactions and
information dissemination. Furthermore, the identification of influential spreaders is of great importance, since it
can leverage methods to either hinder spreading (as is the case in pandemic control)[2] or accelerate it (which is
desirable in information dissemination) [1]. However, quantifying a node’s influence is a very challenging task, as
many times it requires making accurate predictions about the topology of very large networks.
In this paper, we aim to provide a systematic survey of various methods to identify influential nodes and
compare them, based on highly differentiated public real-world datasets. We focus our attention mainly on
topology-based measures, which are pragmatic and efficient approaches - desired properties when working with
very large and highly heterogeneous networks. To our knowledge, this type of work has not yet been done in the
literature, since other papers compare only a subset of them, or use private or hardly-accessible datasets.
We start by examining the groundbreaking work of Kitsak et al.[1] on the k-shell measure, which establishes
that, in some contexts, it can surpass the results from more established metrics like degree or betweenness in
evaluating a node’s spreading capability. However, contrary to the authors’ claim, we observe that, within a
k-shell, nodes with higher degree might be more influential1. Consequently, arises the idea of combining both
metrics, giving them different configurable weights, as in [3]. However, this proposal has the setback of needing
calibration of the weights, which may be ideal in one scenario but be sub-optimal on a different network. Another
proposal that aims at further exploring and improving the k-shell method is the one by Liu et al.[4], that combines
it with the distance to the core. However, it only obtained a significant improvement(more than 0.01 points of
imprecision) over the k-shell in the flights’ dataset4.Besides that, it requires computing shortest distances which
may be prohibitively expensive when looking mainly for performance, in very large graphs. Furthermore, the
paper from Joonhyun Bae and Sangwook Kim[5] proposes another method, the Extended Coreness, for identifying
influential nodes with the assistance of the k-shell metric. In our experiments, it was consistently more precise
1
Identification of Efficient Spreaders in Complex Networks
than the other methods4. Besides that, it is even more attractive by not requiring any calibration of parameters.
[TODO: como falar do mcde e weight neighborhodd] Finally, we also analyze some novel works, MCDE and WNC,
by Sheikhahmadi et al. [6] and Wang et al.[7], respectively. Even though, these fell short in our experiments 4,
they present new interesting ideas to deal with the problem. The first uses entropy [8] to measure how uniformly
the nodes are spread from the perspective of a node, while the second quantifies the importance of edges, by
attributing more weight to those that connect hubs.
The rest of this work will be structured as follows. In Section 2, we provide descriptions and motivations for
each measure of spreading efficiency, along with the definition of performance metrics. In Section 3, we evaluate
these measures using real-world datasets.
2
Methods
2.1
Selection of single spreaders
We now explore methods for evaluating the spreading efficiency of individual nodes when they function as
the sole initially infected node. In addition to the measures presented in the following sections, we also take into
account node degree as an indicator of this efficiency, which, given its straightforward nature, it requires no further
elaboration.
The k-shell decomposition
The k-shell is a measure that consists in an integer index π‘˜ 𝑠 that is assigned to a node and which expresses
its "coreness", that is, its location within successive layers of the network. It is designed in such a way that smaller
values of π‘˜ 𝑠 are associated with the periphery of the network, while larger π‘˜ 𝑠 values correlate with the innermost
core of the network.
The coreness values are determined through an iterative pruning process. Initially, nodes with a degree of one
(π‘˜ = 1) are removed, along with their connected edges. This pruning process continues iteratively until no nodes
with a degree of one remain in the network. All the nodes and links removed during this process collectively form
the 1-shell of the network (π‘˜ 𝑠 = 1). This procedure is then repeated on the remaining subgraph, setting π‘˜ 𝑠 = 2, and
it continues iteratively until all nodes have been assigned a coreness value.
Kitsak et. all motivate the usefulness of this measure by constructing instances where hubs reveal to be bad
spreaders due to their peripheric location within the network. Moreover, they also show through simulations
that nodes in high-π‘˜ 𝑠 layers are more susceptible to infection during a typical epidemic event and are infected
earlier than nodes in lower-π‘˜ 𝑠 layers, sustaining an infection in the early stages of an outbreak. Thus, these nodes
contribute to the epidemic’s ability to reach a critical mass and fully develop within the network and hence they are
expected to exhibit the best spreading capacity. [1] also shows that nodes with the same degree have very diverse
spreading capabilities, while nodes in the same k-shell have a homogeneous spreading capabilities.
Mixed degree decomposition
The k-shell decomposition comes with several notable downsides. Firstly, it only considers the links between
the remaining nodes (i.e., to inner cores), disregarding the connections to the removed nodes, while these turn out
to be important in real networks [3]. Additionally, the k-shell decomposition frequently results in multiple nodes
receiving the same k-shell index, leading to ambiguity in identifying the efficient spreaders within the network
[3][5][7].
(π‘š)
To tackle both these problems, the mixed degree decomposition (MDD) recurs to the mixed degree π‘˜ 𝑖 of a
node 𝑖 as a criterion to remove nodes from the network and add them to shells [3]. It is defined as:
(π‘š)
π‘˜π‘–
(π‘Ÿ)
= π‘˜π‘– + πœ† · π‘˜π‘–
(1)
where πœ† denotes a parameter between 0 and 1, π‘˜ (π‘Ÿ) denotes the residual degree (i.e., number of edges connected
to non-removed nodes) and π‘˜ (𝑒) denotes the exhausted degree (number of edges connected to removed nodes).
Initially, π‘˜ (π‘š) = π‘˜ (π‘Ÿ) = π‘˜ for all the nodes. Then, all nodes with the smallest π‘˜ (π‘š) , say 𝑀, are assigned to the
𝑀-shell, and the value for π‘˜ (π‘š) is updated for the remaining nodes according to (1). Until there are no nodes with
π‘˜ (π‘š) less or equal than 𝑀, these two steps are repeated and the removed nodes are added to the 𝑀-shell. This
iterative procedure persists until all nodes are assigned to a shell. It’s important to note that, when πœ† = 0, the
MDD method effectively reduces to the conventional k-shell method, while πœ† = 1 corresponds to the standard
2
Identification of Efficient Spreaders in Complex Networks
degree-based approach. Although this measure tackles the aforementioned problems, it requires that a value for
πœ† is set; however, different network topologies give rise to different optimal values for πœ† [3].
Improved k-shell
In order to address the degeneracy issue of the k-shell, the improved k-shell method aims to distinguish nodes
within the k-shell by favouring nodes that are closer to the core on average:
Õ
πœƒ(𝑖) = −(π‘˜ 𝑠max − π‘˜ 𝑠 (𝑖) + 1)
𝑑 𝑖𝑗
(2)
𝑗∈Γ(π‘˜ 𝑠max )
where π‘˜ 𝑠max denotes the maximum k-shell value of the network, Γ(π‘˜ 𝑠max ) denotes the network core (i.e., the set of
nodes in the π‘˜ 𝑠max -shell and 𝑑 𝑖𝑗 is the shortest distance between nodes 𝑖 and 𝑗. The negative sign at the beginning of
the expression is introduced to ensure that higher values correspond to more efficient nodes, aligning with other
similar measures. A drawback regarding this measure pertains to its computational complexity, since it requires
computing the shortest distance between the core and all the nodes.
Neighborhood Coreness
Based on the fact that the spreading quality of the node is also determined by the spreading quality of its
neighbors [5], we can define a new measure neighborhood coreness as:
Õ
𝐢 𝑛𝑐 (𝑖) =
π‘˜ 𝑠 (𝑀)
(3)
𝑀∈𝑁(𝑖)
where 𝑁(𝑖) denotes the set of nodes adjacent to a node 𝑖. The effectiveness of this measure, as emphasized by its
creators, lies in its ability to consider both the degree and coreness of neighboring nodes. We can also construct a
higher order version of this measure, designated by extended neighborhood coreness:
Õ
𝐢 𝑛𝑐 + (𝑖) =
𝐢 𝑛𝑐 (𝑀)
(4)
𝑀∈𝑁(𝑖)
Weighted Neighborhood Centrality
Unlike the previous measures, this new approach takes into account the importance of links in facilitating
the spreading process. It builds upon two major assumptions: a node’s spreading capacity is both determined
by its own intrinsic qualities and by the collective influence of its neighboring nodes; moreover, the spreading
power of these neighboring nodes is weighted by the importance of the edges connecting them. The weighted
neighborhood centrality is thus defined as:
𝐢(𝑖) = π‘˜ 𝑠 (𝑖) +
Õ π‘€ 𝑖𝑗
𝑀∈𝑁(𝑖)
βŸ¨π‘€βŸ©
π‘˜ 𝑠 (𝑀)
(5)
where 𝑀 𝑖𝑗 denotes the weight of edge (𝑖, 𝑗), is defined as 𝑀 𝑖𝑗 = π‘˜ 𝑖 π‘˜ 𝑗 and quantifies the diffusion importance by
favouring edges that connect hubs. Naturally, βŸ¨π‘€βŸ© is the mean value of all edge weights. It’s worth noting that,
the expression in Equation (5) can be adapted to incorporate alternative benchmark measures, although here we
exclusively focus on its variation utilizing the k-shell measure for the sake of simplicity.
MCDE
Simulations conducted by [last paper’s authors] revealed that the presence of core-like groups can undermine
the accuracy of influential spreader identification using k-shell decomposition. These core-like groups consist of
nodes that have the highest k-shell values but display poor connectivity with the rest of the network. Consequently,
they don’t turn out to be the most effective spreaders. Based on this, the mixed core, degree and entropy (MCDE) [6]
considers not only the degree and the k-shell of a node, but also the distribution of their neighbors among network
cores. To favour this dispersion, for each node 𝑖, MCDE extends the previous MDD measure by employing
Shannon’s entropy 𝐸(𝑖), which we know to be maximal when the neighbors are uniformly spread among the shells:
3
Identification of Efficient Spreaders in Complex Networks
E(𝑖) = −
π‘šπ‘Žπ‘₯
π‘˜Õ
𝑠
𝑝 π‘˜ (𝑖) log(𝑝 π‘˜ (𝑖))
π‘˜=1
where 𝑝 π‘˜ (𝑖) is the proportion of neighbors of the node 𝑖 which is in core π‘˜. MCDE is subsequently defined as a
weighted combination of the node’s entropy, degree, and k-shell:
MCDE(𝑖) = π›Όπ‘˜ 𝑠 (𝑖) + π›½π‘˜(𝑖) + 𝛾E(𝑖)
Similarly to the MDD, this measure also require the parameters 𝛼, 𝛽 and 𝛾 to be adequately set.
2.2
Datasets, Models and Metrics
In order to comprehensively compare the methods presented, we use datasets from different domains, and
with different properties. The first we consider is a Network of Jazz musicians from [9] (Jazz), which connects two
musicians if they have played in the same band. Second, a Protein-protein interaction in yeast (Yeast) in [10], where
each node corresponds to a protein, and the edges are interactions between different proteins. Third, we use a
dataset for a Flights’ network[11], where the nodes are airports and edges indicate flights between them. Finally, we
consider Social Circles in Ego Networks[12] (Facebook), where for a given user (central node, which is not included
in the nodes), their friends are represented as nodes, and there are connections between them if in turn they are
also friends, making it possible to identify "circles" of common attributes between them. For simplicity, we remove
self-loops and only consider the largest connected component in each network.
To simulate spreading processes, we employ the Susceptible-Infectious-Recovered (SIR) model with a fixed
recovery rate parameter 𝛾 = 0.1, and an infection rate parameter 𝛽 which is network-dependent. In our analysis, we
choose small values for 𝛽, since the spreading always reaches a large proportion of the network when large values
are used [1]. Nonetheless, we must set 𝛽 such that the expected number of infected nodes is greater than zero (i.e.
π›ΎβŸ¨π‘˜βŸ©
𝛽 > 𝛽 𝑐 = πœ† 𝑐 𝛾 = βŸ¨π‘˜ 2 ⟩−βŸ¨π‘˜βŸ© ) [13]. In all networks, we set 𝛽 to be 𝛽 𝑐 rounded up to the decimal place corresponding to
the first non-zero digit of 𝛽 𝑐 (i.e., if 𝛽 𝑐 = 0.0024, 𝛽 is set to 0.003). Each SIR simulation is over when there are no
infected nodes.
To measure a node’s spreading capacity, we recur to the spreading efficiency 𝑀 𝑖 , defined as the proportion of
infected nodes in a simulation when node 𝑖 is the only initially infected node, averaged over 𝑁 = 1000 simulations.
To evaluate the effectiveness of various measures that rank individual spreaders, we introduce the imprecision
𝑀 (𝑝)
function [1], defined as πœ– π‘š (𝑝) = 1 − 𝑀 π‘š (𝑝) , where 𝑝 ∈ (0, 1), π‘€π‘š (𝑝) denotes the sum of the efficiencies of the best
eff
𝑁 𝑝 nodes (as ranked by the measure π‘š) and 𝑀eff denotes the sum of the efficiencies of the actual most efficient 𝑁 𝑝
nodes. Values of πœ–(𝑝) near 0 for all values of 𝑝 indicate goodness of the ranking measure, while a low imprecision
for small values 𝑝 near zero indicate that the measure is effective in identifying the best spreaders.
If the lowest rank of the 𝑁 𝑝 selected nodes is 𝛼 and there are 𝑛 𝛼 of such nodes in the selected set, we average
the imprecision resulting from considering random 1000 subsets of 𝑛 𝛼 elements from the 𝑁𝛼 network nodes with
rank 𝛼. It’s worth noting that this calculation assumes that SIR simulations considering all nodes in the network
as the initial infected node have been previously conducted.
3
Results and Discussion
Table 1 contains several properties of the explored networks for future reference.
Network Name
Jazz
Yeast
Flights
Facebook
𝑁
198
2224
2905
4039
𝐸
2742
6609
15645
88234
βŸ¨π‘˜βŸ©
27.697
5.943
10.771
43.691
βŸ¨π‘˜ 2 ⟩
1070.242
98.994
601.453
4656.144
𝐻
38.641
16.657
55.840
106,570
π‘Ÿ
0.0202
-0.105
0.0489
0.0636
⟨𝐢⟩
0.617
0.138
0.456
0.606
π‘˜ max
100
64
242
1045
π‘˜ 𝑠max
29
10
28
115
𝛽c
0.00266
0.00639
0.00182
0.00095
𝛽
0.003
0.007
0.002
0.001
Table 1: Several properties of each network. We record the number of nodes 𝑁, the number of edges 𝐸, the first
and second degree moments βŸ¨π‘˜βŸ© and βŸ¨π‘˜ 2 ⟩, the heterogeneity 𝐻 = βŸ¨π‘˜ 2 ⟩/βŸ¨π‘˜βŸ©, the network assortativity coefficient π‘Ÿ,
the clustering coefficient ⟨𝐢⟩, the maximum degree and k-shell π‘˜ max and π‘˜ 𝑠max , the threshold infection rate 𝛽 c and
the used infection rate 𝛽
4
Identification of Efficient Spreaders in Complex Networks
0.012
0.010
0.010
0.05
0.008
0.008
0.004
0.004
0.002
0.002
1
0.02
1
0.01
0
10
ks
20
30
0
10
20
ks
30
0.0125
0.0100
0
10
ks
0.0075
0.0050
0.0025
1
0.03
0.0150
0.006
1
0.006
0.0175
M
M
k
M
k
k
10
0.04
0.0200
M
k
10
0.06
Facebook
1000
Yeast
100
0.012
10
Flights
10
0.07
100
100
Jazz
0
100
ks
200
Figure 1: Heatmaps of the average efficiency of each range of π‘˜ and π‘˜ 𝑠 . The range of [π‘˜ min , π‘˜ max ] was partitioned
in 10 equally spaced bins on a logarithmic scale and the range of [π‘˜ 𝑠min , π‘˜ 𝑠max ] was partitioned in 10 equally spaced
bins on a linear scale. Then, for each resulting two-dimensional bin, the average efficiency of nodes falling into the
corresponding range of k and k-shell values was computed.
Next, we analyse the relationship between the degree, k-shell and corresponding average efficiency in each
network through the heatmaps present in Figure 1. We note that there is an agreement with Kitsak et al. [1] in
the sense that a node’s degree and its k-shell value are not perfectly correlated, as nodes from a range of degrees
corresponding to a bin can be dispersed among all shells, given the existence of rows which are almost entirely
filled out with colored cells. While these figures may suggest that there is greater variability in efficiency for each
degree value than for each k-shell value, given the preponderance of vertical color bands and gradients of colors
in the rows, there are instances where variability can also be significant within a single k-shell, which happens to
be the case in the two highest k-shells of all networks. This indicates that, even though k-shells provide a degree
of organization, they do not perfectly determine each node’s efficiency. Moreover, while the heatmaps provide
evidence for homogeneity of efficiency within each k-shell, they do not support the claim that efficiency correlates
well with the k-shell. For instance, in the Yeast network, the heatmap counters the claim that the innermost k-shell
contains the most efficient spreaders.
Jazz
Flights
0.04
M(0.05)
M(0.05)
0.05
0.03
Facebook
0.007
0.014
0.006
0.006
0.012
0.005
0.005
0.010
0.004
0.003
M(0.05)
0.06
Yeast
0.007
M(0.05)
0.07
0.004
0.003
0.008
0.006
0.02
0.002
0.002
0.004
0.01
0.001
0.001
0.002
0.00
0.000
0.000
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.000
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
Figure 2: Average efficiency of the 0.05𝑁 nodes with the highest MDD values for each value of πœ† in
{0, 0.1, 0.2, . . . , 0.9, 1.0}
In Figure 2, we analyze the average efficiency of the 0.05𝑁 nodes with the highest MDD values as function of πœ†.
We first note that the optimal value of πœ† differs across the various networks, thereby validating its dependency on the
network topology. We highlight the fact that, while the Jazz and Facebook networks exhibit similar assortativities,
clustering coefficients, network heterogeneities, and have first and second-degree moments in the same order of
magnitude, the corresponding optimal πœ† values differ by 0.5 (the largest difference between any values). This
shows us that it is difficult to determine the optimal πœ† a priori solely based on a network’s characteristics.
Moreover, we can also conclude that, for the Jazz network, a measure that assigns more weight to the degree
than to the k-shell (i.e., one that is closer to the usual degree) is more efficient, while a measure that is closer to
the k-shell is more efficient for the three other networks, given that more weight is given to the exhausted degree.
Overall, this analysis suggests that a combined approach considering both the exhausted and residual degree
leads to enhanced efficiency.
We proceed in a similar fashion to find the optimal values for (𝛽, 𝛾) to use for the MCDE measure in each network, this time considering all possible combinations of 𝛽 and 𝛾 in {0, 0.1, 0.2, . . . , 0.9, 1.0}×{0, 0.1, 0.2, . . . , 0.9, 1.0}.
A notable observation that can be made through Figure 3 is that the parameter 𝛽 has a pivotal role in determining
the efficiency of the MCDE measure within these networks, since the values associated with efficiency remain
relatively consistent for a fixed 𝛽 value. However, we note that 𝛾 is still a relevant parameter, since the optimal
values yield 𝛾 values different than 0 (which corresponds in practise to the MDD), with the exception of the one
5
1.0
Identification of Efficient Spreaders in Complex Networks
1
2
3
4
5 6
(x 0.1)
7
8
9
10
0
1
2
3
4
5 6
(x 0.1)
7
8
9
10
M
4
0.0144
3
0.0070
2
1
0.007150
0
0
4
0.007175
0.0146
0.0071
3
0.007200
0.0069
0.0142
0
0
0
0.063
2
1
2
0.064
1
3
3
4
0.007225
0.0148
M
(x0.1)
5 6
M
(x0.1)
5 6
0.007250
2
4
0.065
0.007275
9 10
0.0072
7
0.007300
M
(x0.1)
5 6
(x0.1)
5 6
0.066
0.0150
7
8
0.0073
7
7
Facebook
1
9 10
9 10
8
0.007325
8
0.067
0.007350
8
Yeast
Flights
0.068
9 10
Jazz
0
1
2
3
4
5 6
(x 0.1)
7
8
9
10
0
1
2
3
4
5 6
(x 0.1)
7
8
9
10
Figure 3: Average efficiency of the 0.05𝑁 nodes with the highest MCDE values for all pairs of possible values for
(𝛽, 𝛾) in {0, 0.1, 0.2, . . . , 0.9, 1.0} × {0, 0.1, 0.2, . . . , 0.9, 1.0}.
found for Jazz network. This also shows that finding out the parameters a priori without performing simulations
remains a daunting task.
Jazz
0.35
k-shell
degree
mdd
improved k-shell
coreness
extended coreness
weighted neighborhood
mcde
0.08
0.30
0.07
k-shell
degree
mdd
improved k-shell
coreness
extended coreness
weighted neighborhood
mcde
0.20
0.15
0.10
0.06
(p)
0.25
(p)
Flights
0.09
0.05
0.04
0.03
0.02
0.05
0.01
0.00
0.01
0.02
0.03
0.04
0.05
p
0.06
0.07
0.08
0.09
0.01
0.02
0.03
Yeast
0.3
0.25
0.20
0.2
0.07
0.08
0.09
0.15
0.10
0.1
0.0
0.06
k-shell
degree
mdd
improved k-shell
coreness
extended coreness
weighted neighborhood
mcde
0.30
(p)
(p)
0.4
0.05
p
Facebook
k-shell
degree
mdd
improved k-shell
coreness
extended coreness
weighted neighborhood
mcde
0.5
0.04
0.05
0.00
0.01
0.02
0.03
0.04
0.05
p
0.06
0.07
0.08
0.09
0.01
0.02
0.03
0.04
0.05
p
0.06
0.07
0.08
0.09
Figure 4: Imprecision function of all considered 8 measures for values of 𝑝 in {0, 0.1, 0.2, . . . , 0.9, 1.0}
We now delve into the imprecision function plots for all eight considered measures. We first note that, in the
Facebook dataset (where the MDD measure’s optimal πœ† parameter was the highest) measures that solely consider
node degree consistently underperform compared to others, irrespective of the value of parameter 𝑝. This suggests
that there are structural properties of these networks that link efficiency either with the node degree or the
6
Identification of Efficient Spreaders in Complex Networks
k-shell, giving rise to a bias toward a subset of measures.
Furthermore, it’s worth highlighting that the measures which consistently perform best across all datasets are
those that take into account both degree and k-shell centrality Specifically, the extended coreness measure stands
out as the top performer for all datasets except Jazz, but it still manages to achieve the lowest imprecision values for
𝑝 ≤ 0.02 This underscores the advantage of considering a combination of degree and coreness measures to assess
node efficiency.
Nevertheless, our analysis does not provide significant evidence regarding the impact of considering the
dispersion of neighbors among cores, as the MCDE both outperforms and underperforms MDD depending on
the dataset.
Lastly, while it remains uncertain whether weighted neighborhood centrality surpasses vanilla centrality, these
results suggest that, in the context of identifying efficient spreaders, it is more advantageous to consider the coreness
of a broader neighborhood rather than explicitly modeling the diffusion importance of edges. Additionally, the
weighted neighborhood centrality appears to find a "sweet spot" by automatically combining degree and coreness
without the need for parameter tuning.
4
Concluding Remarks
In this paper, we conducted a survey on some of the most cited papers on how to identify influential nodes
in complex networks, especially the ones based on topology-based measures. In addition, we provided some
experiments and comparisons between them, given four distinct networks.
From this study, it is possible to conclude that, while the performance of some metrics is reliant on the scenario
(i.e. characteristics of the network in question), some methods’ results seem to consistently exceed others. That is,
algorithms such as the extended coreness, that take into consideration not only the location of nodes in the network,
but also the properties of their neighbors, achieve higher spreading rates on average than those that consider only
the first of the two. These methods also appear to bypass structural biases that networks have in favouring degree
or k-shell as a discriminative feature of spreading efficiency.
Looking forward, it will be interesting to observe in which novel ways researchers will be able to obtain better
results than those mentioned in this paper. Another equally important path forward, is the improvement of the
computational complexity of the studied methods, in order for them to become feasible with networks of many
orders of magnitude larger.
Finally, we expect our work to provide a good comprehensive description of the state-of-the-art, as well as a
framework for past and future works to transparently compare against each other, in diverse types of networks.
Hopefully, this will propel forward this area of research, which has been garnering more and more interest, due to
phenomena such as social networks, and the recent viral epidemics.
7
Identification of Efficient Spreaders in Complex Networks
References
[1]
Maksim Kitsak et al. “Identification of influential spreaders in complex networks”. In: Nature Physics 6.11
(Aug. 2010), pp. 888–893. doi: 10.1038/nphys1746. url: https://doi.org/10.1038%2Fnphys1746.
[2]
Christian M. Schneider, Tamara Mihaljev, Shlomo Havlin, and Hans J. Herrmann. “Suppressing epidemics
with a limited amount of immunization units”. In: Phys. Rev. E 84 (6 Dec. 2011), p. 061911. doi:
10.1103/PhysRevE.84.061911. url: https://link.aps.org/doi/10.1103/PhysRevE.84.061911.
[3]
An Zeng and Cheng-Jun Zhang. “Ranking spreaders by decomposing complex networks”. In: Physics Letters
A 377.14 (2013), pp. 1031–1035. issn: 0375-9601. doi: https://doi.org/10.1016/j.physleta.2013.02.039.
url: https://www.sciencedirect.com/science/article/pii/S0375960113002260.
[4]
Jian-Guo Liu, Zhuo-Ming Ren, and Qiang Guo. “Ranking the spreading influence in complex networks”. In:
Physica A: Statistical Mechanics and its Applications 392.18 (2013), pp. 4154–4159. issn: 0378-4371. doi:
https://doi.org/10.1016/j.physa.2013.04.037. url:
https://www.sciencedirect.com/science/article/pii/S0378437113003506.
[5]
Joonhyun Bae and Sangwook Kim. “Identifying and ranking influential spreaders in complex networks by
neighborhood coreness”. In: Physica A: Statistical Mechanics and its Applications 395 (2014), pp. 549–559. issn:
0378-4371. doi: https://doi.org/10.1016/j.physa.2013.10.047. url:
https://www.sciencedirect.com/science/article/pii/S0378437113010406.
[6]
Amir Sheikhahmadi and Mohammad Ali Nematbakhsh. “Identification of multi-spreader users in social
networks for viral marketing”. In: Journal of Information Science 43.3 (2017), pp. 412–423. doi:
10.1177/0165551516644171. eprint: https://doi.org/10.1177/0165551516644171. url:
https://doi.org/10.1177/0165551516644171.
[7]
Junyi Wang, Xiaoni Hou, Kezan Li, and Yong Ding. “A novel weight neighborhood centrality algorithm for
identifying influential spreaders in complex networks”. In: Physica A: Statistical Mechanics and its
Applications 475 (2017), pp. 88–105. issn: 0378-4371. doi: https://doi.org/10.1016/j.physa.2017.02.007.
url: https://www.sciencedirect.com/science/article/pii/S0378437117301218.
[8]
Claude Elwood Shannon. “A mathematical theory of communication”. In: The Bell system technical journal
27.3 (1948), pp. 379–423.
[9]
PABLO M. GLEISER and LEON DANON. “COMMUNITY STRUCTURE IN JAZZ”. In: Advances in Complex
Systems 06.04 (Dec. 2003), pp. 565–573. doi: 10.1142/s0219525903001067. url:
https://doi.org/10.1142%2Fs0219525903001067.
[10]
Bu D et al. “Topological structure analysis of the protein-protein interaction network in budding yeast”. In:
Nucleic acids research 31.9 (2003), pp. 2443–50. issn: 1362-4962. doi: 10.1093/nar/gkg340. url:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC154226/.
[11]
Ryan A. Rossi and Nesreen K. Ahmed. “The Network Data Repository with Interactive Graph Analytics and
Visualization”. In: AAAI. 2015. url: https://networkrepository.com.
[12]
Jure Leskovec and Julian Mcauley. “Learning to Discover Social Circles in Ego Networks”. In: Advances in
Neural Information Processing Systems. Ed. by F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger. Vol. 25.
Curran Associates, Inc., 2012. url: https:
//proceedings.neurips.cc/paper_files/paper/2012/file/7a614fd06c325499f1680b9896beedebPaper.pdf.
[13]
Albert-László Barabási. “Network science”. In: Philosophical Transactions of the Royal Society A: Mathematical,
Physical and Engineering Sciences 371.1987 (2013), p. 20120375.
[14]
Linyuan Lü, Tao Zhou, Qian-Ming Zhang, and H. Eugene Stanley. “The H-index of a network node and its
relation to degree and coreness”. In: Nature Communications 7.10168 (Jan. 2016). doi: 10.1038/ncomms10168.
url: https://doi.org/10.1038/ncomms10168.
[15]
Tian Bian and Yong Deng. “Identifying influential nodes in complex networks: A node information
dimension approach”. In: Chaos: An Interdisciplinary Journal of Nonlinear Science 28.4 (Apr. 2018), p. 043109.
issn: 1054-1500. doi: 10.1063/1.5030894. eprint: https://pubs.aip.org/aip/cha/articlepdf/doi/10.1063/1.5030894/10314679/043109\_1\_online.pdf. url:
https://doi.org/10.1063/1.5030894.
8
Identification of Efficient Spreaders in Complex Networks
[16]
Ahmad Zareie, Amir Sheikhahmadi, and Mahdi Jalili. “Influential node ranking in social networks based
on neighborhood diversity”. In: Future Generation Computer Systems 94 (2019), pp. 120–129. issn: 0167-739X.
doi: https://doi.org/10.1016/j.future.2018.11.023. url:
https://www.sciencedirect.com/science/article/pii/S0167739X18319009.
[17]
Min Wang, Wanchun Li, Yuning Guo, Xiaoyan Peng, and Yingxiang Li. “Identifying influential spreaders in
complex networks based on improved k-shell method”. In: Physica A: Statistical Mechanics and its Applications
554 (2020), p. 124229. issn: 0378-4371. doi: https://doi.org/10.1016/j.physa.2020.124229. url:
https://www.sciencedirect.com/science/article/pii/S0378437120300558.
[18]
Lei Guo, Jian-Hong Lin, Qiang Guo, and Jian-Guo Liu. “Identifying multiple influential spreaders in term
of the distance-based coloring”. In: Physics Letters A 380.7 (2016), pp. 837–842. issn: 0375-9601. doi:
https://doi.org/10.1016/j.physleta.2015.12.031. url:
https://www.sciencedirect.com/science/article/pii/S0375960115010671.
[19]
Ying Liu, Ming Tang, Tao Zhou, and Do Younghae. “Core-like groups result in invalidation of identifying
super-spreader by k-shell decomposition”. In: Scientific Reports 5.9602 (2015). issn: 2045-2322. doi:
https://doi.org/10.1038/srep09602.
9
Download