Approximation Algorithms for Betweenness Centrality

advertisement
Approximation Algorithms for
Betweenness Centrality
K ARLSRUHE I NSTITUTE
OF
T ECHNOLOGY (KIT) · I NSTITUTE
OF
T HEORETICAL I NFORMATICS · PARALLEL C OMPUTING G ROUP
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory
the Helmholtz Algorithms
Associationfor Betweenness Centrality
Elisabetta
Bergamini –ofApproximation
www.kit.edu
1
Introduction |
Betweenness centrality
BC: participation of nodes in the shortest paths of the network
Nodes with high betweenness → lie in many shortest paths
between pairs of nodes
Given G = (V , E) and v ∈ V :
X σst (v )
bC (v ) =
σst
s ,t ∈ V
s6=v 6=t
where:
σst = number of s.p. between s
and t
σst (v ) = number of s.p. between s
and t that go through v
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
[geoidin.wordpress.com]
2
Recall |
Brandes’s algorithm
for each node s:
First step
SSSP (Dijkstra or BFS) from s
s
While visiting nodes, we also keep track of
number of shortest paths and predecessors
Second step
Sort nodes by decreasing distance from s
For each node v , compute dependency as
δs (v ) =
X
w ∈succ(v )
σsv
(1 + δs (w))
σsw
w
add δs (v ) to cB (v )
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
3
Approximation algorithms
Complexity of Brandes’s algorithm: O(nm) for unweighted
graphs and O(n(n log n + m)) for weighted graphs
too expensive for graphs with millions or billions of edges!
approximation!!
What do we want from an approximation algorithm?
It should give us an unbiased estimator for the betweenness
of each node v : E(c̃B (v )) = cB (v )
Ideally, it should give us some guarantee on the quality of the
approximation
Absolute error guarantee: cB (v ) − ≤ c̃B (v ) ≤ cB (v ) + Relative error guarantee: cB (v )/ρ ≤ c̃B (v ) ≤ cB (v ) · ρ
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
4
A simple approximation [Brandes and Pich, 2006]
Choose a set S = {s1 , ..., sk } ⊆ V of k source nodes
Each si is chosen uniformly at random in V , i.e.
P(si = v ) = 1/n ∀v ∈ V
For each si and for each node v , we compute δsi (v ) as in Brandes’s algorithm
cB (v ) =
n
k
P
si ∈ S
δsi (v )
cB (v ) =
12 · 7 + 2 · 18 + 5 · 14 =
190
s3
s2
v
c̃B (v ) =
(3 · 7 + 18) ·
s1
s4
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
20
4
= 195
5
A simple approximation [Brandes and Pich, 2006]
Unbiased estimator
k = 1 (only 1 source node)
E(c̃B (v )) =
X1
s∈V
n
· n · δs (v )
= cB (v )
With k sources, we take the average of the n · δs (v ) among the
s∈S
The average of the expectations is the expectation of the average
c̃B (v ) is an unbiased estimator
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
6
A simple approximation [Brandes and Pich, 2006]
Unfortunately, the approach has a major limitation: overestimation of neighbors of degree-1 nodes
Consider a degree-1 node w, with neighbor v
If w is sampled, the betweenness of node v will be overestimated
w
v
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
7
A simple approximation [Brandes and Pich, 2006]
s3
s5
s2
v
s1
s4
cB (v ) =
12 · 7 + 2 · 18 + 5 · 14 = 190
c̃B (v ) = (3 · 7 + 18 + 18) ·
20
5
= 228
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
8
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
9
A new approach (GSS) [Geisberger et al., 2008]
A Generalized Framework for Betweenness Approximation
Length function l : E → R
For a path P =< e1 , ..., ek >, let l (P) :=
Pk
i=1
l (ei )
Scaling function f : [0, 1] → [0, 1]
Let P be in the form
P =< s, ..., v , ..., t >
and let Psv =< s, ..., v >
We define a scaled contribution δP (v ) as
f (l (Psv )/l (P))
δP (v ) :=
σst
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
10
A new approach (GSS) [Geisberger et al., 2008]
A Generalized Framework for Betweenness Approximation
Given a shortest path P =< s, ..., v , ..., t >, we call P 0 the transposed path, i.e. P 0 =< t, ..., v , ..., s >, which is a shortest path
in the transposed graph G0
The scaled contribution for v in P 0 is
δP 0 (v ) :=
s
1 − f (l (Psv )/l (P))
σst
v
t
P’
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
11
A new approach (GSS) [Geisberger et al., 2008]
For each of the k samples:
Sample a node x ∈ V uniformly at random
With probability 1/2 run a forward SSSP from x, otherwise a
backward SSSP from x (a SSSP on G0 )
Forward search:
δ(fx ) (v )
:=
P
y ∈V
P
{δP (v ) : P ∈ SPxy (v )}
:=
P
y ∈V
P
{δP 0 (v ) : P ∈ SPy x (v )}
Backward search:
δx(b) (v )
(f )
X
2n
δx (v ) if forward
c̃B (v ) =
δx (v ) =
(b)
k
k
δ
x (v ) if backward
x ∈S
x ∈S
2n X
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
12
A new approach (GSS) [Geisberger et al., 2008]
Example:
Constant f (x) = 1/2
f (l (Psv )/l (P)) 1 1
δP (v ) :=
= ·
σst
2 σst
Forward contribution of a sampled node x:
XX
1 X σxy (v )
δx (v ) :=
{δP (v ) : P ∈ SPxy (v )} = ·
2
σxy
y ∈V
y ∈V
Algorithm by Brandes and Pich!
(only difference: forward and backward searches)
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
13
A new approach (GSS) [Geisberger et al., 2008]
Example:
f (x) = x
l (Psv )/l (P) d(s, v )/d(s, t)
δP (v ) :=
=
σst
σst
It can be proven that in this case:
δx (v ) :=
X
w ∈succx (v )
d(x, v ) σxv
·
(1 + δx (w))
d(x, w) σxw
Same procedure as Brandes and Pich, but with scaled contributions
Nodes close to the sampled node get less weight
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
14
A new approach (GSS) [Geisberger et al., 2008]
s3
s5
s2
v
s1
s4
cB (v ) = 12 · 7 + 2 · 18 + 5 · 14 = 190
δs1 (v ) = 3 · (3/4) + 3 · (3/5) + 3/6 = 4.55
δs2 (v ) = 3 · (1/2) + 3 · (1/3) + 1/4 = 2.75
c̃B (v ) = (4.55 + 2.75 + ...) ·
2·20
5
= 193.47
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
15
A new approach (GSS) [Geisberger et al., 2008]
Unbiased estimator
k = 1 (only 1 source node)
E(c̃B (v )) = E(2n · δx (v ))
(f )
δ
s∈V s (v )
P
= 2n ·
=
+
P
t ∈V
δ(t b) (v )
2n
( l (Psv )
X X f ( l (P ) ) + 1 − f ( l l(P(Psv) ) )
s,t ∈V
σst
)
: P ∈ SPst (v )
X |SPst (v )|
=
σst
s,t ∈V
X σst (v )|
=
σst
s,t ∈V
= cB (v )
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
16
RK algorithm [Riondato and Kornaropoulos, 2014]
A set of r shortest paths between vertex pairs (si , ti ) i = 1, .., r
is sampled
c̃B (v ): fraction of sampled paths that go through v
s2
s
1
+ 13
+ 13
+ 13
t1
+ 13
s3
+ 13
t2
t3
each shortest path pst between s and t must be sampled with
probability
1
1
P=
·
n(n − 1) σst
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
17
RK algorithm |
Paths sampling
sample a vertex pair (s, t) uniformly at random → (n(n − 1) pairs)
extended SSSP from s → distances + number of shortest
paths + list of predecessors
starting from t, select a predecessor z with probability
s
σz
σt
z1
repeat this until we reach s
every shortest path between s and t has
the same probability to be sampled
P(z1 ) =
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
z2
z3
t
2
,
4
P(z2 ) =
1
,
4
P(z3 ) =
1
4
18
RK algorithm [Riondato, Kornaropoulos 2014]
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
19
RK algorithm [Riondato, Kornaropoulos 2014]
Unbiased estimator
Given a path P, let δP (v ) =
1 if v ∈ P
0 otherwise
k = 1 (only 1 sampled path)
E(c̃B (v )) = E(n · (n − 1)δP (v ))
= n(n − 1)
X X
s,t ∈V P ∈SPst
s6=t
1
1
n(n − 1) σst
δP (v )
X σst (v )
=
σst
s ,t ∈ V
s6=t
= cB (v )
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
20
RK algorithm [Riondato, Kornaropoulos 2014]
Absolute error guarantee
Given two arbitrary numbers and δ, it is possible to prove that,
if the number of sampled paths is at least
r =
c
2
blog2 (V D − 2)c + 1 + ln
1
δ
then
cB (v ) − ≤ c̃B (v ) ≤ cB (v ) + with probability at least 1 − δ
VD = vertex diameter: number of nodes in the shortest path with
the maximum number of nodes
Unweighted graphs: same as shortest path with maximum
weight, weighted graphs: unrelated
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
21
To summarize...
BP algorithm is simple but the estimations tend to be biased for
nodes close to a source
GSS solves this problem by giving “less importance” to nodes
that are close to the source
RK samples single paths instead of source nodes
This allows us to prove a theoretical guarantee
Each sample of RK is easier to compute (can stop the SSSP
from s once t is reached and does not compute dependencies)
However, in practice GSS works better, because for each SSSP
it uses more information
Moral of the story: not always what can be proved to work well
is also the best in practice :-)
Elisabetta Bergamini – Approximation Algorithms for Betweenness Centrality
22
Download