Ranking in Networks http://en.wikipedia.org/wiki/Centralityi Question

advertisement
Ranking in Networks
http://en.wikipedia.org/wiki/Centralityi
Question: Given a “communication network N , how to discover
important nodes? How to define the importance of members of the
network?
The answer may help in discovery of
• most influential member(s) of a social network;
• key infrastructure nodes; in an urban network;
• super-spreaders of disease;
• .........
Polling the members is not efficient and is not accurate.
Examples of networks:
• The Web;
• The protein network;
• The network of scientific cooperation;
• .........
The answer is implemented via the notion of centrality which
gives a real-valued function on the nodes of a graph.
The values of the function provide a ranking which identifies the
most important nodes. On the other hand, it is often meaningless
for not most important nodes.
1
The word “importance” has a wide number of meanings, leading
to many different definitions of centrality.
What are the network features that characterize the importance
of a node in a network?
Degree centrality The degree can be interpreted as the chances
of a node to catch whatever is flowing through the network (such
as a virus, or some information).
In the case of a directed network (where ties have direction), two
separate measures of degree centrality, are defined: in-degree indeg(v),
and out-degree, outdeg(v).
in-degree is interpreted as a measure of popularity;
out-degree is interpreted as a measure of social involvement. ‘
Graph Centralization.
‘
Let G be a connected graph; let X ⊆ V (G), where G[X] is also
conneted. Denote ∆(X) the highest degree centrality in X. Define
X
H = max{ (∆(X)) − deg(x))}.
X
x∈X
The degree centralization of the graph G as follows:
C(G) =
P
v∈V (G) [∆(V
) − deg(v)]
H
The value of H is maximized when the graph X contains one central
node to which all other nodes are connected (a star graph), and in
this case H = (n − 1)(n − 2).
2
Closeness centrality
In connected graphs, dist(x, y) denotes the length of a shortest
path from x to y.
The farness of x is defined as
f arness(x) =
X
dist(x, y).
y6=x
The closeness of x is defined as
cl(x) =
1
.
f arness(x)
If G is disconnected and vertices x and y belong to different connectdd components, dist(x, y) = ∞.
When a graph is not strongly connected, and no path connects y
with x, then we assume dist(y, x) = ∞, and use the sum of reciprocal of distances, instead of the reciprocal of the sum of distances,
with the convention 1/∞ = 0:
X
1
H(x) =
.
dist(y, x)
y6=x
For undirected graphs, the notion is known as harmonic centrality.
A variation of the notion is defined as
X
1
D(x) =
.
dist(y,x)
2
y6=x
3
Betweenness centrality.
Betweenness is a centrality measure of a vertex within a graph.
Betweenness centrality quantifies the number of times a node acts
as a bridge along the shortest path between two other nodes.
It was introduced as a measure for quantifying the control of a
human on the communication between other humans in a social
network. Vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices
have a high betweenness.
The betweenness of a vertex v in a graph G = (V, E) is computed
as follows:
1. For each pair of vertices (x, y), compute the shortest paths
between them.
2. For each pair of vertices (x, y), determine the fraction of shortest paths that pass through the vertex v.
3. Sum this fraction over all pairs of vertices (x, y).
CB(v) =
X
x6=v6=y∈V
σxy (v)
σxy
where σxy is the total number of shortest paths from node x to
node y and σxy (v) is the number of those paths that pass through
v.
The betweenness may be normalised by dividing through the number of pairs of vertices not including v which for directed graphs is
(n − 1)(n − 2) and for undirected graphs is (n − 1)(n − 2)/2.
4
Computational complexity
Both betweenness and closeness centralities of all vertices in a
graph involve calculating the shortest paths between all pairs of
vertices on a graph, which requires Θ(V 3) time with the FloydWarshall algorithm.
However, on sparse graphs, Johnson’s algorithm may be more efficient, taking O(V 2 log V + V E) time.
In the case of unweighted graphs the calculations can be done with
Brandes’ algorithm[19] which takes O(V E) time.
5
PageRank
http://en.wikipedia.org/wiki/PageRank
PageRank is a link analysis algorithm;
It assigns a numerical weighting to each node of a hyperlinked set
of documents, such as the World Wide Web, with the purpose of
“measuring” its relative importance within the set. ‘
The numerical weight that it assigns to any given document E is
referred to as the PageRank of E and denoted by P R(E).
Other factors like Author Rank can contribute to the importance of a document.
A hyperlink to a page counts as a vote of support. The PageRank
of a page is defined recursively and depends on the number and
PageRank metric of all pages that link to it (“incoming links”).
The main idea of ranking:
a page that is linked to by many pages with high
PageRank receives a high rank itself.
6
B
A
C
D
Initialization: P R(A) = P R(B) = P R(C) = P R(D) = 1;
A better idea is to assume a probability distribution within [0,1].
P R(A) = P R(B) + P R(C) + P R(D).
B
A
P R(A) =
D
C
P R(D)
P R(B)
+ P R(C) +
.
2
3
u
v1
v
v
d
2
X P R(v)
P R(u) =
L(v)
in
v∈E (u)
7
The values of P R(v) approximate a probability distribution of the
likelihood that a person randomly clicking on links will arrive at
any particular page.
The PageRank computations require several iterations through the
collection to adjust approximate PageRank values to more closely
reflect the theoretical true value.
Damping factor
The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking.
The probability, at any step, that the person will continue is a
damping factor d.
Various studies have tested different damping factors, but it is
generally assumed that the damping factor will be set around 0.85.
X P R(pj )
1−d
P R(pi) =
+d
N
L(pj )
pj ∈M (pi )
8
Computation
For t = 0,
P R(pi; 0) =
1
.
N
X P R(pj ; t)
1−d
P R(pi; t + 1) =
+d
N
L(pj )
pj ∈M (pi )
The computataion ends when
X
|
|p(j, t + 1) − p(j, t)| < ǫ.
j
9
Download