Ranking in Networks http://en.wikipedia.org/wiki/Centralityi Question: Given a “communication network N , how to discover important nodes? How to define the importance of members of the network? The answer may help in discovery of • most influential member(s) of a social network; • key infrastructure nodes; in an urban network; • super-spreaders of disease; • ......... Polling the members is not efficient and is not accurate. Examples of networks: • The Web; • The protein network; • The network of scientific cooperation; • ......... The answer is implemented via the notion of centrality which gives a real-valued function on the nodes of a graph. The values of the function provide a ranking which identifies the most important nodes. On the other hand, it is often meaningless for not most important nodes. 1 The word “importance” has a wide number of meanings, leading to many different definitions of centrality. What are the network features that characterize the importance of a node in a network? Degree centrality The degree can be interpreted as the chances of a node to catch whatever is flowing through the network (such as a virus, or some information). In the case of a directed network (where ties have direction), two separate measures of degree centrality, are defined: in-degree indeg(v), and out-degree, outdeg(v). in-degree is interpreted as a measure of popularity; out-degree is interpreted as a measure of social involvement. ‘ Graph Centralization. ‘ Let G be a connected graph; let X ⊆ V (G), where G[X] is also conneted. Denote ∆(X) the highest degree centrality in X. Define X H = max{ (∆(X)) − deg(x))}. X x∈X The degree centralization of the graph G as follows: C(G) = P v∈V (G) [∆(V ) − deg(v)] H The value of H is maximized when the graph X contains one central node to which all other nodes are connected (a star graph), and in this case H = (n − 1)(n − 2). 2 Closeness centrality In connected graphs, dist(x, y) denotes the length of a shortest path from x to y. The farness of x is defined as f arness(x) = X dist(x, y). y6=x The closeness of x is defined as cl(x) = 1 . f arness(x) If G is disconnected and vertices x and y belong to different connectdd components, dist(x, y) = ∞. When a graph is not strongly connected, and no path connects y with x, then we assume dist(y, x) = ∞, and use the sum of reciprocal of distances, instead of the reciprocal of the sum of distances, with the convention 1/∞ = 0: X 1 H(x) = . dist(y, x) y6=x For undirected graphs, the notion is known as harmonic centrality. A variation of the notion is defined as X 1 D(x) = . dist(y,x) 2 y6=x 3 Betweenness centrality. Betweenness is a centrality measure of a vertex within a graph. Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. It was introduced as a measure for quantifying the control of a human on the communication between other humans in a social network. Vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness. The betweenness of a vertex v in a graph G = (V, E) is computed as follows: 1. For each pair of vertices (x, y), compute the shortest paths between them. 2. For each pair of vertices (x, y), determine the fraction of shortest paths that pass through the vertex v. 3. Sum this fraction over all pairs of vertices (x, y). CB(v) = X x6=v6=y∈V σxy (v) σxy where σxy is the total number of shortest paths from node x to node y and σxy (v) is the number of those paths that pass through v. The betweenness may be normalised by dividing through the number of pairs of vertices not including v which for directed graphs is (n − 1)(n − 2) and for undirected graphs is (n − 1)(n − 2)/2. 4 Computational complexity Both betweenness and closeness centralities of all vertices in a graph involve calculating the shortest paths between all pairs of vertices on a graph, which requires Θ(V 3) time with the FloydWarshall algorithm. However, on sparse graphs, Johnson’s algorithm may be more efficient, taking O(V 2 log V + V E) time. In the case of unweighted graphs the calculations can be done with Brandes’ algorithm[19] which takes O(V E) time. 5 PageRank http://en.wikipedia.org/wiki/PageRank PageRank is a link analysis algorithm; It assigns a numerical weighting to each node of a hyperlinked set of documents, such as the World Wide Web, with the purpose of “measuring” its relative importance within the set. ‘ The numerical weight that it assigns to any given document E is referred to as the PageRank of E and denoted by P R(E). Other factors like Author Rank can contribute to the importance of a document. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it (“incoming links”). The main idea of ranking: a page that is linked to by many pages with high PageRank receives a high rank itself. 6 B A C D Initialization: P R(A) = P R(B) = P R(C) = P R(D) = 1; A better idea is to assume a probability distribution within [0,1]. P R(A) = P R(B) + P R(C) + P R(D). B A P R(A) = D C P R(D) P R(B) + P R(C) + . 2 3 u v1 v v d 2 X P R(v) P R(u) = L(v) in v∈E (u) 7 The values of P R(v) approximate a probability distribution of the likelihood that a person randomly clicking on links will arrive at any particular page. The PageRank computations require several iterations through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value. Damping factor The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85. X P R(pj ) 1−d P R(pi) = +d N L(pj ) pj ∈M (pi ) 8 Computation For t = 0, P R(pi; 0) = 1 . N X P R(pj ; t) 1−d P R(pi; t + 1) = +d N L(pj ) pj ∈M (pi ) The computataion ends when X | |p(j, t + 1) − p(j, t)| < ǫ. j 9