Stable Algorithms for Link Analysis Andrew Y.Ng, Alice X. Zheng, Michael I. Jordan The paper discusses briefly on the issue of stable rankings under small perturbations to linkage patterns. The authors extend the analysis and show how it gives insight into the ways of designing stable link analysis methods. They also propose two new algorithms and study the performance using citation data and web hyperlink data. An important feature of the World Wide web is its dynamic nature. References can be changed, become inaccessible, or be missed by a search engine. Hence link analysis needs to be stable to perturbations in link structure. The Klienberg HITS algorithm and the Google PageRank algorithm, are eigenvector based methods; they essentially compute principal eigenvectors of particular matrices related to the adjacency graph to determine “authority”. Understanding the robustness of link algorithms therefore involves an analysis of the stability of these eigenvector calculations. A brief overview of the HITS and PageRank algorithms given. An example is shown where a small perturbation to a collection has a large change to the eigenvectors. An algorithm’s stability to small perturbations is determined by the eigengap of S, which is defined to be the difference between the biggest and the second biggest eigenvalue. A section describes the stability of the HITS and the PageRank algorithm. Randomized HITS Let there be a random surfer who is able to follow hyperlinks in both forward and in backward directions. The surfer starts from a randomly chosen page, and visits a new page at every time step. Every time step, he tosses a coin with a bias , and if the coin lands heads, he jumps to a new webpage chosen uniformly at random. If the coin lands tails, then he checks if it is an odd time step or an even time step. If it is an odd step, then he follows a randomly chosen out-link from the current page; if it is an even step, then he traverses a random in-link of the current page. This process defines a random walk on web pages, and the stationary distribution on odd time steps is defined to be the authority weights.. Similarly, the stationary distribution on even time steps is defined to be the hub weights. T a (t 1) 1 (1 ) Arow h (t ) h (t 1) 1 (1 ) Acol a (t 1) where 1 is the vector of all ones, Arow is the same as A with rows normalized to sum to 1, and Acol is A with its columns normalized to sum to 1. Subspace HITS The basis is that subspaces spanned by a few eigenvectors may sometimes be stable even when individual vectors are not. If the eigengap between the kth and k+1 st eigenvalue is large, then the subspace spanned by the first k eigenvectors will be stable. The eigenvectors are treated as a basis for a subspace to obtain authority scores. The procedure for calculating authority scores, where f(.) is a non-negative, montonically increasing function: 1. Find the first k eigenvectors x1, x2,….xk of S = ATA ( or AAT for hub weights ), and their corresponding eigen values 1, ……..k. 2. Let ej be the j-th basis vector( whose jth-element is 1, and all other elements 0 ). Calculate the authority scores a j k f ( )(e i 1 i T j xi ) 2 . (this is the square of the length of the projection of ej onto the subspace spanned by x1, x2,….xk, where the projection in the xi direction is weighted by If we take f ( ) = 1, the authority weight of the page becomes k i 1 f (i ) .) 2 x ij . Thus, the authority weights depend only on the subspace spanned by K eigenvectors, but not the eigenvectors themselves. They finally report the empirical performance of the four algorithms, and explored the issue of ‘diversity’ of the results returned by the algorithms, focusing on the setting of web graphs with multiple connected components.