Social Networks and PageRank

advertisement
Part I: Web Structure Mining
Chapter 2: Hyperlink Based Ranking
•
•
•
•
•
Social Network Analysis
PageRank
Authorities and Hubs
Link Based Similarity Search
Enhanced Techniques for Page Ranking
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
1
Social Networks
• Directed graph with weights assigned to its edges
• Nodes represent documents and the edges – citations
from one document to other documents.
• Prestige can be associated with the number of input
edges to a node (in-degree).
• Prestige has a recursive nature. depends on the
authority (or again, the prestige) of citations
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
2
Social Networks
• adjacency matrix A
– if document cites document A(u, v)  1
– otherwise A(u, v)  0
• prestige score
p(u)  v A(v, u) p(v)
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
3
Social Networks
• Computing prestige
T

P A P
• Eigen decomposition
P A P
T
– Eigenvector P
– Eigenvalue 
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
4
Social Networks
a
0.548
c
0.726
b
0.413
 0.548  0 0 1  0.548

 


1.325  0.414  1 0 0   0.414
 0.726 1 1 0   0.726

 


0 1 1 


A   0 0 1
1 0 0 


 0 0 1


T
A  1 0 0 
1 1 0 


 P  AT P
  1.325
P  0.548 0.414 0.726
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
T
5
Social Networks
Power Iteration
• P  P0
• Loop:
QP
P  A TQ
P
• While
1
P
P
P Q  
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
6
PageRank
• “Random web surfer” keeps clicking on
hyperlinks at random with uniform probability
• Implements random walk on the web graph
• Page u links to N u web pages
• Probability of visiting page v will be 1 Nu
• Amount of prestige that page v receives from
page u is 1 Nu of the prestige of u
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
7
PageRank
Propagation of page rank R(u )
10
20
20
20
60
20
30
26
4
2
13
13
Nv  w A(v, w)
6
12
A(v, u ) R(v)
R(u )   
Nv
v
6
6
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
8
PageRank
Calculation of page rank
 0 0.5 0.5 


A  0 0 1
1 0 0 


a
0.4
0.4
0.2
0.2
c
0.4
0.2
b
0.2
PT  0.666 0.333 0.666
Norm X
2
 x12  x22  ...  xn2
P A P
T
 0.4   0 0 1   0.4 
  
 
 0 . 2    0 .5 0 0   0 . 2 
 0 . 4   0 .5 1 0   0 . 4 
  
 
Norm
X 1 x1 x2 ... xn
PT  2 1 2
Integers
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
9
Download