The maths of Google (PowerPoint)

advertisement
the Further Mathematics network
www.fmnetwork.org.uk
the Further Mathematics network
www.fmnetwork.org.uk
Google and Simultaneous
Equations
Let Maths take you Further…
Google – The World’s Favourite
Search Engine
•Why?
Simple clean design, fast searches, it puts the
“good stuff” at the top
•Some Figures
Google indexes billions of pages
It has several hundred million queries per day with many,
many thousands per second
It runs on tens of thousands of Linux computers
It makes up half of the multi billion dollar per year search
engine business
Page Ranking
Google has to do three things
a) Crawl the web to find all the publicly
accessible websites
b) Arrange the data it finds so that it can be
searched quickly
c) Rank the websites in order of
importance, so that the most important
can be presented to the user first
The rest of this presentation focuses on c) above.
The Mathematics
Web
site
2
Web
site
1
Web
site
Website
Linked to
from
1
3
2
1,3
3
2
2
3
3
1
Suppose the entire world wide web consisted of only three websites.
An arrow between two websites in the diagram on the left represents
a link between those two websites. For example there is a link from
website 2 to website 3.
We can represent this as a directed graph, see the diagram on the
right.
The Mathematics
The goal is to assign a rank to each
website.
2
The rank should be a measure of interest.
Let x1 be the rank of website 1
3
Let x2 be the rank of website 2
Let x3 be the rank of website 3
1
We’ll also insist that the ranks are
normalised, i.e. x1 + x2 + x3 = 1
First attempt is to base the rank on the number of incoming links.
Website
i
Incoming
Links
Rank, xi
1
1
0.25
2
2
0.50
3
1
0.25
The Mathematics
However, a good ranking system should surely
take into account the importance of the website
that the links are coming from. A link from
www.bbc.co.uk should be worth more than a
link from www.barometerworld.co.uk.
2
3
1
This was Google’s big idea. We need to think of
the ranks as representing voting power…
For example website 3 has voting power of x3. Since it links to
websites 1 and 2, it will vote 0.5x3 to website 1 and 0.5x3 to website 2.
Website 1 has voting power of x1. Since it only links to website 2, it
will vote x1 to website 2.
Website 2 has voting power of x2. Since it only links to website 3, it
will vote x2 to website 3.
But the total of each website’s vote is its rank. This gives the
following equations……
The Mathematics
x1  0.5 x3
2
x2  x1  0.5 x3
3
x3  x2
1
But this can be written as a matrix equation….
 0 0 0.5  x1   x1 

   
1
0
0.5

 x2    x2 
 0 1 0  x   x 

 3   3 
The solution of this with
x1 + x2 + x3 = 1 is
x1 = 0.2, x2 = 0.4, x3 = 0.4
Eigenvalues and Eigenvectors
2
The ranks x1, x2, x3 are a solution of
3
1
 0 0 0.5  x1   x1 

   
1
0
0.5

 x2    x2 
 0 1 0  x   x 

 3   3 
So the ranks as a column vector are an eigenvector of the
matrix with eigenvalue 1 (chosen so that the ranks add up to 1).
You might know this as a fixed point of the matrix.
How does Google calculate this so quickly? We need to find a
quick way to calculate this vector, and hence the ranks.
Iterative Computation
2
Let
 0 0 0.5 


T   1 0 0.5 
0 1 0 


3
1
1/ 3 


Starting with the vector x  1/ 3


1/ 3 


We look at the sequence of
vectors Tx , T2x, T3x, T4x,…..
The limit of this sequence is
the eigenvector of ranks….
Adding Websites
4
2
3
1
0

1

Let T 
0

0
Site
1
2
3
4
1
0
0
1
0
2
1
0
1
1
3
0
1
0
0
4
0
0
0
0
0 0.5 0 

0 0.5 1 
gives ranks 0.2, 0.4, 0.4, 0

1 0 0 respectively

0 0 0
Adding Websites
4
2
3
1
 0 0 0.5 0 


1
0
0.5
1

Let T  
 0 0.5 0 0 


0
0.5
0
0


Site
1
2
3
4
1
0
0
1
0
2
1
0
1
1
3
0
1
0
0
4
0
1
0
0
gives ranks 1/9, 4/9, 2/9, 2/9
respectively
Adding Websites
4
2
5
3
1
Site
1
2
3
4
5
1
0
0
1
0
0
2
1
0
1
0
0
3
0
1
0
0
0
4
0
1
0
0
0
5
0
1
0
0
0
0
0.5 0 0 
0


1
0
0.5
0
0


Let T   0 0.333 0 0 0  gives ranks 0, 0, 0, 0, 0, the

 problem of dangling nodes!
0
0.333
0
0
0


 0 0.333 0 0 0  Google have a way of getting

 around this.
Adding Websites
4
2
5
3
1
Site
1
2
3
4
5
1
0
0
1
0
0
2
1
0
1
1
1
3
0
1
0
0
0
4
0
1
0
0
0
5
0
1
0
0
0
0
0.5 0 0 
0


1
0
0.5
1
1


Let T   0 0.333 0 0 0  gives ranks 0.08 0.46, 0.15, 0.15, 0.15

 respectively
0
0.333
0
0
0


 0 0.333 0 0 0 


In Summary
Google uses simultaneous equations in an
extremely inventive way to do its page ranking.
This made it the market leader in search engines!
YOU knew enough mathematics to come up with
this idea when you were very young!
What is the next big idea in applications of
mathematics?.............
Download