the Further Mathematics network www.fmnetwork.org.uk the Further Mathematics network www.fmnetwork.org.uk Google and Simultaneous Equations Let Maths take you Further… Google – The World’s Favourite Search Engine •Why? Simple clean design, fast searches, it puts the “good stuff” at the top •Some Figures Google indexes billions of pages It has several hundred million queries per day with many, many thousands per second It runs on tens of thousands of Linux computers It makes up half of the multi billion dollar per year search engine business Page Ranking Google has to do three things a) Crawl the web to find all the publicly accessible websites b) Arrange the data it finds so that it can be searched quickly c) Rank the websites in order of importance, so that the most important can be presented to the user first The rest of this presentation focuses on c) above. The Mathematics Web site 2 Web site 1 Web site Website Linked to from 1 3 2 1,3 3 2 2 3 3 1 Suppose the entire world wide web consisted of only three websites. An arrow between two websites in the diagram on the left represents a link between those two websites. For example there is a link from website 2 to website 3. We can represent this as a directed graph, see the diagram on the right. The Mathematics The goal is to assign a rank to each website. 2 The rank should be a measure of interest. Let x1 be the rank of website 1 3 Let x2 be the rank of website 2 Let x3 be the rank of website 3 1 We’ll also insist that the ranks are normalised, i.e. x1 + x2 + x3 = 1 First attempt is to base the rank on the number of incoming links. Website i Incoming Links Rank, xi 1 1 0.25 2 2 0.50 3 1 0.25 The Mathematics However, a good ranking system should surely take into account the importance of the website that the links are coming from. A link from www.bbc.co.uk should be worth more than a link from www.barometerworld.co.uk. 2 3 1 This was Google’s big idea. We need to think of the ranks as representing voting power… For example website 3 has voting power of x3. Since it links to websites 1 and 2, it will vote 0.5x3 to website 1 and 0.5x3 to website 2. Website 1 has voting power of x1. Since it only links to website 2, it will vote x1 to website 2. Website 2 has voting power of x2. Since it only links to website 3, it will vote x2 to website 3. But the total of each website’s vote is its rank. This gives the following equations…… The Mathematics x1 0.5 x3 2 x2 x1 0.5 x3 3 x3 x2 1 But this can be written as a matrix equation…. 0 0 0.5 x1 x1 1 0 0.5 x2 x2 0 1 0 x x 3 3 The solution of this with x1 + x2 + x3 = 1 is x1 = 0.2, x2 = 0.4, x3 = 0.4 Eigenvalues and Eigenvectors 2 The ranks x1, x2, x3 are a solution of 3 1 0 0 0.5 x1 x1 1 0 0.5 x2 x2 0 1 0 x x 3 3 So the ranks as a column vector are an eigenvector of the matrix with eigenvalue 1 (chosen so that the ranks add up to 1). You might know this as a fixed point of the matrix. How does Google calculate this so quickly? We need to find a quick way to calculate this vector, and hence the ranks. Iterative Computation 2 Let 0 0 0.5 T 1 0 0.5 0 1 0 3 1 1/ 3 Starting with the vector x 1/ 3 1/ 3 We look at the sequence of vectors Tx , T2x, T3x, T4x,….. The limit of this sequence is the eigenvector of ranks…. Adding Websites 4 2 3 1 0 1 Let T 0 0 Site 1 2 3 4 1 0 0 1 0 2 1 0 1 1 3 0 1 0 0 4 0 0 0 0 0 0.5 0 0 0.5 1 gives ranks 0.2, 0.4, 0.4, 0 1 0 0 respectively 0 0 0 Adding Websites 4 2 3 1 0 0 0.5 0 1 0 0.5 1 Let T 0 0.5 0 0 0 0.5 0 0 Site 1 2 3 4 1 0 0 1 0 2 1 0 1 1 3 0 1 0 0 4 0 1 0 0 gives ranks 1/9, 4/9, 2/9, 2/9 respectively Adding Websites 4 2 5 3 1 Site 1 2 3 4 5 1 0 0 1 0 0 2 1 0 1 0 0 3 0 1 0 0 0 4 0 1 0 0 0 5 0 1 0 0 0 0 0.5 0 0 0 1 0 0.5 0 0 Let T 0 0.333 0 0 0 gives ranks 0, 0, 0, 0, 0, the problem of dangling nodes! 0 0.333 0 0 0 0 0.333 0 0 0 Google have a way of getting around this. Adding Websites 4 2 5 3 1 Site 1 2 3 4 5 1 0 0 1 0 0 2 1 0 1 1 1 3 0 1 0 0 0 4 0 1 0 0 0 5 0 1 0 0 0 0 0.5 0 0 0 1 0 0.5 1 1 Let T 0 0.333 0 0 0 gives ranks 0.08 0.46, 0.15, 0.15, 0.15 respectively 0 0.333 0 0 0 0 0.333 0 0 0 In Summary Google uses simultaneous equations in an extremely inventive way to do its page ranking. This made it the market leader in search engines! YOU knew enough mathematics to come up with this idea when you were very young! What is the next big idea in applications of mathematics?.............