CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2007 Discussion Format • Talk for 10-15 minutes – Give an overview • Give an outline the discussion points that you have come up with • Need a scribe – Volunteer? Note • Make it a habit to check the course web page daily for: – Updated notes (presentation, discussion report, and scribe notes) – Current and future schedule – Announcements Introduction • http://www.google.com/corporate/tech.html • Let us look at some numbers – From the paper – From searchenginewatch.com Introduction (contd.) • Terms – HTML (look at the HTML for the class web page), Hypertext, link/hyperlink, inlink, outlink, anchor text, link graph – Search engine, meta search engine – Information retrieval, crawl, index • Terms that we will discuss later – PageRank, proximity, barrel, … Discussion Points • Motivation for Google – Human-maintained lists – Keyword matching only – Advertising --- conflict of interest Discussion Points • Design Goal #1: High-quality search results – Hypertext – Proximity – PageRank • Design Goal #2: Good performance • Design Goal #3: Support for research activities Next • Problem: User types in a keyword-based search query. We have to (i) find result pages to answer this query, and (ii) rank these result pages – Proximity of terms – Anchor text – PageRank Proximity • • • • • Of terms on a web page E.g., phrases E.g., “anatomy”, “search”, “anatomy search” E.g., “google freshman seminar duke” Other examples? Anchor text • Text around the link • Often accurate and concise description of page • May have terms that the page does not contain – “search engine” – Other examples? • Can return pages that have not been crawled PageRank • • • • First cut: count inlinks Basic idea --- “recursive” counting Interpretation based on probability Demo Assigned Readings • For Tue (1/23) – Continuation of the anatomy paper – Paper on “Taxonomy of Web Search”