CPS 49S Google: The Computer Science Within and its Impact on Society

Google: The Computer Science
Within and its Impact on Society
Shivnath Babu
Spring 2007
Discussion Format
• Talk for 10-15 minutes
– Give an overview
• Give an outline the discussion points that you
have come up with
• Need a scribe
– Volunteer?
• Make it a habit to check the course web page
daily for:
– Updated notes (presentation, discussion report, and
scribe notes)
– Current and future schedule
– Announcements
• http://www.google.com/corporate/tech.html
• Let us look at some numbers
– From the paper
– From searchenginewatch.com
Introduction (contd.)
• Terms
– HTML (look at the HTML for the class web page),
Hypertext, link/hyperlink, inlink, outlink, anchor text,
link graph
– Search engine, meta search engine
– Information retrieval, crawl, index
• Terms that we will discuss later
– PageRank, proximity, barrel, …
Discussion Points
• Motivation for Google
– Human-maintained lists
– Keyword matching only
– Advertising --- conflict of interest
Discussion Points
• Design Goal #1: High-quality search results
– Hypertext
– Proximity
– PageRank
• Design Goal #2: Good performance
• Design Goal #3: Support for research activities
• Problem: User types in a keyword-based search
query. We have to (i) find result pages to answer
this query, and (ii) rank these result pages
– Proximity of terms
– Anchor text
– PageRank
Of terms on a web page
E.g., phrases
E.g., “anatomy”, “search”, “anatomy search”
E.g., “google freshman seminar duke”
Other examples?
Anchor text
• Text around the link
• Often accurate and concise description of page
• May have terms that the page does not contain
– “search engine”
– Other examples?
• Can return pages that have not been crawled
First cut: count inlinks
Basic idea --- “recursive” counting
Interpretation based on probability
Assigned Readings
• For Tue (1/23)
– Continuation of the anatomy paper
– Paper on “Taxonomy of Web Search”