Shivnath Babu
Spring 2007
Google is Born
• Bibliometrics
• Counting inlinks PageRank (BackRub)
• Google’s initial problems need for self-policing
– Copyright issues [downloading entire site content]
– Crawling issues [depth search Vs. breadth search]
– Ranking issues
• Portal Mania
– Conflict of Interest
Document File Preparation
• Heterogeneity
• Document Index Search
• Manual indexing Vs. automated indexing
• Term extraction
– Stop lists
– Stemming
– Advanced: Disambiguation (Metathesaurus)
• Dictionary/lexicon
• Inverted index
Searching the Web
• Different performance metrics for search engines
• Study of coverage of different search engines
– Have to be careful of details
• Using multiple search engines can improve coverage
• HotBot << MetaCrawler << AHOY!
Advertising Models for the Internet
• Bill Gross (IdeaLab)
• At that time:
– Search engines were getting spammed
– Tragedy of the commons
– TV/Super-Bowl model of advertising was not beneficial to most advertisers
• Goto.com
– Keyword interest [Good Vs. Bad traffic]
– Pay-per-click
– Auctions, bidding for keywords under competition
Advertising Models for the Internet
• To profit from search and control its destiny , a search engine needs three elements:
– High-quality search results
– Syndication
– Own its own traffic (means popular Web site)
Advertising Models for the Internet
• Pay-per-click click fraud
• Pay-per-print
• Pay-per-call
• Pay-per-sale
Traditional Vs. New Advertising
• When is feedback available (“real-time”)
– Asset management (performance-based advertising)
– How accurate is the feedback? (video-game example)
• Minimal upfront investment needed
– Earlier companies would advertise only 5-10% of their products
• Real-time auctions
• “Cholesterol symptoms”
• Advertising was an art; now becoming a science
Online Vs. Offline Advertising
• Online Internet, Offline TV, Radio, News
Paper
• Bringing the advertising revolution from online to offline