Search Engines Vynarack Xaykao INF 385F: WIRED Dr. Turnbull September 30, 2004 Outline Google’s origins Marketing your site to search engines Meta Search Engines (MSEs) Future of web searching Google’s Origins Sergey Brin & Lawrence Page (Stanford U.) Dark arts: advertiser-driven search engines Up to academics to make good engines Google focused on basic elements of IR Content: scalability (though perfect recall is impossible) Relevance: PageRank Information need Similar pages Stemming (bowl, bowling, bowler) PageRank Factors Number of links pointing to a site PageRanks of referring pages Can you think of a disadvantage of using PageRank to order results? Google Ranking Classify words in hit list by type Relative font size HTML tags Position IR score: count-weights & type-weights Final rank: IR score & PageRank Marketing your site to search engines 1. search engine optimization: use keywords 2. directory submission & link development 3. pay-for-placement campaigns: top position guaranteed (Overture) 4. trusted feed and paid inclusion programs: guaranteed frequent indexing, top placement not guaranteed Meta Search Engines Search several engines simultaneously Pros Saves the searcher time Relevant results Cons Engines accept different syntax Searches can be slow and time out Types of Meta Search Engines 1. Real MSEs: combine results from different engines (Vivisimo) 2. Pseudo MSEs type I: groups the results by search engine (My Net Crawler) 3. Pseudo MSEs type II: opens a window for each search engine (Multi-Search-Engine.com) 4. Search Utilities: software that searches engines (Copernic) Future of Web Searching Search engines give people starting points Hard part is using sites themselves Card & Pirolli’s information foraging theory Maximum benefit for minimum effort Information has a scent Don’t want user to resort to the site search Next Generation Web Searching “We would like a train system that magically lays down new track to suggest useful directions to go based on where we have been so far and what we are trying to do.” (Hearst, 2002, p. 3) How? Metadata Types of Metadata Creation Descriptive Administrative Good for searching collections of similar items (recipes) Searching metadata yields higher relevance Faceted classification S. R. Ranganathan’s Colon Classification (1933) Example: design of wooden furniture in 18th century America 1. personality : furniture 2. matter : wood 3. energy : design 4. space : America 5. time : 18th century Next Generation Web Searching Figure out people’s tasks Ideal site incorporates metadata using facets for browsing search tool for refining Additional References Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the Web. Retrieved September 29, 2004, from http://dbpubs.stanford.edu:8090/aux/index-en.html Pirolli, P. and Card, S. K. (1995). Information foraging in information access environments. ACM Conference on Human Factors in Software (CHI '95), Denver, Colorado 51–58. Steckel, M. (2002, October 7). Ranganathan for IAs. Boxes and Arrows. Retrieved September 26, 2004, from http://www.boxesandarrows.com/archives/ ranganathan_for_ias.php