Search Engines Vynarack Xaykao INF 385F: WIRED Dr. Turnbull

advertisement
Search Engines
Vynarack Xaykao
INF 385F: WIRED
Dr. Turnbull
September 30, 2004
Outline
Google’s origins
Marketing your site to search engines
Meta Search Engines (MSEs)
Future of web searching
Google’s Origins
Sergey Brin & Lawrence
Page (Stanford U.)
Dark arts: advertiser-driven
search engines
Up to academics to make
good engines
Google focused on basic
elements of IR
Content: scalability (though perfect recall
is impossible)
Relevance: PageRank
Information need
Similar pages
Stemming (bowl, bowling, bowler)
PageRank Factors
Number of links
pointing to a site
PageRanks of
referring pages
Can you think of a disadvantage of using PageRank to order
results?
Google Ranking
Classify words in hit list by type
Relative font size
HTML tags
Position
IR score: count-weights & type-weights
Final rank: IR score & PageRank
Marketing your site to search
engines
1. search engine optimization: use keywords
2. directory submission & link development
3. pay-for-placement campaigns: top position
guaranteed (Overture)
4. trusted feed and paid inclusion programs:
guaranteed frequent indexing, top placement not
guaranteed
Meta Search Engines
Search several engines simultaneously
Pros
Saves the searcher time
Relevant results
Cons
Engines accept different syntax
Searches can be slow and time out
Types of Meta Search Engines
1. Real MSEs: combine results from different
engines (Vivisimo)
2. Pseudo MSEs type I: groups the results by
search engine (My Net Crawler)
3. Pseudo MSEs type II: opens a window for each
search engine (Multi-Search-Engine.com)
4. Search Utilities: software that searches engines
(Copernic)
Future of Web Searching
Search engines give people starting points
Hard part is using sites themselves
Card & Pirolli’s information foraging theory
Maximum benefit for minimum effort
Information has a scent
Don’t want user to resort to the site search
Next Generation Web Searching
“We would like a train system that magically
lays down new track to suggest useful
directions to go based on where we have
been so far and what we are trying to do.”
(Hearst, 2002, p. 3)
How?
Metadata
Types of Metadata
Creation
Descriptive
Administrative
Good for searching collections of similar
items (recipes)
Searching metadata yields higher relevance
Faceted classification
S. R. Ranganathan’s Colon Classification (1933)
Example: design of wooden furniture in 18th
century America
1. personality : furniture
2. matter : wood
3. energy : design
4. space : America
5. time : 18th century
Next Generation Web Searching
Figure out people’s tasks
Ideal site incorporates
metadata using facets for browsing
search tool for refining
Additional References
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The
PageRank citation ranking: Bringing order to the Web.
Retrieved September 29, 2004, from
http://dbpubs.stanford.edu:8090/aux/index-en.html
Pirolli, P. and Card, S. K. (1995). Information foraging in
information access environments.
ACM Conference on Human Factors in Software (CHI
'95), Denver, Colorado 51–58.
Steckel, M. (2002, October 7). Ranganathan for IAs. Boxes
and Arrows. Retrieved September 26, 2004, from
http://www.boxesandarrows.com/archives/
ranganathan_for_ias.php
Download