Hotbot A Search Engine Case Study

advertisement
Hotbot
A Search Engine Case Study
Introduction

Owned by Terra/Lycos.

One of the largest web search engines.

Uses the Inktomi database combined with
Direct Hit and the DMOZ Open Directory.

Basic search screen is simple, but the advanced
search allows for a full range of search features.
Databases




Open Directory
Direct Hit
Inktomi
Direct Hit results display if the option for 10 results at a time is
selected and there are 10 results available from Direct Hit. If an
option for more than 10 results at a time is selected the Direct
Hit results are available via a link. Other content comes from
various advertisers, the Lycos Network, and GoTo. The GoTo
and other advertiser results may show up above and/or below
the other results but are under a separate heading such as
"feature listings."
Strengths
 Advanced
searching capabilities
 Page depth limit
 Advanced search help
 Truncation
Weaknesses
 Link
searches must be exact
 Database size shrunk for awhile
 Advanced features have not always
worked right
Features







Default Operation: Processed as an AND
Full Boolean Searching: AND, OR, and NOT
Proximity Searching
Truncation with the * symbol
Case sensitive
Extensive, dynamic stop word list
Word Stemming - Search for grammatical word
variants including plural, singular, and tense.
Field Searches
Field Searching: Searching title words and
links to a specific URL
 acrobat/applet/activex/audio/embed/
flash/form/frame/image/script/
shockwave/table/video/vrml

Limits







linkdomain: Limits pages containing links to the
specified domain
Outgoingurlext: Limits to pages containing embedded
files with the specified extension
Scriptlanguage: Limits to pages containing only javascript
or vbscript
after: [day]/[month]/[year]
before: [day]/[month]/[year]
within:[number/unit]
Language Limit
Unique for Hotbot

Page Type –
Default is Any (Any pages)
 Top Page (the root page of a URL ie. www.unca.edu)



Page Depth - Limits how far down a
subdirectory hierarchy Hotbot Searches
These are useful for finding the primary sites for
organizations or information
Sorting
Results are sorted by relevance with
groupings by site available at the end of
each brief record.
 The display includes the relevance score,
title, URL, a brief extract, and date.
HotBot displays 10 records at a time, by
default.

Architecture

Direct Hit:
 Provides
the breadth of a conventional search
engine, with the relevancy of an index which
is edited by humans
 References the searching activity of millions
of users
 Adjusts rankings based on the popularity of
the retrieved documents
Architecture

Inktomi
Hosts Web searches for its clients on coupledcluster, parallel-computing multiple workstations
 Receiving a search query from a user, that interface
translates the query from HTTP into Inktomi Data
Protocol (IDP) and sends it to the Inktomi Master
Cluster
 it sends the results in IDP to the client Web server,
which translates the information into HTTP and
sends it to the user

Results



Query 1: Information on Home of the Rockefellers
Kykuit - To test the engines on a very specific bit of
Americana - Kykuit, the baronial home of the
Rockefellers on the Hudson River in New York.
Query 2: Information on Neuschwanstein Castle - To
test the engines on a fairly well-known tourist attraction
in Germany - Neuschwanstein Castle
Query 3: Information on Francis Pilkington Madrigals
- To test the engines on retrieval of an obscure musical
reference - the Elizabethan madrigals of Francis
Pilkington.
Query 1: Information on Home of
the Rockefellers Kykuit

Hotbot - 72 Matches



Google - 91 Matches



FPL: www.gorp.com/gorp/location/ny/kyk_hudv.htm
Relevance rating: Page 14: County Historys
FPL: www.abbeville.com/booktemplate.asp?stockno=2220
Relevance: Page 30: A Book Where Kykuit is mentioned
UNCA Library - 5 Matches


FPL:
wncln.appstate.edu/search/...information+on+how+to+use
+the+dietary+guidelines&1,1
Relevance: Page 1: Information on how to use dietary
guidelines
Query 2: Information on
Neuschwanstein Castle

Hotbot - 2,700 Matches



Google – 4,060 Matches



FPL: www.castlesoftheworld.com/Brochure/
Relevance: Page 10: Castles of the US
FPL: www.neuschwanstein-castle.com/
Relevance: Page 33: A Page on King Ludwig II - No
Mention of Neuschwanstein Castle
UNCA Library - 5 Matches


FPL:
wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform
ation+on+self+employment+tax&1,1
Relevance: Page 1: Information On Self Employment Tax
Query 3: Information on Francis
Pilkington Madrigals

Hotbot - 53 Matches



Google - 33 Matches



FPL: www.medieval.org/emfaq/cds/van624.htm
Relevance: Page 5 - A Page about the Lute - no mention of
Madrigals
FPL: www.netstrider.com/search/methods.html
Relevance: Page 3: No mention of Pilkington Madrigals
UNCA Library - 5 Matches


FPL:
wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform
ation+on+the+red+notice+system&1,1
Relevance: Page 1: Information On The Red Notice System
Conclusion


HotBot is an interface to advanced web searches, and it presents
a dynamically changing backend. Both the Inktomi and Direct
Hit technologies serve, in different ways, to provide a relevant
list of results through advanced queries, and both seek to
minimize the commercial influence over search results. All of
these technologies are subject to changes in technology
developments, and changes in the business environment.
Its weaknesses include that it still doesn't seem to produce the
depth and breadth of some other engines, and that it's advanced
features have not always worked correctly. As the proliferation
of this engine's index and searching features continues, these
weaknesses should be overcome.
Download