Hotbot A Search Engine Case Study Introduction Owned by Terra/Lycos. One of the largest web search engines. Uses the Inktomi database combined with Direct Hit and the DMOZ Open Directory. Basic search screen is simple, but the advanced search allows for a full range of search features. Databases Open Directory Direct Hit Inktomi Direct Hit results display if the option for 10 results at a time is selected and there are 10 results available from Direct Hit. If an option for more than 10 results at a time is selected the Direct Hit results are available via a link. Other content comes from various advertisers, the Lycos Network, and GoTo. The GoTo and other advertiser results may show up above and/or below the other results but are under a separate heading such as "feature listings." Strengths Advanced searching capabilities Page depth limit Advanced search help Truncation Weaknesses Link searches must be exact Database size shrunk for awhile Advanced features have not always worked right Features Default Operation: Processed as an AND Full Boolean Searching: AND, OR, and NOT Proximity Searching Truncation with the * symbol Case sensitive Extensive, dynamic stop word list Word Stemming - Search for grammatical word variants including plural, singular, and tense. Field Searches Field Searching: Searching title words and links to a specific URL acrobat/applet/activex/audio/embed/ flash/form/frame/image/script/ shockwave/table/video/vrml Limits linkdomain: Limits pages containing links to the specified domain Outgoingurlext: Limits to pages containing embedded files with the specified extension Scriptlanguage: Limits to pages containing only javascript or vbscript after: [day]/[month]/[year] before: [day]/[month]/[year] within:[number/unit] Language Limit Unique for Hotbot Page Type – Default is Any (Any pages) Top Page (the root page of a URL ie. www.unca.edu) Page Depth - Limits how far down a subdirectory hierarchy Hotbot Searches These are useful for finding the primary sites for organizations or information Sorting Results are sorted by relevance with groupings by site available at the end of each brief record. The display includes the relevance score, title, URL, a brief extract, and date. HotBot displays 10 records at a time, by default. Architecture Direct Hit: Provides the breadth of a conventional search engine, with the relevancy of an index which is edited by humans References the searching activity of millions of users Adjusts rankings based on the popularity of the retrieved documents Architecture Inktomi Hosts Web searches for its clients on coupledcluster, parallel-computing multiple workstations Receiving a search query from a user, that interface translates the query from HTTP into Inktomi Data Protocol (IDP) and sends it to the Inktomi Master Cluster it sends the results in IDP to the client Web server, which translates the information into HTTP and sends it to the user Results Query 1: Information on Home of the Rockefellers Kykuit - To test the engines on a very specific bit of Americana - Kykuit, the baronial home of the Rockefellers on the Hudson River in New York. Query 2: Information on Neuschwanstein Castle - To test the engines on a fairly well-known tourist attraction in Germany - Neuschwanstein Castle Query 3: Information on Francis Pilkington Madrigals - To test the engines on retrieval of an obscure musical reference - the Elizabethan madrigals of Francis Pilkington. Query 1: Information on Home of the Rockefellers Kykuit Hotbot - 72 Matches Google - 91 Matches FPL: www.gorp.com/gorp/location/ny/kyk_hudv.htm Relevance rating: Page 14: County Historys FPL: www.abbeville.com/booktemplate.asp?stockno=2220 Relevance: Page 30: A Book Where Kykuit is mentioned UNCA Library - 5 Matches FPL: wncln.appstate.edu/search/...information+on+how+to+use +the+dietary+guidelines&1,1 Relevance: Page 1: Information on how to use dietary guidelines Query 2: Information on Neuschwanstein Castle Hotbot - 2,700 Matches Google – 4,060 Matches FPL: www.castlesoftheworld.com/Brochure/ Relevance: Page 10: Castles of the US FPL: www.neuschwanstein-castle.com/ Relevance: Page 33: A Page on King Ludwig II - No Mention of Neuschwanstein Castle UNCA Library - 5 Matches FPL: wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform ation+on+self+employment+tax&1,1 Relevance: Page 1: Information On Self Employment Tax Query 3: Information on Francis Pilkington Madrigals Hotbot - 53 Matches Google - 33 Matches FPL: www.medieval.org/emfaq/cds/van624.htm Relevance: Page 5 - A Page about the Lute - no mention of Madrigals FPL: www.netstrider.com/search/methods.html Relevance: Page 3: No mention of Pilkington Madrigals UNCA Library - 5 Matches FPL: wncln.appstate.edu/search/…6,0,0,B/frameset&FF=tinform ation+on+the+red+notice+system&1,1 Relevance: Page 1: Information On The Red Notice System Conclusion HotBot is an interface to advanced web searches, and it presents a dynamically changing backend. Both the Inktomi and Direct Hit technologies serve, in different ways, to provide a relevant list of results through advanced queries, and both seek to minimize the commercial influence over search results. All of these technologies are subject to changes in technology developments, and changes in the business environment. Its weaknesses include that it still doesn't seem to produce the depth and breadth of some other engines, and that it's advanced features have not always worked correctly. As the proliferation of this engine's index and searching features continues, these weaknesses should be overcome.