Hyper Searching the Web Search Engines: Basic Search (Index), Cluster Search (Themes), Meta-Search (Outsource), “Smarter” Meta-Search (Themes & Outsource) Basic Search Engines: ex. AltaVista, InfoSeek, Lycos, Excite, HotBot, Google, etc; maintains an index for every word found; processes through crawling, indexing and returning results; different ranking systems used- most used heuristics (easiest solution) counts # of keywords that appear/ Google uses PageRank No idea of searcher’s intent so “best” result is hard to achieve Problems with synonymy and polysemy ex. Car and automobile/ jaguar One solution” store semantic relations- only can help synonmy Can’t identify concepts/ author intent ex. IBM site does not say “computer” Cluster Search Engine: ex. Site “Clusty”/ cluster results into categories/themes Can show results that would be ranked lower in another search engine- due to different meanings in word, can show the less searched-for websites Meta Search Engine: DogPile, Surf Wax, Copernic Sends searcher’s query to a database of search engines Claimed to not be any better than database; often the referenced search engines are small, free, commercial; users can create their own on Google of up to 5,000 URLs as “database” Smarter Meta-Search engine ex. Clever project (n/a online yet) Includes clustering and linguistic analysis Uses hyperlinks to locate hubs and authorities “a respected authority is a page that is referred to by many good hubs; a useful hub is a location that points tp many valuable authorities” The Clever Project: obtains a list of webpages from a standard index & follows hyperlinks to increase own database -resulting collection= “root set”/ -each page gets numerical hub and authority score The Clever Project: similar to PageRank in determining method- guesses & constant calculations (useful by-product: cluster sites) Adds to competition because competitors don’t have to acknowledge their competition through hyperlinks Clever vs Google Google: gives initial rankings, keeps pages independent of queries, faster, looks forward “link to link” Clever: root sets per keyword, page priority through query context, forwards and backwards “hub authority”, sometimes too broad ex. Fallingwater