Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web Different Ways to Search on the Web Input URLs, surf links Subject directories Search engines Metasearch engines Web Directories Small, selective databases Created by humans not machines Editors select and place sites into categories for easy retrieval User browses categories and links to sites How Directories Work Browse subject categories Funnel: from categories to web sites Health Fitness Yahoo! LookSmart Open Directory • Yoga Most popular sites yogabasics http://www.yogabasics.com/ Directory Search Boxes Use when subject categories don’t match your topic: winemaking in Slovenia Be aware a directory first searches its own select database may automatically default to a search engine if it finds little in its own database LookSmart defaults to WiseNut Open Directory defaults to Google Why Use Directories? Identify major and quality sites Get overview, general information on topic Enjoy serendipity in discovery as you manipulate a small, focused file Small, Selective Directories Librarian’s Index to the Internet 14,000+ Informine 120,000+ Academic Info 25,000+ WWW Virtual Library Large, Less Selective Directories Yahoo!: 3,000,000+ Open Directory: 3,800,000+ LookSmart: 2,500,000+ HyperResearch Guide Web Search Engines What Are Search Engines? Spiders, crawlers Visits, reads web page, follows its links Index, catalog Giant book containing web page spider finds Spiders update pages and add to index Search interface Sifts through index to find matches to words in searchbox Ranks pages for relevancy Search Engines are Same and Different Search engines are the same Consist of crawlers, index, interface Search engines are different Low overlap of database contents 50% Pages of pages in one not found in another for inclusion found internally, by following links, or by submission Remember: Google doesn’t know it all! How Search Engines Work Spider “capture” web pages Web pages build index, database Interface finds words in database Engine ranks, describes results How engines and directories differ Spiders Capture Web Pages Spider “reads” database text into index Google ~ 100K, Yahoo ~ 500K Spider follows page links, reads new pages into index Spider returns to sites (every month or two) and look for changes Dead sites removed, current sites updated New sites added (through new links found or submitted by others) Web Pages Build Database Current web size: over 15 billion pages Each database has different pages No engine’s database covers it all Google ~ 29% (4.3 billion+) Yahoo! ~ 20% (3 million+) HotBot ~ 20% (3 billion+) Teoma ~ 13% (2 billion) As pages are updated, nature of database changes Interface Finds Words in Database Provides keyword search box Offers search options to affect results Assumes AND between words: Iraq WMD Uses “quotes” for PHRASE searches: “Cold War” Allows FIELD searching : ti:Russian mafia url:russianmafia Offers Simple and Advanced Search Teoma Engine Ranks, Describes Results How “relevance” is determined Location and frequency of search words Title tag, near top of page of indexed text popularity ~ how many “clicks” a site get Link popularity ~ how often others link to site Subject popularity ~ link popularity within subject communities (Teoma) Site Results described, keywords highlighted How Engines and Directories Differ Computers vs people Spiders select documents Editors select documents Quantity vs quality Engines large, non-judgmental Directories small, want “best” “most important” Technology vs human factor Software ranks items Editors organize pages into subject, categories Why Use Search Engines? You need specific rather than general information What role did the Romans play in developing a wine culture in Slovenia? You need a large, comprehensive database in contrast to a directory Note: remember, search engine crawlers grab anything and everything; selectivity depends on you and the engine’s ranking system Top Search Engines Google Yahoo (Inktomi) HotBot (Inktomi) Teoma HyperResearch Guide 4.3 billion+ 3 billion+ 3 billion+ 2 billion+ Metasearch Engines Metasearch Engines Technologies that search several search engines at the same time Pros Increase results when one search engine produces little Save time by searching several engines at once Show results of several engines on one page Cons Retrieve too many hits Retrieve less relevant results Cannot read individual search syntax well Cannot tell if syntax requires terms in upper. lower case (OR~or~and~AND? Cannot tell if title, URL searching allowed, etc. Contain mix of major search engines, not all Top Metasearch Engines Vivisimo Clusters results into subject folders Dogpile Refines results, covers major engines Ez2find Includes most major engines A Few Words About the Web and Search Engines What’s In Search Engines? Business, commercial information Organizational publications Government resources Some magazine, newspaper articles Some scholarly information Teaching materials, unpublished articles Books, articles whose copyright expired What’s Not in Search Engines Most books and periodical articles Current, past research, fiction, non-fiction Reference materials Best current encyclopedias, handbooks, business advisory services, etc. Bulk of human knowledge and research Where can some of this information be found? In libraries in print or via subscription databases available in libraries, institutions, organizations Widening Google and Yahoo’s Eyes for Scholarship OAIster University of Michigan and Yahoo project 3,000,000 scholarly documents 277 institutions involved Open WorldCat OCLC and Google project 2,000,000 books in Google index Open access to 54 million books is goal Search Tips Check “advanced” search and options Learn about AND, OR, ANY, ALL, PHRASE Know how to search in titles, URLs Spell it right Switch engines, get different results Keep up to date about search engines Newspapers and magazines Library web sites Learn to Evaluate Web Sites Accuracy Is information reliable? Where is it from? What does URL tell you? (com, .org, .gov, .edu)? Authority Author’s Content and Currency Purpose credentials? Address, email given? of site: inform, sell, propagandize? Date? Documentation Are sources given, footnotes? Are other links given? Find and Evaluate Use Google and find Website titled: The Burmese Mountain Dog Evaluate this site for Accuracy Authority Content and Currency Documentation Is it a trustworthy Web site?