PowerPoint Presentation on Directories and Search Engines

advertisement
Best Web
Directories and Search Engines
Order Out of Chaos on the World
Wide Web
Different Ways to Search on the Web
Input URLs, surf links
 Subject directories
 Search engines
 Metasearch engines

Web Directories
Small, selective databases
 Created by humans not machines
 Editors select and place sites into
categories for easy retrieval
 User browses categories and links to sites

How Directories Work

Browse subject categories
 Funnel:
from categories to web sites
 Health
 Fitness
Yahoo! LookSmart
Open Directory
• Yoga
Most popular sites
yogabasics
http://www.yogabasics.com/
Directory Search Boxes

Use when
 subject
categories don’t match your topic:
winemaking in Slovenia

Be aware a directory
 first
searches its own select database
 may automatically default to a search engine
if it finds little in its own database
 LookSmart
defaults to WiseNut
 Open Directory defaults to Google
Why Use Directories?
 Identify
major and quality sites
 Get overview, general information
on topic
 Enjoy serendipity in discovery as
you manipulate a small, focused file
Small, Selective Directories
 Librarian’s
Index to the Internet
14,000+
 Informine
120,000+
 Academic
Info
25,000+
 WWW Virtual
Library
Large, Less Selective Directories

Yahoo!: 3,000,000+

Open Directory: 3,800,000+

LookSmart: 2,500,000+
HyperResearch Guide
Web Search Engines
What Are Search Engines?

Spiders, crawlers
 Visits,

reads web page, follows its links
Index, catalog
 Giant
book containing web page spider finds
 Spiders update pages and add to index

Search interface
 Sifts
through index to find matches to words in
searchbox
 Ranks pages for relevancy
Search Engines are Same and
Different

Search engines are the same
 Consist

of crawlers, index, interface
Search engines are different
 Low
overlap of database contents
 50%
 Pages
of pages in one not found in another
for inclusion found internally, by
following links, or by submission
 Remember: Google doesn’t know it all!
How Search Engines
Work
 Spider
“capture” web pages
 Web pages build index, database
 Interface finds words in database
 Engine ranks, describes results
 How engines and directories differ
Spiders Capture
Web Pages

Spider “reads” database text into index
 Google


~ 100K, Yahoo ~ 500K
Spider follows page links, reads new pages into
index
Spider returns to sites (every month or two)
and look for changes
 Dead
sites removed, current sites updated
 New sites added (through new links found or
submitted by others)
Web Pages Build Database



Current web size: over 15 billion pages
Each database has different pages
No engine’s database covers it all
 Google
~ 29% (4.3 billion+)
 Yahoo! ~ 20% (3 million+)
 HotBot ~ 20% (3 billion+)
 Teoma ~ 13% (2 billion)

As pages are updated, nature of database
changes
Interface Finds Words in
Database

Provides keyword search box

Offers search options to affect results
 Assumes AND
between words: Iraq WMD
 Uses “quotes” for PHRASE searches: “Cold War”
 Allows FIELD searching : ti:Russian mafia
url:russianmafia

Offers Simple and Advanced Search

Teoma
Engine Ranks, Describes Results

How “relevance” is determined

Location and frequency of search words

Title tag, near top of page of indexed text
popularity ~ how many “clicks” a site get
 Link popularity ~ how often others link to site
 Subject popularity ~ link popularity within subject
communities (Teoma)
 Site
 Results described, keywords highlighted
How Engines and Directories
Differ

Computers vs people
 Spiders
select documents
 Editors select documents

Quantity vs quality
 Engines
large, non-judgmental
 Directories small, want “best” “most important”

Technology vs human factor
 Software
ranks items
 Editors organize pages into subject, categories
Why Use Search Engines?

You need specific rather than general
information
 What
role did the Romans play in developing a
wine culture in Slovenia?

You need a large, comprehensive
database in contrast to a directory

Note: remember, search engine crawlers grab
anything and everything; selectivity depends on
you and the engine’s ranking system
Top Search Engines
 Google
 Yahoo
(Inktomi)
 HotBot (Inktomi)
 Teoma
HyperResearch Guide
4.3 billion+
3 billion+
3 billion+
2 billion+
Metasearch Engines
Metasearch Engines
Technologies that search several
search engines at the same time
Pros
Increase results when one search engine
produces little
 Save time by searching several engines at
once
 Show results of several engines on one
page

Cons


Retrieve too many hits
Retrieve less relevant results
 Cannot read individual search syntax well
 Cannot
tell if syntax requires terms in upper. lower
case (OR~or~and~AND?
 Cannot tell if title, URL searching allowed, etc.

Contain mix of major search engines, not all
Top Metasearch Engines
 Vivisimo
Clusters
results into subject folders
 Dogpile
Refines
results, covers major engines
 Ez2find
Includes
most major engines
A Few Words About the Web and
Search Engines
What’s In Search Engines?
Business, commercial information
 Organizational publications
 Government resources
 Some magazine, newspaper articles
 Some scholarly information

 Teaching

materials, unpublished articles
Books, articles whose copyright expired
What’s Not in Search Engines

Most books and periodical articles
 Current,

past research, fiction, non-fiction
Reference materials
 Best
current encyclopedias, handbooks, business
advisory services, etc.

Bulk of human knowledge and research
Where can some of this information be found?
 In
libraries in print or via subscription databases
available in libraries, institutions, organizations
Widening Google and Yahoo’s Eyes
for Scholarship

OAIster
 University
of Michigan and Yahoo project
 3,000,000 scholarly documents
 277 institutions involved

Open WorldCat
 OCLC
and Google project
 2,000,000 books in Google index
 Open access to 54 million books is goal
Search Tips






Check “advanced” search and options
Learn about AND, OR, ANY, ALL, PHRASE
Know how to search in titles, URLs
Spell it right
Switch engines, get different results
Keep up to date about search engines
 Newspapers
and magazines
 Library web sites
Learn to Evaluate Web Sites

Accuracy
 Is
information reliable? Where is it from?
 What does URL tell you? (com, .org, .gov, .edu)?

Authority
 Author’s

Content and Currency
 Purpose

credentials? Address, email given?
of site: inform, sell, propagandize? Date?
Documentation
 Are
sources given, footnotes?
 Are other links given?
Find and Evaluate
Use Google and find Website titled:
The Burmese Mountain Dog
 Evaluate this site for

 Accuracy
 Authority
 Content
and Currency
 Documentation

Is it a trustworthy Web site?
Download