Part 8

advertisement
Slide 1
___________________________________
Advanced Searching
• Use Query Languages.
• Use more than one search engine.
– Or metasearches like at www.metacrawler.com
• Start with simple searches.
• Add new parameters incrementally.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
Query Language Features
• Find an exact string.
– Quotes around words
– [] around words.
• Find words near each other.
___________________________________
___________________________________
___________________________________
– President near Hillary
• Find words following each other.
– adams following president
___________________________________
___________________________________
___________________________________
Slide 3
___________________________________
Query Language Features
• Both words must appear
– AND between words
– + before both words
• At least one word must be there
– OR between words.
– Possibly words in parentheses (movies music)
• Wildcards – hits on similar or partial names.
___________________________________
___________________________________
___________________________________
___________________________________
– *, ?
___________________________________
___________________________________
Slide 4
___________________________________
Query Language Features
• Text keyword finds information in the text
of the document.
• Anchor keywords finds information in
hypertext links in the document.
• Host keyword finds information only at a
particular site.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 5
___________________________________
Searches for official pages
• Look up the topic you want to find.
• Look at who owns the site.
• See if it’s a free site or a hosting site or
whether it’s for a large company that owns
the property.
• A TV show may have more than one really
good-looking site, even if some are
fansites.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 6
___________________________________
How Search Engines Work
• Most sites use crawlers/spiders/robots
– They go to web pages and follow links
– Return pages and links
• Some directories use humans to compile
– Most sites use some combination of both.
– For example www.looksmart.com
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 7
___________________________________
Crawler Software
•
•
•
•
•
Constantly scans the web.
Looks for new and updated pages.
Follows links to new pages.
May look at META tags for keywords.
Sends the data to the index/catalog.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
___________________________________
Index or Catalog
•
•
•
•
•
Receives data from the crawler.
Organizes the data.
Stores the data.
Updates the data.
May have human intervention.
– Most parts are automated, however.
• Used as data source for the search engine.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
___________________________________
Search Engine
•
•
•
•
•
•
Waits for queries to come in.
Figures out what the queries mean.
Sends data requests to the catalog.
Retrieves responses from the catalog.
Sends responses to the query initiator.
May use other databases.
– USENET archive on www.google.com
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 10
___________________________________
How Pages are Ranked
• Location keywords in the page.
– Words in the title are given more weight.
– Words at the start or larger are given more
weight.
• Frequency of the keywords.
– More frequent appearance of the words
improves the ranking.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 11
___________________________________
How Pages are Ranked
• Following links
– More links to a page may help rankings.
– Search engines try to stop “fake” links.
– A higher quality of links may help rankings.
• Clickthrough
– Clicking on a link may give it a higher ranking.
– Search engines try to stop “fake” clicking.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 12
___________________________________
Paid Listings
• Some sites allow paid listings.
• www.overture.com is all paid listings.
• Most sites have some paid listings.
–
–
–
–
–
www.google.com sponsored links at top/right.
www.lycos.com has sponsored links at top.
www.yahoo.com has sponsored links at top.
www.looksmart.com has sponsored links at top.
Other sites have sponsored links as well.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 13
___________________________________
Search Results Format
• Generally, there are sponsored listings or
“affiliates” listed first.
• Category-based engines list directories.
– www.yahoo.com
– www.looksmart.com
• www.excite.com lists other searches that it
thinks you might have wanted to do instead.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 14
___________________________________
Project 4 Hints/Other Searches
•
•
•
•
•
www.mapquest.com
www.ipl.org
www.switchboard.com
www.nic.gov for .gov sites.
For the 5 webpages on your major, you
must pick different domains for each site.
• www.smartraveler.com
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Download