Slide 1 ___________________________________ Advanced Searching • Use Query Languages. • Use more than one search engine. – Or metasearches like at www.metacrawler.com • Start with simple searches. • Add new parameters incrementally. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 2 ___________________________________ Query Language Features • Find an exact string. – Quotes around words – [] around words. • Find words near each other. ___________________________________ ___________________________________ ___________________________________ – President near Hillary • Find words following each other. – adams following president ___________________________________ ___________________________________ ___________________________________ Slide 3 ___________________________________ Query Language Features • Both words must appear – AND between words – + before both words • At least one word must be there – OR between words. – Possibly words in parentheses (movies music) • Wildcards – hits on similar or partial names. ___________________________________ ___________________________________ ___________________________________ ___________________________________ – *, ? ___________________________________ ___________________________________ Slide 4 ___________________________________ Query Language Features • Text keyword finds information in the text of the document. • Anchor keywords finds information in hypertext links in the document. • Host keyword finds information only at a particular site. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 5 ___________________________________ Searches for official pages • Look up the topic you want to find. • Look at who owns the site. • See if it’s a free site or a hosting site or whether it’s for a large company that owns the property. • A TV show may have more than one really good-looking site, even if some are fansites. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 6 ___________________________________ How Search Engines Work • Most sites use crawlers/spiders/robots – They go to web pages and follow links – Return pages and links • Some directories use humans to compile – Most sites use some combination of both. – For example www.looksmart.com ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 7 ___________________________________ Crawler Software • • • • • Constantly scans the web. Looks for new and updated pages. Follows links to new pages. May look at META tags for keywords. Sends the data to the index/catalog. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 8 ___________________________________ Index or Catalog • • • • • Receives data from the crawler. Organizes the data. Stores the data. Updates the data. May have human intervention. – Most parts are automated, however. • Used as data source for the search engine. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 9 ___________________________________ Search Engine • • • • • • Waits for queries to come in. Figures out what the queries mean. Sends data requests to the catalog. Retrieves responses from the catalog. Sends responses to the query initiator. May use other databases. – USENET archive on www.google.com ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 10 ___________________________________ How Pages are Ranked • Location keywords in the page. – Words in the title are given more weight. – Words at the start or larger are given more weight. • Frequency of the keywords. – More frequent appearance of the words improves the ranking. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 11 ___________________________________ How Pages are Ranked • Following links – More links to a page may help rankings. – Search engines try to stop “fake” links. – A higher quality of links may help rankings. • Clickthrough – Clicking on a link may give it a higher ranking. – Search engines try to stop “fake” clicking. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 12 ___________________________________ Paid Listings • Some sites allow paid listings. • www.overture.com is all paid listings. • Most sites have some paid listings. – – – – – www.google.com sponsored links at top/right. www.lycos.com has sponsored links at top. www.yahoo.com has sponsored links at top. www.looksmart.com has sponsored links at top. Other sites have sponsored links as well. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 13 ___________________________________ Search Results Format • Generally, there are sponsored listings or “affiliates” listed first. • Category-based engines list directories. – www.yahoo.com – www.looksmart.com • www.excite.com lists other searches that it thinks you might have wanted to do instead. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Slide 14 ___________________________________ Project 4 Hints/Other Searches • • • • • www.mapquest.com www.ipl.org www.switchboard.com www.nic.gov for .gov sites. For the 5 webpages on your major, you must pick different domains for each site. • www.smartraveler.com ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________