Advanced Searching • Use Query Languages. • Use more than one search engine. – Or metasearches like at www.metacrawler.com • Start with simple searches. • Add new parameters incrementally. 1 Query Language Features • Find an exact string. – Quotes around words – [] around words. • Find words near each other. – President near Hillary • Find words following each other. – adams following president 2 Query Language Features • Both words must appear – AND between words – + before both words • At least one word must be there – OR between words. – Possibly words in parentheses (movies music) • Wildcards – hits on similar or partial names. – *, ? 3 Query Language Features • Text keyword finds information in the text of the document. • Anchor keywords finds information in hypertext links in the document. • Host keyword finds information only at a particular site. 4 Searches for official pages • Look up the topic you want to find. • Look at who owns the site. • See if it’s a free site or a hosting site or whether it’s for a large company that owns the property. • A TV show may have more than one really good-looking site, even if some are fansites. 5 How Search Engines Work • Most sites use crawlers/spiders/robots – They go to web pages and follow links – Return pages and links • Some directories use humans to compile – Most sites use some combination of both. – For example www.looksmart.com 6 Crawler Software • • • • • Constantly scans the web. Looks for new and updated pages. Follows links to new pages. May look at META tags for keywords. Sends the data to the index/catalog. 7 Index or Catalog • • • • • Receives data from the crawler. Organizes the data. Stores the data. Updates the data. May have human intervention. – Most parts are automated, however. • Used as data source for the search engine. 8 Search Engine • • • • • • Waits for queries to come in. Figures out what the queries mean. Sends data requests to the catalog. Retrieves responses from the catalog. Sends responses to the query initiator. May use other databases. – USENET archive on www.google.com 9 How Pages are Ranked • Location keywords in the page. – Words in the title are given more weight. – Words at the start or larger are given more weight. • Frequency of the keywords. – More frequent appearance of the words improves the ranking. 10 How Pages are Ranked • Following links – More links to a page may help rankings. – Search engines try to stop “fake” links. – A higher quality of links may help rankings. • Clickthrough – Clicking on a link may give it a higher ranking. – Search engines try to stop “fake” clicking. 11 Paid Listings • Some sites allow paid listings. • www.overture.com is all paid listings. • Most sites have some paid listings. – – – – – www.google.com sponsored links at top/right. www.lycos.com has sponsored links at top. www.yahoo.com has sponsored links at top. www.looksmart.com has sponsored links at top. Other sites have sponsored links as well. 12 Search Results Format • Generally, there are sponsored listings or “affiliates” listed first. • Category-based engines list directories. – www.yahoo.com – www.looksmart.com • www.excite.com lists other searches that it thinks you might have wanted to do instead. 13 Project 4 Hints/Other Searches • • • • • www.mapquest.com www.ipl.org www.switchboard.com www.nic.gov for .gov sites. For the 5 webpages on your major, you must pick different domains for each site. • www.smartraveler.com 14