Googling Welcome ! While you are waiting, please… find in your packet: Exercise 6 - Questions for the Final Exercise “What Do You Want Google to Tell You?” begin writing down your questions in three or more categories Googling Instructor: Joe Barker jbarker@library.berkeley.edu An Infopeople Workshop 2005 Googling This Workshop is Brought to You By the Infopeople Project Infopeople is a federally-funded grant project supported by the California State Library. It provides a wide variety of training to California libraries. Infopeople workshops are offered around the state and are open registration on a first-come, first-served basis. For a complete list of workshops, and for other information about the Project, go to the Infopeople Web site at infopeople.org. Introductions Name Library Position How do you use Google? Workshop Overview Google’s way of “thinking” Taking charge of the driving Using limits to find the hard-to-get Finding information on a subject Special Google databases and tools What to do when Google doesn’t work Go to: bookmarks.infopeople.org Click on extreme_googling_bk.htm Make a bookmark of this page Add to Favorites Exercise 1 How does Google “think” about your searches? Please pause and wait for discussion when you reach a A Close Look at Google Search Results • Excerpt of page with your terms • Matched terms in bold • Which Google database used • Approx. # of hits • Terms actually searched on, as Dictionary links • URL, size, date last crawled • Link to Cached copy • Pages supposedly like this one • 2nd page from same site • All Google pages from this site Don’t believe the number of Results They are approximate, changing, and not comprehensive Default Matching on Search Terms Default AND between terms Google takes a FUZZY approach only some of the words if a page is “important” words may occur only in pages that link to the page words occur somewhere on the site a page belongs to Cached reveals the page as Google found it may differ from the current page Cached exists if a page is full-text indexed About 1 billion pages in Google are not cached Not fully searchable no Cached if a page owner requests not to be cached How Can You Know Why Google Found a Page ? Click Cache link toward end of results top area often explains what was matched Stemming Google stems “when appropriate” automatically detects word stem or root retrieves with various endings kite flying gets kite kites kiting fly flying, flyers, flyer’s, flyers’ to turn off +kite +flying “kite flying” single word searches not stemmed Words Google Does Not Search Common or “stop” words ignored to be or not to be no list of “common” terms Google tells you below search box in results to turn off +to +be +or not +to +be “to be or not to be” single word searches possible on common words Ranking of Results Word order matters favoring phrases (words together) looks for phrases with something in place of stop words word repetition and proximity also count Google ranking is a great mystery PageRank combines many factors popularity - links to a page and their importance “importance” - a value of 0 (low) to 10 (high) term placement - phrases, proximity, repetition See Cheat Sheet #1 Google Preferences Interface language Selected languages for pages SafeSearch filtering Number of results returned “moderate” is default 20 or 30 is best Open new browser window for search results Back of Cheat Sheet #1 The Google Toolbar Search any Google databases Search within a site Pop-up blocker Search history list Set Google preferences quickly Customizable in Options download from toolbar.google.com Other browsers toolbar download from googlebar.mozdev.org Googling Exercise 2 Installing the Google Toolbar Customizing Preferences Taking Charge of Driving Google OR Getting the Most from Google’s FUZZY Thinking Improving Google’s “FUZZY” Default AND Problems with AND default: words can occur anywhere in results pages some pages may not contain all of your words some may not have any of your words Use quotation marks to require words together may have different meanings or contexts turns common words into unique search terms “working mothers” 145,000 5% of working mothers 2,680,000 “dry cells” 11,500 1% of dry cells 1,010,000 Hyphen makes phrases and searches with and without hyphens bite-sized retrieves bite-sized, bite sized, bitesized Force “FUZZY” with OR Searches Singulars and plurals not covered by stemming parent OR parents Equivalent or synonymous terms parent OR guardian Misspellings libarian OR librarian Apostrophes and their misuse april's OR aprils OR april "fools day" Ask Google to be “FUZZY” Synonym search ~ immediately before a word sometimes “thinks” of very broad, related terms ~food ~facts ~help recipes, nutrition, cooking information, statistics guide, tutorial, FAQ, manual Often: Terms appear in links pointing to a retrieved page Take advantage of stemming Let stemming handle variant endings: “wild flowers” OR wildflowers hike “point reyes” april OR may OR spring hike, hikers, hiking, hikes Ask for “FUZZY” Number Ranges Numrange search uses . . (no spaces) babe ruth 1921..1935 results have highlighted dates within this range 3..6 megapixels digital camera most numbers will be associated with megapixels DVD player $250.. can be open-ended -- any number above starting number The Whole-Word Wildcard: Allowing FUZZY within “ ” Can’t remember the exact wording in a phrase? Who wrote something like, “The stag at night drank his fill”? Try searching: “the stag * * * his fill” OR “the stag * * * * his fill” ANSWER: “The stag at eve had drunk his fill” - in most sources --Sir Walter Scott, “Lady of the Lake” Construct proximity searches Or try GAPS www.staggernation.com/cgi-bin/gaps.cgi "george bush" "george * bush" "george * * bush" "bush george" "bush * george" Excluding to Control “FUZZIness” You want: Medical info about a pancreatitis diet Start with: pancreatitis diet 172,000 Eliminate undesirable words in results: pancreatitis diet -cat -dog 132,000 pancreatitis -cat -dog -"support group" 128,000 Select exclusions carefully Ask Google to be Very “FUZZY”: Related & Similar Two commands for the same function click Similar at end of result search related:www.infopeople.org Sometimes hard to see how related links to and from the target page major words in and ranking of related pages Possible uses comparison shopping find more sites like a site related:www.econsumer.gov use to evaluate a suspect page Googling Exercise 3 Taking Charge of Driving Google Googling Limiting to Find the Hard-to-Get Limiting: Words in <Title> intitle: finds pages concentrated on your term hybrid cars intitle:mileage hybrid cars mileage with quotes: intitle:”cuban embargo” “cuban embargo” 7,060 296,000 581 28,000 with OR: intitle:”global warming” OR intitle:”greenhouse effect” Use allintitle: to require all words in title allintitle: hybrid cars mileage 86 can combine only with site: allintitle: hybrid cars mileage –site:com 11 Exploiting a Page’s URL Limiting to domain (edu, gov, etc): site:edu OR site:gov OR site:ca.us complete list at: http://en.wikipedia.org/wiki/List_of_Internet_TLDs Searching within a Site site: site:memory.loc.gov lincoln “sheet music” works only in top/first part of URL omit http:// and final / makes Google into a search engine for pages that are indexed in Google inurl: less specific term may be anywhere in URLs inurl:lincoln “sheet music” finds “lincoln” anywhere in any URL and “sheet music” somewhere in the pages Limiting to Types of Documents filetype: OR to find more than one form 1040 filetype:pdf - finds forms -filetype: exclude certain filetypes form 1040 -filetype:pdf - finds help with forms View as HTML link can be useful avoids viruses a document might carry if opened allows viewing without the software or reader Caveats for Limit Commands Cannot always be combined link: similar: must stand alone allintitle: allintext: allinanchor: allinurl: with site: only You can mix all other limit commands, usually: inurl:ucla intitle:admissions statistics intitle:”thyroid disease” site:edu OR site:com Be careful not to ask for the impossible: site:ucla.edu -inurl:edu site:com site:edu site:gov Some require understanding HTML hypertext links: inanchor:links looks for text in link tags in the HTML code: <a href="http://www.pancreasweb.com”>Pancreatitis links</a> <a href="www.pancreaticdisease.com/links/links.htm”>Links</a> See Cheat Sheet #3 Advanced Web Search page Restricted Opportunities Useful if you want to: Not useful if you want to: Try limiting to pages updated in 3 mos, 6 mos, year Change language of results pages Select from list of filetype formats Change content filtering (also in Preferences) I almost never use it Construct complex searches Use OR for more than one limiter OR with phrases multiple phrases site: filetype: inurl: Use intitle: inurl: only the allin... commands in Advanced Search Googling Exercise 4 Limiting Googling Finding Info on a Subject Finding Directories & Link Lists EXAMPLE - looking for links or directories about: “women’s history” “middle east” Use words likely to occur in link-list or directory pages links OR "directory of" OR guide “women’s history” “middle east” “what’s new” OR “what’s cool” “women’s history” “middle east” <Title> field limit to focus pages you want intitle:links OR intitle:”directory of” OR intitle:”encyclopedia of” “women’s history” “middle east” intitle:”women’s history” intitle:directory “middle east” Are there agencies or organizations with links on this topic? inanchor:links society OR association "middle east" "women's studies" Be creative. Substitute database for “directory” to find searchable databases Google’s Directory 1.5+ million pages (compare with 8+ billion in web search) DMOZ Open Directory Google “importance” ranking within directory EXAMPLE: women's history middle east OR eastern Click on useful subject categories for more: Science > Social Sciences > Area Studies > Middle Eastern Studies Society > People > Women > Women's Studies > By Topic Society > Issues > Human Rights and Liberties > Regional > Middle East Search Google for Weblogs Current commentary, opinions, misc. musings Google indexes “important” blogs frequently more than most web pages Thorough search impossible blog OR weblog OR “web log” your subject words inurl:blog OR inurl:weblog your subject words If you know the software a blog is using: “powered by blogger” your subject words site:blogspot.com your subject words “powered by geeklog” your subject words Try searching the Google Directory Search Google Groups for Info Usenet news groups back to 1981 archive of UNevaluated public thoughts, advice & opinions some not found elsewhere select threads with more than one article for context Search differences: search for a group by name search within a group + required for common words even in “ “ “hair loss” OR "loss +of hair" OR balding group:alt.support.thyroid use Advanced Search to limit by group or date posted Create new mailing lists with registration Google as Encyclopedic Glossary Use the command define:[no space] Google finds and ranks Web pages with definitions define:internet define:due diligence Or build searches for pages with definitions: internet “what is” “what is the internet” “internet stands +for” internet ~beginners internet ~FAQ Also many common facts available: population of japan currency in algeria birthplace of hitler Exercise 5 Finding Info on a Subject Brainstorming How would you approach Google 7. isto the ofof Nepal, and how 1. 2. 4. IHow Where wantcan can find I find Icurrency find websites some debates, good directing from collections a me wide to of range good links places ofand 3. What blogs about California and the 5. 6. birthplace size of California? of Teddy to solve each the following much of in itproblems? could US buy asblogs of a near-death Roosevelt? information for bird watching on about migraine in$100 Northern what headaches? constitutes California. useperspectives, of blogs libraries, particularly to keep in January 15,I'm 2004? experience? interested proofs that what people touch with other librarians andinlibraries in the state can be using believed. andreport how they’re blogs? Googling Special Google Databases and Tools Shortcuts and Services Shortcuts: dictionaries and other definitions phonebooks - white and yellow movie showtimes stocks with recent news maps, weather converters, math problem calculators, physical constants number searches UPS, FedEx, USPS, VIN, UPC codes, area codes, airplane reg. #, patents, more http://www.googleguide.com/shortcuts.html Translate click [Translate this page] or URL or enter text at www.google.com/language_tools Page Info - better to enter a URL @ alexa.com Many search engines offer useful shortcuts & similar tools: See Search Cheat Sheet #4 & Supplement “Hacking” Google URLs Structure of a Google search result URL Your search is for: “web searching” tutorial http://www.google.com/search? Google URL ? indicates query num=20& Number of results per page hl=en& Interface language lr=& Search language blank (ALL) safe=off& SafeSearch off q=%22web+searching%22+tutorial Query search terms %22 means quote mark + joins terms Will vary according to your Preferences setting You can modify results by changing values A “Hack” for Country Searches Type the search: egypt history 1950..1970 http://www.google.com/search?num=20&hl=en&lr=&safe=off& q=egypt+history+1950..1975 &restrict=countryEG Append in Address/URL box (no spaces): &restrict=countryEG General format - capitalized country code: &restrict=countryXX Complete country codes list: http://en.wikipedia.org/wiki/List_of_Internet_TLDs More countries and pages than in Language Tools search page www.google.com/language_tools Google’s Other Proprietary Databases Besides Web, Directory, and Groups Images News Use Advanced Search forms 4,500 news sources Useful, specific limit settings 30 days international versions - other news slants Froogle for shopping 1.3+ billion SafeSearch filter only works in English language shopping sites from Google - a subset + merchant uploads of catalogs not on the web no fees, no pay for position Catalogs (Google Labs still) scanned mail-order catalogs (not web), text searchable to navigate within a catalog, click an image and use the special catalogs navigation bar Local Information local.google.com “businesses & services” from Google web database + several yellow pages topic box address/location box restrict to 1, 5, 15, 45 miles away geographic proximity, maps EXAMPLE: vegetarian restaurants 100 Larkin St, San Francisco, CA maps.google.com draggable images, satellite view local (yellow pages), driving directions earth.google.com requires download, 200 MB memory exotic toy or useful tool? Google Labs More upcoming Google services (beta) Print.google.com – search only in Print database Sets - create and explore sequences of things Suggest - browse possible search terms video.google.com – some TV programs My search history – registration and privacy considerations project to make full text books available online Scholar.google.com – special page to search from scholarly articles (mostly) on the web abstracts if full text not available integrated with OCLC for library holdings integrated with some college campuses See Cheat Sheet #5 Exercise 6 Where would you look? 1. Choose ONE or TWO questions to answer 2. Write down what you did & learned 3. It’s O.K. to talk, ask questions, and help each other as needed Googling When Google Doesn’t Work Other Effective Search Engines Yahoo Search (3+ billion) no 10-word limit accepts ( ) around Boolean OR (“global warming” OR “greenhouse effect”) (site:edu OR site:gov OR site:uk) pay-for-position sites not identified Teoma (1+ billion) popularity within subjects sometimes finds link collections as Resources Bookmarklets for Searching Java Script applications that reside in your Bookmarks or Favorites (Favlets) Search engine tools: run a search in another search engine @Teoma @Yahoo! search highlighted text in a search engine Information and more about them at searchengineshowdown.com/bmlets Recommended Directories By library people LII.ORG Academic Info Infomine Complement to searching when search engines do not seem to work when you know or have a hunch there is a site about your question Thinking in Sync with Search Engines Search engine balancing act: Do we agree with Google’s “importance”? tyrannical or democratic? favors established more than new websites favors trendy, high-speed, consumer, vroom & zoom Are Google’s secretiveness & fuzziness trustable? Have search engines changed us? Do we accept “good enough” quicker? Have we given up “thorough” and “certain”? Will semantic & linguistic analysis help? Or bring in a new age of “whatever” thinking Googling Exercise 7 Make your own Cheat Sheet Write down up to seven things you want to remember to do or practice Circle the ONE you like most Googling Workshop Evaluation infopeople.org/WS/eval