Google: More Than Words How can I use Google better? References: Patrick Crispen: Google 101, Google 201 (web); 2003. Tara Calishain, Rael Dornfest: Google hacks (O'Reilly); 2003. Wed 18 Aug <Science Week> S. Dekeyser The PageRank Algorithm PR(T1) PR(Tn) PR( A) (1 d ) d ... C (Tn) C (T1) Where • PR(A) is the PageRank of Page A • PR(T1) is the PageRank of page T1 • C(T1) is the number of outgoing links from the page T1 • d is a damping factor in the range of 0 < d < 1, usually set to 0.85 Wed 18 Aug <Science Week> S. Dekeyser Source: Google Hacks, p. 295 The Biggest Mistake Typing URLs in the wrong box Wed 18 Aug <Science Week> S. Dekeyser Top 20 search terms 1. britney spears 2. victoria gotti 3. kelly blue book 4. doom 3 5. howard stern 6. dallas cowboys 7. paris hilton 8. ebay 9. google 10. yahoo 11.anastasia myskina 12.anime 13.mapquest 14.carmen electra 15.yahoo.com 16.ashlee simpson 17.games 18.lindsay lohan 19.nudist+image 20.jokes -- Courtesy WordTracker.com Wed 18 Aug <Science Week> S. Dekeyser What most people notice 1. britney spears 2. victoria gotti 3. kelly blue book 4. doom 3 5. howard stern 6. dallas cowboys 7. paris hilton 8. ebay 9. google 10. yahoo 11.anastasia myskina 12.anime 13.mapquest 14.carmen electra 15.yahoo.com 16.ashlee simpson 17.games 18.lindsay lohan 19.nudist+image 20.jokes -- Courtesy WordTracker.com Wed 18 Aug <Science Week> S. Dekeyser What *I* notice 1. britney spears 2. victoria gotti 3. kelly blue book 4. doom 3 5. howard stern 6. dallas cowboys 7. paris hilton 8. ebay 9. google 10. yahoo 11.anastasia myskina 12.anime 13.mapquest 14.carmen electra 15.yahoo.com 16.ashlee simpson 17.games 18.lindsay lohan 19.nudist+image 20.jokes -- Courtesy WordTracker.com Wed 18 Aug <Science Week> S. Dekeyser The Second Biggest Mistake Using the wrong tool at the wrong time Wed 18 Aug <Science Week> S. Dekeyser Three questions • Where would you find the telephone number or address of the Bama Six theatre? • Where you would find the definition of the word “pestilence?” • Where would you find the name of the war that the Treaty of Westphalia ended? Wed 18 Aug <Science Week> S. Dekeyser What would happen if you tried to look up the definition of the word “pestilence” in the telephone book? Wed 18 Aug <Science Week> S. Dekeyser YAHOO ISN’T A SEARCH ENGINE! ... it is a directory. Wed 18 Aug <Science Week> S. Dekeyser Directories • Usually humancompiled guides to the web, where sites are organized by category • Major directories: – MSN – Yahoo – Netscape ODP Wed 18 Aug <Science Week> S. Dekeyser What directories are good for • “What is the Web page address for some company, organization, or entity?” (or “who makes product X?”) • “Where can I find a list of Web pages that focus on a particular, ‘universal’ topic?” • In other words, directories are GREAT for “telephone book” searches. Wed 18 Aug <Science Week> S. Dekeyser What directories AREN’T good for • Directories are horrible for “encyclopedia” or “dictionary” searches. • The only exception is if the topic is so universal that the directories have no choice but to link to a page or two that discuss that topic (and even then the selection will be slim.) Wed 18 Aug <Science Week> S. Dekeyser Directories v Search Engines • Directories are – human-compiled and – have a small number of pages in their databases (usually in the low millions) • Search engines are – machine-compiled and – have a HUGE number of pages in their databases (usually in the hundreds of millions or even the billions) Wed 18 Aug <Science Week> S. Dekeyser The Third Biggest Mistake Not knowing how to use directories or search engines to actually FIND stuff Wed 18 Aug <Science Week> S. Dekeyser Search engine rule #1 Be specific ... because if you aren’t specific, you’ll end up with a bunch of garbage! Wed 18 Aug <Science Week> S. Dekeyser Search engine rule #2 Use quotes to search for phrases. “the tide is rising” Wed 18 Aug <Science Week> S. Dekeyser Search engine rule #3 Use the + sign to require. “the tide is rising” +book Wed 18 Aug <Science Week> S. Dekeyser Search engine rule #4 Use the - sign to exclude. “the tide is rising” +book -drugs Wed 18 Aug <Science Week> S. Dekeyser Search engine rule #5 Combine symbols as often as possible (see rule #1). “the tide is rising” +book –drugs +climate Wed 18 Aug <Science Week> S. Dekeyser The five rules 1. Be specific ... because if you aren’t specific, you’ll end up with a bunch of garbage! 2. Use quotes to search for phrases. 3. Use the + sign to require. 4. Use the - sign to exclude. 5. Combine symbols as often as possible (see rule #1). Wed 18 Aug <Science Week> S. Dekeyser Part Two: More Stuff No One Tells You Google’s shocking secrets revealed! Wed 18 Aug <Science Week> S. Dekeyser Part Two: Contents • • • • Google’s Boolean default is AND. Capitalization does not matter. Google has a hard limit of 10 keywords. Google ignores a BUNCH of common words. • Google does support wildcard searches … sort of. • The order of your keywords matters. Wed 18 Aug <Science Week> S. Dekeyser Google’s Boolean Default is AND But there are ways to get around that. Wed 18 Aug <Science Week> S. Dekeyser Boolean Default is AND • If you search for more than one keyword at a time, Google will automatically search for pages that contain ALL of your keywords. • A search for disney fantasyland pirates is the same as searching for disney AND fantasyland AND pirates • But, if you try to use AND on your own, Google yells at you. Source: http://www.google.com/help/basics.html Wed 18 Aug <Science Week> S. Dekeyser Boolean OR • Sometimes the default AND gets in the way. That’s where OR comes in. • The Boolean operator OR is always in all caps and goes between keywords. • For example, an improvement over our earlier search would be disney fantasyland OR “pirates of the caribbean” – This would show you all the pages in Google’s index that contain the word disney AND the word fantasyland OR the phrase pirates of the caribbean (without the quotes) Source: http://www.google.com/help/refinesearch.html Wed 18 Aug <Science Week> S. Dekeyser Capitalization Does NOT Matter The old AltaVista trick of typing your keywords in lower case is no longer necessary. Wed 18 Aug <Science Week> S. Dekeyser How Insensitive! • Google is not case sensitive. • So, the following searches all yield exactly the same results: disney Disney DISNEY DiSnEy fantasyland Fantasyland FANTASYLAND FaNtAsYlAnD pirates Pirates PIRATES pIrAtEs Source: http://www.google.com/help/basics.html Wed 18 Aug <Science Week> S. Dekeyser Google Has a Hard Limit of 10 Keywords Bet you didn’t know THAT! Source: Google Hacks, p. 19 Wed 18 Aug <Science Week> S. Dekeyser Google’s 10 Word Limit • Google won’t accept more than 10 keywords at a time. • Any keyword past 10 is simply ignored. • How can you get around this limit? Well, first you need to remember that … Source: Google Hacks, p. 19 Wed 18 Aug <Science Week> S. Dekeyser Google Ignores a BUNCH of Common Words Words to avoid Wed 18 Aug <Science Week> S. Dekeyser Stop Words To enhance the speed and relevancy of your Web search, Google routinely and automatically ignores common words and characters known as “stop words.” Source: http://www.google.com/press/guide/reviewguide_7.html Wed 18 Aug <Science Week> S. Dekeyser Stop _ _ Name _ Love • This is certainly not a canonical list, but here are 28 stop words I know about. • a, about, an, and, are, as, at, be, by, from, how, i, in, is, it, of, on, or, that, the, this, to, we, what, when, where, which, with • You can force Google to search for a stop word by putting a + in front of it (for example pirates +of +the caribbean) Source: 10/23/02 post by Bill Todd to news:google.public.support.general Wed 18 Aug <Science Week> S. Dekeyser Dealing with the Word Limit • Omit the stop words in your search terms and you’ll probably never run into the 10 word limit. • Another way around the limit is to use wildcards. Image source: http://www.alloyd.com/ Wed 18 Aug <Science Week> S. Dekeyser Google DOES Support Wildcard Searches … Sort Of. When you wish upon a *. Wed 18 Aug <Science Week> S. Dekeyser Wildcards • Wildcards are characters, usually asterisks (*), that represent other characters. • Google offers full-word wildcards. • For example, if you search Google for it’s +a * world, Google shows you all of the pages in its database that contain the phrase “it’s a small world” … and “it’s a nano world” … and “it’s a Linux world” … and so on. Wed 18 Aug <Science Week> S. Dekeyser it’s +a * world • The + before a is required because it is a stop word and would otherwise be ignored. • Most of the hits are phrases because that’s what Google looks for first. • Oh, and I defy you to get that song out of your head! Image source: http://themeparksource.com/ Wed 18 Aug <Science Week> S. Dekeyser Wildcards and the Word Limit • Remember when I said that one way to get around the 10 word limit was to use wildcards? • Google doesn’t count wildcards toward the limit. • For example, Google thinks that though * mountains divide * * oceans * wide it's * small world after all is exactly 10 words long. Source: Google Hacks, p. 19 Wed 18 Aug <Science Week> S. Dekeyser The Order of Your Keywords Matters A me life for pirate’s? Wed 18 Aug <Science Week> S. Dekeyser How Google Works • When you conduct a search at Google, it searches for – Phrases, then – Adjacency, then – Weights. • Because Google searches for phrases first, the order of your keywords matters. Image source: Google Source: Google Hacks, p. 20-22 Wed 18 Aug <Science Week> S. Dekeyser For Example A search for disney fantasyland pirates yields the same number of hits as a search for fantasyland disney pirates, but the order of those hits – especially the first 10 – is noticeably different. Wed 18 Aug <Science Week> S. Dekeyser Part Two: In Summary • • • • Google’s Boolean default is AND. Capitalization does not matter. Google has a hard limit of 10 keywords. Google ignores a BUNCH of common words. • Google does support wildcard searches … sort of. • The order of your keywords matters. Wed 18 Aug <Science Week> S. Dekeyser Advanced Operators Query modifiers • daterange: • filetype: • inanchor: • intext: • intitle: • inurl: • site: Wed 18 Aug <Science Week> Alternative query types • cache: • link: • related: • info: Other information needs • phonebook: • stocks: • define: • Google Calculator S. Dekeyser Other Information Needs Did you know that Google can look up phone numbers, stock quotes, dictionary definitions, and even the answer to math problems? Wed 18 Aug <Science Week> S. Dekeyser phonebook: • There are actually three different Google phonebook operators. • Using phonebook: searches the entire Google phonebook. • Using rphonebook: searches residential listings only. • Using bphonebook: searches business listings only. Wed 18 Aug <Science Week> Source: http://www.google.com/help/operators.html S. Dekeyser How to Use the Phonebook • first name (or first initial), last name, city (state is optional) • first name (or first initial), last name, state • first name (or first initial), last name, area code • first name (or first initial), last name, zip code • phone number, including area code • last name, city, state • last name, zip code Wed 18 Aug <Science Week> S. Dekeyser phonebook:Data phonebook:disneyland ca phonebook:(714) 956-6425 Wed 18 Aug <Science Week> S. Dekeyser stocks: • If you begin a query with stocks: Google will treat the rest of the query terms as stock ticker symbols, and will link to a Yahoo finance page showing stock information for those symbols. • Go crazy with the spaces – Google ignores them! Wed 18 Aug <Science Week> Source: http://www.google.com/help/operators.html S. Dekeyser stocks:Symbol1 Symbol2 … stocks: msft stocks: aapl intc msft macr Wed 18 Aug <Science Week> S. Dekeyser define: • If you begin a query with define: Google will display definitions for the word or phrase that follows, if definitions are available. • There can be no space between define: and the word or phrase you wish to define. • You don’t need quotes around your phrases. Wed 18 Aug <Science Week> Source: http://www.google.com/help/features.html#definitions S. Dekeyser define:term define:pirate define:barbary coast Wed 18 Aug <Science Week> S. Dekeyser Google Calculator • Simply key in what you'd like Google to compute (like 2+2) and then hit enter. • Google’s Calculator can solve math problems involving basic arithmetic, more complicated math, units of measure and conversions, and physical constants. Source: http://www.google.com/help/features.html#calculator Wed 18 Aug <Science Week> S. Dekeyser 3+44 56*78 1.21 GW / 88 mph 100 miles in kilometers sine(30 degrees) G*(6e24 kg)/(4000 miles)^2 0x7d3 in roman numerals For instructions on how to use the Google Calculator, see http://www.google.com/help/calculator.html Wed 18 Aug <Science Week> S. Dekeyser Advanced Operators Query modifiers • daterange: • filetype: • inanchor: • intext: • intitle: • inurl: • site: Wed 18 Aug <Science Week> Alternative query types • cache: • link: • related: • info: Other information needs • phonebook: • stocks: • define: • Google Calculator S. Dekeyser