Manatee Genealogical Society MGS Computer Special Interest Group (SIG) 4 http://www.colket.org/genealogy/MGS/ Overview o o o o o Manatee Genealogical Society History of Browsing Problem of Searching Solution to Search Problem Google Search Basics Search Results 2 Internet ss Static Searches Indexable Nodes Use Google, Bing, or other Search Engine Every word on Page Is indexed with web crawler Manatee SearchGenealogical Society Dynamic Searches Non Indexable Nodes Private Databases Fee/membership (e.g., Ancestry, Professional, News) Many available with Library membership Commercial Databases Shopping Or Limited to employees and customers only Public Databases City, County, State Federal Records 3 Dark Web Static Searches Manatee Genealogical Society Have Web Crawlers Visit Each Node For “Public Domains” 4 Who Invented the Internet? Manatee Genealogical Society 5 History of Browsing • • • Manatee Genealogical Society Early on very cumbersome • Generally login to a desired computer and search based on the directory • Every computer had its own directory structure and search application(s) In 1980, Tim Berners-Lee proposed and prototyped ENQUIRE, a system to share documents In 1990, he collaborated with Robert Cailliau on a joint proposal for the World Wide Web (WWW) or W3 project for a protocol to share information using hypertext. Became HyperText Markup Language (HTML) – defined using text • • This allowed people to organize information they wanted to share with Links to the information or files which could then be downloaded Requires a browser that could read these HTML files using a protocol called: HyperText Transfer Protocol (HTTP) Many commercial browsers available today • Internet Explorer (IE), Safari, Netscape, Mozilla Firefox, etc. • Even Google has its own browser called “Google Chrome” • You need a current browser to access latest information 6 Problem with Searching Many search applications developed based on HTML Manatee Genealogical Society BUT Search on Coke –117,000,000 hits Many of these are menu items at restaurants – Much useless information You have hits from every restaurant that has coke on its menu If you are interested in Coca-Cola headquarters in Atlanta, it may not appear until item 23,672,344 How do you get RELEVANT hits???? How do you get hits ordered so that Relevant Hits are Ordered in a way that facilitates use???? Google found a way to “solve” this problem; 7 What’s a Google? Manatee Genealogical Society 8 Solution to Search Problem - 1 Manatee Genealogical Society • 1995, Sergey Brin and Larry Page while students at Stanford came up with a concept of using the strength of the Internet community. • Their technology evaluated a site primarily on how many other sites linked to it and ranked search results accordingly. • The technology was called PageRank (named for Larry Page) although, it does rank pages as to which page is most important. • PageRank tended to return results that people found useful, Resulting in a surprisingly valuable system • PageRank was patented by Stanford University. • In 1997, BackRub was a PageRank application so called because the technology analyzed what was going on behind the scenes. • Fall, 1997 BackRub became Google • http://infolab.stanford.edu/~backrub/google.html • Sergey Brin and Larry Page purchased the exclusive licensing rights to PageRank for $1.56B 9 Solution to Search Problem - 2 Manatee Genealogical Society • Google is an adaption of googol. A googol is the number 1 followed by 100 zeros (10E100). (from Hitchhikers Guide to the Galaxy). This reflects the number of WWW pages it searches. • In 1998, they dropped out of Stanford to develop Google. • Set up shop in the Menlo Park garage of Susan Wojcicki • 1998, 50 employees. 7 million searches a day. • By 2005, Google was having 250 million web searches per day. • Sergey Brin’s Net Worth is 29.9 Billion Dollars (17th richest in the world in 2014) • Larry Page’s Net Worth is 29.8 Billion Dollars (18th richest in the world in 2014) • Google headquarters, the Googleplex, is located in Mountain View, California. As of March 31, 2009, the company has 19,786 full-time 10 employees; 46,170 by May 2014 - 68 Worldwide locations Solution to Search Problem - 3 Manatee Genealogical Society Most Relevant Results First 11 Google Search Basics - 0 Manatee Genealogical Society Ready to do some Google Searching Still a Big Problem Simple Surname search yields millions of results Colket => Pelot => Reger => Sparrow => Johnson => Smith => 89,600 results 477,000 results 7,650,000 results 63,900,000 results 978,000,000 results 1,500,000,000 results Need to find a way to reduce results Google Basics Discusses way to do this on Search Query Google Results discusses ways to do this on Results Page 12 Google Search Basics - 1 Google cares about: Singular versus Plural – “apple” versus “apples” Manatee Genealogical Society Exceptions to These Rules Order Of Words is Important for Ranking “brown bear” – things named “Brown Bear” first – 20,800,000 Hits “bear brown” – emphasis on bears – 87,000,000 Hits Spelling is Important Suggest putting Surnames first – Pelot Samuel Names originating in another alphabet have many valid transliterations Mohamed, Mohammed Pelot, Pelote, Pelotte Sometime Get Spelling Suggestions Sometimes Use Misspelled Queries Google does not care about: Case Sensitivity – Hence “Samuel Pelot” = “samuel pelot” Little Words Ignored – such as I, where, how, the, of, an, for, from, how, it, in, is, single digits, single letters. If desired, use quotes. The who Is a Band Punctuation – MOST PUNCTUATION IS IGNORED. … 13 Google Search Basics - 2 Manatee Genealogical Society – Apostrophes are meaningful Hence Pauls, Paul’s, and Pauls’ require 3 different searches. – A “-” before a word excludes terms – later – A “-” between 2 or more words strongly connects the words: Example: twelve-year-old dog almost like “twelve year old” – A “-” by itself is ignored – A “_” between 2 or more words also strongly connects the words Underscore when between 2 words as formal name: Quick_Sort Mary_Beth Underscore treated as a search for MaryBeth | Mary Beth | Mary_Beth – Quotes require exact match – later Exceptions: Punctuation in proper names: Google+ AB+ C++, A# $ is understood to be dollars “Nikon $400” ≠ “Nikon 400” Ditto for ¢, £, ¥. Etc. @ is understood to be an email address e.g., colket@colket.org Hashtags are understood to be trending topics 14 #newenglandpatriots Google Search Basics - 3 Manatee Genealogical Society Exact Order; Exact Phrase – Use quotation marks. This techniques is especially useful for genealogy – very different results for 11,000 Hits 8,670 Hits Samuel George Pelot versus “Samuel George Pelot” 37 Hits George Samuel Pelot versus “George Samuel Pelot” 0 Hits Huh??? Should get the same number – Why??? Does not exist What about the middle name? Some sources report as initial or no middle initial (nmi) “Samuel Pelot” “Samuel G Pelot” “Samuel G. Pelot” “Samuel nmi Pelot” 231 Hits 24 Hits 24 Hits 0 Hits Most Punctuation is ignored 87,200 Hits with G. 3,390,000 Hits with Graham 410,200 Hits Remember, a search for “Alexander Bell” will miss hits for “Alexander G Bell” 15 Google Search Basics - 4 Manatee Genealogical Society Search Within Site/Domain – Identify site in query: iraq site:nytimes.com – returns hits on “Iraq” in NY Times only iraq site:.gov returns hits only from a .gov domain iraq site:.iq returns hits only from an Iraq domain Good for genealogy research: Pelot site:nytimes.com 157 Hits Pelot 394,000 Hits Pelot site:.fr 14,700 Hits Pelot site:.ch 1,070 Hits Pelot site:.ca 2,900 Hits Pelot site:.us 2,410 Hits Pelot site:.mil 89 Hits Pelot site:.gov 947Hits Pelot site:.biz 5,480 Hits NY Times only Worldwide French Domain Swiss Domain Canadian Domain US Domain (not null) US Military Domain US Government Domain US Business Domain 16 Google Search Basics - 5 Manatee Genealogical Society Exclude Terms – Use “-” preceded by a blank Say searching for anti-virus stuff for humans: Note: “-” is part of the word for “anti-virus” Strongly Connected anti-virus 132,000,000 Hits includes antivirus, anti virus, and anti-virus” anti-virus -software 79,100,000 Hits jaguar -cars -football Can use multiple negations and for the poor fellow with the surname of “Sparrow” Sparrow Sparrow -bird Sparrow -bird -book 63,400,000 Hits 60,400,000 Hits 45,500,000 Hits Note: Combinations of Search Terms can be effective 17 Google Search Basics - 6 Manatee Genealogical Society OR Operator – Sometimes you want hits for either/or Use cap “OR” or OR Operator “|” Tampa Bay Buccaneers Tampa Bay Buccaneers Tampa Bay Buccaneers Tampa Bay Buccaneers Tampa Bay Buccaneers Tampa Bay Buccaneers 2,620,000 Hits 2004 298,000 Hits 2005 409,000 Hits 2004 2005 206,000 Hits 2004 OR 2005 726,000 Hits 2004 | 2005 726,000 Hits Exceptions: Phrases such as “FOR BETTER OR FOR WORSE” 18 Manatee Genealogical Society Google Search Basics - 7 Feeling Lucky – Gives you the first page. Wild Cards – Use a “*” – Works on words, not parts of words – Use a “?” – Single characters (Officially not in Google) For Questions: “"How often does Halley's comet appear?“ Pose as: Halley’s Comet appears every * years – it’s 76 years Also for unknown middle names Samuel * Pelot Difference for “Samuel * Pelot“ Difference for “Samuel ? Pelot“ Note: For Samuel Pelot and For “Samuel Pelot“ 10,700,000 Hits 7,910,000 Hits 624 Hits 801,000 Hits 616 Hits Ten Word Limit – Search terms over 10 are ignored 19 Google Search Basics - 8 Manatee Genealogical Society Misspellings – Try alternative spellings thousands of Web sites mention Arnold Schwarznegger 70,000 Hits though the governator spells his name "Schwarzenegger” 34,500,000 Hits Google recognizes some misspellings and provides alternatives New since Mar 2010 20 Google Search Basics - 9 Proximity Search Manatee Genealogical Society Not Advertised Google Tool, But Common Search Tool (e.g., Archive Grid) – Seems to be Useful With Google Proximity Search “Samuel Pelot”~3 Hits for: Samuel Pelot 801,000 Hits “Samuel Pelot” 616 Hits “Samuel George Pelot” 27 Hits “Samuel G Pelot” 73 Hits “Samuel Pelot”~2 351 Hits (catch initial) “Samuel Pelot”~3 190 Hits “Samuel Pelot”~4 158 Hits “Samuel Pelot”~7 126 Hits “Samuel Pelot”~10 173 Hits 21 Google Search Basics - 10 Manatee Genealogical Society Keep Search Terms Simple Most Queries do not require advanced operators or unusual syntax Simply enter name, place, product, or concept, Simple is good Think of terms likely to be on result pages Don’t use My Head Hurts Instead use Headache {term likely found on medical page} Describe what you want in as few words as possible Use Weather Cancun Instead of Weather Report for Cancun Mexico Choose Descriptive Terms Use Celebrity Ringtones Instead of Celebrity Sounds 22 Google Results - 1 Manatee Genealogical Society Start Search Search Term(s) Advanced Search Filters Result Statistics Link Uniform Resource Locator (URL) Snippet (Controls For Advanced Search Options) Sponsored Links Sometimes Similar Pages Cached Pages Result Links 23 Google Results - 2 Manatee Genealogical Society Ordered By Relevance [Indented same site, less relevant] Also sponsored links, links to news stories, Ads True, unpaid results are on the lower left Ads are on the right (no more than 10 per page) Sponsored Links on top (Ads, at a higher rate; colored background) True Unpaid Search Results => Title Text from site with Snippets of your search terms (in bold) URL => Uniform Resource Locator Size Date – NOT created/updated, but when last crawled Dataset in Jul crawl of 2014 is over 266TB containing 4.05 billion webpages Indication if Cached – Good place to go if Page Removed URL goes to current page Cached link goes to cached page – handy if page deleted or link broken Cached version is used to highlight key words File Format .html use browser .pdf – read with Adobe’s free reader at www.adobe.com .doc – read with Microsoft’s free reader at www.microsoft.com .ppt – read with Microsoft’s free reader at www.microsoft.com 24 Similar Results Google Results - 3 Manatee Genealogical Society Location Feature – Sets default for searches Location auto-detected - by IP Address - or entered into Google Toolbar Can be changed, if you are looking for stuff in a different location **Only works in your selected country** Manually set location is stored in a “Cookie” Can also be turned off Type of Content – Limit results to a particular type of web content: Called Filters Images, Videos, News, Shopping, Books, Discussions, Places, Blogs, Real-time (e.g., updates from Twitter) or select the default – Everything This is a big recent change Five years ago one had to search each database --- The databases were not integrated --- They are now --25 Note on URLs Manatee Genealogical Society • Results of Google Search provided as a • Uniform Resource Locator (URL) • URL Format: Domain Name World Wide Web Extension http://www.google.com.uk HyperText Transfer Protocol Domain Name URL Uniform Resource Locator Domain Name Country Extension • Domain Names: http://www.networksolutions.com/whois/index.jsp • URL for my domain name is: http://www.colket.org • Domain name extensions include: .com .mobi .mil .gov .edu .net .info .org .biz .bz .tv • Domain Name Extensions (including Country): http://www.networksolutions.com/glossary/glossaryd.jsp#domainnameextensions 26 • Domain Name Country Extensions – .be .ca .cn .de .es. ru.com se.com .us Note on IP Addresses Manatee Genealogical Society • Every URL maps into a Unique Number called an IP (Internet Protocol) Address http://www.google.com => 216.239.51.99 • IPV4 in format of xxx.xxx.xxx.xxx (e.g., 208.77.188.166) 232 can handle 4,294,967,296 addresses Google crawls Over Expected to run out in early 2000s • IPV6 in format of x:x:x:x:x:x:x:x in late 1990s 8,000,000,000 Pages each (e.g., 2001:db8:0:1234:0:567:1:1) month 2128 (or 340,282,366,920,938,463,463,374,607,431,768,211,456 ) addresses • IP addresses still work as IPV4 addresses all map to IPV6 Need • Operating systems are migrating to IPV6 Current (e.g., Vista uses IPV6; XP uses IPV4) Browser Go to help/support on your computer searching for IPV6 27 Static versus Dynamic Manatee Genealogical Society Searches -1 “Relevancy” might not be relevant to Researchers and Genealogists. Google’s use of Relevancy is not useful for doing many types of searches: • Dynamic Databases • Genealogy Searches on family surnames • Obscure information • Much non-business oriented information • Rather unique information Dynamic Searches Static Searches Indexable Nodes Use Google, Bing, or other Search Engine Every word on Page Is indexed with web crawler Dynamic Searches Non Indexable Nodes Private Databases Fee/membership (e.g., Ancestry, Professional, News) Many available with Library membership Commercial Databases Shopping Or Limited to employees and customers only Public Databases City, County, State Federal Records Static Versus Dynamic Searches - 2 Manatee Genealogical Society Desired Information is in a Separate Database Auction Sites: Ebay | Craig’s List | UBid | Bid Start | Ebid | US Seek Web Pages are Private and Not Available for Google Most businesses have a public web site and a private web site Only data companies want to share is available via Google Limited Access Web Sites – Typically for profit sites, e.g., ACM’s Digital Library – No Google access at all Ancestory.com – Google provides “Teaser” results to entice membership Chicago Tribune – Get “Teaser” hits on Google, but have to pay to access data Many Models Later We will discuss: The dark web Archive Grid New York Times Database 30 Future Plans Manatee Genealogical Society Future Plans for Computer SIGs: Finding Pictures of Your Ancestor on the Internet – 3 Feb 2015 Using Google for Genealogical Searches – Scheduled for 3 March 2015 Manipulating Photos for Genealogy – Scheduled for April 2015 Using Ancestry.com requested by Dunham Swift – Maybe November 2015 Need Inputs see sheet What else would you like to have addressed at future Computer SIG Meetings????? 31 MGS Computer Special Interest Group (SIG) WHAT IS IT? A meeting of genealogists interested in using their personal computers to enhance their research. WHEN? Monthly -- On the first Tuesday of the month (October through May) following main topic speaker. TIME: About 11:15 AM to 12:15 PM, following the meeting break period after the main MGS speaker. PLACE: The Central Library Auditorium, Bradenton, FL (same location as our MGS monthly meeting) WHO: Open to all those interested in using their personal computers to enhance their genealogical research. PROGRAM: Each month we will discuss and view what's new in genealogy on the Internet. We'll have demonstrations of software and hardware that will facilitate our research. Tips and techniques will be shared by and among those attending each meeting. Genealogically related computer, Internet, digital photography and research questions will be fielded during the sessions. We'll look at the newest technology but will keep the discussions as low tech as possible. 32 What topics would you like to hear??????