The Google Search Engine Christy Gavin Spring, 2009 What is an internet search engine? A computer program that retrieves data from the internet. Google’s Mission To organize the world’s information and make it universally accessible and useful. Google Fast Facts •Invented by Larry Page and Sergey Brin •Online 1998 •Named after “googol” •Verb in the OED, 2006 •Added the one trillionth web address, July 2008 •500 million searchers a year How does Google find data? Uses a spider program called GoogleBot What Does Google Do Once It Arrives At a Site? How GoogleBot finds your website: • You submit URL to Google • You submit a sitemap to Google • GoogleBot finds a link to your site GoogleBot’s crawler retrieves websites and stores them in Google’s index. Google’s index works much like an index found in the back of book. A Short Introduction to Gender by Rae Connell Google Web Citation Google’s Advertising Philosophy •Search results free of paid ads. •Avoid flashy banner ads and pop-ups. Google’s “keyword-targeted text ads.” You do not see the ad unless your search relates to the ad’s topic. Google’s Way: Sell ads to make money but… don’t mix ads with user’s search results. eating disorders The Secret World of Relevance Ranking Relevance ranking is the measure of how well the search results answer the question or search. How does Google rank the relevance search results? The heart of Google ranking is PageRank. PageRank works like a voting system. Page A links to page Z. This is a vote for page Z. The more pages that link to a page. . . that page receives a high PageRank. Google also looks at the page that casts the vote. If page A has received lots of votes, A will increase the importance of page Z. So to be included in Google’s top ranked results a page: • must have lots of votes, and must have • votes cast by pages that have received many votes of their own. Other ways Google ranks relevance •Density = frequency •Proximity = closeness of keywords •Prominence = titles, links, tags Google’s recipe for retrieving relevant results is based on a combination of: •PageRank •Density of keywords •Proximity of keywords •Prominence of keywords Google’s top results do not indicate That these are the he top results with the quality of the website. 1. Enter your topic in Google and Yahoo. 2. Compare the top 5 results in each search engine. 3. Do both search engines retrieve the same websites? 4. Do both search engines provide targeted ads? Identify the 3 most important keywords (concepts). Byron Hurt takes pains to say that he is a fan of rap, but over time, says Mr. Hurt, a 36-year-old filmmaker, ''I began to become very conflicted about the music I love.'' A new documentary by Mr. Hurt, '‘Rap: Beyond Beats and Rhymes,'' questions the violence of women in much of rap music. Excerpt from the New York Times Keeping It Together: The Double Quote Use the double quote: ________________________________ distinct individual: “ kanye west” organization: “american medical society” company “general motors” quote: “to be or not to be” Use double quotes for bound phrases: “model minority” “artificial intelligence” “big bang” We model friendship formation as a selection process constrained by individuals' ability to make friends. Blacks are generally the most cohesive racial category, although when whites are in the minority, they display stronger selective mixing than do blacks when blacks are in the minority. Search Engines and Stop words Search engines ignore common or overused words: a, the, of, for, how, who. . . Keyword phrase: The way to the school is hard when walking in the rain. Stored keyword phrase: * way * * school is hard when walking * * rain. To include stop words in your search you can either use: 1. double quotes: “The way to the school is long and hard when walking in the rain.” 2. + before each stop word: +the +who Avoid using double quotes with topic searches! Google’s recipe for retrieving relevant results is based on a combination of: •PageRank Keeping It Together: The Double Quote Use the double quote: ________________________________ distinct individual: “ kanye west” organization: “american medical society” company “general motors” quote: “to be or not to be” Use double quotes for bound phrases: “model minority” “artificial intelligence” “big bang” We model friendship formation as a selection process constrained by individuals' ability to make friends. Blacks are generally the most cohesive racial category, although when whites are in the minority, they display stronger selective mixing than do blacks when blacks are in the minority. Search Engines and Stop words Search engines ignore common or overused words: a, the, of, for, how, who. . . Keyword phrase: The way to the school is hard when walking in the rain. Stored keyword phrase: * way * * school is hard when walking * * rain. To include stop words in your search you can either use: 1. double quotes: “The way to the school is long and hard when walking in the rain.” 2. + before each stop word: +the +who Avoid using double quotes with topic searches!