Google: The Search Engine Giant

advertisement

2008/3/26

Yuh-Jzer Joung

莊 裕 澤

Dept. of Information Management

National Taiwan University

Mar. 2008

Google 1

Google: A Brief History

‰

‰

‰

‰

‰

‰

Began as a research project by Larry Page (then a Ph.D. student) at Stanford in

January 1996.

Co-founded by Larry Page and Sergey Brin while they were still students at

Stanford University on Sep. 7, 1998

Located in Mountain View, California.

The name "Google" originated from a misspelling of “ googol ” , meaning

1000 … 000 (100 zeros).

“ Google ” : “ to use the Google search engine to obtain information on the Internet.

„ Def. from Merriam Webster Collegiate Dictionary and the Oxford English

Dictionary, 2006:

Fortune Magazine's #1 Best Place To Work

2008/3/26

Yuh-Jzer Joung

Google 2

Google’s Mission

‰ To organize the world ’ s information and make it universally accessible and useful.

2008/3/26

Yuh-Jzer Joung

Google

Google ’ s Data center in Oregon

3

Financial Performance

‰

‰

‰

‰

IPO on August 19, 2004

„ sold for $85 a share, but went public for $100.34.

Added to Nasdaq 100 on Dec. 19, 2005.

Added to S&P 500 on Mar.31, 2006.

The largest American company (by market capitalization) that is not part of the Dow Jones Industrial Average (as of Oct 31, 2007).

4 2008/3/26

Yuh-Jzer Joung

Google

Impressive Financial Success

‰ Years to $2 billion in annual net revenue after generating $80-

$100 million...

Source: Goldman Sachs Global Investment Research, Nov. 2004.

2008/3/26

Yuh-Jzer Joung

Google 5

Profile and Key Statistics

‰

‰

‰

‰

‰

Revenue: US$16.593B [2007]

56%

Net income: US$4.203B [2007]

25%

Employees: 16,805 (Dec. 31, 2007)

Market Cap: US$145.47B [Feb. 27, 2008]

See updated data .

2008/3/26

Yuh-Jzer Joung

Google 6

Competitors: Yahoo

„

„

„

„

„

„

Revenue: US$6.7B [2007], $6.4B [2006]

Net income: $730M [2007], $751M [2006]

Employees: 13,600 (October 18, 2007)

Market Cap: US$39.36B [Feb. 27, 2008]

See updated data .

Recent News: Microsoft offers $44.6B bid for Yahoo!

[2008-0201]

2008/3/26

Yuh-Jzer Joung

Google 7

Competitors: Microsoft

„

„

„

„

„

„

Revenue: US$51.12B [2007]

Net income: $14.06B [2007]

Employees: 79,000 (2007)

Market Cap: US$264.13B [Feb. 27, 2008]

See updated data .

Comparison Summary, updated.

2008/3/26

Yuh-Jzer Joung

Google 8

The Core Business: Search Engine

‰ Huge Data Size

„ By 1998, Google had an index of about 60M pages

„

„

Handles 250 M queries per day (as of Feb 2003)

Search Engine Size Wars

As of 2004

‹

‹

Google: 8.1B

MSN: 5.0 B

‹ Yahoo: 4.2 B (estimate)

Yahoo claims 19.2B in 2005

‹

‹

19.2B web documents

1.6 B images

‹ over 50M and video files

Google estimated about 24B in 2005

Estimated Web pages in WWW

‹ 29.7 B (as of Feb. 2007)

Now top secret for each search engine company

2008/3/26

Yuh-Jzer Joung

Google 9

Simple is the Beauty

‰ A simple and clean interface in Google ’ s search engine

„ copied by several other search engines

2008/3/26

Yuh-Jzer Joung

Google 10

Key to the Success

‰ The interface is clear and simple.

‰ Pages load instantly.

‰ Placement in search results is never sold to anyone.

‰ Advertising on the site must offer relevant content and not be a distraction.

11 2008/3/26

Yuh-Jzer Joung

Google

Some Fun: Google Doodles

Holiday Logos

Leap Year - February 29, 2008

Fan Logos

Official Logo

Chinese New Year

February 7, 2008

Happy New Year &

25 Years of TCP/IP

January 1, 2008

Google 2008/3/26

Yuh-Jzer Joung

12

Directories vs. Search Engines

‰ Directories

„ Selected sites organized (typically manually) hierarchically into categories

Yahoo! Directory

‹

‹

Changed to crawler-based listings for its main results in Oct. 2002.

Originally, the actual web crawling and storage/retrieval of data was powered by

Inktomi (bought by Yahoo in 2002), and later by Google until 2004.

‹ Although Yahoo also bought Overture

Services, Inc., (which owned the AlltheWeb and AltaVista search engines), it kept using

Google until 2004.

Open Directory Project

2008/3/26

Yuh-Jzer Joung

Google 13

Directories vs. Search Engines (contd.)

‰ Search Engines

„

„

Search over the contents of the pages themselves

Organized in response to a query by relevance rankings or other scores

14 2008/3/26

Yuh-Jzer Joung

Google

2008/3/26

Yuh-Jzer Joung

How Search Engine Operates

1. crawl the web

2. Store the documents

DocIds

4. User Query

6. Return Results

Search engine servers

5. Check Index

Server

Indexing

DB

3. Parsing, analysis, and indexing

‰ Lots of technologies involved in crawling the Web, analyzing and indexing web pages, ranking them, and make the system scalable!

Google 15

Basic Indexing Concept

Doc 1

Information Technology I:

Information Systems and

Applications, EMBA,

Instructor Yuh-Jzer Joung , …

Doc 2

Yuh-Jzer Joung , Professor,

Department of Information

Management, National

Taiwan University …

Doc 3

Information Retrieval, Basic

Index Concept, Distributed

Inverted Index, …

Term

Information

Joung

3

2

No. of

Doc.

2

2

Total

Freq.

1

2

2

3

DocID

1

1

1

1

1

Freq

2

6

1

Position

1,4

There are more information needs to be recorded in the indexing files!

2008/3/26

Yuh-Jzer Joung

Google 16

2008/3/26

Yuh-Jzer Joung

Google’s Architecture

Google

‰

‰

‰

‰

‰

Sorted barrels = inverted index

Pagerank computed from link structure; combined with IR rank

IR rank depends on TF, type of “ hit ” , hit proximity, etc.

Billion documents

Hundred million queries a day

17

Page Rank

‰ PageRank: Ranking Web Pages using link structure of the web

„

„

A trademark of Google

Developed at Stanford University by Larry Page and Sergey Brin

Stanford owns a patent of this

2008/3/26

Yuh-Jzer Joung

D

E

B

G

A

F

H

C

Google 18

Page Rank (contd.)

PR ( A )

=

( 1

− d )

+ d (

PR ( T

1

)

L ( T

1

)

+

PR ( T

2

)

+

L ( T

2

)

PR ( T

3

)

+

...)

L ( T

3

)

¾ d : damping factor, normally this is set to 0.85.

¾ (1d ) can also be viewed the probability of a user hitting page A directly, and so d is the probability for hitting A from other pages.

¾ T

1

,

, T n

: pages pointing to page A

¾ PR ( X ) : PageRank of page X .

¾ L ( T i

): the number of links going out of page T i

.

D

E

G

A

B

F

H

C

2008/3/26

Yuh-Jzer Joung

Google 19

Search Engine Optimization

‰ A new business to help websites raise their rankings on

Google and on other search engines.

„

„

“ on page ” factors

− like body copy, title tags, H1 heading tags and image alt attributes

“ Off Page Optimization ” factors

− anchor text and PageRank.

2008/3/26

Yuh-Jzer Joung

Google 20

Search Market Share

‰ As of July 2006, US Market

5.6 billion searches in this month

2008/3/26

Yuh-Jzer Joung

Google 21

Search Market Share (contd.)

‰ As of August 2007, US Market

7.8 billion searches in this month

2008/3/26

Yuh-Jzer Joung

Google 22

Trends: Number of Searches

2008/3/26

Yuh-Jzer Joung

Google 23

Zeitgeist: Search patterns, trends, and surprises

‰ 2007 Year-End Zeitgeist

‰ Zeitgeist Archive

Fastest Rising (global)

(2007) iphone badoo facebook dailymotion webkinz youtube ebuddy second life hi5 club penguin

2008/3/26

Yuh-Jzer Joung

Google 24

Google Platform

“ Google ” Circa 1997 ( google.stanford.edu)

2008/3/26

Yuh-Jzer Joung

Google 25

First Production Server (circa 1999)

2008/3/26

Yuh-Jzer Joung

Google 26

Google Data Center (Circa 2000)

2008/3/26

Yuh-Jzer Joung

Google 27

Data Centers Now

‰

‰

‰

With major centers in Mountain View, CA; Virginia; Atlanta,

Georgia; Dublin, Ireland; and new facilities constructed in Dalles,

Oregon and Saint-Ghislain, Belgium.

Estimated over 450,000 servers.

Plan to open one in Asia

„ Taiwan? Malaysia? Japan? Korea? India? (as of Feb. 2008)

„

„

Google Spent $2.4B on Data Centers in 2007.

$1.9B in 2006

Google typically invests $600M in each new data center.

Google

Google Dalles Data Centre

28 2008/3/26

Yuh-Jzer Joung

The Power of Data …

2008/3/26

Google 29

Google AdWords

‰

‰

Launched in 2000, now Google ’ s main source of revenue

Clients choose keywords for which their ad would like to appear when someone uses the keywords to search in Google, and bid for the price

2008/3/26

Yuh-Jzer Joung

Google 30

Google AdWords (contd.)

‰

‰

CPC (Cost-Per-Click) advertisements

„ The ordering of paid listings (displayed as “ Sponsored Links ” ) depends on

− other advertisers' bids

‹ A variation of the Vickrey auction .

†

The highest bidder wins, but the price paid is the second-highest bid

† a minimum bid of $0.01 per click (as of 2008)

"quality score" of all ads shown for a given search

‹ The quality score is determined by Google based on the historical clickthrough rates and the relevance of an advertiser's ad text and keywords

CPM (cost-per-thousand impressions)

„ Site targeted advertisements

„

„

„

Advertisers select the sites where they'd like their ad to appear, and set a CPM bid that applies for all those sites.

CPM ads are again ranked for display according partly to their CPM bid and with keyword-targeted CPC ads.

a minimum bid of $0.25 per thousand impressions

2008/3/26

Yuh-Jzer Joung

Google 31

Google AdSense

‰ For web publishers to make more revenue from advertising on their site while maintaining editorial quality.

‰ Google provides HTML ad code to place on the web pages on which you want to display AdWords ads, and then take care of the rest.

32 2008/3/26

Yuh-Jzer Joung

Google

Google AdSense

‰

‰

‰

AdSense for content

„ automatically crawls the content of your pages and delivers ads that are relevant to your audience and your site content

AdSense for search

„ allows website publishers to provide Google web and site search to their visitors.

AdSense for mobile content

2008/3/26

Yuh-Jzer Joung

Google 33

You Can Make Money Without

Doing Evil

‰ Not allow ads to be displayed on the results pages unless they're relevant to the results page on which they're shown.

‰ Advertising can be effective without being flashy.

‰ Advertising on Google is always clearly identified as a

“ Sponsored Link.

‰ No one can buy better PageRank. Never manipulate rankings to put partners higher in the search results.

2008/3/26

Yuh-Jzer Joung

Google 34

Online Ad War

‰ Google Attracting 25% of all US Online Ad Revenue [2006]

2008/3/26

Yuh-Jzer Joung

Google 35

Online Ad War (contd.)

2008/3/26

Yuh-Jzer Joung

Google 36

Google Earth

2008/3/26

Google 37

Google Earth: Overview

‰

‰

‰

‰

Developed by Keyhole, Inc. (acquired by Google in 2004), and then renamed to Google Earth in 2005.

Free with limited functionality

„

„

Google Earth Plus ($20 per year)

Google Earth Pro ($400 per year)

Was 2D, but goes to 3D in Nov. 2006 for some scenes.

„ In August 2007, Hamburg became the first city entirely shown in 3D.

Added a Sky tool for viewing stars and astronomical images in Aug. 2007.

2008/3/26

Yuh-Jzer Joung

Google 38

Some Spec

‰ Resolutions

„

„

Baseline resolutions

Global: Generally 15 m

U.S.: < 15 m, some states are in 1m

Typical high resolutions

U.S.: 1 m, 0.6 m, 0.3 m, 0.15 m (extremely rare; e.g. Cambridge and Google Campus, or Glendale)

Europe : 0.3 m, 0.15 m (e.g. Berlin, Z ü rich, Hamburg)

‰ Image ages

„

„

Most of the international urban image dates are from 2004 and have not been updated.

But cities in US are kelp current

2008/3/26

Yuh-Jzer Joung

Google 39

Google Earth: Demo

2008/3/26

Yuh-Jzer Joung

Google 40

Explore sky in Google Earth

‰ View the Sky in your place.

2008/3/26

Yuh-Jzer Joung

Google 41

National Security and Privacy Issues

‰ Some places are obscured through pixelization in Google

Earth and Google Maps dues to security concerns.

„ Neverthesless, high-resolution photos and aerial surveys of the property are readily available on the Internet elsewhere.

‰ Individual's right to privacy is less addressed as opposed to the state's right to secrecy.

2008/3/26

Yuh-Jzer Joung

Google 42

Competitor: Windows Live Local

‰ Windows Live Local Gets "Virtual Earth" 3-D Cities

2008/3/26

Yuh-Jzer Joung

Google 43

Google Map

2008/3/26

Google 44

Google Map: Overview

‰

‰

‰

Announced in Feb 8 2005, offer street maps, a route planner, and an urban business locator for numerous countries around the world.

„ Provides high-resolution satellite images for most urban areas in

Canada and the United States as well as

„ parts of New Zealand, Australia, Egypt, France, Germany, Hong

Kong, Iran, Iceland, Italy, Ireland, Iraq, Japan, Taiwan , the Bahamas,

Bermuda, Kuwait, Mexico, the Netherlands, the United Kingdom, and many other countries .

„ All the images shown in Google Maps' satellite mode are at least a year old (wiki).

Google Maps for Mobile lunched in late 2006.

Released Street View on May 25, 2007, Google, a new feature of

Google Maps for 360 ° panoramic street-level views of various U.S. cities.

2008/3/26

Yuh-Jzer Joung

Google 45

Resolution Can be this High!

2008/3/26

Yuh-Jzer Joung

Google deep in Chad, Africa

46

Resolution Can be this High (contd.)!

2008/3/26

Yuh-Jzer Joung

Google people at a well deep in Chad, Africa

47

Google Map: Implementation

‰ Uses AJAX (Asynchronous JavaScript and XML) technique.

‰ Locations are drawn dynamically by positioning a red pin on top of the map images

‰ GIS (Geographic Information System) data are provided by

Tele Atlas and NAVTEQ .

„ High-resolution satellite imagery are largely provided by

DigitalGlobe and its QuickBird satellite, with some imagery also from government sources.

‰ Provide API for developers to integrate Google Maps into their web sites.

2008/3/26

Yuh-Jzer Joung

Google 48

Maps Front-end Design

‰ Metaphor: infinite map under a small window

„

„

Accomplish with DHTML by splitting map into tiles

As user pans, rotate tiles to make map seem infinite

Pad

Map Tiles

View width

Google 2008/3/26

Yuh-Jzer Joung

49

Using Google Map APIs

‰ Putting map on page requires only two lines of JavaScript: var map = new GMap(document.getElementById("map")); map.centerAndZoom(new GPoint(-122.141944, 37.441944), 4);

2008/3/26

Yuh-Jzer Joung

Google 50

Google Map: Demo

2008/3/26

Yuh-Jzer Joung

‰ Some video for the use of location marker .

Google 51

The Potential of Google Map

‰ Overlay search results

‰ and much more …

+ + =

2008/3/26

Yuh-Jzer Joung

Google 52

Google Moon

‰ In honor of the 36th anniversary of the Apollo 11 moon landing on

July 20, 1969.

2008/3/26

Yuh-Jzer Joung

Google 53

Google Mars

‰ Lunched in Mar . 2006.

2008/3/26

Yuh-Jzer Joung

Google 54

Competitor: Yahoo! Map

‰ Yahoo! released their own Maps API

2008/3/26

Yuh-Jzer Joung

Google 55

Google Docs & Spreadsheets

2008/3/26

Google 56

Google Docs & Spreadsheets

‰ A free, Web-based word processor, spreadsheet, and presentation application, lunched in Oct. 11, 2006.

‰ Allow users to create and edit documents online while collaborating in real-time with other users.

2008/3/26

Yuh-Jzer Joung

Google 57

Google Docs & Spreadsheets: Growth

Rate

‰ A free, Web-based word processor, spreadsheet, and presentation application, lunched in Oct. 11, 2006.

2008/3/26

Yuh-Jzer Joung

Google 58

Software As A Service (SaaS)

‰ A software application delivery model where a software vendor develops a web-native software application and hosts and operates (either independently or through a third-party) the application for use by its customers over the Internet.

‰ Gartner Says 25 Percent of New Business Software Will Be

Delivered As Software As A Service by 2011.

The trend: Application Service Provider (ASP)

Î

On-Demand

Î

SaaS

2008/3/26

Yuh-Jzer Joung

Google 59

iGoogle: Start-Page

Service

2008/3/26

Google 60

StartPage

‰

‰

A self constructed, and often customized, webpage that uses small boxes or modules that can house weblinks, photo feeds, breaking news, search engines, bookmarking links, …

Also known as “ Web Desktop ” , “ Start-page service ” , “ Personalized

Portal ” , …

2008/3/26

Yuh-Jzer Joung

Google 61

iGoogle: Background

‰

‰

‰

Launched in May 2005 as “ Google Personalized Homepage ” , but renamed to current one on April 30, 2007.

Customizable AJAX-based start page for accessing the Web.

„

„

Supports the use of specially developed “ gadgets ” to display content on a user's page.

The gadgets interact with the user and utilize the Google

Gadgets API.

Competitors:

„ Netvibes , Pageflakes , My Yahoo , Microsoft's Live.com

.

2008/3/26

Yuh-Jzer Joung

Google 62

iGoogle: Demo

2008/3/26

Yuh-Jzer Joung

Google 63

2008/3/26

Yuh-Jzer Joung

Google Desktop

‰ First released in Oct. 2004.

‰ Desktop search software.

„ Can index several types of data

− email, web browsing history, office documents, instant messenger transcripts from AOL, Google,

MSN, Skype, Tencent QQ, and several multimedia file types.

Additional file types can be indexed through the use of plug-ins.

‰ Provide a number of sidebars

„ Email, Scratch Pad, Photos, News, Weather,

Google Talk, ...

Google 64

Google Gadgets

Windows sidebars for Vista

2008/3/26

Yuh-Jzer Joung

Google 65

Start-Page Service: The Potential

‰ Recall the battle of portals?

2008/3/26

Yuh-Jzer Joung

Google 66

Cloud Computing

2008/3/26

Google 67

Cloud Computing: Function comes from “the Cloud”

‰ A new jargon for describing the use of shared computing resources.

‰ Related concepts

„

„

„

Cluster Computing

Grid Computing

Utility Computing/on-demand computing

2008/3/26

Yuh-Jzer Joung

Google 68

Who Are in the Arena

2008/3/26

Yuh-Jzer Joung

Google Source: BusinessWeek 69

MapReduce: Simplified Data Processing on Large Clusters

[Google 2004]

‰ MapReduce: a programming model to

„

„

„

Process lots of data (TB)

Parallelize across hundreds/thousands of CPUs

Make this easy

‰ MapReduce: an associated implementation that provides

„

„

Automatic parallelization and distribution

Fault-tolerance

„

„

I/O scheduling

Status and monitoring

2008/3/26

Yuh-Jzer Joung

Google 70

MapReduce Programming model

‰ Two functions inspired by functional programming.

„ map (in_key, in_value) -> list(out_key, intermediate_value)

Processes input key/value pair

Produces set of intermediate pairs

(key

1

,value

1

) (key

2

,value

2

) … f f f f …

2008/3/26

Yuh-Jzer Joung

Google 71

Programming model (contd.)

„ reduce (out_key, list(intermediate_value)) -> list(out_value)

Combines all intermediate values for a particular key

Produces a set of merged output values (usually just one) inter_value

1 inter_value

2

… f f f f f f return initial value

2008/3/26

Yuh-Jzer Joung

Google 72

Putting It Together image how to use this to search documents in a corpus!

2008/3/26

Yuh-Jzer Joung

Source: Google Lab map() functions run in parallel reduce() functions may also run in parallel, but they must wait for map()

Google 73

Parallel Execution

2008/3/26

Yuh-Jzer Joung

Google

Source: Google Lab

74

Typical Cluster in MapReduce

‰ 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory

‰ Limited bisection bandwidth

‰ Storage is on local IDE disks

‰ GFS: distributed file system manages data

‰ Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines

‰ Implementation is a C++ library linked into user programs

2008/3/26

Yuh-Jzer Joung

Google 75

Fault Tolerance

‰ Handled via re-execution

‰ Master-Slave Model

„

„

On worker failure:

Detect failure via periodic heartbeats

Re-execute completed and in-progress map tasks

Re-execute in progress reduce tasks

Task completion committed through master

Master failure:

Could handle, but don't yet (master failure unlikely)

2008/3/26

Yuh-Jzer Joung

Google 76

Locality Optimization

‰ Master scheduling policy:

„

„ map() task inputs are divided into 64 MB blocks (Google File

System block size)

Asks GFS for locations of replicas of input file blocks

Try to have map() tasks on same machine as physical file data, or at least same rack.

Effect: Thousands of machines read input at local disk speed

Without this, rack switches limit read rate

2008/3/26

Yuh-Jzer Joung

Google 77

Refinement: Redundant Execution

‰ Slow workers significantly lengthen completion time

‰ Solution: Near end of phase, spawn backup copies of tasks, use results of first copy to finish

‰ Effect: Dramatically shortens job completion time

2008/3/26

Yuh-Jzer Joung

Google 78

Refinement: Local Merge

‰ “ Combiner ” functions can run on same machine as a mapper

‰ Causes a mini-reduce phase to occur before the real reduce phase, to save bandwidth

2008/3/26

Yuh-Jzer Joung

Google 79

Applications:

‰ Find content in thousands of thousands of documents.

‰ Inverted index construction

‰ web link-graph reversal

‰ web access log stats

‰ term-vector per host

‰ document clustering

‰ …

2008/3/26

Yuh-Jzer Joung

Google 80

Hadoop

‰ Essentially an open source version of Google ’ s MapReduce and its file system GFS.

‰ Yahoo! uses Hadoop extensively in its Web Search and

Advertising businesses.

„ Yahoo! Search Webmap uses Hadoop to run on a more than 10,000 core Linux cluster [as of 2008-0219]

Some statistics of Webmap:

‹ Number of links between pages in the index: roughly 1 trillion links

‹ Size of output: over 300 TB, compressed!

‹ Number of cores used to run a single Map-Reduce job: over 10,000

‹ Raw disk used in the production cluster: over 5 Petabytes

2008/3/26

Yuh-Jzer Joung

Google 81

Hadoop (contd.)

‰ IBM and Google announced a major initiative to use Hadoop to support university courses in Distributed Programming. [20071008]

„ A cluster of processors of more than 1,600 processors running Hadoop

2008/3/26

Yuh-Jzer Joung

Google 82

Google Labs

‰ Google's technology playground.

„ Favorite ideas that aren't quite ready for prime time.

2008/3/26

Yuh-Jzer Joung

Google 83

Where are the Innovations Come from

‰ Ideas come from everywhere!

„

„

All Google engineers are encouraged to spend 20% of their work time (one day per week) on projects that interest them.

Half of the new product launches originated from the 20% time.

And, more importantly, a pleasant environment to work.

‰ You are brilliant, we are hiring!

2008/3/26

Yuh-Jzer Joung

Google 84

Google Philosophy

‰ Focus on the user and all else will follow.

‰ It's best to do one thing really, really well.

‰ Fast is better than slow.

‰ Democracy on the web works.

‰ You don't need to be at your desk to need an answer.

‰ You can make money without doing evil.

‰ There's always more information out there.

‰ The need for information crosses all borders.

‰ You can be serious without a suit.

‰ Great just isn't good enough.

2008/3/26

Yuh-Jzer Joung

Google 85

Discussions

‰ Question 1: In late 2006, Google bought online video site

YouTube for US$1.65 billion in stock. Discuss the strategic implication of this merge.

2008/3/26

Yuh-Jzer Joung

Google 86

Discussions

‰ Question 2: Present some interesting projects/products from

Google Labs.

2008/3/26

Yuh-Jzer Joung

Google 87

Download