Models of Computation for Massive Data Jeff M. Phillips August 28, 2013

advertisement

Models of Computation for Massive Data

Jeff M. Phillips

August 28, 2013

Outline

Sequential:

I External Memory / (I/O)-Efficient

I Streaming

Parallel:

I PRAM and BSP

I MapReduce

I GP-GPU

I Distributed Computing data size

RAM Model

RAM model (Von Neumann

Architecture):

I CPU and Memory

I CPU Operations (+, − , ∗ , . . .

) constant time

I Data stored as words , not bits .

I Read

,

Write take constant time.

RAM

CPU

Today’s Reality

What your computer actually looks like:

I 3+ layers of memory hierarchy.

I Small number of CPUs.

Many variations!

Disk

164 GB

2 GB

4 MB

L1

RAM

L2

L1

CPU CPU

RAM Model

RAM model (Von Neumann

Architecture):

I CPU and Memory

I CPU Operations (+, − , ∗ , . . .

) constant time

I Data stored as words , not bits .

I Read

,

Write take constant time.

RAM

CPU

External Memory Model

D

M block I/O

P

I

I

I

I

I

N = size of problem instance

B = size of disk block

M = number of items that fits in Memory

T = number of items in output

I/O = block move between Memory and Disk

Advanced Data Structures: Sorting, Searching

External Memory Model

D

M block I/O

I

I

I

I

I

N = size of problem instance

B = size of disk block

M = number of items that fits in Memory

T = number of items in output

I/O = block move between Memory and Disk

P

Advanced Data Structures: Sorting, Searching

Streaming Model memory

CPU word 2 [ n ]

CPU makes ”one pass” on data

I

I

I

I

I

Ordered set A = h a

1

, a

2

, . . . , a m i

Each a i

∈ [ n ], size log n

Compute f ( A ) or maintain f ( A i

) for A i

= h a

1

, a

2

, . . . , a i i .

Space restricted to

S = O (poly(log m , log n )).

Updates O (poly( S )) for each a i

.

Advanced Algorithms: Approximate, Randomized

Streaming Model memory

CPU word 2 [ n ]

CPU makes ”one pass” on data

I

I

I

I

I

Ordered set A = h a

1

, a

2

, . . . , a m i

Each a i

∈ [ n ], size log n

Compute f ( A ) or maintain f ( A i

) for A i

= h a

1

, a

2

, . . . , a i i .

Space restricted to

S = O (poly(log m , log n )).

Updates O (poly( S )) for each a i

.

Advanced Algorithms: Approximate, Randomized

PRAM

Many ( p ) processors. Access shared memory:

I EREW : Exclusive Read

Exclusive Write

I

I

CREW : Concurrent Read

Exclusive Write

CRCW : Concurrent Read

Concurrent Write

Simple model, but has shortcomings...

...such as Synchronization.

RAM

CPU

1

CPU

2

CPU p

Advanced Algorithms

PRAM

Many ( p ) processors. Access shared memory:

I EREW : Exclusive Read

Exclusive Write

I

I

CREW : Concurrent Read

Exclusive Write

CRCW : Concurrent Read

Concurrent Write

Simple model, but has shortcomings...

...such as Synchronization.

CPU

1

Advanced Algorithms

RAM

CPU

2

CPU p

Bulk Synchronous Parallel

RAM

Each Processor has its own Memory

Parallelism Procedes in Rounds:

1.

Compute: Each processor computes on its own Data: w i

.

2.

Synchronize: Each processor sends messages to others: s i

=

MessSize

×

CommCost

.

3.

Barrier: All processors wait until others done.

Runtime: max w i

+ max s i

RAM

CPU

RAM

CPU

Pro: Captures Parallelism and Synchronization

Con: Ignores Locality.

CPU

RAM

RAM

CPU

CPU

Bulk Synchronous Parallel

RAM

Each Processor has its own Memory

Parallelism Procedes in Rounds:

1.

Compute: Each processor computes on its own Data: w i

.

2.

Synchronize: Each processor sends messages to others: s i

=

MessSize

×

CommCost

.

3.

Barrier: All processors wait until others done.

Runtime: max w i

+ max s i

RAM

CPU

RAM

CPU

Pro: Captures Parallelism and Synchronization

Con: Ignores Locality.

CPU

RAM

RAM

CPU

CPU

Bulk Synchronous Parallel

RAM

Each Processor has its own Memory

Parallelism Procedes in Rounds:

1.

Compute: Each processor computes on its own Data: w i

.

2.

Synchronize: Each processor sends messages to others: s i

=

MessSize

×

CommCost

.

3.

Barrier: All processors wait until others done.

Runtime: max w i

+ max s i

RAM

CPU

RAM

CPU

Pro: Captures Parallelism and Synchronization

Con: Ignores Locality.

CPU

RAM

RAM

CPU

CPU

MapReduce

Each Processor has full hard drive, data items < key , value > .

Parallelism Procedes in Rounds:

I

Map: assigns items to processor by key

.

I Reduce: processes all items using value

. Usually combines many items with same key

.

Repeat M+R a constant number of times, often only one round.

I Optional post-processing step.

RAM

CPU

MAP

RAM

CPU

REDUCE

Pro: Robust (duplication) and simple. Can harness Locality

Con: Somewhat restrictive model

RAM

CPU

Advanced Algorithms

MapReduce

Each Processor has full hard drive, data items < key , value > .

Parallelism Procedes in Rounds:

I

Map: assigns items to processor by key

.

I Reduce: processes all items using value

. Usually combines many items with same key

.

Repeat M+R a constant number of times, often only one round.

I Optional post-processing step.

RAM

CPU

MAP

RAM

CPU

REDUCE

Pro: Robust (duplication) and simple. Can harness Locality

Con: Somewhat restrictive model

Advanced Algorithms

RAM

CPU

General Purpose GPU

Massive parallelism on your desktop.

Uses G raphics P rocessing U nit.

Designed for efficient video rasterizing.

Each processor corresponds to pixel p

I

I

I depth buffer:

D ( p ) = min i

|| x − w i

|| color buffer: C ( p ) = P i

α i

χ i

...

p w i

X

Pro: Fine grain, massive parallelism. Cheap. Harnesses Locality.

Con: Somewhat restrictive model, hierarchy. Small memory.

Distributed Computing

01100

Many small slow processors with data.

Communication very expensive.

I

Report to base station

I Merge tree

I Unorganized (peer-to-peer)

01100

01

100

01100 01100

RAM base station

CPU

100

01

01

100

01100

Data collection or Distribution

Distributed Computing

01100

Many small slow processors with data.

Communication very expensive.

I Report to base station

I

I

Merge tree

Unorganized (peer-to-peer)

01100

01

100

01100 01100

01100

RAM base station

CPU

Data collection or Distribution

100

01

01

100

Distributed Computing

01100

01100

01100

Many small slow processors with data.

Communication very expensive.

I Report to base station

I

I

Merge tree

Unorganized (peer-to-peer)

Data collection or Distribution

01100

01

100

01100

01100

100

01

01

100

Distributed Computing

01100

01100

Many small slow processors with data.

Communication very expensive.

I

I

Report to base station

Merge tree

I Unorganized (peer-to-peer)

01100

01100

01

100

01100

Data collection or Distribution

Advanced Algorithms: Approximate, Randomized

01100

100

01

01

100

Themes

What are course goals?

I How to analyze algorithms in each model

I Taste of how to use each model

I When to use each model

Work Plan:

I

1-3 weeks each model.

I

I

Background and Model.

Example algorithms analysis in each model.

I/O Stream Parallel MapReduce GPU Distributed

4 5 4 4 3 3

Themes

What are course goals?

I How to analyze algorithms in each model

I Taste of how to use each model

I When to use each model

Work Plan:

I

1-3 weeks each model.

I

I

Background and Model.

Example algorithms analysis in each model.

I/O Stream Parallel MapReduce GPU Distributed

4 5 4 4 3 3

Class Work

1 Credit Students:

I

Attend Class. (some Fridays less important)

I Ask Questions.

I If above lacking, may have quizzes.

I Scribing Notes, Video-taping Lectures, or Giving Lectures.

3 Credit Students:

Must also do a project!

I

I

Project Proposal (Aug 30).

Approved or Rejected by Sept 4.

Intermediate Report (Oct 23).

I Presentations (Dec 11 or 13).

Sequential Review

Turing Machines (Alan Turing 1936)

I Single Tape: MoveL, MoveR, read, write

I

I each constant time content pointer memory, infinite tape (memory)

Von Neumann Architecture (Von Neumann + Eckert + Mauchly

1945)

I

I based on ENIAC

CPU + Memory (RAM): read, write, op = constant time

How fast are the following?

I

I

I

Scanning (max):

TM

: O ( n )

VNA

: O ( n )

Sorting:

TM

: O ( n

2 )

VNA

: O ( n log n )

Searching:

TM

: O ( n )

VNA

: O (log n )

Sequential Review

Turing Machines (Alan Turing 1936)

I Single Tape: MoveL, MoveR, read, write

I

I each constant time content pointer memory, infinite tape (memory)

Von Neumann Architecture (Von Neumann + Eckert + Mauchly

1945)

I

I based on ENIAC

CPU + Memory (RAM): read, write, op = constant time

How fast are the following?

I

I

I

Scanning (max):

TM

: O ( n )

VNA

: O ( n )

Sorting:

TM

: O ( n

2 )

VNA

: O ( n log n )

Searching:

TM

: O ( n )

VNA

: O (log n )

Sequential Review

Turing Machines (Alan Turing 1936)

I Single Tape: MoveL, MoveR, read, write

I

I each constant time content pointer memory, infinite tape (memory)

Von Neumann Architecture (Von Neumann + Eckert + Mauchly

1945)

I

I based on ENIAC

CPU + Memory (RAM): read, write, op = constant time

How fast are the following?

I

I

I

Scanning (max):

TM

: O ( n )

VNA

: O ( n )

Sorting:

TM

: O ( n

2 )

VNA

: O ( n log n )

Searching:

TM

: O ( n )

VNA

: O (log n )

Sequential Review

Turing Machines (Alan Turing 1936)

I Single Tape: MoveL, MoveR, read, write

I

I each constant time content pointer memory, infinite tape (memory)

Von Neumann Architecture (Von Neumann + Eckert + Mauchly

1945)

I

I based on ENIAC

CPU + Memory (RAM): read, write, op = constant time

How fast are the following?

I Scanning (max):

I

I

TM

: O ( n )

VNA

: O ( n )

Sorting:

TM

: O ( n

2 )

VNA

: O ( n log n )

Searching:

TM

: O ( n )

VNA

: O (log n )

Sequential Review

Turing Machines (Alan Turing 1936)

I Single Tape: MoveL, MoveR, read, write

I

I each constant time content pointer memory, infinite tape (memory)

Von Neumann Architecture (Von Neumann + Eckert + Mauchly

1945)

I

I based on ENIAC

CPU + Memory (RAM): read, write, op = constant time

How fast are the following?

I Scanning (max):

TM

: O ( n )

VNA

: O ( n )

I

I

Sorting:

TM

: O ( n

2 )

VNA

: O ( n log n )

Searching:

TM

: O ( n )

VNA

: O (log n )

Sequential Review

Turing Machines (Alan Turing 1936)

I Single Tape: MoveL, MoveR, read, write

I

I each constant time content pointer memory, infinite tape (memory)

Von Neumann Architecture (Von Neumann + Eckert + Mauchly

1945)

I

I based on ENIAC

CPU + Memory (RAM): read, write, op = constant time

How fast are the following?

I

I

Scanning (max):

TM

: O ( n )

VNA

: O ( n )

Sorting:

I

TM

: O ( n

2 )

VNA

: O ( n log n )

Searching:

TM

: O ( n )

VNA

: O (log n )

Sequential Review

Turing Machines (Alan Turing 1936)

I Single Tape: MoveL, MoveR, read, write

I

I each constant time content pointer memory, infinite tape (memory)

Von Neumann Architecture (Von Neumann + Eckert + Mauchly

1945)

I

I based on ENIAC

CPU + Memory (RAM): read, write, op = constant time

How fast are the following?

I

I

Scanning (max):

TM

: O ( n )

VNA

: O ( n )

Sorting:

TM

: O ( n

2 )

VNA

: O ( n log n )

I Searching:

TM

: O ( n )

VNA

: O (log n )

Sequential Review

Turing Machines (Alan Turing 1936)

I Single Tape: MoveL, MoveR, read, write

I

I each constant time content pointer memory, infinite tape (memory)

Von Neumann Architecture (Von Neumann + Eckert + Mauchly

1945)

I

I based on ENIAC

CPU + Memory (RAM): read, write, op = constant time

How fast are the following?

I

I

I

Scanning (max):

TM

: O ( n )

VNA

: O ( n )

Sorting:

TM

: O ( n

2 )

VNA

: O ( n log n )

Searching:

TM

: O ( n )

VNA

: O (log n )

Sequential Review

Turing Machines (Alan Turing 1936)

I Single Tape: MoveL, MoveR, read, write

I

I each constant time content pointer memory, infinite tape (memory)

Von Neumann Architecture (Von Neumann + Eckert + Mauchly

1945)

I

I based on ENIAC

CPU + Memory (RAM): read, write, op = constant time

How fast are the following?

I

I

I

Scanning (max):

TM

: O ( n )

VNA

: O ( n )

Sorting:

TM

: O ( n

2 )

VNA

: O ( n log n )

Searching:

TM

: O ( n )

VNA

: O (log n )

Asymptotics

How large (in seconds) is:

I

I

Searching (log n )

Max ( n )

I

I

Merge-Sort ( n log n )

Bubble-Sort ( n

2 ) ... or Dynamic Programming n =

Search

10

0.000001

10

2

0.000001

10

3

0.000001

10

4

0.000002

10

5

0.000001

10

6

0.000002

10

7

0.000002

10

8

0.000007

10

9

0.001871

Max

Merge

Bubble

0.000003

0.000005

0.000003

0.000005

0.000030

0.000105

0.000006

0.000200

0.007848

0.000048

0.002698

0.812912

0.000387

0.029566

83.12960

0.003988

0.484016

∼ 2 hour

0.040698

7.833908

∼ 9 days

9.193987

137.9388

-

> 15 min

-

Complexity Theory:

I

I

I log

: poly log( n ) = log c

P

: poly( n ) = n c n (... need to load data)

(many cool algorithms)

EXP

: exp( n ) = c n (usually hopeless ... but 0 .

00001 n not bad)

I NP

: verify solution in

P

, find solution conjectured

EXP

(If

EXP number parallel machines, then in

P time)

Asymptotics

How large (in seconds) is:

I

I

Searching (log n )

Max ( n )

I

I

Merge-Sort ( n log n )

Bubble-Sort ( n

2 ) ... or Dynamic Programming n =

Search

Max

10

0.000001

0.000003

10

2

0.000001

0.000005

10

3

0.000001

0.000006

10

4

0.000002

0.000048

10

5

0.000001

0.000387

10

6

0.000002

0.003988

10

7

0.000002

0.040698

10

8

0.000007

9.193987

10

9

0.001871

> 15 min

Merge

Bubble

0.000005

0.000003

0.000030

0.000105

0.000200

0.007848

0.002698

0.812912

0.029566

83.12960

0.484016

∼ 2 hour

7.833908

∼ 9 days

137.9388

-

Complexity Theory:

I

I

I log

: poly log( n ) = log c

P

: poly( n ) = n c n (... need to load data)

(many cool algorithms)

EXP

: exp( n ) = c n (usually hopeless ... but 0 .

00001 n not bad)

I NP

: verify solution in

P

, find solution conjectured

EXP

(If

EXP number parallel machines, then in

P time)

-

Asymptotics

How large (in seconds) is:

I

I

Searching (log n )

Max ( n )

I

I

Merge-Sort ( n log n )

Bubble-Sort ( n

2 ) ... or Dynamic Programming n =

Search

Max

Merge

10

0.000001

0.000003

0.000005

10

2

0.000001

0.000005

0.000030

10

3

0.000001

0.000006

0.000200

10

4

0.000002

0.000048

0.002698

10

5

0.000001

0.000387

0.029566

10

6

0.000002

0.003988

0.484016

10

7

0.000002

0.040698

7.833908

10

8

0.000007

9.193987

137.9388

10

9

0.001871

> 15 min

-

Bubble 0.000003

0.000105

0.007848

0.812912

83.12960

∼ 2 hour ∼ 9 days -

Complexity Theory:

I

I

I log

: poly log( n ) = log c

P

: poly( n ) = n c n (... need to load data)

(many cool algorithms)

EXP

: exp( n ) = c n (usually hopeless ... but 0 .

00001 n not bad)

I NP

: verify solution in

P

, find solution conjectured

EXP

(If

EXP number parallel machines, then in

P time)

Asymptotics

How large (in seconds) is:

I

I

Searching (log n )

Max ( n )

I

I

Merge-Sort ( n log n )

Bubble-Sort ( n

2 ) ... or Dynamic Programming n =

Search

Max

Merge

Bubble

10

0.000001

0.000003

0.000005

0.000003

10

2

0.000001

0.000005

0.000030

0.000105

10

3

0.000001

0.000006

0.000200

0.007848

10

4

0.000002

0.000048

0.002698

0.812912

10

5

0.000001

0.000387

0.029566

83.12960

10

6

0.000002

0.003988

0.484016

∼ 2 hour

10

7

0.000002

0.040698

7.833908

∼ 9 days

10

8

0.000007

9.193987

137.9388

-

10

9

0.001871

> 15 min

-

Complexity Theory:

I

I

I log

: poly log( n ) = log c

P

: poly( n ) = n c n (... need to load data)

(many cool algorithms)

EXP

: exp( n ) = c n (usually hopeless ... but 0 .

00001 n not bad)

I NP

: verify solution in

P

, find solution conjectured

EXP

(If

EXP number parallel machines, then in

P time)

Asymptotics

How large (in seconds) is:

I

I

Searching (log n )

Max ( n )

I

I

Merge-Sort ( n log n )

Bubble-Sort ( n

2 ) ... or Dynamic Programming n =

Search

Max

Merge

Bubble

10

0.000001

0.000003

0.000005

0.000003

10

2

0.000001

0.000005

0.000030

0.000105

10

3

0.000001

0.000006

0.000200

0.007848

10

4

0.000002

0.000048

0.002698

0.812912

10

5

0.000001

0.000387

0.029566

83.12960

10

6

0.000002

0.003988

0.484016

∼ 2 hour

10

7

0.000002

0.040698

7.833908

∼ 9 days

10

8

0.000007

9.193987

137.9388

-

10

9

0.001871

> 15 min

-

Complexity Theory:

I log

: poly log( n ) = log c n (... need to load data)

I

I

I

P

: poly( n ) = n c (many cool algorithms)

EXP

: exp( n ) = c n (usually hopeless ... but 0 .

00001 n not bad)

NP

: verify solution in

P

, find solution conjectured

EXP

(If

EXP number parallel machines, then in

P time)

Asymptotics

How large (in seconds) is:

I

I

Searching (log n )

Max ( n )

I

I

Merge-Sort ( n log n )

Bubble-Sort ( n

2 ) ... or Dynamic Programming n =

Search

Max

Merge

Bubble

10

0.000001

0.000003

0.000005

0.000003

10

2

0.000001

0.000005

0.000030

0.000105

10

3

0.000001

0.000006

0.000200

0.007848

10

4

0.000002

0.000048

0.002698

0.812912

10

5

0.000001

0.000387

0.029566

83.12960

10

6

0.000002

0.003988

0.484016

∼ 2 hour

10

7

0.000002

0.040698

7.833908

∼ 9 days

10

8

0.000007

9.193987

137.9388

-

10

9

0.001871

> 15 min

-

Complexity Theory:

I

I log

: poly log( n ) = log c

P

: poly( n ) = n c n (... need to load data)

(many cool algorithms)

I

I

EXP

: exp( n ) = c n (usually hopeless ... but 0 .

00001 n not bad)

NP

: verify solution in

P

, find solution conjectured

EXP

(If

EXP number parallel machines, then in

P time)

Asymptotics

How large (in seconds) is:

I

I

Searching (log n )

Max ( n )

I

I

Merge-Sort ( n log n )

Bubble-Sort ( n

2 ) ... or Dynamic Programming n =

Search

Max

Merge

Bubble

10

0.000001

0.000003

0.000005

0.000003

10

2

0.000001

0.000005

0.000030

0.000105

10

3

0.000001

0.000006

0.000200

0.007848

10

4

0.000002

0.000048

0.002698

0.812912

10

5

0.000001

0.000387

0.029566

83.12960

10

6

0.000002

0.003988

0.484016

∼ 2 hour

10

7

0.000002

0.040698

7.833908

∼ 9 days

10

8

0.000007

9.193987

137.9388

-

10

9

0.001871

> 15 min

-

Complexity Theory:

I

I

I log

: poly log( n ) = log c

P

: poly( n ) = n c n (... need to load data)

(many cool algorithms)

EXP

: exp( n ) = c n (usually hopeless ... but 0 .

00001 n not bad)

I NP

: verify solution in

P

, find solution conjectured

EXP

(If

EXP number parallel machines, then in

P time)

Asymptotics

How large (in seconds) is:

I

I

Searching (log n )

Max ( n )

I

I

Merge-Sort ( n log n )

Bubble-Sort ( n

2 ) ... or Dynamic Programming n =

Search

Max

Merge

Bubble

10

0.000001

0.000003

0.000005

0.000003

10

2

0.000001

0.000005

0.000030

0.000105

10

3

0.000001

0.000006

0.000200

0.007848

10

4

0.000002

0.000048

0.002698

0.812912

10

5

0.000001

0.000387

0.029566

83.12960

10

6

0.000002

0.003988

0.484016

∼ 2 hour

10

7

0.000002

0.040698

7.833908

∼ 9 days

10

8

0.000007

9.193987

137.9388

-

10

9

0.001871

> 15 min

-

Complexity Theory:

I

I

I log

: poly log( n ) = log c

P

: poly( n ) = n c n (... need to load data)

(many cool algorithms)

EXP

: exp( n ) = c n (usually hopeless ... but 0 .

00001 n not bad)

I NP

: verify solution in

P

, find solution conjectured

EXP

(If

EXP number parallel machines, then in

P

time)

Data Group

Data Group Meeting

Thursdays @ 12:15-1:30pm in LCR

(to be confirmed) http://datagroup.cs.utah.edu/dbgroup.php

Download