Models of Computation for Massive Data Jeff M. Phillips August 22, 2011

advertisement
Models of Computation for Massive Data
Jeff M. Phillips
August 22, 2011
Outline
Sequential:
I
External Memory / (I/O)-Efficient
I
Streaming
Parallel:
PRAM and BSP
I
MapReduce
I
GP-GPU
I
Distributed Computing
runtime
I
data size
RAM Model
RAM model (Von Neumann
Architecture):
I
CPU and Memory
I
CPU Operations (+, −, ∗, . . .)
constant time
I
Read, Write take constant
time.
RAM
CPU
Today’s Reality
Disk
What your computer actually
looks like:
I
Small number of CPUs.
Many variations!
bigger
3+ layers of memory
hierarchy.
faster
I
164 GB
RAM
2 GB
L2
4 MB
L1
CPU
L1
CPU
RAM Model
RAM model (Von Neumann
Architecture):
I
CPU and Memory
I
CPU Operations (+, −, ∗, . . .)
constant time
I
Read, Write take constant
time.
RAM
CPU
External Memory Model
D
block I/O
I
N = size of problem
instance
I
B = size of disk block
I
M = number of items
that fits in Memory
I
T = number of items in
output
I
I/O = block move
between Memory and Disk
M
P
External Memory Model
D
block I/O
I
N = size of problem
instance
I
B = size of disk block
I
M = number of items
that fits in Memory
I
T = number of items in
output
I
I/O = block move
between Memory and Disk
M
P
Advanced Data Structures
Streaming Model
memory
length m
CPU makes ”one pass” on data
CPU
I
I
I
word 2 [n]
I
I
Ordered set A = ha1 , a2 , . . . , am i
Each ai ∈ [n], size log n
Compute f (A) or maintain f (Ai )
for Ai = ha1 , a2 , . . . , ai i.
Space restricted to
S = O(poly(log m, log n)).
Updates O(poly(S)) for each ai .
Streaming Model
memory
length m
CPU makes ”one pass” on data
CPU
I
I
I
word 2 [n]
I
I
Ordered set A = ha1 , a2 , . . . , am i
Each ai ∈ [n], size log n
Compute f (A) or maintain f (Ai )
for Ai = ha1 , a2 , . . . , ai i.
Space restricted to
S = O(poly(log m, log n)).
Updates O(poly(S)) for each ai .
Advanced Algorithms: Approximate, Randomized
PRAM
Many (p) processors. Access
shared memory:
I
EREW : Exclusive Read
Exclusive Write
I
CREW : Concurrent Read
Exclusive Write
I
CRCW : Concurrent Read
Concurrent Write
Simple model, but has
shortcomings...
...such as Synchronization.
RAM
CPU1
CPU2
CPUp
PRAM
Many (p) processors. Access
shared memory:
I
EREW : Exclusive Read
Exclusive Write
I
CREW : Concurrent Read
Exclusive Write
I
CRCW : Concurrent Read
Concurrent Write
Simple model, but has
shortcomings...
...such as Synchronization.
RAM
CPU1
Advanced Algorithms
CPU2
CPUp
Bulk Synchronous Parallel
RAM
Each Processor has its own Memory
Parallelism Procedes in Rounds:
RAM
CPU
1. Compute: Each processor
computes on its own Data: wi .
2. Synchronize: Each processor sends
messages to others:
si = m × g × h.
3. Barrier: All processors wait until
others done.
RAM
CPU
CPU
RAM
RAM
Runtime: max wi + max si
CPU
CPU
Pro: Captures Parallelism and Synchronization
Con: Ignores Locality.
Bulk Synchronous Parallel
RAM
Each Processor has its own Memory
Parallelism Procedes in Rounds:
RAM
CPU
1. Compute: Each processor
computes on its own Data: wi .
2. Synchronize: Each processor sends
messages to others:
si = m × g × h.
RAM
CPU
RAM
3. Barrier: All processors wait until
others done.
Runtime: max wi + max si
CPU
RAM
CPU
CPU
Pro: Captures Parallelism and Synchronization
Con: Ignores Locality.
Bulk Synchronous Parallel
RAM
Each Processor has its own Memory
Parallelism Procedes in Rounds:
RAM
CPU
1. Compute: Each processor
computes on its own Data: wi .
2. Synchronize: Each processor sends
messages to others:
si = m × g × h.
RAM
CPU
RAM
3. Barrier: All processors wait until
others done.
Runtime: max wi + max si
CPU
RAM
CPU
CPU
Pro: Captures Parallelism and Synchronization
Con: Ignores Locality.
MapReduce
Each Processor has full hard drive,
data items < key, value >.
Parallelism Procedes in Rounds:
I
Map: assigns items to processor
by key.
I
Reduce: processes all items using
value. Usually combines many
items with same key.
Repeat M+R a constant number of
times, often only one round.
I
MAP
RAM
RAM
RAM
CPU
CPU
CPU
REDUCE
Optional post-processing step.
Pro: Robust (duplication) and simple. Can harness Locality
Con: Somewhat restrictive model
MapReduce
Each Processor has full hard drive,
data items < key, value >.
Parallelism Procedes in Rounds:
I
Map: assigns items to processor
by key.
I
Reduce: processes all items using
value. Usually combines many
items with same key.
Repeat M+R a constant number of
times, often only one round.
I
MAP
RAM
RAM
RAM
CPU
CPU
CPU
REDUCE
Optional post-processing step.
Pro: Robust (duplication) and simple. Can harness Locality
Con: Somewhat restrictive model
Advanced Algorithms
General Purpose GPU
Massive parallelism on your desktop.
Uses Graphics Processing Unit.
Designed for efficient video rasterizing.
Each processor corresponds to pixel p
I
depth buffer:
D(p) = mini ||x − wi ||
P
color buffer: C (p) = i αi χi
I
...
I
wi
p
X
Pro: Fine grain, massive parallelism. Cheap.
Con: Somewhat restrictive model. Small memory.
Distributed Computing
01100
01100
01
10
0
01100
Many small slow processors with data.
Communication very expensive.
Report to base station
I
Merge tree
I
Unorganized (peer-to-peer)
base
station
01
10
011
00
I
RAM
01100
0
CPU
01100
Data collection or Distribution
Distributed Computing
01100
01100
Report to base station
I
Merge tree
I
Unorganized (peer-to-peer)
01100
01100
01
10
011
00
I
01
10
Many small slow processors with data.
Communication very expensive.
0
01100
0
RAM
base
station
CPU
Data collection or Distribution
Distributed Computing
01100
01100
01100
I
Report to base station
Merge tree
I
Unorganized (peer-to-peer)
01
10
01100
01100
01
10
011
00
I
0
Many small slow processors with data.
Communication very expensive.
0
01100
Data collection or Distribution
Distributed Computing
01100
01100
01100
I
Report to base station
Merge tree
I
Unorganized (peer-to-peer)
01
10
01100
01100
01
10
011
00
I
0
Many small slow processors with data.
Communication very expensive.
0
01100
Data collection or Distribution
Advanced Algorithms: Approximate, Randomized
Themes
What are course goals?
I
How to analyze algorithms in each model
I
Taste of how to use each model
I
When to use each model
Themes
What are course goals?
I
How to analyze algorithms in each model
I
Taste of how to use each model
I
When to use each model
Work Plan:
I 2-3 weeks each model.
I
I
I
I
Course Project (1/2 grade).
I
I
I
I
Background and Model.
Example algorithms analysis in each model.
Small assignment (total = 1/2 grade).
Compare single problem in multiple models
Solve challenging problem in one model
Analyze challenging problem in one model
Access to Amazon’s EC2. More in about 1 month.
Class Survey
Q1:
A
B
C
D
Q2:
A
B
C
Q3:
A
B
Algorithms Background
What is the highest algorithms class you have taken?
What was the hardest topic?
Have you seen a randomized algorithm? (which one?)
Have you seen an approximation algorithm? (which one?)
Programming Background
Have you used C or C++?
Have you used Matlab?
What other languages have you coded in?
Class interest
Are you registered?
How certain are you to stay in the class? (choose one)
(a)
(b)
(c)
(d)
Definitely staying in!
Probably staying in.
Deciding between this and another class.
Just shopping around...
Data Group
Data Group Meeting
Thursdays @ 12-1pm in Graphics Annex
http://datagroup.cs.utah.edu/dbgroup.php
Download