Models of Computation for Massive Data
Jeff M. Phillips
August 28, 2013
Outline
Sequential:
I External Memory / (I/O)-Efficient
I Streaming
Parallel:
I PRAM and BSP
I MapReduce
I GP-GPU
I Distributed Computing data size
RAM Model
RAM model (Von Neumann
Architecture):
I CPU and Memory
I CPU Operations (+, − , ∗ , . . .
) constant time
I Data stored as words , not bits .
I Read
,
Write take constant time.
RAM
CPU
Today’s Reality
What your computer actually looks like:
I 3+ layers of memory hierarchy.
I Small number of CPUs.
Many variations!
Disk
164 GB
2 GB
4 MB
L1
RAM
L2
L1
CPU CPU
RAM Model
RAM model (Von Neumann
Architecture):
I CPU and Memory
I CPU Operations (+, − , ∗ , . . .
) constant time
I Data stored as words , not bits .
I Read
,
Write take constant time.
RAM
CPU
External Memory Model
D
M block I/O
P
I
I
I
I
I
N = size of problem instance
B = size of disk block
M = number of items that fits in Memory
T = number of items in output
I/O = block move between Memory and Disk
Advanced Data Structures: Sorting, Searching
External Memory Model
D
M block I/O
I
I
I
I
I
N = size of problem instance
B = size of disk block
M = number of items that fits in Memory
T = number of items in output
I/O = block move between Memory and Disk
P
Advanced Data Structures: Sorting, Searching
Streaming Model memory
CPU word 2 [ n ]
CPU makes ”one pass” on data
I
I
I
I
I
Ordered set A = h a
1
, a
2
, . . . , a m i
Each a i
∈ [ n ], size log n
Compute f ( A ) or maintain f ( A i
) for A i
= h a
1
, a
2
, . . . , a i i .
Space restricted to
S = O (poly(log m , log n )).
Updates O (poly( S )) for each a i
.
Advanced Algorithms: Approximate, Randomized
Streaming Model memory
CPU word 2 [ n ]
CPU makes ”one pass” on data
I
I
I
I
I
Ordered set A = h a
1
, a
2
, . . . , a m i
Each a i
∈ [ n ], size log n
Compute f ( A ) or maintain f ( A i
) for A i
= h a
1
, a
2
, . . . , a i i .
Space restricted to
S = O (poly(log m , log n )).
Updates O (poly( S )) for each a i
.
Advanced Algorithms: Approximate, Randomized
PRAM
Many ( p ) processors. Access shared memory:
I EREW : Exclusive Read
Exclusive Write
I
I
CREW : Concurrent Read
Exclusive Write
CRCW : Concurrent Read
Concurrent Write
Simple model, but has shortcomings...
...such as Synchronization.
RAM
CPU
1
CPU
2
CPU p
Advanced Algorithms
PRAM
Many ( p ) processors. Access shared memory:
I EREW : Exclusive Read
Exclusive Write
I
I
CREW : Concurrent Read
Exclusive Write
CRCW : Concurrent Read
Concurrent Write
Simple model, but has shortcomings...
...such as Synchronization.
CPU
1
Advanced Algorithms
RAM
CPU
2
CPU p
Bulk Synchronous Parallel
RAM
Each Processor has its own Memory
Parallelism Procedes in Rounds:
1.
Compute: Each processor computes on its own Data: w i
.
2.
Synchronize: Each processor sends messages to others: s i
=
MessSize
×
CommCost
.
3.
Barrier: All processors wait until others done.
Runtime: max w i
+ max s i
RAM
CPU
RAM
CPU
Pro: Captures Parallelism and Synchronization
Con: Ignores Locality.
CPU
RAM
RAM
CPU
CPU
Bulk Synchronous Parallel
RAM
Each Processor has its own Memory
Parallelism Procedes in Rounds:
1.
Compute: Each processor computes on its own Data: w i
.
2.
Synchronize: Each processor sends messages to others: s i
=
MessSize
×
CommCost
.
3.
Barrier: All processors wait until others done.
Runtime: max w i
+ max s i
RAM
CPU
RAM
CPU
Pro: Captures Parallelism and Synchronization
Con: Ignores Locality.
CPU
RAM
RAM
CPU
CPU
Bulk Synchronous Parallel
RAM
Each Processor has its own Memory
Parallelism Procedes in Rounds:
1.
Compute: Each processor computes on its own Data: w i
.
2.
Synchronize: Each processor sends messages to others: s i
=
MessSize
×
CommCost
.
3.
Barrier: All processors wait until others done.
Runtime: max w i
+ max s i
RAM
CPU
RAM
CPU
Pro: Captures Parallelism and Synchronization
Con: Ignores Locality.
CPU
RAM
RAM
CPU
CPU
MapReduce
Each Processor has full hard drive, data items < key , value > .
Parallelism Procedes in Rounds:
I
Map: assigns items to processor by key
.
I Reduce: processes all items using value
. Usually combines many items with same key
.
Repeat M+R a constant number of times, often only one round.
I Optional post-processing step.
RAM
CPU
MAP
RAM
CPU
REDUCE
Pro: Robust (duplication) and simple. Can harness Locality
Con: Somewhat restrictive model
RAM
CPU
Advanced Algorithms
MapReduce
Each Processor has full hard drive, data items < key , value > .
Parallelism Procedes in Rounds:
I
Map: assigns items to processor by key
.
I Reduce: processes all items using value
. Usually combines many items with same key
.
Repeat M+R a constant number of times, often only one round.
I Optional post-processing step.
RAM
CPU
MAP
RAM
CPU
REDUCE
Pro: Robust (duplication) and simple. Can harness Locality
Con: Somewhat restrictive model
Advanced Algorithms
RAM
CPU
General Purpose GPU
Massive parallelism on your desktop.
Uses G raphics P rocessing U nit.
Designed for efficient video rasterizing.
Each processor corresponds to pixel p
I
I
I depth buffer:
D ( p ) = min i
|| x − w i
|| color buffer: C ( p ) = P i
α i
χ i
...
p w i
X
Pro: Fine grain, massive parallelism. Cheap. Harnesses Locality.
Con: Somewhat restrictive model, hierarchy. Small memory.
Distributed Computing
01100
Many small slow processors with data.
Communication very expensive.
I
Report to base station
I Merge tree
I Unorganized (peer-to-peer)
01100
01
100
01100 01100
RAM base station
CPU
100
01
01
100
01100
Data collection or Distribution
Distributed Computing
01100
Many small slow processors with data.
Communication very expensive.
I Report to base station
I
I
Merge tree
Unorganized (peer-to-peer)
01100
01
100
01100 01100
01100
RAM base station
CPU
Data collection or Distribution
100
01
01
100
Distributed Computing
01100
01100
01100
Many small slow processors with data.
Communication very expensive.
I Report to base station
I
I
Merge tree
Unorganized (peer-to-peer)
Data collection or Distribution
01100
01
100
01100
01100
100
01
01
100
Distributed Computing
01100
01100
Many small slow processors with data.
Communication very expensive.
I
I
Report to base station
Merge tree
I Unorganized (peer-to-peer)
01100
01100
01
100
01100
Data collection or Distribution
Advanced Algorithms: Approximate, Randomized
01100
100
01
01
100
Themes
What are course goals?
I How to analyze algorithms in each model
I Taste of how to use each model
I When to use each model
Work Plan:
I
1-3 weeks each model.
I
I
Background and Model.
Example algorithms analysis in each model.
I/O Stream Parallel MapReduce GPU Distributed
4 5 4 4 3 3
Themes
What are course goals?
I How to analyze algorithms in each model
I Taste of how to use each model
I When to use each model
Work Plan:
I
1-3 weeks each model.
I
I
Background and Model.
Example algorithms analysis in each model.
I/O Stream Parallel MapReduce GPU Distributed
4 5 4 4 3 3
Class Work
1 Credit Students:
I
Attend Class. (some Fridays less important)
I Ask Questions.
I If above lacking, may have quizzes.
I Scribing Notes, Video-taping Lectures, or Giving Lectures.
3 Credit Students:
Must also do a project!
I
I
Project Proposal (Aug 30).
Approved or Rejected by Sept 4.
Intermediate Report (Oct 23).
I Presentations (Dec 11 or 13).
Sequential Review
Turing Machines (Alan Turing 1936)
I Single Tape: MoveL, MoveR, read, write
I
I each constant time content pointer memory, infinite tape (memory)
Von Neumann Architecture (Von Neumann + Eckert + Mauchly
1945)
I
I based on ENIAC
CPU + Memory (RAM): read, write, op = constant time
How fast are the following?
I
I
I
Scanning (max):
TM
: O ( n )
VNA
: O ( n )
Sorting:
TM
: O ( n
2 )
VNA
: O ( n log n )
Searching:
TM
: O ( n )
VNA
: O (log n )
Sequential Review
Turing Machines (Alan Turing 1936)
I Single Tape: MoveL, MoveR, read, write
I
I each constant time content pointer memory, infinite tape (memory)
Von Neumann Architecture (Von Neumann + Eckert + Mauchly
1945)
I
I based on ENIAC
CPU + Memory (RAM): read, write, op = constant time
How fast are the following?
I
I
I
Scanning (max):
TM
: O ( n )
VNA
: O ( n )
Sorting:
TM
: O ( n
2 )
VNA
: O ( n log n )
Searching:
TM
: O ( n )
VNA
: O (log n )
Sequential Review
Turing Machines (Alan Turing 1936)
I Single Tape: MoveL, MoveR, read, write
I
I each constant time content pointer memory, infinite tape (memory)
Von Neumann Architecture (Von Neumann + Eckert + Mauchly
1945)
I
I based on ENIAC
CPU + Memory (RAM): read, write, op = constant time
How fast are the following?
I
I
I
Scanning (max):
TM
: O ( n )
VNA
: O ( n )
Sorting:
TM
: O ( n
2 )
VNA
: O ( n log n )
Searching:
TM
: O ( n )
VNA
: O (log n )
Sequential Review
Turing Machines (Alan Turing 1936)
I Single Tape: MoveL, MoveR, read, write
I
I each constant time content pointer memory, infinite tape (memory)
Von Neumann Architecture (Von Neumann + Eckert + Mauchly
1945)
I
I based on ENIAC
CPU + Memory (RAM): read, write, op = constant time
How fast are the following?
I Scanning (max):
I
I
TM
: O ( n )
VNA
: O ( n )
Sorting:
TM
: O ( n
2 )
VNA
: O ( n log n )
Searching:
TM
: O ( n )
VNA
: O (log n )
Sequential Review
Turing Machines (Alan Turing 1936)
I Single Tape: MoveL, MoveR, read, write
I
I each constant time content pointer memory, infinite tape (memory)
Von Neumann Architecture (Von Neumann + Eckert + Mauchly
1945)
I
I based on ENIAC
CPU + Memory (RAM): read, write, op = constant time
How fast are the following?
I Scanning (max):
TM
: O ( n )
VNA
: O ( n )
I
I
Sorting:
TM
: O ( n
2 )
VNA
: O ( n log n )
Searching:
TM
: O ( n )
VNA
: O (log n )
Sequential Review
Turing Machines (Alan Turing 1936)
I Single Tape: MoveL, MoveR, read, write
I
I each constant time content pointer memory, infinite tape (memory)
Von Neumann Architecture (Von Neumann + Eckert + Mauchly
1945)
I
I based on ENIAC
CPU + Memory (RAM): read, write, op = constant time
How fast are the following?
I
I
Scanning (max):
TM
: O ( n )
VNA
: O ( n )
Sorting:
I
TM
: O ( n
2 )
VNA
: O ( n log n )
Searching:
TM
: O ( n )
VNA
: O (log n )
Sequential Review
Turing Machines (Alan Turing 1936)
I Single Tape: MoveL, MoveR, read, write
I
I each constant time content pointer memory, infinite tape (memory)
Von Neumann Architecture (Von Neumann + Eckert + Mauchly
1945)
I
I based on ENIAC
CPU + Memory (RAM): read, write, op = constant time
How fast are the following?
I
I
Scanning (max):
TM
: O ( n )
VNA
: O ( n )
Sorting:
TM
: O ( n
2 )
VNA
: O ( n log n )
I Searching:
TM
: O ( n )
VNA
: O (log n )
Sequential Review
Turing Machines (Alan Turing 1936)
I Single Tape: MoveL, MoveR, read, write
I
I each constant time content pointer memory, infinite tape (memory)
Von Neumann Architecture (Von Neumann + Eckert + Mauchly
1945)
I
I based on ENIAC
CPU + Memory (RAM): read, write, op = constant time
How fast are the following?
I
I
I
Scanning (max):
TM
: O ( n )
VNA
: O ( n )
Sorting:
TM
: O ( n
2 )
VNA
: O ( n log n )
Searching:
TM
: O ( n )
VNA
: O (log n )
Sequential Review
Turing Machines (Alan Turing 1936)
I Single Tape: MoveL, MoveR, read, write
I
I each constant time content pointer memory, infinite tape (memory)
Von Neumann Architecture (Von Neumann + Eckert + Mauchly
1945)
I
I based on ENIAC
CPU + Memory (RAM): read, write, op = constant time
How fast are the following?
I
I
I
Scanning (max):
TM
: O ( n )
VNA
: O ( n )
Sorting:
TM
: O ( n
2 )
VNA
: O ( n log n )
Searching:
TM
: O ( n )
VNA
: O (log n )
Asymptotics
How large (in seconds) is:
I
I
Searching (log n )
Max ( n )
I
I
Merge-Sort ( n log n )
Bubble-Sort ( n
2 ) ... or Dynamic Programming n =
Search
10
0.000001
10
2
0.000001
10
3
0.000001
10
4
0.000002
10
5
0.000001
10
6
0.000002
10
7
0.000002
10
8
0.000007
10
9
0.001871
Max
Merge
Bubble
0.000003
0.000005
0.000003
0.000005
0.000030
0.000105
0.000006
0.000200
0.007848
0.000048
0.002698
0.812912
0.000387
0.029566
83.12960
0.003988
0.484016
∼ 2 hour
0.040698
7.833908
∼ 9 days
9.193987
137.9388
-
> 15 min
-
Complexity Theory:
I
I
I log
: poly log( n ) = log c
P
: poly( n ) = n c n (... need to load data)
(many cool algorithms)
EXP
: exp( n ) = c n (usually hopeless ... but 0 .
00001 n not bad)
I NP
: verify solution in
P
, find solution conjectured
EXP
(If
EXP number parallel machines, then in
P time)
Asymptotics
How large (in seconds) is:
I
I
Searching (log n )
Max ( n )
I
I
Merge-Sort ( n log n )
Bubble-Sort ( n
2 ) ... or Dynamic Programming n =
Search
Max
10
0.000001
0.000003
10
2
0.000001
0.000005
10
3
0.000001
0.000006
10
4
0.000002
0.000048
10
5
0.000001
0.000387
10
6
0.000002
0.003988
10
7
0.000002
0.040698
10
8
0.000007
9.193987
10
9
0.001871
> 15 min
Merge
Bubble
0.000005
0.000003
0.000030
0.000105
0.000200
0.007848
0.002698
0.812912
0.029566
83.12960
0.484016
∼ 2 hour
7.833908
∼ 9 days
137.9388
-
Complexity Theory:
I
I
I log
: poly log( n ) = log c
P
: poly( n ) = n c n (... need to load data)
(many cool algorithms)
EXP
: exp( n ) = c n (usually hopeless ... but 0 .
00001 n not bad)
I NP
: verify solution in
P
, find solution conjectured
EXP
(If
EXP number parallel machines, then in
P time)
-
Asymptotics
How large (in seconds) is:
I
I
Searching (log n )
Max ( n )
I
I
Merge-Sort ( n log n )
Bubble-Sort ( n
2 ) ... or Dynamic Programming n =
Search
Max
Merge
10
0.000001
0.000003
0.000005
10
2
0.000001
0.000005
0.000030
10
3
0.000001
0.000006
0.000200
10
4
0.000002
0.000048
0.002698
10
5
0.000001
0.000387
0.029566
10
6
0.000002
0.003988
0.484016
10
7
0.000002
0.040698
7.833908
10
8
0.000007
9.193987
137.9388
10
9
0.001871
> 15 min
-
Bubble 0.000003
0.000105
0.007848
0.812912
83.12960
∼ 2 hour ∼ 9 days -
Complexity Theory:
I
I
I log
: poly log( n ) = log c
P
: poly( n ) = n c n (... need to load data)
(many cool algorithms)
EXP
: exp( n ) = c n (usually hopeless ... but 0 .
00001 n not bad)
I NP
: verify solution in
P
, find solution conjectured
EXP
(If
EXP number parallel machines, then in
P time)
Asymptotics
How large (in seconds) is:
I
I
Searching (log n )
Max ( n )
I
I
Merge-Sort ( n log n )
Bubble-Sort ( n
2 ) ... or Dynamic Programming n =
Search
Max
Merge
Bubble
10
0.000001
0.000003
0.000005
0.000003
10
2
0.000001
0.000005
0.000030
0.000105
10
3
0.000001
0.000006
0.000200
0.007848
10
4
0.000002
0.000048
0.002698
0.812912
10
5
0.000001
0.000387
0.029566
83.12960
10
6
0.000002
0.003988
0.484016
∼ 2 hour
10
7
0.000002
0.040698
7.833908
∼ 9 days
10
8
0.000007
9.193987
137.9388
-
10
9
0.001871
> 15 min
-
Complexity Theory:
I
I
I log
: poly log( n ) = log c
P
: poly( n ) = n c n (... need to load data)
(many cool algorithms)
EXP
: exp( n ) = c n (usually hopeless ... but 0 .
00001 n not bad)
I NP
: verify solution in
P
, find solution conjectured
EXP
(If
EXP number parallel machines, then in
P time)
Asymptotics
How large (in seconds) is:
I
I
Searching (log n )
Max ( n )
I
I
Merge-Sort ( n log n )
Bubble-Sort ( n
2 ) ... or Dynamic Programming n =
Search
Max
Merge
Bubble
10
0.000001
0.000003
0.000005
0.000003
10
2
0.000001
0.000005
0.000030
0.000105
10
3
0.000001
0.000006
0.000200
0.007848
10
4
0.000002
0.000048
0.002698
0.812912
10
5
0.000001
0.000387
0.029566
83.12960
10
6
0.000002
0.003988
0.484016
∼ 2 hour
10
7
0.000002
0.040698
7.833908
∼ 9 days
10
8
0.000007
9.193987
137.9388
-
10
9
0.001871
> 15 min
-
Complexity Theory:
I log
: poly log( n ) = log c n (... need to load data)
I
I
I
P
: poly( n ) = n c (many cool algorithms)
EXP
: exp( n ) = c n (usually hopeless ... but 0 .
00001 n not bad)
NP
: verify solution in
P
, find solution conjectured
EXP
(If
EXP number parallel machines, then in
P time)
Asymptotics
How large (in seconds) is:
I
I
Searching (log n )
Max ( n )
I
I
Merge-Sort ( n log n )
Bubble-Sort ( n
2 ) ... or Dynamic Programming n =
Search
Max
Merge
Bubble
10
0.000001
0.000003
0.000005
0.000003
10
2
0.000001
0.000005
0.000030
0.000105
10
3
0.000001
0.000006
0.000200
0.007848
10
4
0.000002
0.000048
0.002698
0.812912
10
5
0.000001
0.000387
0.029566
83.12960
10
6
0.000002
0.003988
0.484016
∼ 2 hour
10
7
0.000002
0.040698
7.833908
∼ 9 days
10
8
0.000007
9.193987
137.9388
-
10
9
0.001871
> 15 min
-
Complexity Theory:
I
I log
: poly log( n ) = log c
P
: poly( n ) = n c n (... need to load data)
(many cool algorithms)
I
I
EXP
: exp( n ) = c n (usually hopeless ... but 0 .
00001 n not bad)
NP
: verify solution in
P
, find solution conjectured
EXP
(If
EXP number parallel machines, then in
P time)
Asymptotics
How large (in seconds) is:
I
I
Searching (log n )
Max ( n )
I
I
Merge-Sort ( n log n )
Bubble-Sort ( n
2 ) ... or Dynamic Programming n =
Search
Max
Merge
Bubble
10
0.000001
0.000003
0.000005
0.000003
10
2
0.000001
0.000005
0.000030
0.000105
10
3
0.000001
0.000006
0.000200
0.007848
10
4
0.000002
0.000048
0.002698
0.812912
10
5
0.000001
0.000387
0.029566
83.12960
10
6
0.000002
0.003988
0.484016
∼ 2 hour
10
7
0.000002
0.040698
7.833908
∼ 9 days
10
8
0.000007
9.193987
137.9388
-
10
9
0.001871
> 15 min
-
Complexity Theory:
I
I
I log
: poly log( n ) = log c
P
: poly( n ) = n c n (... need to load data)
(many cool algorithms)
EXP
: exp( n ) = c n (usually hopeless ... but 0 .
00001 n not bad)
I NP
: verify solution in
P
, find solution conjectured
EXP
(If
EXP number parallel machines, then in
P time)
Asymptotics
How large (in seconds) is:
I
I
Searching (log n )
Max ( n )
I
I
Merge-Sort ( n log n )
Bubble-Sort ( n
2 ) ... or Dynamic Programming n =
Search
Max
Merge
Bubble
10
0.000001
0.000003
0.000005
0.000003
10
2
0.000001
0.000005
0.000030
0.000105
10
3
0.000001
0.000006
0.000200
0.007848
10
4
0.000002
0.000048
0.002698
0.812912
10
5
0.000001
0.000387
0.029566
83.12960
10
6
0.000002
0.003988
0.484016
∼ 2 hour
10
7
0.000002
0.040698
7.833908
∼ 9 days
10
8
0.000007
9.193987
137.9388
-
10
9
0.001871
> 15 min
-
Complexity Theory:
I
I
I log
: poly log( n ) = log c
P
: poly( n ) = n c n (... need to load data)
(many cool algorithms)
EXP
: exp( n ) = c n (usually hopeless ... but 0 .
00001 n not bad)
I NP
: verify solution in
P
, find solution conjectured
EXP
(If
EXP number parallel machines, then in
time)
Data Group
Data Group Meeting
Thursdays @ 12:15-1:30pm in LCR
(to be confirmed) http://datagroup.cs.utah.edu/dbgroup.php