Document

advertisement
Jeffrey D. Ullman
Stanford University
Arbitrary Acyclic Flow Among Tasks
Preserving Fault Tolerance
The Blocking Property
2

MapReduce uses only two functions (Map and
Reduce).
 Each is implemented by a rank of tasks.
 Data flows from Map tasks to Reduce tasks only.
3




Natural generalization is to allow any number
of functions, connected in an acyclic network.
Each function implemented by tasks that feed
tasks of successor function(s).
Key fault-tolerance (blocking) property: tasks
produce all their output at the end.
Important point: Map tasks never deliver their
output until completed.
 Thus, we can restart a Map task that failed without
fear that a Reduce task has already used some
output of the failed Map task.
4
1.
2.
3.
4.
5.
6.
Clustera – University of Wisconsin.
Hyracks – Univ. of California/Irvine.
Dryad/DryadLINQ – Microsoft.
Nephele/PACT – T. U. Berlin.
BOOM – Berkeley.
epiC – N. U. Singapore.
5



Relations D(emp, dept) and S(emp, salary).
Compute the sum of the salaries for each
department.
D JOIN S computed by MapReduce.
 But each Reduce task can also group its emp-deptsalary tuples by dept and sum the salaries.

A Third function is needed to take the deptSUM(salary) pairs from each Reduce task,
organize them by dept, and compute the final
sum for each department.
6
D
Final
Group +
Aggregate
Join +
Group
Tasks
Map
Tasks
S
Hash
by
emp
Hash
by
dept
7
Transitive-Closure Example
Fault-Tolerance Problem
Endgame Problem
Some Systems and Approaches
8
1.
2.
3.
4.
PageRank, the original map-reduce application
is really a recursion implemented by many
rounds of map-reduce.
Analysis of social networks.
Many machine-learning algorithms, e.g.,
gradient descent.
PDE’s.
9

Many recursive applications involving large
data are similar to transitive closure :
Path(X,Y) :- Arc(X,Y)
Path(X,Y) :- Path(X,Z) & Path(Z,Y)
Path(X,Y) :- Arc(X,Y)
Path(X,Y) :- Arc(X,Z) & Path(Z,Y)
Nonlinear. Takes
log n rounds on an
n-node graph.
(Right) Linear. Takes
n rounds on an n-node
graph.
10





Use k tasks.
Nonlinear recursion used here.
Hash function h sends each node of the graph to
one of the k tasks.
Task i receives and stores Path(a,b) if either h(a)
= i or h(b) = i, or both.
Task i must join Path(a,c) with Path(c,b) if h(c) = i.
11


Data is stored as relation Arc(a,b).
“Map” tasks read chunks of the Arc relation and
send each tuple Arc(a,b) to recursive tasks h(a)
and h(b).
 Treated as if it were tuple Path(a,b).
 If h(a) = h(b), only one task receives.
12
Path(a,b)
received
Store
Path(a,b)
if new.
Otherwise,
ignore.
Task i
Send Path(a,c) to
tasks h(a) and h(c);
send Path(d,b) to
tasks h(d) and h(b)
Look up
Path(b,c) and/or
Path(d,a) for
any c and d
13



MapReduce depends on the blocking property.
Only then can you restart a failed task without
restarting the whole job.
But any recursive task has to deliver some
output and later get more input.
14

Iterates Hadoop, once for each round of the
recursion.
 Uses Hadoop blocking-based fault tolerance.



Similar idea: Twister (U. Indiana).
HaLoop tries to run each task in round i at a
compute node where it can find its needed
output from round i – 1.
Also partitions and stores locally a file that is
used at each round.
 Example: Arc in Path(X,Y) :- Arc(X,Z) & Path(Z,Y)
15


Views all computation as a recursion on some
graph.
Nodes send messages to one another.
 Messages bunched into supersteps, where each
node processes all data received.
 Sending individual messages would result in far too
much overhead.


Checkpoint all compute nodes after some
fixed number of supersteps.
On failure, rolls all tasks back to previous
checkpoint.
16
Is this the
shortest path from
M I know about?
If so …
I found a path
from node M to
you of length L
Node
N
I found a path
from node M to
you of length L+5
5
3
I found a path
from node M to
you of length L+6
table of
shortest
paths
to N
6
I found a path
from node M to
you of length L+3
17


Giraph: open-source Pregel.
GraphLab: similar system that deals more
effectively with nodes of high degree.
 Will split the work for such a graph node among
several compute nodes.
18


Some recursive applications allow restart of
tasks even if they have produced some
output.
Example: TC is idempotent; you can send a
task a duplicate Path fact without altering the
result.
 But if you were counting paths, the answer would
be wrong.
19

Some recursions, like TC, take a large number
of rounds, but the number of new discoveries
in later rounds drops.
 T. Vassilakis: searches forward on the Web graph
can take hundreds of rounds.

Problem: in a cluster, transmitting small files
carries much overhead.
20



Decide when to migrate tasks to fewer
compute nodes.
Data for several tasks at the same node are
combined into a single file and distributed at
the receiving end.
Downside: old tasks have a lot of state to
move.

Example: “paths seen so far.”
21

Nonlinear recursions can terminate in many
fewer steps than equivalent linear recursions.
 Avoids the endgame problem.

Example: TC.
 O(n) rounds on n-node graph for linear.
 O(log n) rounds for nonlinear.
22


The communication cost (= sum of input sizes
of all tasks) for executing linear TC is generally
lower than that for nonlinear TC.
Why? Each path is discovered only once
(unique-decomposition property).
 Note: distinct paths between the same endpoints
may each be discovered.
23
24
25

(Valduriez-Boral, Ioannides) Construct a path
from two paths:
1. The first has a length that is a power of 2.
2. The second is no longer than the first.
26
27


You can have the unique-decomposition
property with many variants of nonlinear TC.
Example: Balance constructs paths from two
equal-length paths.
 Favor first path when length is odd.
28
29


On different graphs, any of the uniquedecomposition algorithms – left-linear, rightlinear, smart, balanced – could have the lowest
data-volume cost.
Other unique-decomposition algorithms are
possible and also could win.
30


Can you avoid the endgame problem by
converting any linear recursion into an
equivalent nonlinear recursion that requires
logarithmic rounds?
Answer: Not always, without increasing arity
and data size.
31
1.
2.
(Agarwal, Jagadish, Ness) All linear Datalog
recursions reduce to TC.
Right-linear chain-rule Datalog programs can
be replaced by nonlinear recursions with the
same arity, logarithmic rounds, and the
unique-decomposition property.
Each subgoal shares variables
only with the next, in a circular
sense that includes the head.
32
P(X,Y) :- Blue(X,Y)
P(X,Y) :- Blue(X,Z) & Q(Z,Y)
Q(X,Y) :- Red(X,Z) & P(Z,Y)
33



Reach(X) :- Source(X)
Reach(X) :- Reach(Y) & Arc(Y,X)
Takes linear rounds as stated.
Can compute nonlinear TC to get Reach in
O(log n) rounds.
But, then you compute O(n2) facts instead of
O(n) facts on an n-node graph.
34

Theorem: If you compute Reach using only
unary recursive predicates, then it must take
(n) rounds on a graph of n nodes.
 Proof uses the ideas of Afrati, Cosmodakis, and
Yannakakis from a generation ago.
35


Key problems are “endgame” and
nonblocking nature of recursive tasks.
In some applications, endgame problem can
be handled by using a nonlinear recursion
that requires O(log n) rounds and has the
unique-decomposition property.
36
1.
2.
3.
How do you best support fault tolerance
when tasks are nonblocking?
How do you manage tasks when the
endgame problem cannot be avoided?
When can you replace linear recursion with
nonlinear recursion requiring many fewer
rounds, (roughly) the same communication
cost, and (roughly) the same number of
facts discovered?
37
Download