Minimal MapReduce Algorithms

advertisement
outline
•
•
•
•
INTRODUCTION
PRELIMINARY AND RELATED WORK
SORTING
BASIC MINIMAL ALGORITHMS IN
DATABASES
• SLIDING AGGREGATION
• EXPERIMENTS
• CONCLUSIONS
introduction
• Motivation
Although these principles have guided
the design of MapReduce algorithms,
the previous practices have mostly been
on a best-effort basis, paying relatively
less attention to enforcing serious
constraints on different performance
metrics.
introduction
• Minimal MapReduce Algorithms
Minimum footprint.
Bounded net-traffic
Constant round
Optimal computation
introduction
• Contributions
The core of this work comprises of neat
minimal algorithms for two problems:
Sorting
Sliding Aggregation
introduction
Sorting
Sliding Aggregation
related work
MapReduce
TeraSort
Algorithms on MapReduce
Relevance to Minimal Algorithms
related work-MR
Statelessness for Fault Tolerance
Some MapReduce implementations (e.g.,
Hadoop) place the requirement that, at
the end of a round, each machine should
send all the data in its storage to a
distributed file system.
related work- TS
What's TeraSort?
sorting
-TS

sorting
Define Si = S ∩(bi−1, bi], for 1 ≤ i ≤ t. In
Round 2, all the objects in Si are gathered
by Mi, which sorts them in the reduce
phase. For TeraSort to be minimal, it must
hold:
P1. s = O(m).
P2. |Si| = O(m) for all 1 ≤ i ≤ t

sorting
Pr
Discussion
Minimality
sorting
Removing the Broadcast Assumption
(by changing round 1)
in databases
Ranking & Skyline
Group by
Semi-Join
in databases
example
sliding aggregation
The window sum of o equal:

w in  sum ( o ) 
,
o  w indow ( o )
,
w (o )
sliding aggregation
Sorting with Perfect Balance
sliding aggregation
Sliding Aggregate Computation
experiments
-sorting
experiments
-sorting
experiments
-skyline
Download