outline • • • • INTRODUCTION PRELIMINARY AND RELATED WORK SORTING BASIC MINIMAL ALGORITHMS IN DATABASES • SLIDING AGGREGATION • EXPERIMENTS • CONCLUSIONS introduction • Motivation Although these principles have guided the design of MapReduce algorithms, the previous practices have mostly been on a best-effort basis, paying relatively less attention to enforcing serious constraints on different performance metrics. introduction • Minimal MapReduce Algorithms Minimum footprint. Bounded net-traffic Constant round Optimal computation introduction • Contributions The core of this work comprises of neat minimal algorithms for two problems: Sorting Sliding Aggregation introduction Sorting Sliding Aggregation related work MapReduce TeraSort Algorithms on MapReduce Relevance to Minimal Algorithms related work-MR Statelessness for Fault Tolerance Some MapReduce implementations (e.g., Hadoop) place the requirement that, at the end of a round, each machine should send all the data in its storage to a distributed file system. related work- TS What's TeraSort? sorting -TS sorting Define Si = S ∩(bi−1, bi], for 1 ≤ i ≤ t. In Round 2, all the objects in Si are gathered by Mi, which sorts them in the reduce phase. For TeraSort to be minimal, it must hold: P1. s = O(m). P2. |Si| = O(m) for all 1 ≤ i ≤ t sorting Pr Discussion Minimality sorting Removing the Broadcast Assumption (by changing round 1) in databases Ranking & Skyline Group by Semi-Join in databases example sliding aggregation The window sum of o equal: w in sum ( o ) , o w indow ( o ) , w (o ) sliding aggregation Sorting with Perfect Balance sliding aggregation Sliding Aggregate Computation experiments -sorting experiments -sorting experiments -skyline