A Model of Computation for MapReduce Karloff, Suri and Vassilvitskii (SODA’10) Presented by Ning Xie Why MapReduce Tera- and petabytes data set (search engines, internet traffic, bioinformatics, etc) Need parallel computing Requirement: easy to program, reliable, distributed What is MapReduce A new framework for parallel computing originally developed at Google (before ’04) Widely adopted and became standard for large scale data analysis Hadoop (open source version) is being used by Yahoo, Facebook, Adobe, IBM, Amazon, and many institutions in academia What is MapReduce (cont.) Three-stage operations: • Map-stage: mapper operates on a single pair <key, value>, outputs any number of new pairs <key’, value’> • Shuffle-stage: all values that are associated to an individual key are sent to a single machine (done by the system) • Reduce-stage: reducer operates on the all the values and outputs a multiset of <key, value> What is MapReduce (cont.) Map operation is stateless (parallel) Shuffle stage is done automatically by the underlying system Reduce stage can only start when all Map operations are done (interleaving between sequential and parallel) An example: kth frequency moment of a large data set Input: x 2Σn , Σ is a finite set of symbols Let f(¾) be the frequency of symbol ¾ note: ¾f(¾)=n Want to compute ¾fk(¾) An example (cont.) Input to each mapper: <i, xi> • M1(<i, xi>)= <xi , i> (i is the index) Input to each reducer: <xi,{i1, i2,…, im}> • R1(<xi,{i1, i2,…, im}>)=<xi, mk> Each mapper: M2(<xi, v>)=<$, v> A single reducer: R2(<$,{v1,…,vl}>=<$, ivi> Formal Definitons A MapReduce program consists of a sequence <M1, R1, M2, R2,…, Ml, Rl> of mappers and reducers The input is U0, a multiset of <key,value> Formal Definitons Execution of the program For r=1,2,…,l 1. feed each <k,v> in Ur-1 to mapper Mr Let the output be U’r 2. for each key k, construct the multiset Vk,r s.t. <k, vi> 2 Ur-1 3. for each k, feed k and some perm. of Vk,r to a separate instance of Rr. Let Ur be the multiset of <key, value> generated by Rr The MapReduce Class (MRC) On input {<key,value>} of size n • Memory: each mapper/reducer uses O(n1-² ) space • Machines: There are £(n1-²) machines available • Time: each machine runs in time polynomial in n, not polynomial in the length of the input they receive • Randomized algorithms for map and reduce • Rounds: Shuffle is expensive MRCi : num. of rounds=O(login) DMRC: the deterministic variant Comparing MRC with PRAM Most relevant classical computation model is the PRAM (Parallel Random Access Machine) model The corresponding class is NC Easy relation: MRC µ P Lemma: If NC P, then MRC * NC Open question: show that DMRC P Comparing with PRAM (cont.) Simulation lemma: Any CREW (concurrent read exclusive write) PRAM algorithm using O(n2-2²) total memory and O(n2-2²) processors and runs in time t(n) can be simulated by an algorithm in DMRC which runs in O(t(n)) rounds Example: Finding an MST Problem: find the minimum spanning tree of a dense graph The algorithm • Randomly partition the vertices into k parts • For each pair of vertex sets, find the MST of the bipartite subgraph induce by these two sets • Take the union of all the edges in the MST of each pair, call the graph H • Compute an MST of H Finding an MST (cont.) The algorithm is easy to parallelize • The MST of each subgraph can be computed in parallel Why it works? • Theorem: the MST tree of H is an MST of G • Proof: we did not discard any relevant edge when sparsify the input graph G Finding an MST (cont.) Why the algorithm in MRC? • Let N=|V| and m=|E|=N1+c • So input size n satisfies N=n1/1+c • Pick k=Nc/2 • Lemma: with high probability, the size of each bipartite subgraph has size N1+c/2 • so the input to any reducer is n1-² • The size of H is also n1-² Functions Lemma A very useful building block for designing MapReduce algorithms Definition [MRC-parallelizable function]: Let S be a finite set. We say a function f on S is MRC-parallelizable if there are functions g and h so that the following hold: • For any partition of S, S = T1 [ T2 [ …[ Tk • f can be written as: f(S) =h(g(T1), g(T2),… g(Tk)). • g and h each can be represented in O(logn) bits. • g and h can be computed in time polynomial in |S| and all possible outputs of g can be expressed in O(logn) bits. Functions Lemma (cont.) Lemma (Functions Lemma) Let U be a universe of size n and let S = {S1,…, Sk} be a collection of subsets of U, where k· n2-3² and i=1k|Si|· n2-2². Let F = {f1, …, fk} be a collection of MRC-parallelizable functions. Then the output f1(S1), …, fk(Sk) can be computed using O(n1-²) reducers each with O(n1-²) space. Functions Lemma (cont.) The power of the lemma • Algorithm designer may focus only on the structure of the subproblem and the input • Distribution the input across reducer is handled by the lemma (existence theorem) Proof of the lemma is not easy • Use universal hashing • Use Chernoff bound, etc Application of the Functions Lemma: s-t connectivity Problem: given a graph G and two nodes, are they connected in G? Dense graphs: easy, powering adjacency matrix Sparse graphs? A logn-round MapReduce algorithm for s-t connectivity Initially every node is active For i=1,2,…, O(logn) do • Each active node becomes a leader with probability ½ • For each non-leader active node u, find a node v in the neighbor of u’s current connected component • If the connected component of v is non-empty, then u become passive and re-label each node in u’s connected component with v’s label Output TRUE if s and t have the same label, FALSE otherwise Conclusions A rigorous model for MapReduce Very loose requirements for the hardware requirements Call for more research in this direction THANK YOU!