Map-Reduce and Datalog Implementation Distributed File Systems Map-Reduce Join Implementations 1 Humongous Data Problems We are seeing new applications for very large data operations. Web operations, e.g., PageRank. Social network data. Collaborative filtering of commercial data. Result: new infrastructure. Distributed file systems. Map-reduce/Hadoop/Hive/Pig,… 2 Role of Datalog Many operations are remarkably simple. Example: suggest new friends in a social network by looking for violations of transitivity: suggest(X,Y) :- friend(X,Z) & friend(Z,Y) & NOT friend(X,Y) 3 Scale of the Problem FaceBook has 250 million subscribers, each with about 300 friends. Self join of friends with itself could have 22.5 trillion tuples. But because of “locality,” the size would be less by a factor of perhaps 10–100. 4 Distributed File Systems To deal with computations of this size, companies use large collections of commodity servers. Both for storage and for computing. • Often the same servers. Files are stored in chunks, typically 64MB. Chunks are replicated, typically 3 times. 5 Cluster Computing Racks of compute nodes, interconnected, e.g., by gigabit Ethernet. New element: computations involve so much work, that a node failure is common. Map-reduce (Hadoop) is a framework for dealing effectively with node failure, as well as simplifying certain calculations. 6 Map-Reduce You write two functions, Map and Reduce. Several Map tasks and Reduce tasks implement these functions. Each Map task gets one or more chunks of input data from a distributed file system. 7 Map-Reduce – (2) Map tasks turn input into a list of keyvalue pairs. But “keys” are not unique. A master controller assigns each key, and all output from Map tasks with that key, to one of the Reduce tasks. Reduce tasks apply some operation to the values associated with one key. 8 Graph of Map and Reduce Tasks Input Map Reduce Map Reduce . . . . . Map Output Reduce 9 Example: Join by Map-Reduce Answer(X,Y) :- r(X,Z) & s(Z,Y) Map takes each tuple from r, say r(x,z), and produces the key-value pair [z, (r,x)]. From tuple s(z,y), Map produces keyvalue pair [z, (s,y)]. Thus, all tuples r(x,z) and s(z,y) go to the same Reduce task. 10 Join by Map-Reduce – (2) The Reduce tasks perform a standard join on all the r- and s-tuples they receive. Output is the union of the results of all Reduce tasks. 11 Coping With Failures Because Every Map and Reduce task receives all its input at the beginning, Every Map and Reduce task finishes by handing its complete output to the master controller. Any task at a failed node can be restarted without affecting any other task. 12 Multiway Join Via Map-Reduce From Afrati/Ullman in EDBT-2010. Useful for Datalog evaluation because: Bodies often have more than two subgoals. Seminaive evaluation can involve a complex expression with many relations and their increments (next talk). Normal procedure is to take a cascade of two-way joins. 13 Multiway Join – (2) Sometimes, it is more efficient to replicate tuples to several Reduce tasks. 1. When relations have large fan-out. Examples: “friends” or links on the Web. 2. Star joins. Join of a large fact table with smaller dimension tables. Intuition: wins when intermediate joins would be large. 14 Multiway Join – (3) Assume k Reduce tasks. Certain variables get shares of a hash function that maps to k buckets. Product of the shares = k. If variable X has share x, then each Xvalue is hashed to one of x hash keys. Hash key of a Reduce task = vector of hash values for each variable with a share. 15 Example: Multiway Join Answer(W,X,Y,Z) :- r(W,X) & s(X,Y) & t(Y,Z) Only X and Y get a share, say xy = k. Theorem: never give a share to a dominated variable. • = variable that appears only where some other variable also appears. Tuple s(a,b) goes only to Reduce task [h(a), h’(b)]. 16 Example: Multiway Join – (2) Answer(W,X,Y,Z) :- r(W,X) & s(X,Y) & t(Y,Z) However, tuple r(a,b) must go to all Reduce tasks [h(b), n] where n is any of y different hash values. Similarly, tuple t(a,b) must go to all Reduce tasks [m, h’(a)], where m is any of x different hash values. 17 Example: Multiway Join – (3) Answer(W,X,Y,Z) :- r(W,X) & s(X,Y) & t(Y,Z) To minimize the number of tuples transmitted, pick: x = k|r|/|t| y = k|t|/|r| Intuition: costs distributing tuples of r and t are the same. 18 Summary of Afrati/Ullman EDBT-2010 It is possible to find the optimum shares for variables for any join. Usually, the process is a straightforward Lagrangean analysis. In pathological cases, an exponential search appears necessary. Constraining to positive integers and adjusting the product add complexity. 19