CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design Hector Garcia-Molina CS 347 Notes 02 1 Distributed DB Design Chapter 5 Ozsu & Valduriez Top-down approach: - have DB… - how to split and allocate the sites Multi-DBs (or bottom-up): no design issues! CS 347 Notes 02 2 Two issues in DDB design: • Fragmentation • Allocation Note: issues not independent, but will cover separately CS 347 Notes 02 3 Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sb and… and ... CS 347 Notes 02 4 Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sb and… and ... Motivation: Two sites: Sa, Sb Qa Sa Sb Qb CS 347 Notes 02 5 • It does not take a rocket scientist to figure out fragmentation... CS 347 Notes 02 6 # NM Loc Sal Joe Sally Tom .. 5 7 8 Sa 10 Sb 25 Sa 15 .. E F # NM Loc Sal # NM Loc Sal 5 8 7 Sb 25 At Sb At Sa CS 347 Sally .. Sa 10 Sa 15 .. Joe Tom Notes 02 7 F = { F1, F2 } F1 = CS 347 loc=Sa E F2 = Notes 02 loc=Sb E 8 F = { F1, F2 } F1 = loc=Sa E F2 = loc=Sb E called primary horizontal fragmentation CS 347 Notes 02 9 Fragmentation • Horizontal R Primary depends on local attributes Derived depends on foreign relation • Vertical R CS 347 Notes 02 10 Fragmentation • Horizontal R Primary depends on local attributes Derived depends on foreign relation • Vertical R CS 347 Fragmentation also called Sharding Notes 02 11 Three common horizontal partitioning techniques • Round robin • Hash partitioning • Range partitioning CS 347 Notes 02 12 • Round robin R t1 t2 t3 t4 ... D0 t1 D1 D2 t2 t4 t3 t5 • Evenly distributes data • Good for scanning full relation • Not good for point or range queries CS 347 Notes 02 13 • Hash partitioning R t1h(k1)=2 t2h(k2)=0 t3h(k3)=0 t4h(k4)=1 ... D0 D1 t2 t3 D2 t1 t4 • Good for point queries on key; also for joins • Not good for range queries; point queries not on key • If hash function good, even distribution CS 347 Notes 02 14 • Range partitioning R t1: t2: t3: t4: ... A=5 A=8 A=2 A=3 partitionin g vector 4 7 D0 t3 t4 V0 V1 D1 t1 D2 t2 • Good for some range queries on A • Need to select good vector: else unbalance data skew execution skew CS 347 Notes 02 15 Which are good fragmentations? Example: F = { F1, F2 } F1 = CS 347 sal<10 E F2 = Notes 02 sal>20 E 16 Which are good fragmentations? Example: F = { F1, F2 } F1 = sal<10 CS 347 E F2 = sal>20 E Problem: Some tuples lost! Notes 02 17 Which are good fragmentations? Second example: F = { F3, F4 } F3 = CS 347 sal<10 E F4 = Notes 02 sal>5 E 18 Which are good fragmentations? Second example: F = { F3, F4 } F3 = sal<10 E F4 = sal>5 E Tuples with 5 < sal < 10 are duplicated... CS 347 Notes 02 19 Prefer to deal with replication explicitly Example: F = F5 = 7 F = { F5, F6, F7 } sal 5 E sal 10 E F6 = 5< sal <10 E Then replicate F6 if convenient (part of allocation problem) CS 347 Notes 02 20 Desired properties for horizontal fragmentation R F ={ F1, F2, … } (1) Completeness t R, Fi F such that t Fi CS 347 Notes 02 21 (2) Disjointness t Fi, Fj such that tFj, i j, Fi, Fj F (3) Reconstruction - ignore CS 347 Notes 02 22 How do we get completeness and disjointness? (1) Check it “manually”! e.g., F1 = CS 347 sal<10 E Notes 02 ; F2 = sal10 E 23 How do we get completeness and disjointness? (2) “Automatically” generate fragments with these properties Desired simple predicates Fragments CS 347 Notes 02 24 Example of generation • Say queries use predicates: A<10, A>5, Loc = SA, Loc = SB • Next: CS 347 - generate “minterm” predicates - eliminate useless ones Notes 02 25 Minterm predicates (part I) (1) (2) (3) (4) (5) (6) (7) (8) A<10 A<10 A<10 A<10 A<10 A<10 A<10 A<10 CS 347 A>5 Loc=SA Loc=SB A>5 Loc=SA ¬(Loc=SB) A>5 ¬(Loc=SA) Loc=SB A>5 ¬(Loc=SA) ¬(Loc=SB) ¬(A>5) Loc=SA Loc=SB ¬(A>5) Loc=SA ¬(Loc=SB) ¬(A>5) ¬(Loc=SA) Loc=SB ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) Notes 02 26 Minterm predicates (part I) (1) (2) (3) (4) (5) (6) (7) (8) A<10 A<10 A<10 A<10 A<10 A<10 A<10 A<10 CS 347 A>5 Loc=SA Loc=SB A>5 Loc=SA ¬(Loc=SB) A>5 ¬(Loc=SA) Loc=SB A>5 ¬(Loc=SA) ¬(Loc=SB) ¬(A>5) Loc=SA Loc=SB ¬(A>5) Loc=SA ¬(Loc=SB) ¬(A>5) ¬(Loc=SA) Loc=SB ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) Notes 02 27 5 < A < 10 Minterm predicates (part I) (1) (2) (3) (4) (5) (6) (7) (8) A<10 A<10 A<10 A<10 A<10 A<10 A<10 A<10 A>5 Loc=SA Loc=SB A>5 Loc=SA ¬(Loc=SB) A>5 ¬(Loc=SA) Loc=SB A>5 ¬(Loc=SA) ¬(Loc=SB) ¬(A>5) Loc=SA Loc=SB ¬(A>5) Loc=SA ¬(Loc=SB) ¬(A>5) ¬(Loc=SA) Loc=SB ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) A5 CS 347 Notes 02 28 Minterm predicates (part II) (9) (10) (11) (12) (13) (14) (15) (16) ¬(A<10) A>5 Loc=SA Loc=SB ¬(A<10) A>5 Loc=SA ¬(Loc=SB) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB) ¬(A<10) ¬(A>5) Loc=SA Loc=SB ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) CS 347 Notes 02 29 Minterm predicates (part II) (9) (10) (11) (12) (13) (14) (15) (16) ¬(A<10) A>5 Loc=SA Loc=SB ¬(A<10) A>5 Loc=SA ¬(Loc=SB) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB) ¬(A<10) ¬(A>5) Loc=SA Loc=SB ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) A 10 CS 347 Notes 02 30 Final fragments: F2: F3: F6: F7: F10: F11: CS 347 5 < A < 10 5 < A < 10 A5 A5 A 10 A 10 Notes 02 Loc=SA Loc=SB Loc=SA Loc=SB Loc=SA Loc=SB 31 Note: elimination of useless fragments depends on application semantics: e.g.: if LOC could be SA, SB, we need to add fragments F4: 5 <A <10 Loc SA Loc SB F8: A 5 Loc SA Loc SB F12: A 10 Loc SA Loc SB CS 347 Notes 02 32 Why does this work? Predicates: p1 p2 p3 p4 p1 p2 p3 ¬ p4 ... ¬ p1 ¬ p2 ¬ p3 ¬ p4 CS 347 Notes 02 33 (1) Completeness: Take t R pi(t) must be T or F! Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F Then t is in fragment with predicate p1 p2 ¬ p3 ¬ p4 CS 347 Notes 02 34 (2) Disjointness Say t Fragment p1 p2 ¬ p3 ¬ p4 Then: p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F t cannot be in any other fragment! CS 347 Notes 02 35 Summary • Given simple predicates Pr= { p1, p2,.. pm } minterm predicates are M={m | m = p *, p P k k r 1 k m } where pk* is pk or is ¬ pk • Fragments m R for all m M are complete and disjoint CS 347 Notes 02 36 Another Desired Fragmentation Property: Match Access Patterns data A frequently accessed together CS 347 data B data C Notes 02 try to place in same fragment 37 Return to example: E(#, NM, LOC, SAL,…) Common queries: Qa: select * from E where LOC=Sa and … CS 347 Notes 02 Qb: select * from E where LOC=Sb and ... 38 Three choices: (1) Pr = { } F1 ={ E } (2) Pr = {LOC=Sa, LOC=Sb} F2={ loc=Sa E, loc=Sb E} (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} F3={ loc=Sa sal<10 E, loc=Sa sal10 E, loc=Sb sal<10E, loc=Sb sal10 E } CS 347 Notes 02 39 In other words: Qa: Select … loc = Sa ... Loc=Sa sal < 10 Qb: Select … loc = Sb ... Loc=Sa sal 10 F1 F2 F3 Loc=Sb sal < 10 Loc=Sb sal 10 CS 347 Notes 02 40 In other words: Qa: Select … loc = Sa ... Loc=Sa sal < 10 Qb: Select … loc = Sb ... Loc=Sa sal 10 F1 F2 F3 F2 is good… (not F1 , F3 ) Loc=Sb sal < 10 Loc=Sb sal 10 CS 347 Notes 02 41 Derived horizontal fragmentation Example: E(#, NM, SAL, LOC) F={ E1, E2} by LOC J(#, DES,…) Common query for project: [Given employee name, list projects (s)he works in] CS 347 Notes 02 42 # E1 5 8 … NM Joe Tom Loc Sa Sa Sal 10 15 E2 # 7 12 … (at Sa) J CS 347 NM Sally Fred Loc Sb Sb Sal 25 15 (at Sb) # 5 7 5 12 … Description work on 347 hw go to moon build table rest Notes 02 43 # E1 5 8 … NM Joe Tom Loc Sa Sa Sal 10 15 E2 # 7 12 … NM Sally Fred (at Sa) J1 # 5 5 … J1 = J CS 347 Des work on 347 hw build table Loc Sb Sb Sal 25 15 (at Sb) J2 E1 # 7 12 … J2 = J Notes 02 Des go to moon rest E2 44 Derived horizontal fragmentation R, F = { F1, F2, ... Fn} S, F could be primary or derived D = {D1, D2, …Dn} where Di =S Fi Convention: R is owner S is member CS 347 Notes 02 45 • Checking completeness and disjointness of derived fragmentation Example: Say J is: # … 33 … Des build chair But no #= 33 in E1 nor in E2! This J tuple will not be in J1 nor J2 Fragmentation not complete CS 347 Notes 02 46 To get completeness Need to enforce referential integrity constraint: join attr(#) of member relation joint attr(#) of owner relation CS 347 Notes 02 47 Example: E1 # 5 … NM Joe Loc Sa J J1 # 5 … CS 347 # 5 … Sal 10 E2 # 5 … NM Fred Description day off Description day off J2 Notes 02 Loc Sb Sal 20 Fragmentation is not disjoint! # 5 … Description day off 48 To get disjointness CS 347 Join attribute(#) should be key of owner relation Notes 02 49 Summary: horizontal fragmentation • Type: primary, derived • Properties: completeness, disjointness CS 347 Notes 02 50 Vertical fragmentation Example: E1 CS 347 # 5 7 8 … NM Joe Sally Fred E # 5 7 8 … NM Joe Sally Fred Loc Sa Sb Sa E2 Loc Sa Sb Sa Notes 02 Sal 10 25 15 # 5 7 8 … Sal 10 25 15 51 R1[T1] Ti T ... R[T] Rn[Tn] Just like normalization of relations CS 347 Notes 02 52 Properties: R[T] Ri[Ti] (1) Completeness U Ti = T all i CS 347 Notes 02 53 (2) Disjointness Ti Tj = for all i,j ij E1(#,LOC) E(#,LOC,SAL) E2(SAL) CS 347 Notes 02 54 (2) Disjointness Ti Tj = for all i,j ij E1(#,LOC) E(#,LOC,SAL) E2(SAL) Not a desirable property!! (could not reconstruct R!) CS 347 Notes 02 55 (3) Lossless join all i Ri = R One way to achieve lossless join: Repeat key in all fragments, i.e., Key Ti for all i CS 347 Notes 02 56 How do we decide what attributes are grouped with which? E1(#,NM,LOC) E2(#,SAL) Example: E(#,NM,LOC,SAL) E1(#,NM) E2(#,LOC) E3(#,SAL) ? CS 347 Notes 02 57 Attribute affinity matrix A1 A2 A3 A4 A5 CS 347 A1 A2 A3 A4 A5 - - - - - 50 - - - - 45 48 - - - 1 2 0 - - 0 0 4 75 - Notes 02 58 Attribute affinity matrix A1 A2 A3 A4 A5 A1 A2 A3 A5 - - - - - 50 - - - - 45 48 - - - 1 2 0 - - 0 0 4 75 - R1[K,A1,A2,A3] CS 347 A4 R2[K,A4,A5] Notes 02 59 • Textbook (Ozsu & Valduriez) discusses – How to build affinity matrix – How to identify attribute clusters – How to partition relation • You are not responsible for – Clustering and partitioning algorithms (i.e., Skip pages 135-145) CS 347 Notes 02 60 Allocation Example: E(#,NM,LOC,SAL) F1 = loc=Sa E ; F2 = loc=Sb E Qa: select … where loc=Sa... Qb: select … where loc=Sb… Where do F1,F2 go? Site a CS 347 Site b ? Notes 02 61 Issues • Where do queries originate • What is communication cost? and size of answers, relations,… • What is storage capacity, cost at sites? and size of fragments? • What is processing power at sites? CS 347 Notes 02 62 More Issues • What is query processing strategy? – How are joins done? – Where are answers collected? CS 347 Notes 02 63 Do we replicate fragments? • Cost of updating copies? • Writes and concurrency control? • ... CS 347 Notes 02 64 Optimization problem: • What is best placement of fragments and/or best number of copies to: – minimize query response time – maximize throughput – minimize “some cost” – ... • Subject to constraints? – Available storage – Available bandwidth, power,… – Keep 90% of response time below X – ... CS 347 Notes 02 65 Optimization problem: • What is best placement of fragments and/or best number of copies to: – minimize query response time – maximize throughput This is an incredibly – minimize “some cost” hard problem – ... • Subject to constraints? – Available storage – Available bandwidth, power,… – Keep 90% of response time below X – ... CS 347 Notes 02 66 Example: Single fragment F m Read cost: [ti MIN Cij] i=1 j i: Originating site of request ti: Read traffic at Si Cij: Retrieval cost Accessing fragment F at Sj from Si CS 347 Notes 02 67 Scenario - Read cost 1 . F 2 . F ci,1 3 . ci,3 ci,2 . C=inf . CS 347 F . C=inf i C=inf Notes 02 Stream of read requests for F ti REQ/SEC 68 Write cost m m Xj ui C’ij i=1 j=1 i: Originating site of request j: Site being updated Xj: 0 if F not stored at Sj 1 if F stored at Sj ui: Write traffic at Si C’ij: Write cost Updating F at Sj from Si CS 347 Notes 02 69 Scenario - write cost F . F CS 347 F . . . . . i Notes 02 Updates ui updates/sec 70 Storage cost: m i=1 Xi: d i: CS 347 Xi di 0 if F not stored at Si 1 if F stored at Si storage cost at Si Notes 02 71 Target function: m min i=1 m [ti MIN Cij + j + CS 347 m i=1 j=1 Xj ui C’ij ] Xi d i Notes 02 72 Can add more complications: Examples: - Multiple fragments - Fragment sizes - Concurrency control cost CS 347 Notes 02 73 Case Study: PNUTS • Where in the World is My Data? Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax, Raghu Ramakrishnan, Adam Silberstein, Erwin Tam, Hector Garcia-Molina; VLDB 2011 • Distributed object/tuple store for Yahoo! CS 347 Notes 02 74 Case Study: PNUTS • Issue: Where to locate data • Issue: What and where to replicate CS 347 Notes 02 75 PNUTS Discussion • Dynamic vs Static fragment placement • Caching vs Replication CS 347 Notes 02 76 Policy Constraints • MIN_COPIES: The minimum number of full replicas of the record that must exist. • INCL_LIST: An inclusion list -- the locations where a full replica of the record must exist. • EXCL_LIST: An exclusion list -- the locations where a full replica of the record cannot exist. CS 347 Notes 02 77 Example Rule • Rule 1: • IF TABLE_NAME = "Users“ THEN SET 'MIN_COPIES' = 2 CONSTRAINT_PRI = 0 CS 347 Notes 02 78 Another Example Rule • Rule 2: • IF TABLE_NAME = "Users" AND FIELD STR('home location') = 'France‘ THEN SET 'MIN_COPIES' = 3 AND SET 'EXCL LIST' = 'USWest, USEast‘ CONSTRAINT PRI = 1 CS 347 Notes 02 79 Summary • • • • Description of fragmentation Good fragmentations Design of fragmentation Allocation CS 347 Notes 02 80