5. Distributed Database Design Chapter 3 Distributed Database Design 1 Outline Introduction Fragmentation (片段划分) Horizontal fragmentation (水平划分) Derived horizontal fragmentation (导出式水平划分) Vertical fragmentation (垂直划分) Allocation (片段分配) 2 Outline Introduction Fragmentation (片段划分) Horizontal fragmentation (水平划分) Derived horizontal fragmentation (导出式水平划分) Vertical fragmentation (垂直划分) Allocation (片段分配) 3 Distributed Database Design The design of a distributed computer system involves making decisions on the placement of data and programs across the sites of a computer network. In distributed DBMSs, such placement involves two things: Placement of the DDBMS software Placement of the applications that run on the database The course concentrates on distribution of data. The distribution of DDBMS and applications are given a priori. 4 Framework of Distribution Behavior of access pattern Dynamic Static Complete Information No sharing Data sharing Data + program sharing Level of sharing Level of knowledge on access pattern behavior Partial Information 5 Design Strategies Top-Down Mostly in designing systems from scratch Mostly in designing homogeneous systems Bottom-up When the databases already exist at a number of sites Combining both 6 Top-Down Design Process User Input Requirement Analysis User Input Objectives Conceptual Design GCS View Integration View Design Access Information Distribution Design ES’s User Input LCS’s Physical Design LIS’s 7 Distribution Design Issues Fragmentation Why fragmentation at all? How should we fragment? How much should we fragment? How to test correctness of decomposition? Allocation Necessary information required for fragmentation and allocation 8 Why Fragment? Can’t we just distribute relations? What is a reasonable unit of distribution? Relation – Views are subsets of relations locality – Extra communication cost Fragments of relations (sub-relations) 9 Fragmentation Unit of distribution = unit of data application accesses Reduce irrelevant data access Facilitate intra-query concurrency over different fragments Can be used with other performance enhancing methods (e.g., indexing and clustering) Applications have conflicting requirements, making disjoint fragmentation a very hard problem Multiple fragment access requires join or union Semantic data control (integrity enforcement) could be very costly 10 About fragmentation How should we fragment? Vertical Fragments sub grouping of attributes Horizontal Fragments sub grouping of tuples Mixed/Hybrid Fragments combination of above two How much to fragment? Too little too much of irrelevant data access Too much too much processing cost Need to find suitable level of fragmentation 11 Correctness Criteria Completeness no loss of data Decomposition of relation R into fragments R1, R2, …, Rn is complete if and only if each data item in R can also be found in some Ri . Reconstruction If relation R is decomposed into fragments R1, R2, …, Rn then there should exist some relational operator such that R = 1 i n Ri Disjointness If relation R is decomposed into fragments R1, R2, …, Rn and data item di is in Rj, then di should not be in any other fragment Rk (k!=j). 12 Allocation Alternatives Full Replication Each fragment resides at each site Partial Replication Each fragment resides at some of the sites Not-replicated (Partitioned) Each fragment resides at only one site Q & A: What are the advantages and disadvantages? 13 Allocation Alternatives Full-Replication Partial -replication Partitioning Query Processing Directory Management Concurrency Control Reliability Reality Easy Same Difficulty Easy or non-existent Same Difficulty Moderate Difficult Easy High Possible application High Low Possible application Realistic 14 Rule of Thumb for Allocation If number-of-read-only-queries is more than number-of-update-queries, then replication is advantageous, otherwise replication may cause problems. 15 Information Requirements Four categories Database information Application information Communication network information Computer system information 16 Outline Introduction Fragmentation (片段划分) Horizontal fragmentation (水平划分) Derived horizontal fragmentation (导出式水平划分) Vertical fragmentation (垂直划分) Allocation (片段分配) 17 Fragmentation Horizontal fragmentation (HF) Primary horizontal fragmentation (PHF) – based on predicates accessing the relation Derived horizontal fragmentation (DHF) – based on predicates being defined on another logically related relation We shall first study algorithm for horizontal fragmentation, and then study issues related to derived horizontal fragmentation. Vertical fragmentation (VF) Hybrid fragmentation (HVF) 18 PHF Information Requirements Database Information > links PAY Title, Sal EMP L1 PROJ Eno,Ename, Title L2 Jno, Jname, Budget. Loc L3 ASG Eno, Jno, Resp, Dur > Cardinality of each relation: card (R) 19 PHF Information Requirements (cont.) Application Information 1 Simple predicate: Given R(A1 , A2 , ..., An ), with each Ai having domain of values dom(Ai), a simple predicate pj is pj : Ai Value where {<, >, , , } and Value dom(Ai). Example Jname=“maintenance” Budget 200000 Follow ``80/20'' rule 20 PHF Information Requirements (cont.) Application Information 2 minterm predicate: Given R and a set of simple predicates Pr = {p1, p2, ..., pm} on R, the set of minterm predicates M = {m 1, m 2 , ..., m z } is defined as M = {mi | mi = Pj Pr p*j }, 1 i z, 1 j z where p*j = pj or ¬pj Example m1: m2: m3: m4: (Jname=“maintenance”) (Budget 200000) ¬ (Jname=“maintenance”) (Budget 200000) (Jname=“maintenance”) ¬ (Budget 200000) ¬ (Jname=“maintenance”) ¬ (Budget 200000) 21 PHF Information Requirements (cont.) Application Information 3 Minterm selectivity: sel (mi) – number of tuples of the relation that would be accessed by a user query specified according to a given minterm predicate. Access frequency: acc (qi) – frequency with which user applications access data. If Q = {q1 , q2 , ..., qn } is the set of queries, acc (qi ) indicates access frequency of query qi in a given period. 22 Primary Horizontal Fragmentation Each horizontal fragment Ri of relation R is defined by Ri = Fi (R), 1 i w, where Fi is a selection formula, which is (preferably) a minterm predicate. A horizontal fragment Ri of relation R consists of all the tuples of R which satisfy a minterm predicate mi. Given a set of minterm predicates M, there are as many as horizontal fragments of relation R as there are minterm predicates. Set of horizontal fragments also referred to as minterm fragments 23 PHF - Algorithm Input: A relation R, a set of simple predicates Pr Output: The set of fragments of R = {R1 , R2 , ..., Rw }, which obey the fragmentation rules. Preliminaries: Pr should be complete Pr should be minimal 24 Completeness of Simple Predicates A set of simple predicates Pr is said to be complete if and only if the accesses to the tuples of the minterm fragments defined on Pr requires that two tuples of the same minterm fragment have the same probability of being accessed by any application. 25 Completeness of Simple Predicates Example: PNO PNAME BUDGET LOC P1 Instrumentation 150000 Mntreal P2 Database 135000 New York P3 CAD/CAM 250000 New York P4 Maintenance 310000 Paris Applications: Q1: Find the projects at each location Q2: Find projects with budget less than $200,000 incomplete Predicates: Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”} Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”, BUDGET 2000000, BUDGET >200000} 26 Minimality of Simple Predicates If a predicate influences how fragmentation is performed, (i.e., causes a fragment f to be further fragmented into, say, fi and fj), then there should be at least one application that accesses fi and fj differently. In other words, the simple predicate should be relevant in determining a fragmentation. If all the predicates of a set Pr are relevant, then Pr is minimal. 27 Minimality of Simple Predicates Example Applications: Q1: Find the projects at each location Q2: Find projects with budget less than $200,000 Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”, BUDGET 2000000, BUDGET >200000} minimal Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”, BUDGET 2000000, BUDGET >200000, Minimal?? PNAME=“Instrumentation”} 28 COM_MIN Algorithm Input: A relation R and a set of simple predicates Pr Output: A complete and minimal set of simple predicates Pr ’ for Pr Rule 1: A relation or fragment is partitioned into at least two parts which are accessed differently by at least one application. 29 COM_MIN Algorithm (cont.) 1. Initialization • Find a pi Pr such that pi partitions R according to Rule 1 • Set Pr’ = pi ; Pr Pr - pi ; F fi 2. Iteratively add predicates to Pr ’ until it is complete • Find a pj Pr such that pj partitions some fk defined according to minterm predicate over Pr’ according to Rule 1 • Pr’ pj ; Pr = Pr - pj ; F fj • If pk Pr ’ which is nonrelevant, then Pr ’ = Pr ’ - pk ; F F - fk 30 PHORIZONTAL Algorithm Make use of COM_MIN to perform fragmentation Input: A relation R and a set of simple predicates Pr Output: A set of minterm predicates M according to which relation R is to be fragmented Steps: Pr ’ COM_MIN(R, Pr) Determine the set M of minterm predicates Determine the set I of implications among pi Pr ’ Eliminate the contradictory minterms from M 31 Contradictory Minterms Given a minimal and complete set of simple predicates, containing n simple predicates Not all the minterm fragments derived are valid A fragment can be self contradictory because of implications among simple predicates. Example: Dom(Sal): [10000, 200000]; Dom(Loc) = {HK,SF} p1 : sal < 50000; p2 : Loc = HK; p3 : Loc = SF Note: p2 (¬ p3 ); p3 (¬ p2 ) the minterm p1 p2 p3 is self contradictory. 32 PHORIZONTAL Algorithm (cont.) Input: relation R and a set of simple predicates Pr Output: a set of minterm fragments M begin Pr’ = COM-MIN(R, Pr ); M = set of minterm predicates from Pr’ I = set of implications among pi Pr’ for each mi M if mi is contradictory according to I then M = M mi end 33 PHF Example: PAY PAY Application: Check the salary info and determine raise Employee records kept at two sites ==> application runs at two sites TITLE Mech. Eng. Programmer Elec. Eng. Syst. Anal. SAL 27000 24000 40000 34000 Simple predicates: P1: SAL 30000, P2: SAL > 30000 Pr = {P1, P2}, which is complete and minimal Pr’ = Pr Minterm predicates: m1: (SAL 30000); m2: (SAL > 30000) PAY1 PAY2 TITLE Mech. Eng. Programmer SAL 27000 24000 TITLE Elec. Eng. Syst. Anal. SAL 40000 34000 34 PHF Example: PROJ PROJ Applications Find the name and budget of projects given their locations – Issued at three sites PNO PNAME BUDGET LOC P1 Instrumentation 150000 Mntreal P2 Database 135000 New York P3 CAD/CAM 250000 New York P4 Maintenance 310000 Paris Access project information according to budget – One site access 200000; other accesses >200000 m1: LOC = “Montreal” BUDGET 200000 m2 : LOC = “Montreal” BUDGET > 200000 m3 : LOC = “New York” BUDGET 200000 m4 : LOC = “New York” BUDGET > 200000 m5 : LOC = “Paris” BUDGET 200000 M6 : LOC = “Paris” BUDGET > 200000 p1: LOC = “Montreal” p2 : LOC = “New York” p3 : LOC = “Paris” p4: BUDGET 200000 p5 : BUDGET > 200000 35 PHF Example: PROJ - Result PROJ PROJ1 PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC P1 Instrumentation 150000 Mntreal P1 Instrumentation 150000 Mntreal P2 Database 135000 New York P3 CAD/CAM 250000 New York PROJ2 P4 Maintenance 310000 Paris PNO PNAME BUDGET LOC P2 Database 135000 New York m1: LOC = “Montreal” BUDGET 200000 m2 : LOC = “Montreal” BUDGET > 200000 m3 : LOC = “New York” BUDGET 200000 m4 : LOC = “New York” BUDGET > 200000 m5 : LOC = “Paris” BUDGET 200000 M6 : LOC = “Paris” BUDGET > 200000 PROJ4 PNO PNAME BUDGET LOC P3 CAD/CAM 250000 New York PROJ6 PNO PNAME BUDGET P4 Maintenance 310000 LOC Paris 36 PHF - Correctness Completeness Since Pr is complete and minimal, the selection predicates are complete Reconstruction If relation R is fragmented into FR = {R1, R2, …, Rr} R = Ri FR Ri Disjointness Minterm predicates that form the basis of fragmentation should be mutually exclusive 37 DHF: Derived Horizontal Fragmentation Owner relation PAY Title, Sal EMP Each link is an equijoin L1 PROJ Eno, Ename, Title L2 Member relation DHF: Defined on a member relation according to a selection operation on its owner Jno, Jname, Budget. Loc L3 ASG Eno, Jno, Resp, Dur 38 PAY DHF – Example Title, Sal EMP PAY1= SAL 30000 PAY PAY2= SAL> 30000 PAY TITLE Mech. Eng. Programmer SAL 27000 24000 TITLE Elec. Eng. Syst. Anal. SAL 40000 34000 EMP ENO E1 E2 E3 E4 E5 E6 E7 E8 ENAME J. Doe M. Smith A. Lee J. Miller B. Casey L. Chu R. Davis J. Jones TITLE Elec.Eng. Syst. Anal. Mech. Eng. Programmer Syst. Anal. Elec.Eng. Mech. Eng. Syst. Anal. DHF L1 Eno, Ename, Title EMP1 = EMP ENO E3 E4 E7 ENAME A. Lee J. Miller R. Davis EMP2 = EMP ENO E1 E2 E5 E6 E8 ENAME J. Doe M. Smith B. Casey L. Chu J. Jones PAY1 TITLE Mech. Eng. Programmer Mech. Eng. PAY2 TITLE Elec.Eng. Syst. Anal. Syst. Anal. Elec.Eng. Syst. Anal. 39 DHF: Derived Horizontal Fragmentation Let S be horizontally fragmented and let there be a link L with owner(L) = S, and member(L) = R, the derived horizontal fragments of R are defined as Ri = R Si , 1 i w where Si is the horizontal fragment of S, is the semijoin operator, and w is the maximum number of fragments Inputs to derived horizontal fragmentation: partitions of owner relation member relation the semijoin condition The algorithm is straight forward. 40 DHF: Correctness Completeness primary horizontal fragmentation based on completeness of selection predicates. For derived horizontal fragmentation based on referential integrity Reconstruction Same as primary horizontal fragmentation (via union) Disjointedness Simple join graphs between the owner and the member fragments. 41 DHF: Issues Multiple owners for a member relation; how should we derived horizontally fragments of a member relation? There could be a chain of derived horizontal fragmentation. 42 Vertical Fragmentation (VF) Has been studied within the centralized context Vertical partitioning of a relation R produces fragments R1, R2, ..., Rm, each of which contains a subset of R's attributes as well as the primary key of R The object of vertical fragmentation is to reduce irrelevant attribute access, and thus irrelevant data access “Optimal” vertical fragmentation is one that minimizes the irrelevant data access for user applications 43 VF Two Approaches Grouping: each individual attribute one fragment, at each step join some of the fragments until some criteria being satisfied Attributes to fragments Splitting: start with global relation, and generate beneficial partitions based on access behavior of the applications Relations to fragments 44 Vertical Fragmentation Overlapping fragments grouping Non-overlapping fragments Splitting We do not consider the replicated key attributes to be overlapping Advantage – easier to enforce functional dependencies (for integrity checking) 45 VF Information Requirements Application Information Attribute affinities – A measure that indicates how closely related the attributes are – This is obtained from more primitive usage data Attribute usage values – Given a set of queries Q = {q1 , q2, ..., qm} that will run on relation R (A1, A2, ..., An) use (qi, Aj ) = 1 if attribute Aj is referenced by query qi 0 otherwise use (qi, . ) can be defined accordingly. 46 VF Definition of use(qi, Aj) Consider the following 4 queries for relation PROJ q1: SELECT BUDGET FROM PROJ WHERE PNO = val; q2: SELECT PNAME,BUDGET FROM PROJ; q3: SELECT PNAME FROM PROJ WHERE LOC = val; q4: SELECT SUM(BUDGET) FROM PROJ WHERE LOC=val; Let A1= PNO, A2= PNAME, A3 = BUDGET, A4 = LOC A1 A2 A3 A4 1 q2 0 q3 0 q4 0 q1 0 1 1 0 1 1 0 1 0 0 1 1 47 VF - Affinity Measure aff(Ai, Aj) The attribute affinity measure between two attributes Ai and Aj of a relation R (A1, A2, ..., An) with respect to the set of applications Q = {q1 , q2, ..., qm} is defined as: Aff (Ai, Aj) = { k | use (qk, Ai)=1 use (qk, Aj)=1} l refl (qk) * accl (qk) where refl (qk) is the number of accesses to attributes for each execution of application qk at site l; accl (qk) is the access frequency of query qk at site l . 48 VF - Affinity Measure aff(Ai, Aj) Assume each query in the previous example accesses the attributes once during each execution. Also assume the access frequency of query k at different sites is: q1 S1 S2 S3 15 20 10 q2 5 0 0 q3 25 25 25 q4 3 0 0 then Aff (A1, A3) = 15*1 + 20*1 + 10*1 = 45 and the attribute affinity matrix AA is: A1 A2 A3 A4 1 q2 0 q3 0 q4 0 q1 0 1 1 0 1 1 0 1 0 0 1 1 A1 A1 45 A2 0 A3 45 A4 0 A2 A3 A4 0 45 0 80 5 75 5 53 3 75 3 78 49 AA Matrix for Vertical Fragmentation This affinity matrix will be used to guide the fragmentation effort. The process involves first clustering together the attributes with high affinity for each other, and then splitting the relation accordingly. 50 VF - Correctness A relation R, defined over attribute set A and key K, generates vertical partitioning FR = {R1 , R2 , … ,Rr} Completeness A ARi Reconstruction R = k Ri Ri FR Disjointness Duplicate keys are not considered to be overlapping 51 Hybrid Fragmentation R HF R1 R2 VF VF R11 R12 R21 R22 R23 52 Outline Introduction Fragmentation (片段划分) Horizontal fragmentation (水平划分) Derived horizontal fragmentation (导出式水平划分) Vertical fragmentation (垂直划分) Allocation (片段分配) 53 Allocation File Allocation vs. Database Allocation Fragments are not individual files – Relationships have to be maintained Access to databases is more complicated – Remote file access model not applicable – Relationship between allocation and query processing Cost of integrity enforcement should be considered Cost of concurrency control should be considered 54 Allocation Problem Assuming: F F1, F2 ,, Fn A set of fragments S S1, S2 ,, Sm A set of sites Q Q1, Q2 ,, Qq A set of applications Problem Find the optimal distribution of F over S 55 Optimality with Two Aspects Minimal cost Storing Fi at Sj Querying Fi at Sj Updating Fi at all Sj 's with a copy of Fi Communication Performance Response time Throughput …… Separate the two issues to reduce its complexity. 56 A Simple Formulation of the Cost Problem 1 For a single fragment Fi R = {r1, r2, …, rm} rj : read-only traffic generated at Sj for Fi U = {u1, u2, …, um} uj : update traffic generated at Sj for Fi F, S, Q are defined as before 57 A Simple Formulation of the Cost Problem (cont.) Assume the communication cost between any pair of sites Si and Sj is fixed C(T ) = {c11,c12, c13, …, c1,m, …, cm-1,m} cij : retrieval communication cost C’(U ) = {c’11,c’12, c’13, …, c’1,m, …, c’m-1,m} c’ij : update communication cost 58 A Simple Formulation of the Cost Problem (cont.) 3 D = {d1, d2, …, dm} cost for storing Fi at Sj No capacity constraints for sites and communication links 59 A Simple Formulation of the Cost Problem (cont.) Let The allocation problem is a cost minimization problem for finding the set I S1 , S2 ,, Sm i.e., the sites to store the fragment Fi, where 1 if thefragmentFi is assigned tosite S j xij 0 otherwise 60 A Simple Formulation of the Cost Problem (cont.) For queries from site i Updates : Reads : r (min i Total cost m min[ ( { j| I } sj c) ij ui c'ij j| S j I Storage : j| S j I dj ci , j ) x j d j ] x j u j c'i, j r j jmin |S I i 1 j|S j I j j|S j I This formulation only considers one fragment Fi at site Sj. It is NP-complete. 61 A Precise Formulation of the Cost Problem A precise formulation must consider: All fragments together How query is processed The enforcement of integrity constraint The cost of concurrency control and transaction control 62 Allocation Model in General Allocation Model min (total Cost) subject to – responsetime constraint – storage constraint – processing constraint Decision variable 1 if fragment Fi is stored at site Sj xij 0 otherwise 63 Allocation Model (cont.) Total Cost ∑all queries query processing cost + ∑all sites ∑all fragments cost of storing a fragment at a site Storage Cost (on fragment Fj at site Sk) (unit storage cost at Sk) * (size of Fj) * xjk Query processing Cost (for one query) processing component + transmission component 64 Allocation Model (cont.) Query Processing Cost Processing component access cost + integrity enforcement cost + concurrency control cost access cost: ∑all sites ∑all fragments (number of update accesses + number of read accesses) * xjk * local processing cost at site integrity enforcement and concurrency control costs can be similarly calculated. 65 Allocation Model (cont.) Query Processing Cost Transmission component cost of processing updates + cost of processing retrievals Cost of updates ∑all sites ∑all fragments update message cost + ∑all sites ∑all fragments acknowledgement cost Retrieval costs ∑all fragments minall sites (cost of retrieval command + cost of sending back the result) 66 Allocation Model (cont.) Constraints Response time for each query not longer than maximally allowed response time for that query Storage constraint: The total size of all fragments allocated at a site must be less than the storage capacity at that site. Processing constraint: The total processing load because of all queries at a site must be less than the processing capacity at that site. 67 Solution Methods NP-complete Heuristics approaches Exploring techniques developed in operational research (运筹学) Reduce problem complexity ignore replication first, and then improve with a greedy algorithm 68 Question & Answer 69