Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Distributed Databases: Organization & Query Processing These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see http://www.openlineconsult.com/db 1 Topics Distributed Database Architecture Location Transparency Data Placement & Fragmentation Distributed Query Processing Copyright © Ellis Cohen, 2002-2005 2 Distributed Databases Distribution Spreading data across multiple network nodes Partitioning & Fragmentation Distribute tables divided into vertical or horizontal parts Replication Replicating (parts of) tables across multiple nodes Why would we want to distribute or replicate data? Copyright © Ellis Cohen, 2002-2005 3 Distribution & Replication Distribution Integrate separate databases Decrease network latency by locating data near greatest demand Locate data within secure administrative boundaries Parallel processing Replication Decreased network latency by placing replicas near multiple high demand sites High availability & reliability in face of failure More parallel processing Scalability (single copy no longer bottleneck) Disconnected operation Copyright © Ellis Cohen, 2002-2005 4 Date's 12 Rules for Integrated Distributed Systems 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Local autonomy No reliance on a central site Continuous operation Location transparency Fragmentation transparency Replication transparency Distributed query processing Distributed transaction management Hardware independence Operating system independence Network independence DBMS independence (transparent heterogeneity) To the user, a distributed DB system should look exactly like a non-distributed system Copyright © Ellis Cohen, 2002-2005 5 Design Issues for DDBMS's Keep track of data names & locations Decide what to fragment & replicate Decide placement (allocation, distribution) of objects, fragments & replicas Devise strategies for executing transactions & queries that access data from multiple sites Manage distributed transactions, including backup & recovery from – individual site crashes – communication link failures Decide which copy/copies of replicated data to access Maintain consistency of replicated data copies Copyright © Ellis Cohen, 2002-2005 6 Distributed Database Architectures Copyright © Ellis Cohen, 2002-2005 7 Distributed Database Architectures Architectures – Multi-Database Architecture • Appears to user as separate databases – TP Monitor / Application Server Architecture • Separate server to handle transaction management & other services (e.g. security) – Federated Database Architecture • Appears to user as a single database providing a global schema integrating disparate DB's – Collaborating Database Architecture • A collection of peer databases, which interconnect to one another, providing a global schema to users who connect to an individual peer Heterogeneity Homogeneous: every site runs same type of DBMS Heterogeneous: different sites run different DBMS's (perhaps even non-relational ones) Copyright © Ellis Cohen, 2002-2005 8 Coordination Coordination of a distributed transaction is managed by a coordinator, which resides at a single node • Multi-Database Architecture Client is the coordinator • TP Monitor / Application Server Architecture TP Monitor / App Server is coordinator • Federated Database Architecture Federation Server is coordinator • Collaborating Database Architecture The peer connected to by the client is the coordinator Copyright © Ellis Cohen, 2002-2005 9 Multi-Database Architecture Client acts as coordinator • Issues queries directly to multiple DB servers (subordinates) • Integrates the results • Handles distributed transaction management (as well as it can) Client DB Server Coordinator DB Server Subordinates Copyright © Ellis Cohen, 2002-2005 10 Sub-query Distribution Suppose a coordinator wants to execute the query that lists the project managed by the highest paid employee SELECT * FROM Projs WHERE pmgr = (SELECT empno FROM Emps WHERE sal = (SELECT max(sal) FROM Emps)) If subordinate S1 holds the Projs table, and subordinate S2 holds the Emps tables, then the coordinator will request S2 to execute the sub-query SELECT empno FROM Emps WHERE sal = (SELECT max(sal) FROM Emps) Will get the result back (let's call it result), and request S1 to execute (and return the results of) the sub-query SELECT * FROM Projs WHERE pmgr = result Copyright © Ellis Cohen, 2002-2005 11 Sub-transactions Imagine a coordinator C has started a transaction TC, and is executing a query as part of TC. – The coordinator divides the query up into sub-queries, which it sends to various subordinates. – It labels each subquery with TC, the identity of the main transaction. When a subordinate S is passed a sub-query – If it has not yet seen the label TC, it creates a local transaction TS (called a sub-transaction), and associates TS with TC. – If it has seen TC before, it looks up the corresponding TS. In either case, S runs the sub-query as part of the local sub-transaction TS Copyright © Ellis Cohen, 2002-2005 12 Transaction Manager Client acts as coordinator • Uses Transaction Manager to handle distributed transaction management Client still • Issues queries directly to multiple DB servers • integrates the results Client Transaction Manager API's & Protocols standardized by X/Open Subtransactions DB Server Copyright © Ellis Cohen, 2002-2005 DB Server 13 Distributed Transaction Management Coordinator's transaction manager communicates with each subordinate (participating DB server) Each subordinate manages its own sub-transactions – Reflects queries performed by that subordinate on behalf of the parent transaction – Enforces ACID requirements of the subordinate – Enables independent recovery by each subordinate Provides distributed concurrency control to ensure global serializability Provides atomic commit protocol to ensure global atomicity & durability Copyright © Ellis Cohen, 2002-2005 14 The Distributed Commit Problem • A distributed transaction which executes at multiple sites must either be committed at all sites or aborted at all sites • Not acceptable for one sub-transaction to commit and one abort. • If the coordinator just sends a COMMIT message to two subordinates S1 and S2 – S1 could get the COMMIT message and commit – S2 could crash just before it gets the COMMIT message, and before writing any local subtransaction state to stable storage) -- i.e. S2 is aborted • Obviously a more complicated protocol is needed, which we will address later Copyright © Ellis Cohen, 2002-2005 15 TP Monitor / Application Server Client uses TP Monitor / App Server to execute transactions Application Server may use load balancing to decide which Application Server should coordinate transaction Transaction executing within App Server ... Client App Server • makes direct calls to multiple DB servers Subqueries • integrates the results • Uses App Server's Transaction Mgr to handle distributed transaction management ... App Server DB Server DB Server Copyright © Ellis Cohen, 2002-2005 DB Server 16 Heterogeneity Heterogeneous Databases – – – – Different data types Different SQL commands or syntax Different protocols Different embedded programming languages – Different security mechanisms (authentication & access control) – Different concurrency mechanisms Heterogeneous Data Models – Different names – Different values (esp units) – Different constraints & derived values Copyright © Ellis Cohen, 2002-2005 17 Heterogeneity Transparency Non-Transparent: Client must deal with some or all aspects of database heterogeneity directly Semi-Transparent: Mapping layer hides most differences among databases Coordinator may still be able to exploit differences (e.g. pass-through SQL) Coordinator Mapping Layer DB Server Transparent Mapping layer hides differences among databases and among data models Copyright © Ellis Cohen, 2002-2005 18 Mapping Architecture Coordinator may be • Client • App Server • DB Server Mapping layer may reside in • Coordinator • DB Server • separate DB Server Gateway Server Coordinator Mapping Layer DB Server Copyright © Ellis Cohen, 2002-2005 DB Server 19 Federated Database Architecture Federation Layer supports Client / App Server • Transaction Management • Heterogeneity Mapping Layer • Global Schema supported by Distributed Query Processing Federation Layer may be • Software layer callable by client (i.e. extended transaction manager) • Provided by separate Federated DB Server (e.g. extended TP Monitor) • Integrated with DB server (i.e. Collaborating DB Architecture) Transactions Queries Federation Layer SubTransactions Sub-Queries DB Server Copyright © Ellis Cohen, 2002-2005 DB Server 20 Collaborating Database Architecture Client can connect to one of a set of DB Servers Could itself be an App/DB/Gateway Server Connecting DB Server • Provides global schema • May choose a different DB Server to coordinate transaction (e,g, based on load balancing or one nearest data) Coordinating DB Server • Handles distributed transaction management • Handles distributed query management DB Servers • Appear homogeneous • May themselves be Federated DBs or Gateway Servers Client DB Server DB Server DB Server DB Server Collaborating DB servers generally communicate using private protocol Copyright © Ellis Cohen, 2002-2005 21 Location Transparency Copyright © Ellis Cohen, 2002-2005 22 Location Transparency Requirements 1. DB objects must be able to reside and be created at multiple sites in a system 2. Each DB object must be able to be uniquely named by a transaction 3. The name for a DB object used by a transaction must enable the object to be located efficiently 4. It must be possible to write transaction code that will not need to be modified if either • the transaction is executed at a different site • The DB objects accessed are moved Copyright © Ellis Cohen, 2002-2005 23 Explicit Site Naming SELECT * FROM scott.emp@hq.acme.com If @ (as in Oracle) reflects the table's current location, this does not support the key transparency requirement. However, if @ identifies the table's birth site, which then holds the table's forwarding location (where it is currently located, or which does further forwarding), the transparency is retained. Security considerations In what security domain does the transaction run on the remote machine? What if the user currently running does not have an account on the remote machine? Copyright © Ellis Cohen, 2002-2005 24 Synonyms joe@boston> create SYNONYM emp for scott.emp@hq.acme.com joe@boston> SELECT * FROM emp Is emp a – Local synonym [can only be used by joe?] – Part of joe's schema? dilip@boston> SELECT * FROM joe.emp Even if synonyms are automatically replicated on every machine no guarantee of location transparency because of naming conflicts Copyright © Ellis Cohen, 2002-2005 25 Location Transparency via Global Directory Management Design a global directory hierarchy Provides a separate naming scope for storing synonyms joe@boston> CREATE PUBLIC GLOBAL DIRECTORY /stuff joe@boston> CREATE PUBLIC DIRECTORY /stuff/empinfo // invented syntax joe@boston> CREATE PUBLIC GLOBAL SYNONYM /stuff/empinfo/emp FOR scott.emp@hq.acme.com sam@podunk> SELECT * FROM /stuff/empinfo/emp Where is the global directory stored? – Centralized directory manager (name server) susceptible to bottlenecks and failures – Needs to be replicated Copyright © Ellis Cohen, 2002-2005 26 Data Placement & Fragmentation Copyright © Ellis Cohen, 2002-2005 27 Data Placement Company HQ in Des Moines Warehouses in SF, NY, Denver SfCust( custid, addr ) NyCust( custid, addr ) DenverCust( custid, addr ) A. Place all 3 in DesMoines How would B. Place SfCust in SF you decide? NyCust in NY DenverCust in Denver C. Place SfCust in SF & DesMoines NyCust in NY & Des Moines DenverCust in Denver & DesMoines Copyright © Ellis Cohen, 2002-2005 28 Data Fragmentation Why would you choose one or another of these approaches? Horizontal Fragmentation • Each fragment is a subset of rows • Rows do not overlap (else doing partial replication) • Reconstruction by union • Updates may requires tuple migration Vertical Fragmentation • Each fragment is a subset of columns • All fragments include primary key columns or share ROWIDs • Reconstruction by join • Updates do not require tuple migration Copyright © Ellis Cohen, 2002-2005 29 Rules for Data Fragmentation Completeness All the data of the global relation must be mapped to the fragments Reconstruction It must always be possible to reconstruct each global relation from its fragments Disjointedness If fragments are disjoint, then decisions about replication of data can be made somewhat separately from decisions about fragmentation Copyright © Ellis Cohen, 2002-2005 30 Horizontal Fragmentation Create: CREATE TABLE emp ( … ) PARTITION ( scott.emp10@hq.acme.com WHERE deptno = 10, scott.emp20@dallas.acme.com WHERE deptno = 20, scott.emp30@chicago.acme.com WHERE deptno = 30, scott.otheremp@hq.acme.com OTHERWISE) // invented syntax loosely based on Oracle The predicates defining all the fragments should be complete and mutually exclusive (or else there is replication) Reconstruct: SELECT SELECT SELECT SELECT * * * * FROM FROM FROM FROM scott.emp10@hq.acme.com UNION scott.emp20@dallas.acme.com UNION scott.emp30@chicago.acme.com UNION scott.otheremp@boston.acme.com Copyright © Ellis Cohen, 2002-2005 31 Fragmentation Transparency SELECT ename, job FROM emp WHERE sal > 50000 SELECT ename, job FROM scott.emp10@hq.acme.com WHERE sal > 50000 UNION SELECT ename, job FROM scott.emp20@dallas.acme.com WHERE sal > 50000 UNION SELECT ename, job FROM scott.emp30@chicago.acme.com WHERE sal > 50000 UNION SELECT ename, job FROM scott.otheremp@hq.acme.com WHERE sal > 50000 Copyright © Ellis Cohen, 2002-2005 Implement as Integrate decomposed queries via union 32 Fragmentation Transparency for Updates UPDATE emp SET deptno = 30 WHERE empno = 6749; // assumes you know deptno currently 20; // much more complicated otherwise Implementing this update requires tuple migration Implement as SELECT * INTO anEmp FROM scott.emp20@dallas.acme.com WHERE empno = 6749; DELETE FROM scott.emp20@dallas.acme.com WHERE empno = 6749; INSERT INTO scott.emp30@chicago.acme.com VALUES ( 6749, anEmp.ename, anEmp.job, anEmp.mgr, anEmp.hiredate, anEmp.sal, anEmp.comm, 30 ); Copyright © Ellis Cohen, 2002-2005 33 Vertical Fragmentation Create: CREATE TABLE emp ( empno int primary key, … ) PARTITION ( ename, job, mgr, deptno AS scott.empinfo@boston.acme.com, hiredate AS scott.emphr@hq.acme.com, sal, comm AS scott.empacct@hq.acme.com) // invented syntax loosely based on Oracle The rows defining all the fragments should be complete and mutually exclusive. All automatically include the primary key empno to match up rows (or use some other mechanism to match ROWIDs) Reconstruct: SELECT i.empno, i.job, i.mgr, h.hiredate, a.sal, a.comm, i.deptno FROM scott.empinfo@boston.acme.com i, NATURAL JOIN scott.emphr@hq.acme.com h, NATURAL JOIN scott.empacct@hq.acme.com a Copyright © Ellis Cohen, 2002-2005 34 Hybrid Fragmentation CREATE TABLE emp ( empno int primary key, … ) PARTITION ( ename, job, mgr, deptno AS ( scott.emp10@hq.acme.com where deptno = 10, scott.emp20@dallas.acme.com where deptno = 20, scott.emp30@chicago.acme.com where deptno = 30, scott.otheremp@hq.acme.com otherwise ) hiredate AS scott.emphr@hq.acme.com, sal, comm AS scott.empacct@hq.acme.com) // invented syntax loosely based on Oracle Copyright © Ellis Cohen, 2002-2005 35 Data Placement Revisited Company HQ in Des Moines Warehouses in SF, NY, Denver Cust( custid, addr, whse ) whse is 'SF', 'NY', or 'Denver' A. Place Cust at Des Moines How would A. Partition Cust by whse you decide? SfCust@SF NyCust@NY DenverCust@Denver C. Leave Cust at Des Moines and also partition as SfCust@SF, NyCust@NY & DenverCust@Denver Copyright © Ellis Cohen, 2002-2005 36 Database Design Problem Hard Optimization Problem (even w/o considering replication) – Fragmentation: How to fragment tables – Allocation/Placement: Where to place tables and fragments Relative to minimizing/maximizing some cost function - e.g. – minimize query response time – maximize throughput – must be approximate, since determining actual query plan is a separate optimization problem Subject to constraints - e.g. – Available storage, bandwidth, processing power, … – Keep 90% of response time below X Copyright © Ellis Cohen, 2002-2005 37 Optimization Approach Factors to Consider The originating site(s) of queries/updates Which attributes are accessed together Which attributes & combinations of selection predicates are used from which sites, with which frequencies Frequencies of updates that affect combinations of selection predicates Data integration costs (costs of joins and unions for fragments) vs increase in parallelism Costs of communication, concurrency control, security & integrity maintenance Copyright © Ellis Cohen, 2002-2005 38 Distributed Query Processing Copyright © Ellis Cohen, 2002-2005 39 Distributed Query Processing Query processing Based on algorithms that analyze queries and convert them into a series of data manipulation operations. The problem Deciding a strategy for executing each query over the network in the most cost effective way, however the cost is defined. Main factors I/O, CPU, Communication costs Opportunity for pipelining & parallel operations Copyright © Ellis Cohen, 2002-2005 40 Distributed Query Example S2 proj S3 dept S1 emp Given tables emp( empno, ename, deptno, sal, … ) at site S1 (largest) project( pno, pname, mgr, … ) at site S2 dept( deptno, dname, loc ) at site S3 (smallest) Copyright © Ellis Cohen, 2002-2005 41 Sub-Querying & Shipping Queries are executed via a combination of computing queries and shipping data. For example, suppose we want to execute a query to find out the name of each project, along with its project manager & the name of that manager's department SELECT pname, ename, dname FROM project p, emp e, dept d WHERE p.mgr = e.empno AND e.deptno = d.deptno Copyright © Ellis Cohen, 2002-2005 42 Consider Cost-based Alternatives S2 proj Alternative 1 Ship dept & project to S1 Process query at S1 S3 dept S1 emp S2 Alternative 2 Ship emp & project to S3 Process query at S3 proj S3 dept S1 emp Which one is better? Copyright © Ellis Cohen, 2002-2005 43 Evaluating Alternatives Alternative 1 Ship dept & project to S1 Process query at S1 Alternative 2 S2 proj S3 dept S1 emp Ship emp & project to S3 Process query at S3 S2 In general, alternative #1 proj is better, because it involves shipping less information But to really determine the best approach, you must consider S3 dept S1 emp – Communication costs to S1 vs S3 (what if slow line between S2 & S1) – Relative processing speeds and scheduling algorithms at S1 vs S3 – Size of result & location of coordinator Copyright © Ellis Cohen, 2002-2005 44 Intermixing Querying & Shipping Rather than shipping base tables and performing a single query, it may make sense to – do a query at one site – ship the query results to another site – do a query at that site joining the results received with data available at a that site In general, a distributed query plan involves a (potentially lengthy) sequence of performing queries and shipping data (either base tables or query results) Copyright © Ellis Cohen, 2002-2005 45 Distributed Query Planning Example For example, suppose we are only interested in projects, where the project manager makes more than 8000/month. For those projects, we want the name of the project, the name of the project manager & the name of that manager's department. S2 S3 proj dept Process SELECT pname, ename, dname FROM project p, emp e, dept d WHERE p.mgr = e.empno AND e.deptno = d.deptno AND e.sal > 8000 S1 emp If there are not very many employees who make > 8000, what's the best plan for executing this query? Copyright © Ellis Cohen, 2002-2005 46 Restrict before Ship At S1, COMPUTE emplet AS SELECT empno, ename FROM emp WHERE sal > 8000 SHIP emplet & dept FROM S1 TO S2 S2 S3 proj AT S2, COMPUTE SELECT ename, dname, pname FROM emplet e, dept d, project p WHERE p.mgr = e.empno AND e.deptno = d.deptno Copyright © Ellis Cohen, 2002-2005 dept S1 emp 47 Semijoins A semijoin is • a join between two (or more tables) where • one of the tables is just used to restrict the result, but not provide any data Example List the names of employees whose departments are located in NY SELECT e.empno FROM emp e, dept d WHERE e.deptno = d.deptno All the result data comes from the emp table The dept table is joined with emp, simply to restrict the tuples chosen from the emp table Copyright © Ellis Cohen, 2002-2005 48 Using Semijoins in Distributed Queries 1) Some data (generally the result of a query) is shipped from site Sa to site Sb Sb 2 Db 3 1 Sa Da 2) The shipped data is used in a semijoin with the data at Sb. This produces a subset of the data at Sb, restricted based on the data shipped from Sa 3) The result of the semijoin is shipped back to Sa, where it is combined with data already there If S1 is the coordinator (where the results must end up), how can semijoins be used to produce a more efficient solution to the project manager query? Copyright © Ellis Cohen, 2002-2005 49 Using Semijoins At S1, COMPUTE emplet AS SELECT empno, ename FROM emp 1 WHERE sal > 8000 At S1, COMPUTE empl AS SELECT empno FROM emplet At S3, COMPUTE deptlet AS SELECT deptno, dname 3 FROM Dept SHIP deptlet FROM S3 TO S1 S2 SHIP empl FROM S1 TO S2 S3 dept proj Shipping empl to S2 limits the tuples from proj to be sent back to S1 2 3 1 emp At S1, COMPUTE emproj AS SELECT pmgr, pname 2 FROM project, empl WHERE pmgr = empno ORDER by pmgr SHIP emproj FROM S2 TO S1 S1 4 At S3, COMPUTE deptlet AS SELECT pname, ename, dname FROM emplet e, deptlet d, emproj p 4 WHERE e.deptno = d.deptno and e.empno = p.pmgr Copyright © Ellis Cohen, 2002-2005 50 Planning Alternatives Result-Based or Stream-Based – Result-Based: A site waits until it receives the entire result set shipped to it before it can use it in a query – Stream-Based: A query at a site will use data streamed to it as it arrives from another site (also called pipelining) Sequential or Parallel – Sequential: A site ships data to (or requests data) from one other site at a time – Parallel: A site can ship data to (or request data from) multiple sites in parallel Copyright © Ellis Cohen, 2002-2005 51 Streaming & Pipelining At S3, COMPUTE deptlet AS SELECT deptno, dname FROM dept S2 proj S3 dept SHIP deptlet FROM S3 TO S1 AT S1, COMPUTE empdept AS SELECT empno, ename, dname FROM emp, dept WHERE emp.deptno = dept.deptno AND sal > 8000 ORDER BY empno STREAM empdept FROM S1 TO S2 emp AT S2, COMPUTE SELECT p.pname, ed.ename, ed.dname FROM project p, empdept ed WHERE p.mgr = ed.empno When would this approach be useful? Copyright © Ellis Cohen, 2002-2005 52 S1 Parallelism & Streaming At S1, COMPUTE empl AS SELECT empno FROM emp WHERE sal > 8000 ORDER BY empno AT S1, COMPUTE dempl AS SELECT DISTINCT deptno FROM emp WHERE sal > 8000 STREAM empl FROM S1 TO S2 AS S2, COMPUTE eproj AS SELECT pmgr, pname FROM project p, empl e WHERE e.empno = p.pmgr STREAM eproj FROM S2 TO S1 STREAM dempl FROM S1 TO S3 AS S3, COMPUTE deptlet AS SELECT deptno, dname FROM dept d, dempl e WHERE d.deptno = e.deptno STREAM deptlet FROM S3 TO S1 Do in parallel AT S1, COMPUTE SELECT pname, ename, dname FROM emp e, eproj p, deptlet d WHERE e.deptno = d.deptno AND e.empno = p.pmgr S3 dept S2 proj emp S1 Copyright © Ellis Cohen, 2002-2005 53 What's Best Informally, we've talked about how query planning finds the best way to process the query, involving • subqueries • shipping/streaming • parallel execution But when we say "best", what do we actually mean? Copyright © Ellis Cohen, 2002-2005 54 Possible Query Plan Goals Fastest complete result Fastest first result Minimize resource usage of specific resources Combination of the above Copyright © Ellis Cohen, 2002-2005 55 Query Optimization Build initial tree for query – Build tree reflecting relational algebra corresponding to query – Modify tree to account for fragmentation (more complex if distributed fragments overlap) – Incorporate simplest ship operations into tree for accessing remote data Perform global query optimization – Apply transformation operators that produce an equivalent tree – Account for pipelining & parallelism as well – Use heuristic search algorithm (e.g. hill climbing, simulated annealing, genetic algorithms) to find best distributed query plan considering replicas – Use cost function incorporating time taken by I/O, CPU & communication (best if statistics on size of relations & result sets are maintained) Copyright © Ellis Cohen, 2002-2005 56 Global vs Local Query Optimization Global Optimization produces – A set of decomposed queries to be sent to various DB servers – Combined with ship/stream instructions – All placed in a parallel/sequential control flow graph Local Optimization – Each local server determines best way to execute each decomposed query sent to it (though global optimization may generate preliminary plans) Copyright © Ellis Cohen, 2002-2005 57