Transaction Management in HDBMSs ©2003 Vera Goebel & Denise Ecklund HDBMS-TM-1 HDBS Transaction Model global transactions GTi GTj GTM - global transaction manager { GSTi1, GSTl1, GSTi2, GSTj2 } LTk LTl local transactions server server (proxy for the GTM) (proxy for the GTM) GSTi1 DBMS 1 ©2003 Vera Goebel & Denise Ecklund ©2002 Vera Goebel & Denise Ecklund GSTi2 GSTj1 ... GSTj2 local transactions DBMS n LTm LTn HDBMS-TM-2 III–1 Transaction Management • Local transactions: access data at a single site outside of the global HDBS control. • Global transactions: are executed under the HDBS control. Local DBMSs have three types of autonomy: Autonomy Type Definition Resulting Problem No changes can be made to the local DBMS software to support the HDBMS Design Each local DBMS controls execution of global subtransactions and local transactions ( the commit/abort decision) Local DBMS do not communicate with Communication each other and they do not exchange execution control information Execution Non-serializable schedule for global transactions Non-atomic & non-durable global transactions Distributed deadlock can not be detected ©2003 Vera Goebel & Denise Ecklund HDBMS-TM-3 Global Serializability Problem Global Serializability Atomicity & Durability Distrbuted Deadlock • GTM is responsible for – A serializable schedule for the set of global transactions – Coordination of submission and execution of global subtransactions among the local DBMSs • Serializing the global schedule? GT2 GT1 GST11 GST12 GST21 GST22 GST23 Local DBMS-3 Local DBMS-1 Local DBMS-2 If GST11 〈 GST22 at site DBMS-1, Then it must be the case that GST12 〈 GST23 at site DBMS-2 If GST23 〈 GST12 at site DBMS-2 ©2003 Vera Goebel & Denise Ecklund ©2002 Vera Goebel & Denise Ecklund GT2 〈 GT1 GT1 〈 GT2 A non-serializable schedule! HDBMS-TM-4 III–2 Local Transactions and the Global Serializable Schedule • • • • Local transactions execute outside the control of the GTM Local transactions create indirect conflicts with global transactions GTM is not aware of local transactions and these indirect conflicts In general, the GTM cannot ensure global serializability GT1: r1(a) r1(c) LT3: w3(a) w3(b) GTM belives GT1 〈 GT2 at both sites GT2: r2(b) r2(d) LDBMS-1 a b LDBMS-1: r1(a) c1 w3(a) w3(b) c3 r2(b) c2 => LDBMS-1: GT1 〈 LT3 〈 GT2 LDBMS-2 LT4: w4(c) w4(d) c d LDBMS-2: w4(c) r1(c) c1 r2(d) c2 w4(d) c4 => LDBMS-2: GT2 〈 LT4 〈 GT1 ©2003 Vera Goebel & Denise Ecklund HDBMS-TM-5 Controlling the Execution Order of Global Subtransactions • Four Strategies: 1) Execute global transactions serially • No concurrent execution for global transactions! • Does not solve indirect conflicts with local transactions • Costs: Heavy CC processing at the GTM Low query processing throughput Global Serializability Atomicity & Durability Distrbuted Deadlock 2) Define a specific order over the global transactions and use the concurrency control mechanism of each local DBMS to enforce that order • Every local DB stores one ”ticket” object • Extend every global subtransaction to access the ticket GT1: r1(a) w1(a) GT2: r2(b) w2(b) newGT1: r1(ticketS1) r1(a) w1(a) w1(ticketS1) c1 newGT2: r2(ticketS1) r2(b) w2(b) w2(ticketS2) c2 • Means GT1 and GT2 will be correctly serialized with respect to all global transactions and all local transaction executed by the local DBMS at S1 ©2003 Vera Goebel & Denise Ecklund ©2002 Vera Goebel & Denise Ecklund HDBMS-TM-6 III–3 Controlling the Execution Order of Global Subtransactions 3) Use local DBs deploying rigorous CC Algorithms • If all LDBMSs use rigorous 2-phase locking and support a “prepare-to-commit” interface then Global Serializability Atomicity & Durability Distrbuted Deadlock – Global transactions are serializable without a CC Alg at GTM – Local transactions can not cause indirect conflicts Ex: (w4(c) r1(c) c1 r2(d) c2 w4(d) c4) Not a rigorous local schedule In R2PL, T4 holds all locks until commit, so ... T1 can not read object c until after T4 commits 4) Relax the serializability requirement • Use “strong correctness” instead • Most indirect conflicts have no effect on correctness ©2003 Vera Goebel & Denise Ecklund HDBMS-TM-7 Alternative Consistency Models • Global schedule is not serializable; it is strongly correct – Global transactions preserve all data consistency constraints Constraint-based strategies Global Serializability Atomicity & Durability Distrbuted Deadlock • Local serializability: Some HDBS applications have no global constraints because each DBS is (and should be) independent from each other => no global concurrency control mechanism needed So, local serializability ensures strong correctness of global executions. Ex application: travel reservation service for planes, trains, ferries, hotels, etc. • Limited global constraints: Some applications need global constraints. Define 2 types of data: global data and local data. Global constraints may only span global data, and local transactions may not write to global data. Use two-level serializability (2LSR): local-SR and global-SR. Artificial solution: local site has no autonomy over or direct-access to global data; local site must submit transactions to GTM to update global data stored at the local site => master-slave relationship. ©2003 Vera Goebel & Denise Ecklund ©2002 Vera Goebel & Denise Ecklund HDBMS-TM-8 III–4 Alternative Consistency Models Global Serializability Global & Atomicity Serializability Durability Distrbuted Deadlock Non-constraint-based strategies • Diverge from strong correctness and serializability 1) Epsilon Serializability • Allows a specified number of nonserializable conflicts 2) Sets of Compatible Transactions • Assume a set of known transactions • Pre-analyze the transactions for conflicts • Group non-conficting transactions into compatible sets • Not CC control required among transactions in a compatible set ©2003 Vera Goebel & Denise Ecklund HDBMS-TM-9 Global Atomicity and Recovery Problem Global Serializability Atomicity & Durability Distrbuted Deadlock • The GTM must guarantee that a global transaction commits at all sites or aborts at all sites • Local DBMSs wish to preserve their execution autonomy – May not implement or export a “prepare-to-commit” interface GT1 GST11 GST12 GTM 2PC GTM Proxy Abort GST11 No 2PC LDBMS 2PC GTM Proxy No 2PC Commit GST12 LDBMS • A local DBMS can unilaterally abort a subtransaction anytime – Results in non-atomic global transactions and incorrect global schedules – Local transactions and global subtransactions see committed partial results Note: The first heterogeneous systems did not support update transactions! ©2003 Vera Goebel & Denise Ecklund ©2002 Vera Goebel & Denise Ecklund HDBMS-TM-10 III–5 Approaches to Achieve Atomicity and Durability Global Serializability Atomicity & Durability Distrbuted Deadlock • If all LDBMSs export a “prepare-to-commit” interface, then use 2PC between the proxy and the LDBMS • If some LDBMSs do not export “prepare-to-commit”, then four approaches: 1) Modify each global subtransaction to “callback to the proxy” just before local commit • Blocks the global subtransaction until GTM completes 2PC with proxies GTM 2PC GTM Proxy No 2PC • Possibly only if the LDBMS supports a client callback service LDBMS • Fails if the LDBMS uses optimistic concurrency control ©2003 Vera Goebel & Denise Ecklund HDBMS-TM-11 Approaches to Achieve Atomicity and Durability • If any global subtransaction aborts 2) REDO failed write operations from global subtransactions Global Serializability Atomicity & Durability Distrbuted Deadlock - Performed by the proxy, who must maintain a local redo log 3) RETRY failed global subtransactions (read & write operations) - Performed by the proxy - Inappropriate semantics for many applications or transactions - No guarantee that the retry can ever be committed Ex: Banking application – withdrawing money can fail ”forever” 4) UNDO committed global subtransactions by executing compensating transactions - Performed by the GTM - Can provide semantic atomicity (called a saga) ©2003 Vera Goebel & Denise Ecklund ©2002 Vera Goebel & Denise Ecklund Inconsistent data is temporarily visible to other transactions! HDBMS-TM-12 III–6 Global Deadlock Problem • Same problem as in distributed homogeneous DBMSs Site X waits for T1 x to release Lx T1 x holds lock Lx T1 x needs a waits for T1 y to complete T1 y Site Y holds lock Lb waits for T2 y to release Ly holds lock La T2 x Global Serializability Atomicity & Durability Distrbuted Deadlock T2 y needs b waits for T2 x to complete T2 y holds lock Ly • We solved the problem by exchanging lock information to construct the global “waits-for” graph – This violates design autonomy and communication autonomy • Therefore the GTM will be unaware of a global deadlock. • There are no complete solutions to the global deadlock problem for autonomous multi-database systems. ©2003 Vera Goebel & Denise Ecklund HDBMS-TM-13 Status: Transaction Management for HDBS • Transaction management for HDBSs is a very active research area. • Distributed transactions over the Internet define new semantics for transaction consistency, allowing development of new solutions. Open issues: • What can be done if some of the local subsystems (e.g., file systems) do not support transaction management? • Performance implications of transaction management strategy? • Handling of different degrees of consistency? ©2003 Vera Goebel & Denise Ecklund ©2002 Vera Goebel & Denise Ecklund HDBMS-TM-14 III–7 Conclusions HDBS allows a uniform view on the combination of data maintained by different autonomous database systems. • available: prototypes & commercial products with a set of fixed / specific drivers (so-called gateways) for existing, widely used data management systems (conventional DBS and file systems) • missing: systematic support for individual integration of arbitrary data management systems – Examples: geographical DBs, multimedia DBs, Internet storefronts, etc. ©2003 Vera Goebel & Denise Ecklund ©2002 Vera Goebel & Denise Ecklund HDBMS-TM-15 III–8