CSL 771: Database Implementation Transaction Processing Maya Ramanath All material (including figures) from: Concurrency Control and Recovery in Database Systems Phil Bernstein, Vassos Hadzilacos and Nathan Goodman (http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx) Transactions • Interaction with the DBMS through SQL update Airlines set price = price - price*0.1, status = “cheap” where price < 5000 A transaction is a unit of interaction ACID Properties • • • • Atomicity Consistency Isolation Durability Database system must ensure ACID properties Atomicity and Consistency • Single transaction – Execution of a transaction: “all-or-nothing” Either a transaction completes in its entirety Or it “does not even start” – As if the transaction never existed – No partial effect must be visible 2 outcomes: A transaction COMMITs or ABORTs Consistency and Isolation • Multiple transactions – Concurrent execution can cause an inconsistent database state – Each transaction executed as if isolated from the others Durability • If a transaction commits the effects are permanent • But, durability has a bigger scope – Catastrophic failures (floods, fires, earthquakes) What we will study… • Concurrency Control – Ensuring atomicity, consistency and isolation when multiple transactions are executed concurrently • Recovery – Ensuring durability and consistency in case of software/hardware failures Terminology • Data item – A tuple, table, block • Read (x) • Write (x, 5) • • • • Start (T) Commit (T) Abort (T) Active Transaction – A transaction which has neither committed nor aborted High level model Transaction 1 Transaction 2 Transaction n Transaction Manager Scheduler Recovery Manager Disk Cache Manager Recoverability (1/2) • Transaction T Aborts – T wrote some data items – T’ read items that T wrote T Read (x) Write (x, k) Read (y) • DBMS has to… – Undo the effects of T – Undo effects of T’ – But, T’ has already committed T’ Read (x) Write (y, k’) Commit Abort Recoverability (2/2) • Let T1,…,Tn be a set of transactions • Ti reads a value written by Tk, k < i • An execution of transactions is recoverable if Ti commits after all Tk commit T1 T2 Write (x,2) T1 T2 Write (x,2) Read (x) Read (x) Write (y,2) Write (y,2) Commit Commit Commit Cascading Aborts (1/2) • Because T was aborted, T1,…, Tk also have to be aborted T T’ T’’ Read (x) Write (x, k) Read (y) Read (x) Write (y, k’) Abort Read (y) Cascading Aborts (2/2) • Recoverable executions do not prevent cascading aborts • How can we prevent them then ? T1 T2 Write (x,2) T1 T2 Write (x,2) Read (x) Write (y,2) Commit Commit Commit Read (x) Write (y,2) Commit What we learnt so far… Reading a value, committing a transaction Not recoverable T1 T2 Write (x,2) Recoverable with cascading aborts T1 T2 Write (x,2) T1 T2 Write (x,2) Read (x) Read (x) Write (y,2) Write (y,2) Commit Recoverable without cascading aborts Commit Commit Commit Read (x) Write (y,2) Commit Strict Schedule (1/2) • “Undo”-ing the effects of a transaction – Restore the before image of the data item T1 T2 Write (x,1) T1 Write (y,3) Equivalent to Write (y,1) Commit Write (x,1) Write (y,3) Commit Read (x) Abort T2 Final value of y: 3 Strict Schedule (2/2) Initial value of x: 1 T1 T2 Write (x,2) T1 T2 Abort Write (x,3) Abort Abort Abort Should x be restored to 1 or 3? T2 Write (x,2) Write (x,2) Write (x,3) T1 Write (x,3) T1 restores x to 3? T2 restores x to 2? Do not read or write a value which has been written by an active transaction until that transaction has committed or aborted The Lost Update Problem T1 T2 Read (x) Read (x) Write (x, 200,000) Commit Write (x, 200) Commit Assume x is your account balance Serializable Schedules • Serial schedule – Simply execute transactions one after the other • A serializable schedule is one which equivalent to some serial schedule SERIALIZABILITY THEORY Serializable Schedules T1: op11, op12, op13 T2: op21, op22, op23, op24 • Serial schedule – Simply execute transactions one after the other op11, op12, op13 op21, op22, op23, op24 op21, op22, op23, op24 op11, op12, op13 • Serializable schedule – Interleave operations – Ensure end result is equivalent to some serial schedule Notation r1[x] = Transaction 1, Read (x) w1[x] = Transaction 1, Write (x) c1 = Transaction 1, Commit a1= Transaction 1, Abort r1[x], r1[y], w2[x], r2[y], c1, c2 Histories (1/3) • Operations of transaction T can be represented by a partial order. r1[x] w1[z] r1[y] c1 Histories (2/3) • Conflicting operations – Of two ops operating on the same data item, if one of them is a write, then the ops conflict – An order has to be specified for conflicting operations Histories (3/3) • Complete History T1 = r1[x] ® w1[x] ® c1 T2 = r2 [x] ® w2 [y] ® w2 [x] ® c2 T3 = r3[y] ® w3[x] ® w3[y] ® w4 [z] ® c4 A complete history over T = {T1,T2 ,T3 } r2 [x] ® w2 [y] ® w2 [x] ® c2 ­ ­ H1 = r3[y] ® w3[x] ® w3[y] ® w3[z] ® c3 ­ r1[x] ® w1[x] ® c1 Serializable Histories • The goal: Ensure that the interleaving operations guarantee a serializable history. • The method – When are two histories equivalent? – When is a history serial? Equivalence of Histories (1/2) H ≅ H’ if 1. they are defined over the same set of transactions and they have the same operations 2. they order conflicting operations the same way Equivalence of Histories (2/2) y Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman Serial History • A complete history is serial if for every pair of transactions Ti and Tk, – all operations of Ti occur before Tk OR – all operations of Tk occur before Ti • A history is serializable if its committed projection is equivalent to a serial history. Serialization Graph Let H be a history SG(H ) = (V, E) where V = {T1,...,Tn } and E = {(Ti ,Tj ) | $opi Î Ti which precedes op j Î Tj } r2 [x] ® w2 [y] ® w2 [x] ® c2 ­ ­ H1 = r3[y] ® w3[x] ® w3[y] ® w3[z] ® c3 ­ r1[x] ® w1[x] ® c1 T1 T3 T2 Serializability Theorem A history H is serializable if its serialization graph SG(H) is acyclic On your own How do recoverability, strict schedules, cascading aborts fit into the big picture? LOCKING High level model Transaction 1 Transaction 2 Transaction n Transaction Manager Scheduler Recovery Manager Disk Cache Manager Transaction Management Transaction Manager Transaction 1 Transaction 2 Transaction 3 . . . Transaction n • Receives Transactions • Sends operations to scheduler Read1(x) Write2(y,k) Read2(x) Commit1 Scheduler • Execute op • Reject op • Delay op Disk Locking • Each data item x has a lock associated with it • If T wants to access x – Scheduler first acquires a lock on x – Only one transaction can hold a lock on x • T releases the lock after processing Locking is used by the scheduler to ensure serializability Notation • Read lock and write lock rl[x], wl[x] • Obtaining read and write locks rli[x], wli[x] • Lock table – Entries of the form [x, r, Ti] • Conflicting locks – pli[x], qlk[y], x = y and p,q conflict • Unlock rui[x], wui[x] Basic 2-Phase Locking (2PL) RULE 1 Receive pi[x] is qlk[x] set such that p and q conflict? YES pi[x] delayed RULE 2 pli[x] cannot be released until pi[x] is completed NO Acquire pli[x] RULE 3 (2 Phase Rule) pi[x] scheduled Once a lock is released no other locks may be obtained. The 2-phase rule Once a lock is released no other locks may be obtained. T T T1: r1[x] w1[y] c1 T2: w2[x] w2[y] c2 1 2 H = rl1[x] r1[x] ru1[x] wl2[x] w2[x] wl2[y] w2[y] wu2[x] wu2[y] c2 wl1[y] w1[y] wu1[y] c1 Correctness of 2PL 2PL always produces serializable histories Proof outline STEP 1: Characterize properties of the scheduler STEP 2: Prove that any history with these properties is serializable (That is, SG(H) is acyclic) Deadlocks (1/2) T1: r1[x] w1[y] c1 T2: w2[y] w2[x] c2 Scheduler rl1[x] wl2[y] r1[x] w2[y] <cannot proceed> Deadlocks (2/2) Strategies to deal with deadlocks • Timeouts – Leads to inefficiency • Detecting deadlocks – Maintain a wait-for graph, cycle indicates deadlock – Once a deadlock is detected, break the cycle by aborting a transaction • New problem: Starvation Conservative 2PL • Avoids deadlocks altogether – T declares its readset and writeset – Scheduler tries to acquire all required locks – If not all locks can be acquired, T waits in a queue • T never “starts” until all locks are acquired – Therefore, it can never be involved in a deadlock On your own Strict 2PL (2PL which ensures only strict schedules) Extra Information • Assumption: Data items are organized in a tree Can we come up with a better (more efficient) protocol? Tree Locking Protocol (1/3) RULE 1 Receive ai[x] is alk[x] ? NO RULE 2 if x is an intermediate node, and y is a parent of x, the ali[x] is possible only if ali[y] RULE 2 RULE 3 YES pi[x] scheduled ali[x] cannot be released until ai[x] is completed ai[x] delayed RULE 4 Once a lock is released the same lock may not be re-obtained. Tree Locking Protocol (2/3) • Proposition: If Ti locks x before Tk, then for every v which is a descendant of x, if both Ti and Tk lock v, then Ti locks v before Tk. • Theorem: Tree Locking Protocol always produces Serializable Schedules Tree Locking Protocol (3/3) • Tree Locking Protocol avoids deadlock • Releases locks earlier than 2PL BUT • Needs to know the access pattern to be effective • Transactions should access nodes from root-toleaf Multi-granularity Locking (1/3) • Granularity – Refers to the relative size of the data item – Attribute, tuple, table, page, file, etc. • Efficiency depends on granularity of locking • Allow transactions to lock at different granularities Multi-granularity Locking (2/3) • Lock Instance Graph • Explicit and Implicit Locks • Intention read and intention write locks • Intention locks conflict with explicit read and write locks but not with other intention locks Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman Multi-granularity Locking (3/3) • To set rli[x] or irli[x], first hold irli[y] or iwli[y], such that y is the parent of x. • To set wli[x] or iwli[x], first hold iwli[y], such that y is the parent of x. • To schedule ri[x] (or wi[x]), Ti must hold rli[y] (or wli[y]) where y = x, or y is an ancestor of x. • To release irli[x] (or iwli[x]) no child of x can be locked by Ti The Phantom Problem • How to lock a tuple, which (currently) does not exist? T1: r1[x1], r1[x2], r1[X], c1 T2: w[x3], w[X], c2 rl1[x1], r1[x1], rl1[x2], r1[x2], wl2[x3], wl[X], w2[x3], wu2[x3,X], c2, rl1[X], ru1[x1,x2,X], c1 NON-LOCK-BASED SCHEDULERS Timestamp Ordering (1/3) • Each transaction is associated with a timestamp – Ti indicates Transaction T with timestamp i. • Each operation in the transaction has the same timestamp Timestamp Ordering (2/3) TO Rule If pi[x] and qk[x] are conflicting operations, then pi[x] is processed before qk[x] iff i < k Theorem: If H is a history representing an execution produced by a TO scheduler, then H is serializable. Timestamp Ordering (3/3) • For each data item x, maintain: max-rt(x), max-wt(x), c(x) • Request ri[x] – Grant request if TS (i) >= max-wt (x) and c(x), update max-rt (x) – Delay if TS(i) > max-wt(x) and !c(x) – Else abort and restart Ti • Request wi[x] – Grant request if TS (i) >= max-wt (x) and TS (i) >= max-rt (x), update max-wt (x), set c(x) = false – Else abort and restart Ti ON YOUR OWN: Thomas write rule, actions taken when a transaction has to commit or abort Validation • Aggressively schedule all operations • Do not commit until the transaction is “validated” ON YOUR OWN Summary • Lock-based Schedulers – 2-Phase Locking – Tree Locking Protocol – Multi-granularity Locking – Locking in the presence of updates • Non-lock-based Schedulers – Timestamp Ordering – Validation-based Concurrency Control (on your own) SOURCE: Database System: The complete book. Garcia-Molina, Ullman and Widom RECOVERY Logging • Log the operations in the transaction(s) • Believe the log – Does the log say transaction T has committed? – Or does it say aborted? – Or has only a partial trace (implicit abort)? • In case of failures, reconstruct the DB from its log The basic setup Buffer Space for each transaction Buffer Space for data and log Transactions T1 LOG The Disk T2 T3 Tk Terminology • Data item: an element which can be read or written – tuple, relation, B+-tree index, etc Input x: fetch x from the disk to buffer Read x,t: read x into variable local variable t Write x,t: write value of t into x Output x: write x to disk Example update Airlines set price = price - price*0.1, status = “cheap” where price < 5000 Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP” Write y, S Output P Output S System fails here System fails here System fails here Logs • Sequence of log records • Need to keep track of – Start of transaction – Update operations (Write operations) – End of transaction (COMMIT or ABORT) • “Believe” the log, use the log to reconstruct a consistent DB state Types of logs • Undo logs – Ensure that uncommitted transactions are rolled back (or undone) • Redo logs – Ensure that committed transactions are redone • Undo/Redo logs – Both of the above All 3 logging styles ensure atomicity and durability Undo Logging (1/3) • • • • <START T>: Start of transaction T <COMMIT T> <ABORT T> <T, A, x>: Transaction T modified A whose before-image is x. Undo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP” Write y, S FLUSH LOG Output P Output S FLUSH LOG <START T> <T, P, x> <T, S, y> <COMMIT T> U1: <T, X, v> should be flushed before Output X U2: <COMMIT T> should be flushed after all OUTPUTs Undo Logging (3/3) • Recovery with Undo log 1. If T has a <COMMIT T> entry, do nothing 2. If T has a <START T> entry, but no <COMMIT T> • • T is incomplete and needs to be undone Restore old values from <T,X,v> records • There may be multiple transactions – Start scanning from the end of the log Redo Logging (1/3) • All incomplete transactions can be ignored • Redo all completed transactions • <T, A, x>: Transaction T modified A whose after-image is x. Redo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP” Write y, S <START T> R1: <T, X, v> and <COMMIT T> should be flushed before Output X <T, P, x> <T, S, y> FLUSH LOG Output P Output S <COMMIT T> Write-ahead Logging Redo Logging (3/3) • Recovery with Redo Logging – If T has a <COMMIT T> entry, redo T – If T is incomplete, do nothing (add <ABORT T>) • For multiple transactions – Scan from the beginning of the log Undo/Redo Logging (1/3) • Undo logging: Cannot COMMIT T unless all updates are written to disk • Redo logging: Cannot release memory unless transaction commits • Undo/Redo logs attempt to strike a balance Undo/Redo Logging (2/3) <START T> Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP” Write y, S FLUSH LOG Output P Output S <T, P, x, a> <T, S, y, b> <COMMIT T> UR1: <T, X, a, b> should be flushed before Output X U1: <T, X, v> should be flushed before Output X U2: <COMMIT T> should be flushed after all OUTPUTs R1: <T, X, v> and <COMMIT T> should be flushed before Output X Undo/Redo Logging (3/3) • Recovery with Undo/Redo Logging – Redo all committed transactions (earliest-first) – Undo all uncommitted transactions (latest-first) What happens if there is a crash when you are writing a log? What happens if there is a crash during recovery? Checkpointing • Logs can be huge…can we throw away portions of it? • Can we avoid processing all of it when there is a crash? ON YOUR OWN