Quick Review of Apr 29 material • Transformation of Relational Expressions – Equivalence Rules • Transactions – ACID properties (Atomic, Consistent, Isolated, Durable) – Transaction States (active, partially committed, failed, committed, aborted) – Concurrent Execution and Serializability Serial and Interleaved Schedules Serial schedule Interleaved schedule Another Interleaved Schedule The previous serial and interleaved schedules ensured database consistency (A+B before execution = A+B after execution) The interleaved schedule on the right is only slightly different, but does not ensure a consistent result. – assume A=1000 and B=2000 before (sum = 3000) – after execution, A=950 and B=2150 (sum = 3100) Inconsistent Transaction Schedules • So what caused the problem? What makes one concurrent schedule consistent, and another one inconsistent? – Operations on data within a transaction are not relevant, as they are run on copies of the data residing in local buffers of the transaction – For scheduling purposes, the only significant operations are read and write • Given two transactions, Ti and Tj, both attempting to access data item Q: – – – – if Ti and Tj are both executing read(Q) statements, order does not matter if Ti is doing write(Q), and Tj read(Q), then order does matter same if Tj is writing and Ti reading if both Ti and Tj are executing write(Q), then order might matter if there are any subsequent operations in Ti or Tj accessing Q Transaction Conflicts • Two operations on the same data item by different transactions are said to be in conflict if at least one of the operations is a write • If two consecutive operations of different transactions in a schedule S are not in conflict, then we can swap the two to produce another schedule S’ that is conflict equivalent with S • A schedule S is serializable if it is conflict equivalent (after some series of swaps) to a serial schedule. Transaction Conflict Example Example: read/write(B) in T0 do not conflict with read/write(A) in T1 T0 read(A) write(A) T1 read(A) read(B) write(A) write(B) read(B) write(B) T0 read(A) write(A) read(B) write(B) T1 read(A) write(A) read(B) write(B) Serializability Testing (15.9) and Precedence Graphs • So we need a simple method to test a schedule S and discover whether it is serializable. • Simple method involves constructing a directed graph called a Precedence Graph from S • Construct a precedence graph as follows: – a vertex labelled Ti for every transaction in S – an edge from Ti to Tj if any of these three conditions holds: • Ti executes write(Q) before Tj executes read(Q) • Ti executes read(Q) before Tj executes write(Q) • Ti executes write(Q) before Tj executes write(Q) – if the graph has a cycle, S is not serializable Precedence Graph Example 1 • Compute a precedence graph for schedule B (right) • three vertices (T1, T2, T3) • edge from Ti to Tj if – Ti writes Q before Tj reads Q – Ti reads Q before Tj writes Q – Ti writes Q before Tj writes Q Precedence Graph Example 1 • Compute a precedence graph for schedule B (right) • three vertices (T1, T2, T3) • edge from Ti to Tj if – Ti writes Q before Tj reads Q – Ti reads Q before Tj writes Q – Ti writes Q before Tj writes Q Precedence Graph Example 2 • Slightly more complicated example • Compute a precedence graph for schedule A (right) • three vertices (T1, T2, T3) • edge from Ti to Tj if – Ti writes Q before Tj reads Q – Ti reads Q before Tj writes Q – Ti writes Q before Tj writes Q Precedence Graph Example 2 • Slightly more complicated example • Compute a precedence graph for schedule A (right) Concurrency Control • So now we can recognize when a schedule is serializable. In practice, it is often difficult and inefficient to determine a schedule in advance, much less examine it for serializability. • Lock-based protocols are a common system used to prevent transaction conflicts on the fly (i.e., without knowing what operations are coming later) • Basic concept is simple: to prevent transaction conflict (two transactions working on the same data item with at least one of them writing), we implement a lock system -- a transaction may only access an item if it holds the lock on that item. Lock-based Protocols • We recognize two modes of locks: – shared: if Ti has a shared-mode (“S”) lock on data item Q then Ti may read, but not write, Q. – exclusive: if Ti has an exclusive-mode (“X”) lock on Q then Ti can both read and write Q. • Transactions be granted a lock before accessing data – A concurrency-control manager handles granting of locks – Multiple S locks are permitted on a single data item, but only one X lock – this allows multiple reads (which don’t create serializability conflicts) but prevents any R/W, W/R, or W/W interactions (which create conflicts) Lock-based Protocols • Transactions must request a lock before accessing a data item; they may release the lock at any time when they no longer need access • If the concurrency-control manager does not grant a requested lock, the transaction must wait until the data item becomes available later on. • Unfortunately, this can lead to a situation called a deadlock – suppose T1 holds a lock on item R and requests a lock on Q, but transaction T2 holds an exclusive lock on Q. So T1 waits. Then T2 gets to where it requests a lock on R (still held by waiting T1). Now both transactions are waiting for each other. Deadlock. Deadlocks • To detect a deadlock situation we use a wait-for graph – one node for each transaction – directed edge Ti --> Tj if Ti is waiting for a resource locked by Tj • a cycle in the wait-for graph implies a deadlock. – The system checks periodically for deadlocks – If a deadlock exists, one of the nodes in the cycle must be aborted • 95% of deadlocks are between two transactions • deadlocks are a necessary evil – preferable to allowing the database to become inconsistent – deadlocks can be rolled back; inconsistent data is much worse Two-phase Locking Protocol • A locking protocol is a set of rules for placing and releasing locks – a protocol restricts the number of possible schedules – lock-based protocols are pessimistic • Two-phase locking is (by far) the most common protocol – growing phase: a transaction may only obtain locks (never release any of its locks) – shrinking phase: a transaction may only release locks (never obtain any new locks) Two-phase with Lock Conversion • Two-phase with lock conversion: – S can be upgraded to X during the growing phase – X can be downgraded to S during the shrinking phase (this only works if the transaction has already written any changed data value with an X lock, of course) – The idea here is that during the growing phase, instead of holding on X on an item that it doesn’t need to write yet, to hold an S lock on it instead (allowing other transactions to read the old value for longer) until the point where modifications to the old value begin. – Similarly in the shrink phase, once a transaction downgrades an X lock, other transactions can begin reading the new value earlier. Variants on Two-phase Locking • Strict two-phase locking – additionally requires that all X locks are held until commit time – prevents any other transactions from seeing uncommitted data – viewing uncommitted data can lead to cascading rollbacks • Rigorous two-phase locking – requires that ALL locks are held until commit time