UNIT III TRANSACTION PROCESSING AND CONCURRENCY CONTROL Introduction-Properties of Transaction- Serializability- Concurrency Control – Locking Mechanisms- Two Phase Commit Protocol-Dead lock. 1. TRANSACTION CONCEPTS A Transaction is a unit of program execution that accesses and possibly updates various data items. A transaction is initiated by a user program written in a high level data manipulation language or programming language(For example, SQL, C++, Java) where it is delimited by statements (or function calls) of the form begin transaction and end transaction. The transaction consists of all operations executed between the begin transaction and end transaction. To ensure integrity of the data, require that the database system maintain the following properties of the transactions. a. Atomicity b. Consistency c. Isolation d. Durability These properties are called ACID properties. Transactions access data using two operations. Read(X): This transfer the data item X from the database to a local buffer belonging to the transaction that executed the read operation. Write(X): This transfer the data item X from the local buffer of the transaction that executed the write back to the database. Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be defined as Ti: read (A) A: =A-50; Write (A); Read (B); B: =B+50; Write (B); Transaction management guarantees a correct transaction and maintains the database in a correct state. It guarantees that if the transaction executes some updates and then a failure occurs before the transaction reaches its planned termination, then those updates will be undone. Thus the transaction either executes entirely or totally cancelled. The system component that provides this atomicity is called transaction manager or transaction processing monitor or TP monitor. ROLLBACK and COMMIT are key to the way it works. 1. COMMIT: The COMMIT operation signals successful end of transaction. It tells the transaction manager that a logical unit of work has been successfully completed and database is in correct state and the updates can be recorded or saved. 2. ROLLBACK: By contrast, the ROLLBACK operation signals unsuccessful end of transaction. It tells the transaction manager that something has gone wrong, the database might be in incorrect state and all the updates made by the transaction should be undone. TRANSACTION PROPERTIES ACID stands for Atomicity, Correctness, Isolation and Durability. Atomicity: Transactions are atomic. Consider the following example Transaction to transfer $50 from account A to account B: read(A) A := A – 50 write(A) read(B) B := B + 50 write(B) read(X), which transfers the data item X from the database to a local buffer belonging to the transaction that executed the read operation. write(X), which transfers the data item X from the local buffer of the transaction that executed the write back to the database. Before the execution of transaction Ti the values of accounts A and B are $1000 and $2000, respectively. Suppose if the transaction fails due to some power failure, hardware failure and system error the transaction Ti will not execute successfully. If the failure happens after the write(A) operation but before the write(B) operation. The database will have values $950 and $2000 which results in a failure. The system destroys $50 as a result of failure and leads the system to inconsistent state. The basic idea of atomicity is: The database system keeps track of the old values of any data on which a transaction performs a write, if the transaction does not terminate successfully then the database system restores the old values. Atomicity is handled by transaction-management component. Consistency: Transactions transform a correct state of the database into another correct state, without necessarily preserving correctness at all intermediate points. In example, the transaction is in consistent state if the sum of A and B is unchanged by the execution of transaction. If the database is consistent before an execution of the transaction, the database remains consistent after the execution of the transaction. Ensuring consistency for an individual transaction is the responsibility of the application programmer who codes the transaction. Isolation: Transactions are isolated from one another. Even though there are many transactions running concurrently, any given transaction‘s updates are concealed from all the rest, until that transaction commits. The database will be temporarily inconsistent while the transaction is in progress. When the amount is reduced from A and not yet incremented to B. the database will be inconsistent. If a second concurrently running transaction reads A and B at this intermediate point and computes A+B, it will observe an inconsistent value. If the second transaction performs updates on A and B based on the inconsistent values that it read, the database will remain inconsistent even after both transactions are completed. In order to avoid this problem serial execution of transaction is preferred. Concurrency control component maintain isolation of transaction. Durability: Once a transaction commits, its updates persist in the database, even if there is a subsequent system crash. The computer system failure may lead to loss of data in main memory, but data written to disk are not lost. Durability is guaranteed by ensuring the following. The updates carried out by the transaction should be written to the disk. Information stored in the disk should be sufficient to enable the database to reconstruct the updates when the database system restarts after failure. Recovery management component is responsible for ensuring durability. TRANSACTION STATES A transaction must be in one of the following states: 1. 2. 3. 4. 5. Active: The initial state; the transaction stays in this state while it is executing. Partially committed: After the final statement has been executed. Failed: After the discovery that normal execution can no longer proceed. Aborted: After the transaction has been rolled back and the database has been restored to its state prior to the start of the transaction Committed: After successful completion A transaction has committed only if it has entered the committed state. Similarly, we say that a transaction has aborted only if it has entered the aborted state. A transaction is said to have terminated if has either committed or aborted. A transaction starts in the active state. When it finishes its final statement, it enters the partially committed state. At this point, the transaction has completed its execution, but it is still possible that it may have to be aborted, since the actual output may still be temporarily residing in main memory, and thus a hardware failure may preclude its successful completion. The database system then writes out enough information to disk that, even in the event of a failure, the updates performed by the transaction can be re-created when the system restarts after the failure. When the last of this information is written out, the transaction enters the committed state. A transaction enters the failed state after the system determines that the transaction can no longer proceed with its normal execution (for example, because of hardware or logical errors). Such a transaction must be rolled back. Then, it enters the aborted state. At this point, the system has two options: It can restart the transaction, but only if the transaction was aborted as a result of some hardware or software error that was not created through the internal logic of the transaction. A restarted transaction is considered to be a new transaction. It can kill the transaction. It usually does so because of some internal logical error that can be corrected only by rewriting the application program, or because the input was bad, or because the desired data were not found in the database. SERIALIZABILITY Basic Assumption – Each transaction preserves database consistency. Thus serial execution of a set of transactions preserves database consistency. A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of: 1.conflict serializability 2.view serializability Simplified view of transactions We ignore operations other than read and write instructions We assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. Our simplified schedules consist of only read and write instructions. Conflicting Instructions: *Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q. 1. li = read(Q), lj = read(Q). li and lj don’t conflict. 2. li = read(Q), lj = write(Q). They conflict. 3. li = write(Q), lj = read(Q). They conflict 4. li = write(Q), lj = write(Q). They conflict *Intuitively, a conflict between li and lj forces a (logical) temporal order between them. l If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule. Conflict Serializability: *If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent. *We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule. View Serializability: *Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following three conditions are met, for each data item Q, 1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also transaction Ti must read the initial value of Q. 2. If in schedule S transaction Ti executes read(Q), and that value was produced by transaction Tj (if any), then in schedule S’ also transaction Ti must read the value of Q that was produced by the same write(Q) operation of transaction Tj . 3. The transaction (if any) that performs the final write(Q) operation in schedule S must also perform the final write(Q) operation in schedule S’. *As can be seen, view equivalence is also based purely on reads and writes alone. Testing for Serializability: Consider some schedule of a set of transactions T1, T2, ..., Tn Precedence graph — a direct graph where the vertices are the transactions (names). We draw an arc from Ti to Tj if the two transaction conflict, and Ti accessed the data item on which the conflict arose earlier. We may label the arc by the item that was accessed. Example 1 Test for Conflict Serializability: A schedule is conflict serializable if and only if its precedence graph is acyclic. Cycle-detection algorithms exist which take order n2 time, where0is the number of vertices in the graph. l (Better algorithms take order0+ e where e is the number of edges.) If precedence graph is acyclic, the serializability order can be obtained by a topological sorting of the graph. This is a linear order consistent with the partial order of the graph. For example, a serializability order for Schedule A would be T5 T1 T3 T2 T4 Test for View Serializability: The precedence graph test for conflict serializability cannot be used directly to test for view serializability. Extension to test for view serializability has cost exponential in the size of the precedence graph. The problem of checking if a schedule is view serializable falls in the class of NPcomplete problems. Thus existence of an efficient algorithm is extremely unlikely. However practical algorithms that just check some sufficient conditions for view serializability can still be used. CONCURRENCY Concurrency in terms of databases means allowing multiple users to access the data contained within a database at the same time. If concurrent access is not managed by the Database Management System (DBMS) so that simultaneous operations don't interfere with one another problems can occur when various transactions interleave, resulting in an inconsistent database. Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions. Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins. Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the CPU humming by working on several user programs concurrently. Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. DBMS ensures such problems don’t arise: users can pretend they are using a single-user system. Purpose of Concurrency Control To enforce Isolation (through mutual exclusion) among conflicting transactions. To preserve database consistency through consistency preserving execution of transactions. To resolve read-write and write-write conflicts. Example: ----In concurrent execution environment if T1 conflicts with T2 over a data item A, then the existing concurrency control decides if T1 or T2 should get the A and if the other transaction is rolled-back or waits. LOCKING PROTOCOLS One way to ensure serializability is to require that data items be accessed in a mutually exclusive manner; that is, while one transaction is accessing a data item, no other transaction can modify that data item. The most common method used to implement this requirement is to allow a transaction to access a data item only if it is currently holding a lock on that item. A locking protocol is a set of rules to be followed by each transaction to ensure that, even though the actions of several transactions might be interleaved; the net effect is identical to executing all transactions in some serial order. LOCKS Lock is a variable associated with data item which gives the status whether the possible operations can be applied on it or not. There are various modes in which a data item may be locked. The two modes are 1. Shared It is denoted by ‘S’. If a transaction T1 has obtained a shared mode lock on item P, then T1 can read, but cannot write P. 2. Exclusive It is denoted by X. If a transaction T2 has obtained an exclusive mode lock on item P, then T2 can have both read and write P. Requesting a Lock: Every transaction request a lock on data item P, depending on the types of operations that it will perform on P. The request is made to the ‘Concurrency Control Manager’. The transaction can precede the operation only after the concurrency control manager ‘grants’ the lock to the transaction. Lock Compatibility Matrix ‘comp’: The compatibility relation between 2 modes of locks exclusive (X) and shared (S) is given by the matrix ‘comp’. Comp (A,B) S X S True False X False False An element comp (A, B) of the matrix has the value ‘True’ if and only if A is in shared mode and B is also in shared mode. The shared mode is compatible with shared mode, but not with exclusive mode. a. Lock-S (Q): A transaction requests a shared lock on data item Q by executing this instruction. b. Lock-X (Q): A transaction requests an exclusive lock on data item Q by executing this instruction. c. Unlock (Q): A data item Q can be unlocked with this instruction. To access a data item, any transaction Ti must first lock that item. In any transaction, a data item is unlocked immediately after its final access of the data item. Example: T1 transfers Rs.50/- from account B to account A. T2 displays the total amount of money in A & B. T1: lock-X (B); Read (B); T2: lock-S (A); Read (A); B: =B-50; Write (B); Unlock (B); Lock-X (A); Unlock (A); Lock-S (B); Read (B); Read (A); A: =A+50; Unlock (B); Write (A); Display (A+B); Unlock (A); The transaction making a lock request cannot execute its next action until the concurrency control manager grants the lock. Hence, the lock must be granted in the interval of time between the lock request operations. LOCKING TECHNIQUES A lock is a mechanism to control concurrent access to a data item Data items can be locked in two modes : 1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X instruction. 2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction. Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted. Pitfalls of Lock-Based Protocols: The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil. Starvation is also possible if concurrency control manager is badly designed. For example: 1.A transaction may be waiting for an X-lock on an item, while a sequence of other transactions request and are granted an S-lock on the same item. 2.The same transaction is repeatedly rolled back due to deadlocks. Concurrency control manager can be designed to prevent starvation. The Two-Phase Locking Protocol: This is a protocol which ensures conflict-serializable schedules. Phase 1: Growing Phase o transaction may obtain locks o transaction may not release locks Phase 2: Shrinking Phase transaction may release locks transaction may not obtain locks The protocol assures serializability. It can be proved that the transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock). Two-phase locking does not ensure freedom from deadlocks. Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol called strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts. Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit. Lock Conversions: Two-phase locking with lock conversions: – First Phase: o can acquire a lock-S on item o can acquire a lock-X on item o can convert a lock-S to a lock-X (upgrade) – Second Phase: o can release a lock-S o can release a lock-X o can convert a lock-X to a lock-S (downgrade) This protocol assures serializability. But still relies on the programmer to insert the various locking instructions. Implementation of Locking: A lock manager can be implemented as a separate process to which transactions send lock and unlock requests. The lock manager replies to a lock request by sending a lock grant messages (or a message asking the transaction to roll back, in case of a deadlock). The requesting transaction waits until its request is answered. The lock manager maintains a data-structure called a lock table to record granted locks and pending requests. The lock table is usually implemented as an in-memory hash table indexed on the name of the data item being locked. TWO PHASE COMMIT Two-phase commit is important whenever a given transaction can interact with several independent “resource managers”. Example: Consider a transaction running on an IBM mainframe that updates both an IMS database and a DB2 database. If the transaction completes successfully, then both IMS data and DB2 data are committed. Conversely, if the transaction fails, then both the updates must be rolled back. It is not possible to commit one database update and rollback the other. If done so the atomicity will not be maintained in the system. Therefore, the transaction issues a single “global” or system-wide COMMIT or ROLLBACK. That COMMIT or ROLLBACK is handled by a system component called the coordinator. Coordinators task is to guarantee the resource managers commit or roll back. It should also guarantee even if the system fails in the middle of the process. The two-phase commit protocol is responsible for maintaining such a guarantee. WORKING Assume that the transaction has completed and a COMMIT is issued. On receiving the COMMIT request, the coordinator goes through the following two-phase process: Prepare: 1. The resource manager should get ready to “go either way” on the transaction. 2. The participant in the transaction should record all updates performed during the transaction from temporary storage to permanent storage. 3. In order to perform either COMMIT or ROLLBACK as necessary. 4. Resource manager now replies ”OK” to the coordinator or “NOT OK” based on the write operation. Commit: 1. When the coordinator has received replies from all participants, it takes a decision regarding the transaction and records it in the physical log. 2. If all replies were ”OK” that the decision is “commit”; if any reply was “Not OK” the decision is “rollback” 3. The coordinator informs its decision to all the participants. 4. Each participant must then commit or roll back the transaction locally, as instructed by the coordinator. If the system fails at some point during the process, the restart procedure looks for the decision of the coordinator. If the decision is found then the two phase commit can start processing from where it has left off. If the decision is not found then it assumes that the decision is ROLLBACK and the process can complete appropriately. If the participants are from several systems like in distributed system, then some participants should wait for long time for the coordinators decision. Data communication manager (DC manager) can act as a resource manager in case of a twophase commit process. DEADLOCK Deadlock Handling: n Consider the following two transactions: T1: write (X) T2: write(Y) write(Y) write(X) n Schedule with deadlock T1 T2 lock-X on X write (X) lock-X on Y write (X) wait for lock-X on X wait for lock-X on Y System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another transaction in the set. Deadlock prevention protocols ensure that the system will never enter into a deadlock state. Some prevention strategies : Require that each transaction locks all its data items before it begins execution (predeclaration). Impose partial ordering of all data items and require that a transaction can lock data items only in the order specified by the partial order (graph-based protocol). More Deadlock Prevention Strategies: Following schemes use transaction timestamps for the sake of deadlock prevention alone. wait-die scheme — non-preemptive o older transaction may wait for younger one to release data item. Younger transactions never wait for older ones; they are rolled back instead. o a transaction may die several times before acquiring needed data item wound-wait scheme — preemptive o older transaction wounds (forces rollback) of younger transaction instead of waiting for it. Younger transactions may wait for older ones. o may be fewer rollbacks than wait-die scheme. Deadlock prevention: Both in wait-die and in wound-wait schemes, a rolled back transactions is restarted with its original timestamp. Older transactions thus have precedence over newer ones, and starvation is hence avoided. Timeout-Based Schemes : o a transaction waits for a lock only for a specified amount of time. After that, the wait times out and the transaction is rolled back. o thus deadlocks are not possible. o simple to implement; but starvation is possible. Also difficult to determine good value of the timeout interval. Deadlock Detection: Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E), o V is a set of vertices (all the transactions in the system) o E is a set of edges; each element is an ordered pair Ti Tj. If Ti Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is waiting for Tj to release a data item. When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted in the wait-for graph. This edge is removed only when Tj is no longer holding a data item needed by Ti. The system is in a deadlock state if and only if the wait-for graph has a cycle. Must invoke a deadlock-detection algorithm periodically to look for cycles. Deadlock Recovery: When deadlock is detected : o Some transaction will have to rolled back (made a victim) to break deadlock. Select that transaction as victim that will incur minimum cost. o Rollback -- determine how far to roll back transaction Total rollback: Abort the transaction and then restart it. More effective to roll back transaction only as far as necessary to break deadlock. o Starvation happens if same transaction is always chosen as victim. Include the number of rollbacks in the cost factor to avoid starvation