Transaction Manager Concurrency Control Recovery Management Transactions A transaction program is a unit of program execution that accesses and possibly updates various data items. [ A transaction program is a collection of operations that form a single unit of work.] Clearly, it is essential that all these operations occur, or that, in case of failure, none occur. A database system must ensure proper execution of transactions despite failures – either the entire transaction executes, or none of it does. Furthermore, it must manage concurrent execution of transactions in a way that avoids the introduction of inconsistency. Transaction Program Transaction is execution program coded with High-level data manipulation language to update contents of a database. DBMS must ensure that a transaction should always transform the database from one consistent state (before update) to the another (after update) , although we accept that consistency may be violated while the transaction is in progress. Database recovery is the process of restoring the database to a correct state following a failure. The failure may be the result of system crash due to hardware or software errors, a media failure, such as a head crash, or an application software error, such as logical error in the program that is accessing the database. It may also be the result of unintentional or intentional corruption of destruction of data or facilities by operators or users. Whatever the underlying cause of the failure, the DBMS must be able to recover from the failure and restore the database to a consistent state. Architecture of a TPS Application Notice of Event Transaction Keyed TPS Data TPS Program Event Response Response TPS Data Report(s) The event is recorded by keying it into the computer system as a transaction, which is a representation of the event. One or more TPS programs process the transaction against TPS data. The TPS program generates two types of output. It sends messages back to the user terminal, and it generates printed documents. Transaction State A transaction may not always complete its execution successfully. Such a transaction is termed aborted. If we are to ensure the atomicity property, an aborted transaction must have no effect on the state of the database. Thus, any changes that the aborted transaction made to the database must be undone. Once the changes caused by an aborted transaction have been undone, we say that the transaction has been rolled back. Partially committed Committed failed Aborted active Transactions access data using two operations: • read(X), which transfers the data item X from the database to a local buffer belonging to the transaction that executed the read operation. • write(X), which transfers the data item X from the local buffer of the transaction that executed the write back to the database. In a real database system, the write operation does not necessarily result in the immediate update of the data on the disk; the write operation may be temporarily stored in memory and executed on the disk later. For now, however, it is assumed that the write operation updates the database immediately. Transaction Concepts Usually, a transaction is initiated by a user program written in high-level DML or programming language, where it is delimited by statements (or function calls) of the form begin transaction and end transaction. The transaction consists of all operations executed between the begin transaction and end transaction. To ensure integrity of the data, we require that the database system maintain ACID properties of the transactions: ACID properties of Transaction ensured by DBMS Atomicity. Either all operations of the transaction are reflected properly in the database, or none are. Consistency. A transaction satisfies integrity constraints after completion. It must preserve the consistency of the database. Isolation. Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of transactions Ti and Tj, it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished. Thus each transaction is unaware of other transactions executing concurrently in the system. Durability. After a transaction completes successfully, the change it has made to the database persist, even if there are system failures. Atomicity: Because of the failure (power failures, hardware failures, and software errors), the state of the system no longer reflects a real state of the world that the database is supposed to capture. We term such a state an inconsistent state. We must ensure that such inconsistencies are not visible in a database system. [The system must be at some point be in a temporary inconsistent state, however, it is eventually replaced by the consistent state.] The basic idea behind ensuring atomicity is this: The database system keeps track (on disk) of the old values of any data on which a transaction performs a write, and, if the transaction does not complete its execution, the database system restores the old values to make it appear as though the transaction never executed. Ensuring atomicity is the responsibility of the database system itself; specifically, it is handled by a component called the transaction-management component. Consistency: Ensuring consistency for an individual transaction is the responsibility of the application programmer. This task may be facilitated by automatic testing of integrity constraints Isolation: Even if the consistency and atomicity properties are ensured for each transaction, if several transactions are executes concurrently, their operations may interleave in some undesirable way, resulting in an inconsistent state. A way to avoid the problem of concurrently executing transactions is to execute transaction serially – that is, one after the other. However, concurrent execution of transactions provides significant performance benefits. The isolation property of a transaction ensures that the concurrent execution of transactions results in a system state that is equivalent to state that could have been obtained had these transactions executed one at a time in some order. Ensuring the isolation property is the responsibility of a component of the database system called the concurrencycontrol component. Durability: We assume that a failure of the computer system may result in loss of data in the main memory, but data written to disk are never lost. DBMS can guarantee durability by ensuring that either : 1. The updates carried out by the transaction have been written to disk before the transaction completes. 2. Information about the updates carried out by the transaction and written to disk is sufficient to enable the database to reconstruct the updates when the database system is restarted after the failure. Ensuring durability is the responsibility of a component of the database system called the recovery-management component. Access manager Transaction Manager Scheduler Buffer manager Recovery manager File manager System manager Database and system catalog A transaction manager is software that monitors the behavior of transactions and decides whether each action can be allowed to execute. The transaction manager coordinates transactions on behalf of application programs. It communicates with the scheduler (sometimes referred to as the lock manager). This module is responsible for implementing a particular strategy for concurrency control. If a failure occurs during the transaction, then the database could be inconsistent. It is the task of the recovery manager to ensure that the database in consistent state. Finally, the buffer manager is responsible for the transfer of data between disk storage and main memory. Transaction Atomicity in a Single-Transaction System In a single-transaction system, only one transaction is execute at any time. If a transaction is active, no other transaction can start. This situation is the same as having one application connected to the database server at a time. To support atomicity, a database server must support operations to open a transaction, commit a transaction, and rollback a transaction by grouping one or more SQL commands together. If either command fails, transaction manager can roll back all commands, returning the data source to its original state. If all commands are successful, the transaction manager commits the changes and make them permanent. Concurrent Transaction Processing Concurrency arises when many applications are executing transactions at the same time. A single database server processes all operations, so only one database operation can be processed at a time. However, the operations of the transactions overlap because independent applications are requesting service by the database server in parallel. Schedule is a sequence of the operations by a set of concurrent tractions that preserves the order of the operations in each of the individual transactions. Clearly, a schedule for a set of transactions must consists of all instructions of those transactions, and must preserve the chronological order in which instructions appear in each individual transaction. A schedule can be serial or non-serial schedule. Each serial schedule consists of a sequence of instructions from various transactions, where the operations of each transaction are executed consecutively without any interleaved operations from other transactions. For a set of n transactions, there exist n! different valid serial schedules. When the database system executes several transactions concurrently, the corresponding schedule no longer needs to be serial. OS must perform a context switch (CPU time is shared) among all transactions which concurrently access to database. Several execution sequences are possible, since the various instructions from several transactions may now be interleaved. The number of possible schedules for a set of n transactions is much larger then n!. Schedule : A sequence of the operations by a set of concurrent transactions that preserves the order of the operations in each of the individual transactions. Serial schedule : A schedule where the operations of each transaction are executed consecutively without any interleaved operations from other transactions. T1: Read(BALx); BALx = BALx + 100; Write(BALx); Read(BALy); BALy = BALy – 100; Write(BALy); T1 T2: Read(BALx); BALx = BALx * 1.1; Write(BALx); Read(BALy); BALy = BALy * 1.1; Write(BALy); T2 Read(BALx); BALx = BALx + 100; Write(BALx); Read(BALy); BALy = BALy – 100; Write(BALy); Read(BALx); BALx = BALx * 1.1; Write(BALx); Read(BALy); BALy = BALy * 1.1; Write(BALy); Serial schedule : A schedule where the operations of each transaction are executed consecutively without any interleaved operations from other transactions. (T2 before T1) T1: Read(BALx); BALx = BALx + 100; Write(BALx); Read(BALy); BALy = BALy – 100; Write(BALy); T1 T2: Read(BALx); BALx = BALx * 1.1; Write(BALx); Read(BALy); BALy = BALy * 1.1; Write(BALy); T2 Read(BALx); BALx = BALx * 1.1; Write(BALx); Read(BALy); BALy = BALy * 1.1; Write(BALy); Read(BALx); BALx = BALx + 100; Write(BALx); Read(BALy); BALy = BALy – 100; Write(BALy); Nonserial schedule : are interleaved. T1: A schedule where the operations from a set of concurrent transactions Read(BALx); BALx = BALx + 100; Write(BALx); Read(BALy); BALy = BALy – 100; Write(BALy); T1 T2: Read(BALx); BALx = BALx * 1.1; Write(BALx); Read(BALy); BALy = BALy * 1.1; Write(BALy); T2 Read(BALx); BALx = BALx + 100; Read(BALx); BALx = BALx * 1.1; Write(BALx); Read(BALy); BALy = BALy * 1.1; Write(BALx); Read(BALy); BALy = BALy – 100; Write(BALy); Write(BALy); If several transactions run concurrently, and control of concurrent execution is left entirely to the OS, database consistency can be destroyed despite the correctness of each individual transaction We can ensure consistency of the database under concurrent execution by making sure that any schedule that executed has the same effect as a schedule that could have occurred without any concurrent execution. That is, the schedule should, in some sense, be equivalent to a serial schedule. Potential problems caused by concurrency 1. Lost update problem : An apparently successfully completed update operation by one user can be overridden by another user. T3 Time1 balance1 = (select balance from Customer where accountID = 101); balance1 += 5.00; Time 2 Time 3 update Customer set balance = ?balance1 where accountID = 101; Time 4 Time 5 Commit Time 6 T4 balance (15) 15 balance2 = (select balance from Customer where accountID = 101); balance2 += 10.00; 15 20 update Customer set balance = ?balance2 where accountID = 101; Commit 25 25 25 Potential problems caused by concurrency 2. The uncommitted dependency problem : This problem occurs when one transaction is allowed to see the intermediate result of another transaction before it has committed. T3 Time1 balance1 = (select balance from Customer where accountID = 101); balance1 += 5.00; Time 2 update Customer set balance = ?balance1 where accountID = 101; Time 3 Time 4 Rollback Time 5 Time 6 T4 balance (15) 15 20 balance2 = (select balance from Customer where accountID = 101); balance2 += 10.00; update Customer set balance = ?balance2 where accountID = 101; Commit 20 15 30 30 3. Incorrect summary problem : is an inconsistent retrieval problem which occurs when a transaction reads several values, but another transaction updates some of the values while Balance the first transaction is still executing. T3 Time1 balance1 = (select balance from Customer where accountID = 101); balance1 += 10.00; Time 2 update Customer set balance = ?balance1 where accountID = 101; Time 3 Time 4 Time 5 Time 6 Time 7 balance1 = (select balance from Customer where accountID = 102); balance1 -= 10.00; update Customer set balance = ?balance1 where accountID = 102; Commit T4 bal 101 bal 102 15 15 25 15 25 15 25 5 25 5 Total = select sum(balance) from customer where accountID = 101 or accountID = 102 Commit A phantom read problem : It occurs when an aggregate operation is repeated by a transaction and yields a different result because of the insertion of a row by another transaction T1 Time1 totalA = (select sum(balance) from Customer where zipcode = 31101); Time 2 Time 3 totalB = (select sum(balance) from Customer where zipcode = 31101); Time 4 Time 5 Commit T2 insert into customer (accountID, balance, zipcode) values (105, 10.00, 31101) sum(balance) 100 100 100 110 rollback A nonrepeatable read problem : It occurs when a transaction reads the same value more than one time. In between reading the data item, another transaction modifies the data item. T1 Time1 balance1 = (select balance from Customer where accountID = 101); Time 2 Time 3 balance2 = (select balance from Customer where accountID = 101); T2 balance 15 15 update customer set balance = 0.0 where accountID = 101; 0.0 110 Recoverability : If a transaction fails, the atomicity property requires that we undo the effects of the transaction. In addition, the durability property states that once a transaction commits, its changes cannot be undone. Recoverable schedule : A schedule where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written by T i, then the commit operation of Ti precedes the commit operation of Tj. Non-recoverable schedule T3 Time1 balance1 = (select balance from Customer where accountID = 101); balance1 += 5.00; Time 2 update Customer set balance = ?balance1 where accountID = 101; Time 3 Time 4 Time 5 Time 6 Rollback T4 balance (15) 15 20 balance2 = (select balance from Customer where accountID = 101); balance2 += 10.00; update Customer set balance = ?balance2 where accountID = 101; Commit 20 30 30 Locking : A procedure used to control concurrent access to data. When one transaction is accessing the database, a lock may deny access to other transactions to prevent incorrect results. Locking methods are the most widely used approach to ensure serializability of concurrent transactions. There are several variations, but all share the same fundamental characteristic, namely that a transaction must claim a read (shared) or write (exclusive) lock on a data item before the corresponding database read or write operation. The lock prevents another transaction from modifying the item or even reading it, in the case of write lock. Data items of various sizes, ranging from the entire database down to a field, may be locked. The size of the item determines the fineness, or granularity, of the lock. Read lock : If a transaction has a read lock on a data item, it can read the item but not update it Write lock : If a transaction has a write lock on a data item, it can both read and update the item. •· Any transaction that needs to access a data item must first lock the item, requesting a read lock only access or a write lock for both read and write access. •· If the item is not already locked by another transaction, the lock will be granted. · If the item is currently locked, the DBMS determines whether the request is compatible with the existing lock. If a read lock is requested on an item that already has a read lock on it, the request will be granted; otherwise, the transaction must wail until the existing lock is released. • A transaction continues to hold a lock until it explicitly releases it either during execution or when it terminates (aborts or commits). It is only when the write lock has been released that the effects of the write operation will be made visible to other transaction. Lock can solve Lost update problem : (An apparently successfully completed update operation by one user can be overridden by another user.) Time1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7 T3 Write_lock (balance) balance1 = (select balance from Customer where accountID = 101); balance1 += 5.00; update Customer set balance = ?balance1 where accountID = 101; Commit/ Unlock (balance) T4 balance (15) 15 Write_lock (balance) Wait Wait balance2 = (select balance from Customer where accountID = 101); balance2 += 10.00; update Customer set balance = ?balance2 where accountID = 101; Commit/ Unlock (balance) 20 20 20 30 30 Lock can solveThe uncommitted dependency problem : This problem occurs when one Transaction is allowed to see the intermediate result of another transaction before it has committed. T3 Time1 Write_lock (balance) balance1 = (select balance from Customer where accountID = 101); balance1 += 5.00; Time 2 update Customer set balance = ?balance1 where accountID = 101; Time 3 Time 4 Time 5 Rollback / Unlock (balance) Time 6 Time 7 Time 8 T4 balance (15) 15 20 Write_lock (balance) Wait Wait balance2 = (select balance from Customer where accountID = 101); balance2 += 10.00; update Customer set balance = ?balance2 where accountID = 101; Commit / Unlock (balance) 15 15 25 25 Lock can solve Incorrect summary problem : T3 Time1 Write_lock (balance) balance1 = (select balance from Customer where accountID = 101); balance1 += 10.00; Time 2 update Customer set balance = ?balance1 where accountID = 101; Time 3 Time 4 balance1 = (select balance from Customer where accountID = 102); balance1 -= 10.00; Time 5 update Customer set balance = ?balance1 where accountID = 102; Time 6 Commit / Unlock (balance) Time 7 Time 8 T4 Balance bal 101 bal 102 15 15 25 15 25 15 Wait 25 5 Wait Total = select sum(balance) from customer where accountID = 101 or accountID = 102 Commit / Unlock (balance) 25 5 Write_Lock (balance) ถ้ าปล่อย Lock เร็วเกินไป อาจเกิดปัญหา Inconsistency กับฐานข้ อมูล Write_Lock (balx); Read (balx); balx = balx + 100; Write(balx); Unlock (balx); Write_Lock (balx); Read (balx); balx = balx * 1.1; Write(balx); Unlock (balx); Write_Lock (baly); Read (baly); baly = baly * 1.1; Write(baly); Unlock (baly); Commit Write_Lock (baly); Read (baly); baly = baly - 100; Write(baly); Unlock (baly); Commit Cascading rollback : the situation, in which a single transaction leads to a series of rollback. Cascading rollbacks are undesirable, since they potentially lead to the undoing of a significant amount of work. Clearly, it would be useful if we could design protocols that prevent cascading rollbacks. One way to achieve this with two-phase locking is to leave the release of all locks until the end of the transaction. T1 Write_Lock (balx); Read (balx); Read_Lock (baly); Read(baly); balx = baly + balx; Write(balx); Unlock (balx); . . . . Rollback T2 Write_Lock (balx); Read (balx); balx = baly + 100; Write(balx); Unlock (balx); . . . . Rollback T3 Read_Lock (balx); . . . . Rollback Two-phase locking (2PL) : A transaction follows the two-phase locking protocol if all locking operations precede the first unlock operation in the transaction. According to the rules of this protocol, every transaction can be divided into two phases; first a growing phase, in which it acquires all the locks needed but cannot release any locks, and then a shrinking phase, in which it releases its locks but cannot acquire any new locks. Two-phase locking protocol may cause deadlock. Deadlock : An impasse that may result when two or more transactions are each waiting for locks held by the other to be released. Neither transaction can continue because each is waiting for a lock it cannot obtain until the other completes. Once deadlock occurs, the applications involved cannot resolve the problem. Instead, the DBMS has to recognize that deadlock exists and break the deadlock in some way. Lock can solveThe uncommitted dependency problem : This problem occurs when one transaction is allowed to see the intermediate result of another transaction before it has committed. Time1 Write_lock (balance); balance1 = (select balance from customer where accountID = 101); balance1 += 10.00; Time 2 Time 3 Write_lock (balance); balance1 = (select balance from customer where accountID = 102; balance -= 10.00; update Customer set balance = ?balance1 where accountID = 101; Time 4 Time 5 Write_lock (balance); balance2 = (select balance from customer where accountID = 102); Time 6 Wait Time 7 Time 8 Wait update Customer set balance = ?balance1 where accountID = 102; Write_lock (balance) balance2 = (select balance from customer where accountID = 101; Wait In addition to these rules, some systems permit a transaction to issue a read lock on an item and then later to upgrade the lock to a write lock. This effectively allows a transaction to examine the data first and then decide whether it wishes to update it. If upgrading is not supported, a transaction must hold write locks on all data items that it may update at some time during the execution of the transaction, thereby potentially reducing the level of concurrency in the system. For the same reason, some systems also permit a transaction to issue a write lock and then later to downgrade the lock to a read lock. Granularity of Data Items Granularity : The size of data items chosen as the unit of protection by a concurrency control protocol. A data item is chosen to be one of the following, ranging from coarse to fine, where fine granularity refers to small item sizes and coarse granularity refers to large item sizes: · The entire database. · A file. · A page (sometimes called an area or database space – a section of physical disk in which relations are stored). · A record · A field value of a record The size of granularity of the data item that can be locked in a single operation has a significant effect on the overall performance of the concurrency control algorithm. The granularity would prevent any other transactions from executing until the lock is released. Thus, the coarser the data item size, the lower the degree of concurrency permitted. On the other hand, the finer the item size, the more locking information that is needed to be stored. The best item size depends upon the nature of the transactions. The solutions to this problem will involve providing a locking mechanism in the database server. Any restrictions on the concurrency of transactions will have a negative effect on the number of transactions that can be executing at any time. This balancing act is a typical trade-off. The more restrictive the concurrency strategy is, the more reliable it is, and the slower it is. DBMS designers, database administrators, and application developers must all carefully consider how much concurrency can be achieved without sacrificing either speed or reliability. Timestamp-Based Protocal The use of locks, combined with the two-phase locking protocol, guarantees serializability of schedules. The order of transactions in the equivalent serial schedule is based on the order in which the transactions lock the items they require. If a transaction needs an item that is already locked, it may be forced to wait until the item is released. A different approach that also guarantees serializability uses transaction timestamps to order transaction execution for an equivalent serial schedule. Timestamp methods for concurrency control are different from locking methods. No locks are involved, and therefore there can be no deadlock. Locking methods generally prevent conflicts by making transactions wait. With timestamp methods, there is no waiting; transactions involved in conflict are simply rolled back and restarted. Timestamp A unique identifier created by the DBMS that indicates the relative starting time of a transaction. A timestamp can be generated by using the value of the system clock as the timestamp; that is, a transaction’s timestamp is equal to the value of the clock when the transaction enters the system. The timestamps of the transactions determine the serializability order. Thus, if TS(Ti) < TS(Tj), then the system must ensure that the produced schedule is equivalent to a serial schedule in which transaction Ti appears before transaction Tj. With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This timestamp is assigned by the database system before the transaction Ti starts execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then TS(Ti) < TS(Tj). There are two simple methods for implenenting this scheme: Besides timestamps for transactions, there are timestamps for data items. Each data item contains a read-timestamp, giving the timestamp of the last transaction to read the item and a writetimestamp, giving the timestamp of the last transaction to write (update) the item. • W-timestamps(Q) denotes the largest timestamp of any transaction that executed write(Q) successfully. • R-timestamps(Q) denotes the largest timestamp of any transaction that execute read9Q) successfully. These timestamps are updated whenever a new read(Q) or write(Q) instruction is executed. Timestamping : A concurrency control protocol in which the fundamental goal is to order transactions in such a way that older transactions , transactions with smaller timestamps, get priority in the even of conflict. For a transaction T with timestamp ts(T), the timestamp ordering protocol works as follows: The Timestamp-ordering Protocal 1. Suppose that transaction Ti issues read(Q) (a) If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten. Hence, the read operation is rejected. (b) If TS(Ti) ≥ W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set to the maximum of R-timestamp(Q) and TS(Ti). 2. Suppose that transaction Ti issues write(Q) (a) If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the system assumed that that value would never be produced. Hence, the system rejects the write operation. (b) If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, the system rejects this write operation. (c) Otherwise, the system executes the write operation and sets Wtimestamp(Q) to TS(Ti). Transaction Failure and Recovery management Failure Classification Transaction failure. There are 2 types of errors that may cause a transaction to fail: Logical error: The transaction can no longer continue with its normal execution because of some internal condition, such as bad input, data not found, overflow or resource limit exceeded. System error : The system has entered an undesirable state. System crash. There is a hardware malfunction, or a bug in the DBMS or OS, that causes the loss of the content of volatile storage and brings transaction processing to a halt. The content of nonvolatile storage remains intact. Disk failure. A disk block loses its content as a result of either a head crash or failure during a data transfer operation. Storage Types Volatile storage. Information residing in volatile storage does not usually survive system crashes. Example of such storage are main memory and cache memory. Access to volatile storage is extremely fast, both because of the speed of the memory access itself, and because it is possible to access any data items in volatile storage directly. Nonvolatile storage. Information residing in nonvolatile storage survives system crashes. Example of such storage are disk and magnetic tapes. Both are subject to failure (for example, head crash), which may result in loss of information. Stable storage. Information residing in stable storage is never lost (in fact never cannot be guaranteed. Although, stable storage is theoretically impossible to obtain, it can be closely approximated by techniques that make data loss extremely unlikely. The execution of an SQL statement begins with an implicit request to open a transaction, followed by the processing of the statement, followed automatically by a commit request. Rollback happens only when the SQL statement fails. An application must make explicit calls to the database transaction manager to enter explicit-commit mode and allow multiple SQL statements to execute as a single transaction. An application executes an open transaction statement (begin transaction) to ask the transaction manager to create a new transaction before the next SQL statement executes. The application executes a commit transaction statement to ask the transaction manager to commit the transaction. The application executes a rollback statement to ask the application to cancel the transaction. Storage Hierarchy The database system resides permanently on nonvolatile storage (usually disks), and is partitioned into fixed-length storage units called blocks. Blocks are the units of data transfer from disk to main memory (or memory to , disk) , and may contain several data items. Transactions input information from the disk to main memory, and then output the information back onto the disk. The input and output operations are done in block units. The blocks residing on the disk are referred to as physical blocks; the blocks residing temporarily in main memory are referred to as buffer blocks. The area of memory where blocks reside temporarily is called the disk buffer. Storage Hierarchy Block movements between disk and main memory are initiated through the following two operations: Input (X) : transfers the physical block which contains the data item X from disk to main memory Output(X) : transfers the buffer block which contains the data item X from main memory to disk and replaces the appropriate physical block there. Input (A) A B Main Memory Output (B) B Disk Each transaction Ti has a private work area in which copies of all the data items accessed and updated by Ti are kept. The system creates this work area when he transaction is initiated; the system removes it when the transaction either commits or aborts. Each data item X kept in the work area of transaction Ti is denoted by xi. Transaction Ti interacts with the database system by transferring data to and from its work area to the system buffer. Data is transferred by these 2 operations: 1. read(X) assigns the value of data item X to the local variable xi. It executed this operation as follows: a. If block Bx on which X resides is not in main memory, it issues input(Bx). b. It assigns the value of X from the buffer block to xi . 2. Write(X) assigns the value of local variable xi to data item X in the buffer block. It executes this operation as follows: a. If block Bx on which X resides is not in main memory, it issues input(Bx). b. It assigns the value of xi to X in buffer Bx. Both operations may require the transfer of a block from disk to main memory. However, they do not require the transfer of a block from main memory to disk. The output (Bx) operation for the buffer block Bx on which X resides does not need to take effect immediately after write (X) is executed, since the block Bx may contain other data items that are still being accessed. A buffer block is eventually written out to the disk either because the buffer manager needs the memory space for other purposes or because the database system wishes to reflect the change to B on the disk. (DBMS performs a force-output of buffer B if it issues an output B). Algorithms proposed to ensure database consistency and transaction atomicity despite failures are known as recovery algorithms, which have 2 parts :1: Actions taken during normal transaction processing to ensure that enough information exists to allow recovery from failures. 2: Actions taken after a failure to recover the database contents to a state that ensures database consistency, transaction atomicity, and durability. Log-Based Recovery The most widely used structure of recording database modifications is the log. The log is a sequence of log records, recording all the update activities in the database. There are several types of log records. An update log record describes a single database write. It has these fields: • Transaction identifier • Data-item identifier • Old value • New value Other special log records exist to record significant events during transaction processing. Whenever a transaction performs a write, it is essential that the log record for that write be created before the database is modified. (the transaction has its own memory that acts like a cache for the modified data items.) Once a log record exists, we can output the modification to the database if that is desirable. Also, we have the ability to undo a modification that has already been output to the database. We undo it by using the old-value field in log records. Deferred Database Modification This technique ensures transaction atomicity by recording all database modifications in the log, but deferring the execution of all write operations of a transaction until the transaction partially commits. When a transaction partially commits, the information on the log associated with the transaction is used in executing the deferred writes. If the system crashes before the transaction completes its execution, or if the transaction aborts, then the information on the log is simply ignored. The execution of transaction Ti proceeds as follows. Before Ti starts its execution, a record <Ti start> is written to the log. A write(X) operation by Ti results in the writing of a new record to the log. Finally, when Ti partially commits, a record <Ti commit> is written to the log. T0: T1: Read(A); A = A – 50; Write (A); Read (B); B = B + 50; Write (B); Read (C); C = C – 100; Write (C); สมมุติให้ขอ้ มูลปั จจุบนั ของ A = 1000 B = 2000 และ C = 700 < T0 Start> < T0, A, 950 > < T0, B, 2050> < T0 Commit> < T1 Start> < T1, C, 600 > < T1 Commit> ข้ อมูลใน log บันทึก เฉพาะค่ าใหม่ เท่ านั้น When transaction Ti partially commits, the records associated with it in the log are used in executing the deferred writes. Since a failure may occur while this updating is taking place, we must ensure that, before the start of these updates, all the log records are written out to stable storage. Once they have been written, the actual updating takes place, and the transaction enters the committed state. T0: T1: Read(A); A = A – 50; Write (A); Read (B); B = B + 50; Write (B); Read (C); C = C – 100; Write (C); เรคอร์ ดใน log ข้ อมูลใน Database < T0 Start> < T0, A, 950 > < T0, B, 2050> < T0 Commit> A = 950 B = 2050 < T1 Start> < T1, C, 600 > < T1 Commit> C = 600 เรคอร์ ดใน log ข้ อมูลใน Database < T0 Start> < T0, A, 950 > < T0, B, 2050> System failure A = 1000 B = 2000 < T1 Start> < T1, C, 600 > System failure C = 700 DBMS does not take any action after recovery from failure because database has been untouched. เรคอร์ ดใน log ข้ อมูลใน Database < T0 Start> < T0, A, 950 > < T0, B, 2050> < T0 Commit> System failure A = 950 B = 2050 < T1 Start> < T1, C, 600 > < T1 Commit> System failure C = 600 DBMS has to perform redo operation after recovery from failure. เรคอร์ ดใน log ข้ อมูลใน Database < T0 Start> < T0, A, 950 > < T0, B, 2050> System failure A = 1000 B = 2000 < T1 Start> < T1, C, 600 > < T1 Commit> C = 600 System failure DBMS does not take any action to T0 because A and B are untouched but DBMS must perform redo to T1 after recovery from failure. Using the log, the system can handle any failure that results in the loss of information on volatile storage. The recovery scheme uses the following recovery procedure: Redo(Ti) sets the value of all data items updated by transaction Ti to the new values. The redo operation must be idempotent; that is, executing it several times must be equivalent to executing it once. This characteristic is required if we are to guarantee correct behavior even if a failure occurs during the recovery process. After a failure, the recovery subsystem consults the log to determine which transactions need to be redone. Transaction Ti needs to be redone if and only if the log contains both the record <Ti start> <Ti commit>. Thus, if the system crashes after the transaction completes its execution, the recovery scheme uses the information in the log to restore the system to a previous consistent state after the transaction had completed. Immediate Database Modification This technique allows database modifications to be output to the database while the transaction is still in the active state. Data modifications written by active transactions are called uncommitted modifications. In the event of a crash or a transaction failure, the system must use the old-value field of the log records to restore the modified data items to the value they had prior to the start of the transaction. The undo operation accomplishes this restoration. Before a transaction Ti starts its execution, the system writes the record <Ti start> to the log. During its execution, any write(X) operation by Ti is preceded by the writing of the appropriate new update record to the log. When Ti partially commits, the system writes the record <Ti commit> to the log. เรคอร์ ดใน log T0: T1: Read(A); A = A – 50; Write (A); Read (B); B = B + 50; Write (B); Read (C); C = C – 100; Write (C); < T0 Start> < T0, A, 1000, 950 > < T0, B, 2000, 2050> ข้ อมูลใน Database A = 950 B = 2050 < T0 Commit> < T1 Start> < T1, C, 700, 600 > C = 600 < T1 Commit> เรคอร์ ดใน log < T0 Start> < T0, A, 1000, 950 > < T0, B, 2000, 2050> ข้ อมูลใน Database A = 950 B = 2050 System failure < T1 Start> < T1, C, 700, 600 > C = 600 System failure DBMS must perform undo to T0 and T1 by using old value after recovery from failure. เรคอร์ ดใน log < T0 Start> < T0, A, 1000, 950 > < T0, B, 2000, 2050> ข้ อมูลใน Database A = 950 B = 2050 < T0 Commit> System failure < T1 Start> < T1, C, 700, 600 > C = 600 < T1 Commit> System failure DBMS has to perform redo operation by using new value to T0 and T1 after recovery from failure. เรคอร์ ดใน log < T0 Start> < T0, A, 1000, 950 > < T0, B, 2000, 2050> ข้ อมูลใน Database A = 950 B = 2050 System failure < T1 Start> < T1, C, 700, 600 > C = 600 < T1 Commit> System failure DBMS has to perform undo to T0 and redo to T1 after recovery from failure. After a failure, the recovery subsystem consults the log to determine which transactions need to be undone or redone. Transaction Ti needs to be undone if the log contains only the record <Ti start> and need to be redone if there exists <Ti start> and <Ti commit> Thus, if the system crashes after the transaction completes its execution, the recovery scheme uses the information in the log to restore the system to a previous consistent state after the transaction had completed. Since the information in the log is used in reconstructing the state of the database, We therefore require that, before execution of an output(B) operation, the log records corresponding to B be written onto stable storage. Rollback segment (RBS) Rollback segment (RBS) : An Oraclex database has a data area that contains a rollback segment (RBS) entry for each open transaction. RBS entry is a set of images of rows that have been modified by the transaction. The images represent the values of the rows before the execution of the transaction. Each update operation executed by a transaction is applied to row of a database table only after the previous value of the row is added to the RBS entry. Oraclex database server Rollback segment Before image Transaction T T.A write r Database tables Updated values r r s s T.B write s t T.C read s u T.D read u The open transaction operation creates a new RBS entry and associates it with the transaction. The execution of a transaction commit operation deletes the RBS entry and makes the changes permanent. The execution of a rollback operation restores all of the modified rows from the RBS entry. Other DBMS systems, the transaction has its own memory that acts like a cache for the modified rows. During the execution, the database tables are not changed. Instead, the new row images are written into the memory of the transaction. All accesses to rows in database tables go first to the transaction cache. If a row is not found, the full database tables are used. The commit operation flushes the cache by writing the new row values to the database tables and deleting the cache. The rollback operation deletes the cache, leaving the database unchanged. Cached updates database server Update segment Updated values Transaction T T.A write r Database tables Before image r r s s T.B write s t T.C read s u T.D read u Checkpoints When a system failure occurs, DBMS must consult the log to determine those transactions that need to be redone or those that need to be undone. In principle, the entire log must be searched to determine this information. There are two major difficulties with this approach: 1. The search process is time consuming. 2. Most of the transactions that, according to the algorithm, need to be redone have already written their updates into the database. Although redoing them will cause no harm, it will nevertheless cause recovery to take longer. Checkpoints (continue) To reduce the number of transactions to be redone and undone, the system periodically performs checkpoints, which require the following sequence of actions to take place :1. Output onto stable storage all log records currently residing in main memory. 2. Output to the disk all modified buffer blocks. 3. Output onto stable storage a log record <checkpoint>. Transactions are not allowed to perform any update actions, such as writing to a buffer block or writing a log record, while a checkpoint is in progress. Checkpoints (continue) The presence of a <checkpoint> record in the log allows the system to streamline its recovery procedure. Consider a transaction Ti that committed prior to the checkpoint. For such a transaction, the <Ti commit> record appears in the log before the <checkpoint> record. Any database modifications made by Ti must have been written to the database either prior to the checkpoint or as part of the checkpoint itself. Thus, at the recovery time there is no need to perform a redo operation on Ti. This observation allows us to refine our previous recovery schemes. After a failure has occurred, the recovery scheme examines the log to determine the most recent transaction Ti that started executing before the most recent checkpoint took place. It can find such a transaction by searching the log backward, from the end of the log, until it finds the first <checkpoint> record (since we are searching backward, the record found is the final <checkpoint> record in the log); then it continues the search backward until it finds the next <Ti start> record. This record identifies a transaction Ti . Once the system has identified transaction Ti, the redo and undo operations need to be applied to only transaction Ti and all transactions Tj that started execution after transaction Ti. Let denote these transactions by the set T. The remainder (earlier part) of the log can be ignored, and can be erased whenever desired. The exact recovery operations to be performed depend on the modification technique being used. For the immediate-modification technique, the recovery operations are: • For all transactions Tk in T that have no <Tk commit> record in the log, execute undo(Tk). • For all transaction Tk in T such that the record <Tk commit> appears in the log, execute redo(Tk). Consider the set of transactions {T0, T1, ….., T100} executed in the order of the subscripts. Suppose that the most recent checkpoint took place during the execution of transaction T67. Thus, only transactions T67, T68, …., T100 need to be considered during the recovery scheme. Each of them needs to be redone if it has committed; otherwise, it needs to be undone.