* Topic 6.3: Transactions and Concurrency Control Hari Uday * Part 1: Basic Knowledge * Originally Developed for Databases, but can be applied to Distributed File management systems * Motivation is to make use of clean and powerful atomic transactions semantics [ACID] * Transaction Services: * Clients at each distributed site issues transaction service requests to the local transaction processing system (TPS) * TPS consists of transaction manager(TM), Scheduler (SCH) and Object Manager (OM) * A transaction may invoke operations on remote objects [sub-transactions], send to remote TPS * Transaction Manager works in tandem with other TMs to make sure local as well as remote transactions work consistently. * Scheduler makes sure that scheduling is done to objects and that conflicts are avoided. * Object Manager ensures the coherency of replicas and caches and provides interface to file systems. * * Atomicity, Consistency, Isolation and Durability * Atomicity: either all actions are carried or none are. * Consistency: when a user’s transaction runs to completion by itself * against a ‘consistent’ database instance, it will leave the database in a ‘consistent’ state. * Isolation: users can understand a transaction without considering the effect of other concurrently executing transactions. * Durability: once a transaction has been successfully completed, its effects should persist even if the system crashes before all its changes are reflected on disk * * Client Process: A client initiates start of transaction by issuing a begin transaction to its local TM. Tm generates transaction ID and work space for subsequent RW operations. * * TM then sends the transaction ID to the scheduler . Called execution phase. Access requests from the TM are handled by Scheduler. To maintain atomicity, a rejected operation causes TM to abort. Finally, the TM either commits or aborts. * * SCH chooses a concurrency control protocol. * Three approaches: Prevent inconsistency or validate consistency: Locks (Shared and Exclusive) * 2nd approach: Individual access operations is checked by the SCH and is either accepted, tentatively accepted or rejected. Timestamps are used * 3rd approach: Conflicts are ignored in exec. phase. Consistency is validated at the end of exec. phase. Only transactions that are globally validated are allowed to commit. * * OM is responsible for interfacing with the underlying file service for actual operations on an object. * Provides cache management for efficiency * Participates in failure recovery protocols to support durability * If objects are replicated, consistency of replicas are managed by Replica Management Protocol. * * Concept of Transactions are very important in multiuser file system (NFS, NTFS) * Scheduling Transactions: A transaction is a series of actions including READs / WRITEs of files and ABORTs and COMMITs. * A schedule is a list of actions from a set of transactions, and the order in which two actions of a transaction T appear in a schedule must be the same as the order in which they appear in T * Serial schedule: Schedule that does not interleave the actions of different transactions * Equivalent schedules: For any file system, the effect (on the set of objects in the database) of executing the first schedule is identical to the effect of executing the second schedule * Serializable schedule: A schedule that is equivalent to some serial execution of the transactions. * If each transaction preserves consistency, every serializable schedule preserves consistency * Two actions on the same data object conflict if at least one of them is a WRITE. * – write-read (WR) conflict: T2 could read a DB object that has been modified by T1, which has not yet committed. * – read-write (RW) conflict: T2 could change the value of an object that has been read by T1, while T1 is still in progress. * – write-write (WW) conflict: T2 could overwrite the value of an object which has already been modified by T1, while T1 is still in progress. * A serializable schedule is a schedule whose effect on any consistent database instance is guaranteed to be identical to that of some complete serial schedule over the set of committed transactions * In a recoverable schedule, transactions commit only after all transactions whose changes they read commit * * A lock is a small bookkeeping object associated with a object * A locking protocol is a set of rules to be followed by each transaction to ensure that, even though actions of several transactions might be interleaved, the net effect is identical to executing all transactions in some serial order. * Strict 2 PL Locking * Timestamp * * Strict 2PL Protocol: * Each transaction must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. * All locks held by a transaction are released when the transaction completes. * If a transaction holds an X lock on an object, no other transactions can get a lock (S or X) on that object. * Strict 2PL allows only serializable schedules * * If a transaction Ti is aborted, all its actions have to be undone. Not only that, if Tj reads an object last written * by Ti, Tj must be aborted as well. * Most systems avoid such cascading aborts by releasing a transaction’s locks only at commit time. - If Ti writes an object, Tj can read this only after Ti commits. * In order to undo the actions of an aborted transaction, the OS maintains a log in which every write is recorded. This mechanism is also used to recover from system crashes: all active transactions at the time of the crash are aborted when the system comes back up. * * The following actions are recorded in the log: – Ti writes an object: the old value and the new value. Log record must go to disk before the changed page! – Ti commits/aborts: a log record indicating this action. * Log records are chained together by transaction id, so it’s easy to undo a specific transaction * Log is often duplexed and archived on stable storage * All log related activities are handled transparently by the DBMS * * Crash Recovery * A recovery manager of is responsible for ensuring transaction atomicity and durability. * Can the changes made to an object in the buffer pool by a transaction T be written to disk before T commits? (steal?) * When a transaction commits, must all the changes it has made to objects in the buffer pool be immediately forced to disk? (force?) * Recommend: Use a steal, no-force approach. * * Three phases for recovery algorithm: * Analysis: Scan the log forward (from the most recent checkpoint) to identify all transactions that were active, and all dirty pages in the buffer pool at the time of the crash. * Redo: Redo all updates to dirty pages in the buffer pool, as needed, to ensure that all logged updates are in fact carried out and written to disk. * Undo: The writes of all transactions that were active at the crash are undone, working backwards in the log. (Some care must be taken to handle the case of a crash occurring during the recovery process!) * * Development, Visualization, and Application of the ARIES Systems Code Carlson, L. ; Center for Energy Res., Univ. of California-San Diego, La Jolla, CA, USA ; Tillack, M. ; Najmabadi, F. ; Kessel, C. * A model of atomicity for multilevel transactions Blaustein, B.T. ; MITRE Corp., McLean, VA, USA ; Jajodia, S. ; McCollum, C.D. ; Notargiacomo, L. *