Transaction Processing John Ortiz Introduction Transactions are motivated by two of the properties of DBMS's discussed way back in our first lecture: Multi-user database access Safe from system crashes Main issues: How to model concurrent execution of user programs? How to guarantee acceptable DB behavior? How to deal with system crashes? Lecture 19 Transaction Processing 2 Why Concurrency? Allowing only serial execution of user programs may cause poor system performance Low throughput, long response time Poor resource utilization (CPU, disks) Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the CPU humming by working on several user programs concurrently Lecture 19 Transaction Processing 3 Example: Why Concurrency? Assume each users’ program uses CPU and I/O resources (disks) in an interleaved fashion: CPU, R(X), CPU, W(X) Suppose each CPU request takes 1 time unit and each I/O request takes 5 time units. For a 2 GHz Machine, one clock tick is ½ ns An 8 millisecond seek time is 8000 microseconds, which is 8,000,000 ns Clearly the CPU can get quite a bit done while the disk is searching for a block Lecture 19 Transaction Processing 4 Example: Why Concurrency? Serial schedule Time units = 48 T1 T2 T3 T4 T1 T2 T3 T4 Lecture 19 CPU I/O Time Non-serial schedule Time units = 41 CPU I/O Time Transaction Processing 5 Example: Why Concurrency? Serial schedule Time units = 48 T1 T2 T3 T4 T1 T2 T3 T4 Lecture 19 CPU I/O Time Non-serial schedule Time units = 22 CPU Use 2 disks I/O 1 I/O 2 Time Transaction Processing 6 Transaction A user program may carry out many operations on data retrieved from database, but DBMS is only concerned about what data is read/written from/to the database (on disk) A transaction is a sequence of database actions that is considered as a unit of work DB actions: read (R(X)), write (W(X)), commit, abort Represent DBMS’s abstract view of Interact user sessions Execution of user programs Lecture 19 Transaction Processing 7 Example: Transaction Account(Ano, Name, Type, Balance) A user want to update Account set Balance = Balance – 50 where Ano = 10001 update Account set Balance = Balance + 50 where Ano = 12300 Let A be account w/ Ano=10001, B be account w/ Ano=12300. The transaction is R(A), W(A), R(B), W(B) Lecture 19 Transaction Processing 8 States of a Transaction begin transaction partially committed end active transaction exception commit committed failure failed abort aborted read/write Lecture 19 Transaction Processing 9 Consistency of Transaction Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins. DBMS will enforce some ICs, depending on the ICs declared in CREATE TABLE statements. Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). Lecture 19 Transaction Processing 12 Atomicity of Transactions A transaction might commit after completing all its actions, or it could abort (or be aborted by the DBMS) after executing some actions. A very important property guaranteed by the DBMS for all transactions is that they are atomic. That is, a user can think of a transaction as always executing all its actions in one step, or not executing any actions at all. DBMS logs all actions so that it can undo the actions of aborted transactions. Lecture 19 Transaction Processing 13 Example: Why Atomicity? Account(Ano, Name, Type, Balance) A user want to update Account set Balance = Balance – 50 where Ano = 10001 update Account set Balance = Balance + 50 where Ano = 12300 System crashed in the middle Possible outcome w/o recovery: $50 transferred or lost The operations must be done as a unit Lecture 19 Transaction Processing 14 Durability DBMS often save data in main memory buffer to improve system efficiency. Data in buffer is volatile (may get lost if system crashes) When a transaction commits, DBMS must guarantee that all updates make by the transaction will not be lost even if the system crashes later DBMS uses the log to redo actions of committed transactions if necessary Lecture 19 Transaction Processing 15 Isolation Users submit transactions, and can think of each transaction as executing by itself (in isolation) Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions DBMS guarantees that interleaving transactions do not interfere with each other Lecture 19 Transaction Processing 16 Example: Why Isolation? Two users (programs) do this at the same time User 1: update Student set GPA = 3.7 where SID = 123 User 2: update Student set Major = ‘CS’ where SID = 123 Sequence of events: for each user, read tuple, modify attribute, write tuple. Possible outcomes w/o concurrency control: one change or both Lecture 19 Transaction Processing 17 Example: Why Isolation? Emp(EID, Name, Dept, Sal, Start, Loc) User 1: update Emp set Dept = ‘Sales’ where Loc = ‘Downtown' User 2: update Emp set Start = 3/1/00 where Start = 2/29/00 Possible outcomes w/o concurrency control: each tuple has one change or both, may be inconsistent across tuples Lecture 19 Transaction Processing 18 Example: Interleaved Transactions Consider two transactions: T1: BEGIN A=A+100, B=B-100 END T2: BEGIN A=1.06*A, B=1.06*B END One possible interleaved execution: T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B It is OK. But what about another interleaving? T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B Lecture 19 Transaction Processing 19 Schedule: Modeling Concurrency Schedule: a sequence of operations from a set of transactions, where operations from any one transaction are in their original order Notation: Ri(X): read X by Ti T1 T2 Wi(X): write X by Ti R(A) W(A) R(B) R1(A), W1(A), R2(B), W2(B), W(B) R1(C), W1(C) R(C) W(C) Lecture 19 Transaction Processing 20 Schedule (cont.) Represents some actual sequence of database actions. In a complete schedule, each transaction ends in commit or abort. A schedule transforms database from an initial state to a final state Initial state Lecture 19 A schedule Transaction Processing Final state 21 Schedule (cont.) Assume a consistent initial state A representation of an execution of operations from a set of transactions Ignore aborted transactions Incomplete (not yet committed) transactions Operations in a schedule conflict if 1. They belong to different transactions 2. They access the same data item 3. At least one item is a write operation Lecture 19 Transaction Processing 22 Anomalies with Concurrency Interleaving transactions may cause many kinds of consistency problems Reading Uncommitted Data ( “dirty reads”): R1(A), W1(A), R2(A), W2(A), C2, R1(B), A1 Unrepeatable Reads: R1(A), R2(A), W2(A), C2, R1(A), W1(A), C1 Overwriting Uncommitted Data (lost update): R1(A), R2(A), W2(A), W1(A) Lecture 19 Transaction Processing 23 Anomalies with Concurrency Incorrect Summary Problem Data items may be changed by one transaction while another transaction is in the process of calculating an aggregate value A correct “sum” may be obtained prior to any change, or immediately after any change Lecture 19 Transaction Processing 24 Serial Schedule An acceptable schedule must transform database from a consistent state to another consistent state Serial schedule : one transaction runs entirely before the next transaction starts. T1: R(X), W(X) T2: R(X), W(X) R1(X) W1(X) C1 R2(X) W2(X) C2 R2(X) W2(X) C2 R1(X) W1(X) C1 R1(X) R2(X) W2(X) W1(X) C1 C2 Lecture 19 Transaction Processing Serial Non-serial 25 Serial Schedule IS Acceptable Serial schedules guarantee transaction isolation & consistency Different serial schedules can have different final states N transactions may form N! different serial schedules Any state from a serial schedule is acceptable – DBMS makes no guarantee about the order in which transactions are executed Lecture 19 Transaction Processing 26 Example: Serial Schedules T1: R(X), X=X+10, W(X) T2: R(X), X=X*2, W(X) Final X = 60 S1: R1(X) W1(X) C1 R2(X) W2(X) C2 Initial X = 20 Final X = 50 S2: R2(X) W2(X) C2 R1(X) W1(X) C1 Lecture 19 Transaction Processing 27 Is Non-Serial Schedule Acceptable? T1: R(X), X=X*2, W(X), R(Y), Y=Y-5, W(Y) T2: R(X), X=X+10, W(X) S1: R1(X) W1(X) R2(X) W2(X) R1(Y) W1(Y) C1 C2 Initial X=20 Y=35 final X=50 Y=30 S2: R1(X) W1(X) R1(Y) W1(Y) C1 R2(X) W2(X) C2 Lecture 19 Transaction Processing 28 Serializable Schedules Serializable schedule: Equivalent to a serial schedule of committed transactions. Non-serial (allow concurrent execution) Acceptable (final state is what some serial schedule would have produced) Types of Serializable schedules: depend on how the equivalency is defined Conflict: based on conflict operations View: based on viewing of data Ex: p.645, text does not show commits Lecture 19 Transaction Processing 29 Lock-Based Concurrency Control Strict Two-phase Locking (Strict 2PL) Protocol: Each transaction must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. All locks held by a transaction are released when the transaction completes If a transaction holds an X lock on an object, no other transaction can get a lock (S or X) on that object. Strict 2PL allows only serializable schedules. Lecture 19 Transaction Processing 30 Cascading Aborts When a transaction aborts, all its actions are undone. DBMS uses a log to keep track of actions of each transaction If T1 reads uncommitted data written by T2 (dirty read) and T2 must aborted, then T1 must also be aborted (cascading aborts) T1: R(A) W(A) … Abort T2: R(A) W(A) … Cascadeless schedule: transactions only read data from committed transactions Lecture 19 Transaction Processing 31 Recoverability If a transaction fails, the DBMS must return the DB to its previous state 1. 2. 3. 4. Computer failure – hw, sw, network, memory error Transaction error – erroneous input, divison by zero Local errors – insufficient funds, data not found Concurrency control enforcement – transaction aborted 5. Disk failure – hard disk crash (listed in text but not much different from 1.) 6. Physical catastrophe – power, theft, fire, etc. Lecture 19 Transaction Processing 32 Recoverability If T1 reads data from T2, commits and then T2 needs to abort, what should DBMS do? This situation is undesirable! A schedule is recoverable if very transaction commits only after all transactions from which it reads data commit. Cascadeless schedules are recoverable (but not vice-versa!). Real systems typically ensure that only recoverable schedules arise (through locking). Lecture 19 Transaction Processing 33 Summary Transactions model DBMS’ view of user programs Concurrency control and recovery are important issues in DBMSs Transactions must have ACID properties Atomicity Consistency Isolation Durability C & I are guaranteed by concurrency control A & D are guaranteed by crash recovery Lecture 19 Transaction Processing 34 Summary (cont.) Schedule models concurrent execution of transactions Conflicts arise when two transactions access the same object, and one of the transactions is modifying it Serial execution is our model of correctness Serializability allows us to “simulate” serial execution with better performance Concurrent execution should avoid cascade abort and be recoverable Lecture 19 Transaction Processing 35