On Transactions 1 Most database allow multiple users to execute simultaneously on common data. Each user runs his/her own process accessing possibly same data(relations, rows, objects). Without adequate control, such concurrent access can lead: --Inconsistent data in the database --wrong results For example, inconsistent view of data 2 Consider an accounts relation: A(aid, balance, owner) Assume a depositor Smith has two accounts: (A1, $900, smith) (A2, $100, smith) Consider two processes P1 and P2, where: P1: move $400 from A1 to A2 P2: perform a credit check on depositor Smith and if total balance in the bank is at least $900, issue a credit card 3 Only P1 update the database. Hence the three possible states in which we can find balances of Smith are as follows: State 1: A1.balance=$900, A2.balance=$100 Values before any update from P1 takes place State 2: A1.balance=$500, A2.balance=$100 Values after subtracting $400 from A1.balance State 3: A1.balance=$500, A2.balance=$500 Values after adding $400 to A2.balance --- NOTE that State 2, is an intermediate state and should NOT be visible from process P2(inconsistent view) 4 To avoid inconsistent views and a number of other problems that can arise with concurrent access, the DBMS provides a feature called transaction. A transaction offers easier programming to the database programmer. One of the guarantees of this feature is that whatever is declared as a transaction, it runs in ISOLATION, I.E., Other processes cannot interfere with it. Using the notion of transaction, each process is able to “package” together a series of database operations that should be executed in isolation. 5 In general: Transaction: a means by which an application programmer can “package” together a sequence of database operations, so that this part of the program is executed with the ACID properties of a transaction. If a transaction contains both reads and updates, it represents an attempt of the programmer to change the state of database. 6 If only reads: attempt to view data from the database. NOTE: there is NO begin-transaction statement in standard SQL. A transaction begins when there is no active transaction in progress and an SQL statement is performed to access the data. Hence when the application program starts, or when we DECLARE cursor, is NOT the beginning of a transaction. 7 But when we issue an SELECT…FROM… or UPDATE… or INSERT, DELETE, OPEN CURSOR, and there is no active transaction in progress, a new transaction starts. When a transaction is in progress any updates it makes or data it reads cannot be ‘seen’ or updated by other concurrent transactions. There are TWO SQL statements in an application program to end a transactional execution: 8 (1) COMMIT - the programmer uses this statement to inform the DBMS that the ongoing transaction has successfully completed. (then all the updates of this transaction become persistent and visible to other transactions) (2) ROLLBACK – the programmer says to DBMS that the ongoing transaction has finished unsuccessfully(aborted). 9 NOTE: if neither Commit nor Rollback Statement is executed for a transaction in progress before the application program terminates, then the system adds a default action(Commit or Rollback). Which one? Depends on the DBMS. 10 Question: why they need to put ‘rollback’ or ‘commit’ on our own? It is usually a bad programming to keep a transaction active across user interactions. [Add a commit before requesting a user input] why? 11 So, the system does not know the limits of a transaction. Rather it is ??? when a transaction ends. What problems were there before the use of transactions? 12 1. Creation of INCONCISTENT RESULTS 2. Errors if concurrent execution (the inconsistent analysis problem) 3. Uncertainly as to what changes become permanent. 13 Examples: -- CASE 1:the lost update problem T1 read_item(X) X:=X-N T2 read_item(X); X:=X+M write_item(X); Read_item(Y); Item X has an incorrect value because its update by T1 is “lo (overwrite). write_item(X); Y:=Y+N; write_item(Y); 14 -- Case 2: The Dirty Read Problem T1 read_item(X) X:=X-N write_item(X); T2 read_item(X); Trans T1 fails and must X:=X+M change the value of X back write_item(X); Read_item(Y); to its old value; meanwhile, T2 has read the “temporary” incorrect value of X. 15 --CASE 3:The Inconsistent Analysis Problem. T1 T3 sum:=0; read_item(A); sum:=sum+A … read_item(X); X:=X-N; write_item(X); read_item(X); sum:=sum+X; read_item(Y); sum:=sum+Y; read_item(Y); T3 reads X after N is subtracted, and reads Y before N is added, so a wrong summary is the result (off by N). RW Conflict(Unrepeatable Read) as T1 writes(Y) what T3 read Y:=Y+N; Write_item(Y) 16 -- CASE 4: We normally buffer popular pages in memory, to reduce I/O. Then very popular pages remain in main memory for extensive period. Problem: After a crash, the contents of main memory are lost. (this problem exists independently of concurrency) 17 One solution: After each update, write the buffered page back to disk(stable storage). Bad.(Why?) Better solution: use a LOG. Using the motion of transaction, it will become clear when the contents of the LOG have to be written on the disk. 18 Hence, TRANSACTIONS will be used for both: CONCURRENCY & Recovery 19 How can we ensure Concurrency & Recovery? Transactions have 4 basic preperties. Atomic Consistent Isolated Durable ACID properties 20 Atomicity: The instructions ‘packaged’ in a transaction either happen all or none of them happens. (All the transaction commits None the transaction aborted) --Hence a transaction cannot be left partially complete. --Example: if a transaction updates 200 records(e.g. give raise 10% to all employees), it cannot end after only 150 of them were updated. 21 Consistency: A transaction preserves the consistency (integrity constraints) of the database. Hence, during a transaction, the database may go from a consistent state to an inconsistent one, but after the transaction terminates the database is again consistent.(provided the transaction program is consistent with the DB constraints). 22 ISOLATION: --A transaction executes as if it was the only transaction running. Thus it is independent of other concurrently running transactions. --Hence: no transaction can see intermediate results of other transaction. 23 DURABILITY: After a transaction commits its updates (results) persist, even if system failures occur. 24 Note: --Recovery is related to: Atomicity & Durability (a trans, fails on its own) (system failures) --Concurrency is related to : Consistency & Isolation. 25 First we’ll talk about concurrency. It’s easier if we consider concurrency on its own[i.e. let’s assume for the moment that there are no failures] To understand Concurrency we need to define transaction Histories or Schedules. 26 NOTE: Concurrent execution of transactions means that their operations are interleaved. Problem: This interleaving if it is done carelessly it results into errors. To avoid such errors, some concurrency control is needed. 27 When a transaction starts it gets a UNIQUE ID (transaction identifier) More important operations inside a transaction package are those that read/write the database. Hence we concentrate on these only. 28 Notation: Ri(A) Meaning: Trans. With ID i performs a read on data item A. A: can be a table, a record, an object (granularity depends on application). 29 Example: Table X SSN Value and Trans. i performs: Ri(A) … A … … … … Select Value into: progr.val From x where SSN=A Similarly: Wj(B) Update X set value=:progr.val where SSN=B Part of transaction j 30 We sometimes associate the values read/write e.g. Ri(A,30), Wj(B,20) Usually a select_from_where results in read/write more than one item(a whole predicate) e.g. Update X set value = 1.1s*value where SSN between :low and :high --The DBMS “sees” it as a sequence of writes Wj(ss1), Wj(ss2)… 31 For simplicity assume only read/write(in general we may also have insert) In addition we are interested in Cj (transaction j commits) Ai (transaction i aborts) Then the history or schedule is an interleaved series of R,W,C,A’s Example: … R2(A) W2(A) R1(A) R3(B) R1(B) C1 R2(B)… 32 Another representation: T1 T2 T3 R2(A) W2(A) Time R1(A) R3(B) R1(B) C1 R2(B) 33 How is the notion of transactions supported in a DBMS? User1 User2… User n Application Programs Issue calls on behalf of the user: Open Cursor, Update, Fetch, Select, Insert, Delete, Commit, Work, Rollback Transaction Manager Intercepts calls, and initiates transactions when appropriate; assigns number, Ti. Decides which Ti to abort in event of deadlock; passes on Rollback as AborTi call. Scheduler Interprets all calls as sequences of reads and writes. Assures serializable schedule, using R and W locks. Detects deadlocks and passes back such information to TM. 34 Problem: The scheduler “sees” a sequence of operations from various transactions. How does it decide whether this interleaving produces correct results? First what is correct? A serial execution is correct. In a serial schedule each transaction finishes in its entirety before the next one executes. 35 Example: T1 T2 T1 T2 read_item(X); read_item(X); X:=X-N; X:=X+M; write_item(X); write_item(Y); read_item(Y); Y:=Y+N; read_item(X); write_item(Y) X:=X-N; write_item(X); read_item(X); read_item(Y); X:=X+M; Y:=Y+N; write_item(Y); write_item(Y) (a) Schedule A:T1 followed by T2 (b) Schedule B: Ts followed by T1 36 T1 T2 T1 read_item(X); read_item(X); X:=X-N; X:=X-N; read_item(X); T2 write_item(X); X:=X+M; read_item(X); write_item(X); sum:=sum+X; read_item(Y); write_item(Y); write_item(X); read_item(Y); Y:=Y+N; Y:=Y+N; write_item(Y); write_item(Y); ©Two schedules with interleaving of operations. 37 The serial schedule is easy for the scheduler to implement: --simply delay all other transactions until the first one finishes. Repeat for second transaction etc. (FCFS) --But : NO concurrency Question: How to interleave(i.e. increase concurrency) while still having correct results? 38 A schedule is called SERIALIZABLE if it is equivalent to a serial schedule. (of committed transaction) --That is: if produces the same effect as a serial schedule. Since serial schedule is correct then a serializable schedule is CORRECT. 39 How to analytically define equivalence? --Definition: Two operations are called CONFLICTING if: 1. They are from different transactions, 2. They access the same item, and, 3. At least one of them is WRITE. --Definition: Two Schedules that contain the same operations are conflict-equivalent if conflicting operations appear in the same order in both schedules. 40 --Conflict serializable: a schedule that is conflict equivalent to a serial one. (Hence, when two operations conflict in a schedule, the order in which they occur is important). 41 3 types of conflicting operations in a schedule: (1) Ri(A)…Wj(A) then in an equivalent serial schedule we showed have: Ti<<Tj (2) Wk(A)…Rl(A) then in an equiv. serial: Tk<<Tl (3) Wp(A)…Wr(A) then in an equiv. serial: Tp<<Tr --NOTE: Ri(A)…Rj(A) Does Not imply Ti<<Tj 42 Also: Ri(A)…Wj(B) does not imply anything as no conflict exists. Note: Transitivity holds, hence Ri(A)……Wk(A)……Rj(A) Ti<<Tk and Tk<<Tj Ti<<Tk<<Tj Thus we have one way to check for conflict serializability. 43 Consider: H: R2(A)N2(A)R1(A)R1(B)R2(B)W2(B)C1C2 --This history(schedule) is not serializable. Why? --W2(A) conflicts with R1(A) => T2<<T1 --R1(B) conflicts with W2(B) => T1<<T2 Which is a contradiction to (1) (1) (2) 44 See why this schedule may produce incorrect execution: Suppose A=50, B=50 T1: add A+B, Print it T2: transfer 30 from A to B R2(A,50) W2(A,20) R1(A,20) R1(B,50) R2(B,50)W2(B,80) C1, C2 T1 will print A+B=70 which is wrong????? 45 H’: T1<<T2 R1(A) R2(A) W1(A) W2(A) C1 C2 T2<<T1 Also non-serializable (this is a lost-update schedule) 46 H’’: W1(A) W2(A) W2(B) W1(B) C1C2 T1<<T2 T2<<T1 How can the scheduler check for conflict serializability? Use PRECEDENCE GRAPH A direct graph where: Vertices: committed trans. of schedule edges: conflicting operations 47 Serializability Theorem: A schedule(history) H has an equivalent serial execution H’ (i.e. it is serializable) iff the precedence graph of H contains NO cycle. Example: H’ T1<<T2 1 2 T2<<T1 48 In general the graph has many nodes as there are many transactions. 1 2 3 1 1 2 2 or 4 3 3 4 4 This is serializable Equivalent serial: 49 -- NOTE Another form of equivalence is “view equivalence” Also based on read/write operations but less stringent than “conflict equivalence” Two schedules S1 and S2 that contain the same transactions are said to be view equivalent if: 1. For each data item Q, if trans. Ti reads the initial value of Q in schedule S1, then Ti must also read the initial value of Q in S2. 50 2. For each data item Q, if trans. Ti executes read(Q) in schedule S1 and that value was produced by trans. Tj (if any), then Ti must in schedule S2 also read the value of Q that was produced by Tj. 3. For each data item Q, the transaction (if any) that performs the final write (Q) operation in S1 must perform the final write(Q) operation in S2. 51 Conditions 1 and 2 ensure that each trans. need the same values in both schedules. Condition 3 coupled with 1,2 ensure that both schedules result in the same final system stable. Every conflict-serializable schedule is view –serializ. But there are view-serializable schedules that are not conflict-serializable. 52 Example T2 T3 T5 Read(Q) Write(Q) Commit Write(Q) Commit Write(Q) commit 53 This schedule is view-serializable to serial schedule<T2,T3,T5> but is not conflict serialazable. (The reason: writes that do not come after a read as write(Q) in T3 and write(Q) in T5. Called: Blind writes Blind Writes appear in any view-serializable schedule that is not conflict –serializable!!!) 54 All schedules View-serializable schedule Conflict-serializable schedule Serial Schedules 55 Note: In conflict-serializable we can create the dependency-graph and decide whether a schedule is serializable (no cycles) For view-serializability there is no easy way to decide that(many graphs have to be tested for cycles which is an NP-complete problem, i.e. almost certainly we will need an exponential-time algorithm on the size of the graphs to search) Hence view-serializability is of no practical use. 56 However, to check the dependency graph for cycles is still not practical, if many transactions. A more practical solution: Use locks and 2phase locking. (2-phase locking is a protocol that says how locks are used. Locking on its own is not enough) 57 First we discuss locks: There are 2 kinds of locks -- read(or shared) -- write(or exclusive) Truth Table R W R Yes No W No No 58 When Ti issues a Ri(A) the scheduler intercepts this call and first issues a read lock on A for i. ---Rli(A) Similarly for Wi(A) it issues a write lock ---Wli(A) Before granting a lock to a transaction for a data item, the scheduler requires the requesting items. Hence a transaction may wait until no conflicting lock on this item exists. 59 NOTE: Conflicting locks work similarly to the notion of conflicting operations(see truth table) Locking on its own is NOT enough. H1:R1(A)R2(B)W2(B)R2(A)W2(A)R1(B)C1C2 recall that H1 is not serializable. Note however that locking only, could allow the above schedule to happen: ( RU: unlock read_lock WU: unlock write_lock ) 60 RL1(A) R1(A) RU1(A) RL2(B) R2(B) WL2(B) W2(B) WU2(B) RL2(A) R2(A) WL2(A) W2(A) RL1(B) R1(B) C1 C2 here we get a lock when it is needed and we get it go after we are done. Instead: 2-phase locking Two phases: growing phase (during which locks are acquired) and then shrinking phase (when locks are released) 61 But the two phases are separate. i.e. after shrinking phase starts(first lock is released) no new lock can be obtained. A trans. cannot release a lock and then acquire a new lock. 62 It can be proved that the schedules allowed by 2PL are conflict serializable. (Note: there are some few serializable schedules that 2 PL would not allow.But 2PL has the advantage of being a practical solution for concurrency.) Locks acquired by trans. time 63 Problem:Using Locks may lead to deadlock T4 RL(A) R(A) T2 WL(B) W(B) wait RL(B) R(B) wait WL(A) 64 How to deal with deadlock: (A). Deadlock detection WAITS-FOR-GRAPH ( uncommitted transactions ) T2 T1 T3 65 The scheduler creates a new node when a new trans. starts. Put an edge when a trans. waits (for another trans.) . Takes an edge away when waiting is done. Takes a node away when a trans. commits. The scheduler tests this graph for cycles at regular intervals. If cycle(deadlock) is found then the TM is informed and chooses a victim transaction to abort. Abort trans. will be retried later. 66 (B). Deadlock Prevention we can prevent deadlocks by giving each transaction a priority and ensure that lower priority trans. are not allowed to wait for higher priority ones(or vice-versa) (one way to assign priorities: timestamp the older transaction –lower timestamp--has higher priority) * Two ways to prevent: 67 -- Wait-die Suppose Ti requests a lock and Tj has a conflicting lock already. If Tj > Ti (Ti has higher priority) prior => Ti waits else Ti aborts (i.e. a lower priority transaction is killed) Note: the trans. with the lock is not affected) 68 --Wound-wait Ti requests a lock and Tj has a conflicting lock already If Tj > Ti (Ti has higher priority) => Tj aborts (now a higher priority can preempt the trans. that already has the lock) else Ti waits -- In both cases no deadlock can occur. 69 In Wait-Die lower priority trans. can never wait for higher priority trans. (the lower prior aborts) In Wound-Wait higher-priority trans. never waits for lower priority ones (the lower prior aborts) Difference: Wound-Wait is preemptive ( a trans. that runs can be aborted if a higher priority asks its locks => work is lost) 70 In wait-die younger transactions don’t have a chance! Which one to choose depends on the transaction workload and application. As said a usual trans. priority is its timestamp. Note: if a trans. is aborted for deadlock prevention, it must be restarted with the same timestamp as before, so as to avoid repeated aborts.(why? It will eventually become the highest priority (timestamp) and will get the locks!) 71 In practice: strict-2PL Release all (easier to implement as you don’t need to know when shrinking starts and safeguards against cascading aborts) But: if limits concurrency than classical 2PL. 72 There is also a version of strict 2PL called conservative – 2PL A transaction gets all locks that it will ever need when it starts(or else it keeps waiting until it can get all). Advantage: no deadlock(deadlock prevention) Disadvantage : limits concurrency as a transaction gets all locks earlier than actually needed. 73 Obviously: Conservative 2PL Strict 2PL 2PL Definition: a schedule is called strict if a value written be a trans. T is not read or overwritten by another trans. until T is aborted or committed. 74 A schedule is called recoverable if its transactions commit only after all transactions whose changes they read commit. 75 A schedule avoids cascading aborts if aborting a transaction can be accomplished without cascading the abort to other transactions. strict=>avoid casc.aborts =>recoverable recov Avoid casc. aborts Strict 76 Strict-2PL is a strict schedule => it does not create cascading aborts & is recoverable 2PL could have casc. aborts => in practice strict – 2PL is common 77 All Schedules View Serializable Conflict Serializable S1 S2 S3 S4 S5 S6 S7 S8 S9 Recoverable Avoid Cascading Abort Strict S10 S11 S12 serial Venn Diagram for Classes of Schedules 78 Concurrency control without locks To avoid deadlock there are other techniques for concurrency control: -- Timestamps Pessimistic(as -- Multiversioning 2PL) -- optimistic CC (2PL, Timestamps, multiversioning are examples of pessimistic conc. control. for system that expect a lot of conflicts. We also have optimistic schemes that are more efficient if the number of conflicts is relatively low). 79 Another technique for concurrency control: Timestamping A timestamp: a unique id that identifies a transaction. It is created by the DBMS. Assume the DBMS has a counter and when a transaction starts it gives it the timestamp. (like the TID but now it is used for implementing the concurrency). Timestamps determine the serializability order.(i.e.no locks are used) 80 If TS(Ti) < TS(Tj) then the system must guarantee that the produced schedule is equivalent to a serial one where Ti is before Tj. How is that ensured? When an item is accessed by more than one transactions, it is accessed in an order that does not violate serializability. 81 Each item x has two variables associated with it: 1. read_timestamp(x): the largest timestamp of a transaction that has already read item x, successfully. 2. write_timestamp(x): the largest timestamp of a transaction that has already written item x, successfully. 82 So when a transaction T issues a read(x) or write(x), the timestamp TS(T) is compared with read_TS(x), write_TS(x) to check if the order is violated. If the serializability order is violated, then T is aborted(rollback) and resubmitted(with new TS) 83 Protocol: (1) trans. T issues a write(x) operation: a. if read_TS(x) > TS(T) => abort & rollback T(why?some later trans. T1 has already used this item) b. if write_TS(x) > TS(T) then do NOT execute write(x) of T continue.(some later T1 has already written a later value for x). c. else execute write(x) and make write_TS(x) = TS(T). 84 (2) T issues a read(X) operation: a. if write_TS(X)>TS(T) then abort & rollback trans. T(since it will try to read later value). b. else (i.e. write_TS(X)<=TS(T)), execute read(X) and set read_TS(X)=MAX(TS(T), current readTS(x)) (This is a TS disadvantage: even for a read we may have to update data) 85 Explanations: 2.a: T tries to read a value of X which was already overwritten Hence reject this read and rollback T. 1.a: the value of X that T is producing was previously needed (and was assumed that would never be produced). Hence write is rejected and T is rollback. 1.b: T tries to write an obsolete value. No need to do that. Just continue.[ this is Thomas’s Write Rule. Allows to continue on obsolete writes instead of aborting T => increases concurrency.] 86 Note: if T that has aborted is restarted with same timestamp it is guaranteed to be aborted again! (=> use new TS when restart T) (note: different policy than TS in deadlock prevention) 87 Example: T1 Read(X) Read(Y) Display X+Y T2 Read(X) X=x-S0 Write(X) Read(Y) Y=Y+S0 Write Y Display (X+Y) 88 Suppose TS(T1)<TS(T2) then the following schedule is possible (and it is serializable) T1 Read(X) T2 Read(X) X=X-50 Write(X) Read(Y) Read(Y) Check it! Display(X+Y) Y=Y+50 Write(Y) Display(X+Y) 89 Note: there are serializable schedules that are possible under 2PL and are not possible under timestamping and vice versa. If Thomas Write Rule is not used then (like 2PL) the TS protocol allows ONLY conflict serializable schedules.(But each allows schedules that the other does not) With the TWR allowed, the TS protocol allows some serializable schedules that are not conflict serializable. 90 Ex. T1 T2 R(A) W(A) Commit W(A) commit Not conflict serializable (T1 << T2 << T1) but is still serializable.why? Tricky: T2’s W(A) is not seen by anyone hence as if it never happened T1 T2 R(A) Which is serializable! C W(A) 91 TS(with or without TWR) may permit schedules that are NOT-recoverable Ex. T1 W(A) T2 Assume TS(T1)=1 TS(T2)=2 R(A) W(B) C It is not recoverable since T2 reads a change of Ti & commits before T1 commits. However is allowed by TS protocol(with or without TWR) 92 One solution: Buffer writes until a trans. commits! Hence W1(A) is buffered until T1 commits (write TS(A) is updated though) R2(A) even though permissible is not allowed (i.e. T2 blocks) until T1 commits then A written from buffer to disk & T2 continues. Note: buffering looks like blocking!(as if exclusive look on A!) 93 Even with this modification TS & 2PL are still not the same!(one admits schedules that the other does not & vice-versa) Since recoverability is essential, the above modification is usually needed. But then 2PL seems (and is) more practical than TS for centralized DBs. TS has an advantage in distributed DBs! 94 Multiversion Schemes Up to now we ensured serializability by: (1) wait (locking 2PL) (2) abort (timestamps) There is another approach under which each write(x) creates a new version of item x. With this approach a read(x) never fails. Simply read the appropriate version of item X. Better C.C. Protocol for workloads dominated by transactions that only read values from DB. 95 The concurrency control scheme must ensure that the selection of the version to be read is done in a manner that ensures serializability. This must be done quickly for good performance. Again we will use timestamps. Each trans. Ti gets a unique, static timestamp TS(Ti). With each data item Q, a sequence of versions < Q1, Q2, …, Qm > is associated. 96 Each version Qk has three data fields: - content (the value of version Qk) - write_timestamp(Qk); the timestamp of the trans. That created Qk. - read_timestamp(Qk); the largest timestamp of any trans. that successfully read version Qk. A trans. Ti create Qk on data item Q by issuing a write(Q) operation. The content of Qk is written by Ti and the write_TS and read_TS of Qk are initialized to TS(Ti). The read_TS of Qk is updated when a trans. Tj reads Qk and read_TS(Qk) < TS(Tj) 97 The following multiversion protocol ensures serializability. - Assume Ti issues read(Q) or write(Q). - Let Qk be the version of Q with the largest write_TS(Qk) < TS(Ti) (1) if Ti issues a read(Q) then the value returned is that if the content of Qk. (2) if Ti issues a write(Q) and TS(Ti)<read_TS(Qk) then Ti is rolled back. Else (TS(Ti)>read_TS(Qk)) a new version Qk is created. 98 Justification (1) the trans. needs to read the most recent version to its timestamp. (2) A trans. will abort if it come “too-late” in doing a write(some other transaction that was later, has already read the item). Advantage: Read operation never fails or never has to wait. Thus good for systems with many reads and few writes. 99 Disadvantage: (a) when read(Q), need to update the read_TS(Q) (i.e. maybe a second I/O) (b) conflicts are still resolved through rollbacks (as with timestamp protocols) instead of waits (as with locking protocols), which maybe too expensive. (a lot of work may have been done on a transaction that is forced to abort). 100 Optimistic concurrency control Previous techniques assume that is much contention for common data among transactions. The concurrency control followed by 2PL, timestamps, multiversioning is thus pessimistic. But this creates overhead in running a transaction (e.g. keeping the locks, checking timestamps etc.) To reduce such overhead we may let transactions run freely and check them (validate them) when they finish. 101 If not many transactions contend for some resources (data) most of them will pass the validation test! To monitor this execution, a transaction is said to be in two or three different phases in its lifetime: 1. Read Phase. During this phase, the execution of the trans. Ti takes place. The values of the various data items “read” are stored in variables local to Ti. All “write” operations are performed on these local variables, without updating the database. 102 2. Validation Phase. When its work is done, trans. Ti performs a validation test to determine whether it can copy to the database the temporary local variables (that hold the results of its “writes”), without violating the serializability. (basically it checks whether its results violate serializability against the already committed trans.) 103 3. Write Phase if Ti succeeds in validation, then the actual updates are applied to the database. Otherwise Ti is rolled back. To perform the validation test we associate three different timestamps to Ti. --Start(Ti): the time Ti started its execution. --Validation(Ti): the time Ti finished its read phase and started its validation phase. --Finish(Ti): the time when Ti finished its write phase. 104 The serializability order is determined by the timestamp ordering technique, using the validation(Ti) as the transactions identifying timestamp, i.e. : TS(Ti) = Validation(Ti) Then if TS(Tj) < <TS(Tk) any produced schedule should be equivalent to a serial schedule where Tj is before Tk. The validation test of transaction Ti requires that for all Th such that TS(Th) < TS(Ti), one of the following two conditions must hold: 105 1. Finish(Th) < Start(Ti) Since Th completes its execution before Ti started, the serializability is maintained. 2. The set of data items written by Th does not intersect with the data items read by Ti, and Th completes its write phase before Ti’s validation (Start(Ti)<Finish(Th)<Validation(Ti)) --This condition ensures that writes of Th and Ti do not overlap. Since the writes of Th do not affect the read of Ti and since Ti cannot affect the read of Th, the serializability is maintained. 106 Example: Consider T17, T18 with TS(T17)<TS(T18) then following schedule is serializable and allowed by validation protocol (but not allowed by 2PL or timestamp-ordering protocols) 107 T17 T18 read(B) read(B) B:=B-50 write(B) read(A) A:=A+50 write(A) Note: the validation protocol guards against cascading rollbacks since the actual writes take place only after validation phase(i.e. the trans. committed). read(A) display(A+B) display(A+B) 108 Optimistic Conservative 2PL Strict 2PL 2PL Timestamp W/O TWR Timestamp with TWR Multiversion Conflict-serializable Serializable 109 Comparison of all CC protocols interesting point: All CC protocols discussed are subset of conflict-serializable schedules (one more reason why we concentrate on conflictserializability) Note that checking the validation criteria requires the maintenance of lists of objects read/written by each transaction. 110 Also: while one transaction is being validated no other transaction can commit (otherwise the first trans. can miss any conflicts with respect to the newly committed trans.) Hence even optimistic concurrency control has some overhead (2PL has lock maintenance 2PL blocks=>have to wait) while Opt.C.Control may require restarting a transaction. 111 Transaction Support in SQL-92 A transaction is automatically started when for example the user writes SELECT, UPDATE, CREATE TABLE, INSERT etc. A transaction can be terminated by -- a COMMIT command -- a ROLLBACK command (the SQL keyword for ABORT) Each Transaction has -- access mode -- isolation level -- diagnostics (for error conditions) 112 Access mode --Read Only: the transaction is not allowed to modify the DB. (i.e. only shared locks) --Read Write: the transaction is allowed to modify the DB.(it’s the default) Isolation level: controls the extent to which a transaction is exposed to other concurrent transactions. Choices: -- Read uncommitted -- Read committed -- Repeatable Read -- Serializable 113 Level Read Uncommitted Read committed Repeatable read Serializable Dirty Read Maybe Unrepeatable Read Maybe Phantom No Maybe Maybe No No Maybe No No No Maybe Transaction Isolation Levels in SQL-92 114 Recall --Dirty Read(WR conflict): if a transaction could read an object written by an uncommitted trans. --Unrepeatable Read(RW conflict): if a transaction could write an object that has been read by an uncommitted transaction. --Phantom: if a transaction reads a collection of objects twice and sees different results even though it did not modify them itself. Highest degree of isolation(strict 2PL is used): serializable(default) 115 Repeatable read same as serializable except that it does not lock “sets” of objects (hence phantom phenomenon could occur) The lower the isolation degree the less safe the transaction, but maybe improved system performance.(some trans. could live with few missing values, ex. statistical queries=>less isol. degree) 116