Concurrency Control Protocols In order to ensure concurrent transactions, DBMS introduce concurrency control protocols. Classification of Concurrency Protocols: 1) Lock based Protocol a) Two phase locking protocol i) Basic 2PL ii) Conservative 2PL iii) Strict 2PL iv) Rigorous 2PL b) Graph based Protocol 2) Time Stamp based Protocol a) Time Stamp Ordering b) Thomas’s Write Rule 3) Multiple Granularity Protocol 4) Multi Version Protocol Two-Phase-Locking Protocol Basic 2PL: Transaction is said to follow the two phase locking protocol if all locking operations precede the first unlock operation. Expanding(growing)=first phase Shrinking=second phase During shrinking phase no new locks can be acquired Downgrading OK Upgrading is not Conservative 2PL (or static 2PL): Lock all items needed BEFORE execution begins by predeclaring its read and write set If any of the items in read or write set is already locked(by other transactions), transaction waits(does not acquire any locks) Deadlock free but not very realistic. Strict 2PL: Transaction does not release its write locks until AFTER it aborts/commits Not deadlock free but guarantees recoverable schedules (strict schedule: transaction can neither read/write X until last transaction that wrote X has committed/aborted) Most popular variation of 2PL. Rigorous 2PL: No lock is released until after abort/commit Transaction is in its expanding phase until it ends. Graph based protocols The simplest graph based protocols is tree locking protocol which is used to empty exclusive locks and when the database is in the form of a tree of data items. In the tree locking protocol, each transaction Ti can lock a data item at most once and must observe the following rules. a) b) c) d) e) All locks are exclusive locks The first lock by Ti can be any data item including the root node. Ti can lock a data item Q only if Ti currently locks the parent of Q Data item may be unlocked at any time Ti cannot subsequently lock a data item that has been locked and unlocked by Ti A schedule with a set of transactions that uses the tree locking protocol can be shown to be serializable. The transactions need not be two phase. Advantage of tree locking control: a. Compared to the two phase locking protocol, unlocking of data item is easier waiting time . So it leads to the shorter waiting times and increase in concurrency. Disadvantages of tree locking control: a. A transaction may have to lock data items that it does not access descendants we have to lock its parent also. So the number of locks and associated locking overhead is high. Timestamp based protocols The use of locks, combined with the two phase locking protocol, allows us to guarantee serializability of schedules. The order of transactions in the equivalent serial schedule is based on the order in which executing transactions lock the item they require. If a transaction needs an item that is already locked, it may be forced to wait until the item is released. A different approach that guarantees serializability involves using transaction timestamps to order transaction execution for an equivalent serial schedule. Time Stamps A timestamp is a unique identifier created by the DBMS to identify a transaction. Timestamp values are assigned in the order in which the transactions are submitted to the system. So a timestamp is considered as the transaction start time. With each transaction Ti in the system, a unique timestamp is assigned and it is denoted by TS (Ti). When a new transaction Tj enters the system, then TS (Ti) <TS (Tj), this is known as timestamp ordering scheme. To implement this scheme, each data item (Q) is associated with two timestamp values. 1. “W-timestamp (Q)” denotes the largest timestamp of any transaction that executed write (Q) successfully. 2. “R-timestamp (Q)” denotes the largest timestamp of any transaction that executed read (Q) successfully. These timestamps are updated whenever a new read (Q) or write(Q) instruction is executed. Timestamp ordering protocol The timestamp ordering protocol ensures that any conflicting read and write operations are executed in timestamp order. This protocol operaton is as follows. A. Suppose transaction Ti issues read(Q) a. If TS (Ti) <W-timestamp (Q) then Ti needs to read a value of Q that was already overwritten. Hence, the read operation is rejected and Ti is rolled back. b. If TS (Ti)>=W-timestamp (Q), then the read operation is executed and R-timestamp (Q) is set to the maximum of R-timestamp (Q) and TS(Ti). B. Transaction issue a write(X) a. If TS(T)<read-timestamp(X), this means that a younger transaction is already using the current value of the item and it would be an error to update it now. This occurs when transaction is late in doing a write and younger transaction has already read the old value. b. If TS(Ti)<write-timestamp(X), this means transaction T asks to write any item(X) whose value has already been written by a younger transaction i.e. ,T is attempting to write an absolute Value of data item X So T should be rolled back and restarted using a later timestamp. c. Otherwise, the write operation can proceed we set write-timestamp(X)=TS(Ti) This scheme is called basic timestamp ordering and guarantees that transaction are conflict serializable and the results are equivalent to a serial schedule. Advantages of timestamp ordering protocol 1) Conflicting operations are processed in timestamp order and therefore it ensures conflict serializability. 2) Since timestamps do not use locks, deadlocks cannot occur. Disadvantages of timestamp ordering protocol 1) Starvation may occur if a transaction is continually aborted and restarted. 2) It does not ensure recoverable schedules. Thomas’s Write Rule A modification to the basic timestamp ordering protocol is that it relaxes conflict serializability and provides greater concurrency by rejecting absolute write operation. The extension is known as Thomas’s Write Rule. Suppose transaction T1 issues read (Q) :no change, same as Time stamp ordering protocol. If transaction issues a write(X) a) If TS (Ti) <read-timestamp(X), this means that a younger transaction is already using the current value of the item and it would be an error to update it now. This occurs when a transaction is late in doing a write and younger transaction has already read the old value. b) If TS (Ti)<write-timestamp(X). This means that a younger transaction has already updated the value of the item and the value that the order transaction is writing must be based on the absolute value of the item. In this case write operation can safely be ignored. This is sometimes known as ignored absolute write rule and allows greater concurrency. Multiple Granularity In all concurrency control schemes, we have used each individual data item as the unit on which synchronization is performed. However, it would be advantageous to group several data items and to treat them as one individual unit. Example, if a transaction Ti needs to access the entire database, it uses a locking protocol. Then Ti must lock each item in the database, so it is time consuming process. Hence it would be better if Ti would issue a single lock request to lock the entire database. On the other hand if transaction Ti needs to access only a few data items, it should not be required to lock the entire database. A data item can be one of the following. 1. A database record 2. 3. 4. 5. Field value of database record A disk block Whole File Whole database The size of database item is often called the data item granularity. Fine granularity refers to overall item size where as coarse granularity refers to large item sizes. The best item size depends on the type transaction. Hierarchy of data granularities, where the small granularities are nested within larger ones, can be represented graphically a tree. In the tree, each node represents independent data item, nonleaf node of the multiple granularity tree represents the data associated with its descendents. Level 0 Level 1 Level 2 Level 3 Level 4 DB DDB Files Pages Records Fields The highest level represents the entire database, then files, pages, records and fields. Hence we can use shared and exclusive lock when a transaction locks a node, all the descendants of the node in the same lock node. To make multiple granularity level locking practical, additional types of locks called intention locks are needed. The idea behind intention locks is for a transaction to indicate, long path from the root to the desired node, what type of the lock it will require from one of the node’s descendants. There are three types of intention locks, they are 1. Intention Shared(IS) to indicate that a shared lock will be requested on some descendant node 2. Intention Exclusive(IX)to indicate that a exclusive lock will be requested on some descendant node 3. Shared intention exclusive(SIX)to indicate that the current node is locked in shared mode but an exclusive lock will be requested Compatibility Matrix for multiple granularity locking IS IX S SIX X IS T T T T F IX T T F F F S T F T F F SIX T F F F F X F F F F F Multi version Schemes In multi version database systems, each write operation on data item say Q creates a new version of Q. When a read (Q) operation is issued, the system selects one of the versions of Q to read. The concurrency control scheme must ensure that the selection of the version to be read is done in a manner that ensures serializability. There are two multi version schemes 1. Multi version Timestamp ordering 2. Multi version Two-phase locking Multi version timestamp ordering In this technique, several versions Q1, Q2,……Qk of each data item Q are kept by the system. For each version the value of the version Qk and the following two timestamps are kept. 1) W-timestamp (Qk) is the timestamp of the transaction that created version Qk. 2) R-timestamp (Qk) is largest timestamp of any transaction that successfully read version Qk. The scheme operates as follows when transaction Ti issues a read (Q) or writes (Q) operation. Let Qk denote the version of Q whose timestamp is the largest write timestamp less than or equal to TS (Ti). 1) It transaction issues a read(Q), then the value returned is the content of version Qk 2) It transaction Ti issues a write(Q), and if TS(Ti)<R-timestamp(Qk), then transaction Ti is rolled back. Otherwise if TS(Ti)=W-timestamp(Qk) the contents of Qk are over written, otherwise a new version of Q is created. Advantages 1) The read request never fails and is never made to wait. Disadvantages 1) It requires more storage to maintain multiple versions of the data item. 2) Reading of a data item also requires the update of the R-timestamp field, resulting in two potential disk accesses. 3) Conflicts between transactions are resolved through rollbacks, rather than through waits. This may be expensive. Multi version two Phase Locking The multi version two phase locking protocol attempts to combine the advantages of multi version concurrency control with the advantages of two phase locking. In the standard locking scheme, once a transactions obtain a write lock on an item, no other transactions can access that item. So here it allows other transactions T1 to read an item X while a single transaction T holds a write lock on X. This is accomplished by allowing two versions of each item of X. When an update transaction reads an item it gets shared lock on the item and reads the latest version of that item. When an update transactions wants to write an item, it first gets an exclusive lock on the item and then creates a new version of the data item. The write is performed on the new versions the timestamp of the new version is initially set to a value ∞. Advantages 1) Reads can proceed concurrently with a write operation but it is not permitted in standard two phase locking. 2) It avoids cascading aborts, since transactions are only allowed to read the version that was written by a committed transaction. Disadvantages 1) It requires more storage to maintain multiple versions of data item. 2) Dead locks may occur.