Advanced Database Systems and Data Warehousing INTEGRITY AND CONCURRENCY IN DATABASE SYSTEMS By: Benmammass Mehdi Outline ► Integrity Introduction Achieving integrity in a database system ► Integrity Subsystem Component ► Integrity Rules ► Concurrency Introduction Some important definitions Lock-Based Protocols Deadlock Avoidance in lock-based protocols Locking Granularity Optimistic Concurrency Control ► Conclusion Introduction - Integrity ► The main features that a database system should exhibit are : Accuracy Correctness Validity ► An integrity constraint guards against accidental damage of database. ► It ensures data consistency by allowing only authorized changes in the database. ► The Integrity Subsystem is a component of the DBMS. Outline ► Integrity Introduction Achieving integrity in a database system ► Integrity Subsystem Component ► Integrity Constraints ► Concurrency Introduction Some important definitions Lock-Based Protocols Deadlock Avoidance in lock-based protocols Locking Granularity Optimistic Concurrency Control ► Conclusion Integrity subsystem ► The role of an integrity subsystem is : Monitoring transactions and detecting integrity violations. Take appropriate actions given a violation. ► The integrity subsystem is provided with a set of rules that define the following : the errors to check for; when to check for these errors; what to do if an error occurs. Integrity Rules ► Set of rules stored in the system dictionary by Integrity Rule Compiler. ► A new integrity rule, before being adopted, must fulfill all the existing rules. RULE#1 : AFTER UPDATING sales.quantity : sales.quantity > 0 ELSE DO ; set return code to “RULE#1 violated” ; REJECT ; END ; Integrity Rules ► The general structure of an integrity rule is Trigger condition (after updating, inserting…) Constraint (sales.quantity >0) Violation response (else do…) ► There are three types of integrity rules Domain Integrity Rule The relation integrity rules The fansets integrity constraints Domain Integrity Rule (1) DCL S# PRIMARY DOMAIN CHARACTER (5) SUBSTR (S#,1,1) = ‘S’ AND IS_NUMERIC (SUBSTR (S#,2,4)) ELSE DO; Set return code to “S# domain rule violated” ; REJECT ; END ; ► S# is a string of 5 characters. Te first character is an S and the last 4 characters are numeric. Domain Integrity Rule (2) ► Composite domains : a domain DATE which is composed of three domains DAY, MONTH and YEAR ► User-Written Procedures. ► Interdomain Procedures : some conversion rules (procedures) may help for example to compare two values from two distinct domains (distance expressed in kms and miles). Relation Integrity Rule (1) ► Immediate record state constraints After updating or inserting sales.quantity, verify : sales.quantity > 0 ► Immediate record transition constraints New_date > sales.date ► Immediate set state/transition constraints define a key uniqueness and enforcing non-null values of the key (Entity Integrity Rule) impose referential integrity (Foreign Key Integrity Rule) Relation Integrity Rule (2) ► Deferred record state constraints ► Deferred record transition constraints ► Deferred set state constraints Applied at the end of the transaction (WHEN COMMITING). We need this kind of constraints because sometimes the set of updates in a transaction violates temporarily the rule. ► Deferred set transition constraints Other Integrity Constraints ► Fanset Integrity Rules Used in network databases. They prevent integrity violations by providing referential integrity. ► Triggered procedures Integrity rules are special case of triggered procedures. Are useful to carry out the following tasks : ► Prevent the user that deleting a client will delete all its sales. ► Access security. ► Performance measurement of the database. ► Controlling stored record (compressing and decompressing data when storing and retrieving data). ► Exception reporting (expiry date for medicaments) Outline ► Integrity Introduction Achieving integrity in a database system ► Integrity Subsystem Component ► Integrity Constraints ► Concurrency ► Introduction Some important definitions Lock-Based Protocols Deadlock Avoidance in lock-based protocols Locking Granularity Optimistic Concurrency Control Conclusion Concurrency Control - Introduction ► Contention occurs when two or more users try to access simultaneously the same record. ► Concurrency occurs when multiple users have the ability to access the same resource and each user has access to the resource in isolation. Concurrency is high when there is no apparent wait time for a user to get its request. Concurrency is low when wait times are evident ► Consistency occurs when users access a shared resource and the resource exhibits the same characteristics and satisfies all the constraints among all operations. Concurrency Control - Introduction ► Example (Bank transactions) : 2 accounts A and B (assume balances A and B=100DH) 2 transactions T1 and T2 that will be executed concurrently ► ► T1 : start, A=A+100, B=B-100, COMMIT T2 : start, A=A*1.05, B=B*1.05, COMMIT Concurrency Control - Introduction Consider these two different sequences of execution : ► T1 ► T1 A+100 A=A+100 B=B-100 COMMIT ► T2 A=A*1.05 B=B*1.05 COMMIT ► T2 A=A*1.05 INTERFERENCE B=B*1.05 COMMIT ► T1 B=B-100 COMMIT A = 210 DH A = 210 DH B = 0 DH B = 5 DH Concurrency Control - Solutions ► Lock-Based Protocols ► Timestamp Techniques ► Optimistic Concurrency Control Concurrency Control – Lock Manager ► A Lock manager can be implemented as a separate process to which transactions send lock and unlock requests ► The lock manager replies to a lock request by sending a lock grant messages (or a message asking the transaction to roll back, in case of a deadlock) ► The requesting transaction waits until its request is answered ► The lock manager maintains a data structure called a lock table to record granted locks and pending requests Concurrency – LB Protocols (1) Principle : ► Transactions ask for a lock on a record before updating it. ► After update, the record is unlocked ► We have two types of locks : exclusive (X) mode: data can be both read as well as written. X-lock is requested using lock-X instruction. Records are the unit of locking. shared (S) mode: data can only be read. S-lock is requested using lock-S instruction. Tables are the unit of locking. LB Protocols (2) Shared and Exclusive Locks Transaction A Shared Lock Exclusive Lock Exclusive Lock Accounts Table Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Transaction B Shared Lock Exclusive Lock Exclusive Lock LB Protocols (3) ► Compatibility Matrix Shared lock Exclusive lock Shared lock True False Exclusive lock False False Lock Table ► ► ► ► ► Black rectangles indicate granted locks, white ones indicate waiting requests Lock table also records the type of lock granted or requested New request is added to the end of the queue of requests for the data item, and granted if it is compatible with all earlier locks Unlock requests result in the request being deleted, and later requests are checked to see if they can now be granted If transaction aborts, all waiting or granted requests of the transaction are deleted lock manager may keep a list of locks held by each transaction, to implement this efficiently LB Protocols (4) – PX Protocol ► Any transaction that intends to update a record must first execute an exclusive lock request (X-lock) on that record. ► If the lock cannot be acquired, the transaction goes into a wait state. ► When the record becomes available, the lock can be granted and the transaction can resume processing. LB Protocols (5) – PX Protocol ► Example : 2 transactions Transaction 1: lock-X(B) read(B) B = B -50 write(B) unlock(B) lock-X(A) read(A) A = A + 50 write(A) unlock(A) Transaction 2 : lock-S(A) read(A) unlock(A) lock-S(B) read(B) unlock(B) display(A+B) LB Protocols (6) – PX Protocol Execution sequence : Transaction 1 lock-X(B) read(B) B = B -50 write(B) unlock(B) lock-X(A) read(A) A = A + 50 write(A) unlock(A) Transaction 2 Concurrency control manager grant-X(B) lock-S(A) read(A) unlock(A) lock-S(B) read(B) unlock(B) display(A+B) grant-S(A) grant-S(B) grant-X(A) LB Protocols (7) – PX Protocol ► Serializability : interleaved execution sequence of a set of transactions that will obtain the same results as if the transactions are processed serially. ► We have to look at the lock requests of each transaction and to find an order to execute them without any interference between then. The resulting sequence, if there is one, implies that the two transactions are serializable. ► PX Protocol then can be applied. LB Protocols (7) – PX Protocol ► Using the lock-based mechanism, deadlock and starvation can occur. This is an example of deadlock : Transaction A Transaction B Accounts Table Shared Lock Already XLocked Asks for an XLock Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Shared Lock Asks for an XLock Already XLocked LB Protocols – PXC Protocol ► Derived from PX protocol. ► Exclusive locks are retained until end of transaction (COMMIT or ROLLBACK). ► PXC helps to avoid loss of updates bcause of ROLLBACK. No transaction is allowed to update an uncommitted changed record. LB Protocols – PS / PSC Protocols ► Any transaction that updates a record must firstly ask for a shared lock of that record. ► During the transaction, just before the update command, comes a request of changing the lock-S to lock-X. ► A transaction should not be allowed to lock itself out. ► The goal here is to limit the duration of X-locks. LB Protocols – PS Protocol ► Example (here deadlock occurs at T4): Transaction A SFIND Record1 --UPD Record1 --- Time T1 T2 T3 T4 Transaction B --SFIND R1 --UPD R1 LB Protocols – PU / PUC Protocol ► This protocol uses a third lock state : the update lock. ► Any transaction that intends to update a record is required to ask for U-lock of that record. A U-lock is compatible with an Slock but not with another U-lock. ► Replacing S-locks by U-locks will prevent deadlock. LB Protocols – PU / PUC Protocol ► Compatibility ►Example matrix : X S U X S False False False True False True U False True False : compare PU and PS protocols LB Protocols – PU Protocol ► This protocol is more efficient than the previous ones. ► It limits considerably deadlock, because it decreases the number of S-locks. LB Protocols –Two Phase Locking Protocol ► 2PL ensures conflict-serializable schedules. ► 2PL includes two phases : Growing phase : transaction may obtain locks and may not release locks Shrinking phase : transaction may release locks and may not obtain locks. ► The schedule is determined in the relation to the order of their lock points. ► If all transactions are two-phase, then all executions are serializable. LB Protocols –2PL Protocol Example : LB Protocols –2PL Protocol ► There are many protocols derived from 2PL : Strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits. Rigorous two-phase locking is even stricter: here all locks (shared and exclusive) are held till commit. In this protocol transactions can be serialized in the order in which they commit. Graph-based protocol : we fix an order of accessing data. If a transaction has to update Row2 and read Row1, it has to access these data in a predefined order. LB Protocols – Deadlock Avoidance ► Deadlock prevention protocols ensure that the system will never enter into a deadlock state. It can be achieved using different strategies : Transaction Scheduling Request Rejection Transaction Retry LB-Protocols : Deadlock Prevention Strategies ► Timeout-Based Schemes : a transaction waits for a lock only for a specified amount of time. After that, the wait times out and the transaction is rolled back. thus deadlocks are not possible simple to implement; but starvation is possible. Also difficult to determine good value of the timeout interval. LB-Protocols : Deadlock Prevention Strategies ► What to do when a deadlock is detected ? ► Some transactions will have to roll back to break deadlock. Select that transaction as victim that will incur minimum cost. ► We have to determine how far to roll back the transaction. We can either carry out : Total rollback: Abort the transaction and then restart it. Partial rollback: it is more effective to roll back transaction only as far as necessary to break deadlock Deadlock Avoidance : Transaction Scheduling ► Two transactions will not be run concurrently if their data requirements conflict. ► We must know what are the data requirements of each transaction before run time => impossible till runtime. ► Consequently, the lock unit is a set of records and locks are applied at transaction initiation instead of during execution. Deadlock Avoidance : Request Rejection ► The system rejects any lock request that cannot be applied. ► It uses the deadlock detection algorithm. ► When trying to grant a lock request, if a deadlock is detected, the transaction is rejected. Deadlock Avoidance : Transaction Retry ► Transactions are timestamped with their start time. Example : A requests a lock on a record already locked by B ► wait-die scheme — non-preemptive ► A waits if it is older than B, otherwise, it dies and it is rolled back and automatically retried. a transaction may die several times before acquiring its needed data item ► wound-wait scheme — preemptive A waits if it is younger than B, otherwise it wounds (forces rollback) of younger transaction instead of waiting for it. Younger transactions may wait for older ones. Less rollbacks than wait-die scheme. ► Transactions retain their timestamps even if they are rolled back. LB-Protocols : Deadlock Detection Algorithm ► The system is in a deadlock state if and only if the wait-for graph has a cycle. ► The system must invoke a deadlockdetection algorithm periodically to look for cycles. LB-Protocols : Deadlock Detection Algorithm Wait-for graph without a cycle Back Wait-for graph with a cycle LB Protocols : Locking Granularity ► Allow data items to be of various sizes and define a hierarchy of data granularities, where the small granularities are nested within larger ones. ► Can be represented graphically as a tree. When a transaction locks a node in the tree explicitly, it implicitly locks all the node's descendents in the same mode. ► Granularity of locking (level in tree where locking is done): fine granularity (lower in tree): high concurrency, high locking overhead coarse granularity (higher in tree): low locking overhead, low concurrency LB Protocols : Locking Granularity ► The highest level in the example hierarchy is the entire database. ► The levels below are of type area, file and record in that order. LB Protocols : Intent Locking Protocol ► In addition to S and X lock modes, there are three additional lock modes with multiple granularity: intention-shared (IS): indicates explicit locking at a lower level of the tree but only with shared locks. intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks shared and intention-exclusive (SIX): the subtree rooted by that node is locked explicitly in shared mode and explicit locking is being done at a lower level with exclusive-mode locks. ► Intention locks allow a higher level node to be locked in S or X mode without having to check all descendent nodes. LB Protocols : Intent Locking Protocol Compatibility Matrix : IS IX S S IX IS IX S S IX X X LB Protocols : Intent Locking Protocol ► Transaction Ti can lock a node Q, using the following rules: 1. The lock compatibility matrix must be observed. 2. The root of the tree must be locked first, and may be locked in any mode. 3. A node Q can be locked by Ti in S or IS mode only if the parent of Q is currently locked by Ti in either IX or IS mode. 4. A node Q can be locked by Ti in X, SIX, or IX mode only if the parent of Q is currently locked by Ti in either IX or SIX mode. 5. Ti can lock a node only if it has not previously unlocked any node (that is, Ti is two-phase). 6. Ti can unlock a node Q only if none of the children of Q are currently locked by Ti. ► Locks are acquired in root-to-leaf order, whereas they are released in leaf-to-root order. LB Protocols ► Default Locking Behavior for Oracle A pure SELECT will not lock any row. INSERT, UPDATE or DELETE will place a row Exclusive Lock (X-lock). SELECT...FROM...FOR UPDATE will place a row Shared Lock (S-lock). LB Protocols ► Oracle Syntax: LOCK TABLE [schema.] table [options] IN lock mode MODE [NOWAIT] Options: ► PARTITION partition ► SUBPARTITION subpartition ► @dblink Lock modes: EXCLUSIVE ► SHARE ► ROW EXCLUSIVE ► SHARE ROW EXCLUSIVE ► ROW SHARE* | SHARE UPDATE* ► Optimistic Concurrency Control Read ►A Validation Write transaction in OCC is composed of three phases : Read Phase Transactions access the database to load data, then they update data in a separate buffer. Validation phase For each transaction, the system checks if there is any conflict with another transaction. If there is, the transaction is rolled back, otherwise the write phase can proceed. Write phase Updates are written from the buffer to the database. CONCLUSION ► Locking is a pessimist concurrency control, because it assumes maximum contention. ► OCC is dead-lock free because it does not implement locking. References ► Date Book ► Cooperative Cataloging in a Scalable Digital Library System, Dr Hachim Haddouti ► Transaction Management, IBM Research Laboratory, San Jose California ► ► Performance of Concurrency Control Mechanisms in Centralized Database Systems, Vijay Kumar. Overview of concurrency control and locking for databases. www.odbmsfacts.com/articles/ concurrency_control_and_locking.html Integrity and Concurrency Control in Database System Q&A