Granularity of Locks and Degrees of Consistency in a Shared Database J.N. Gray, R.A. Lorie, G.R. Putzolu, I.L. Traiger 1977 (Presentation by Randy Ellis) Topics to be Discussed: * Transaction definition * Database Object Hierarchy * Locking Types * Locking Compatibility * Locking Granularity/Protocol * Transaction Consistency * Transaction Isolation * Transaction Interdependence * Transaction Recovery An initial definition of “Transaction”: A transaction is a logical unit of work that can be done to the database by any single client process/session. For now, we will accept that the transaction can be made up of multiple statements/operations and will assume that the break between one transaction and the next is well defined by the database. The Tradeoff! Concurrency: The ability to have one transaction work with one set of resources while another transaction “Simultaneously” works with others Overhead: The amount of file structure, memory and processing time needed to keep those different transactions from affecting each other undesirably The DBMS Structure Waterfall: Database Shows resources that different transactions must “share”. DB Spaces Table Spaces Tables Views Pages Indexes Cursors Rows Columns And probably more! The “Simplified” DBMS Structure Waterfall: Now this is something we can work with! Still shows resources that different transactions must “share”. Is a DIRECTED ACYCLIC GRAPH (DAG) Database Each resource instance has a class type name and a unique ID/Name (e.g. Database:Accounting, Table:Salaries) Table Spaces (Areas) Tables (Files/Relations) Each resource must be reached THROUGH A PARENT! Indexes Rows (Records) Note: Two ways to reach a record! Transactions protect each other from changes by “Locking” the resources (generally) Types of Database Locks: Null (NL) No locks for the node Exclusive/Write (X) Prevents other transactions from replacing changes that haven't been committed And MAY prevent other transactions from reading changes that haven't been committed Share/Read (S) Prevents other transactions from overwriting data it is currently looking at. Intention Locks (I_) Set on ancestors to denote we "intend" to set the real thing on one of its descendents. Therefore intention locks have no use on leaf nodes. Note: The combination of intention locks on the ancestors and real locks on the descendents is equivalent to the use of real locks only on the desired node. We have to set sentinels higher up in the digraph to prevent conflict with transactions arriving at our node via other paths. We could use real locks higher up but the use of "Intention" locks allows us to work with that node, and still avoid changes that might conflict with the lock further down -- this increases our concurrency. Lock Compatibility: When a transaction attempts to create a new lock on an object that already has locks, the locking mechanism determines if the new lock is "compatible" with the existing one based on the types of the new and existing locks. If compatible: The new lock will be set and the transaction will be allowed to proceed. If not: 1) The transaction may be given an error (non-blocking) to relay the user 2) OR The transaction may be suspended and enqueued on the object until the incompatible lock(s) are released Note: If the locks are never removed the locking mechanism will timeout and check for deadlock - failing one of the two transactions to correct the situation. Tran 2 T r a n 1 NL NL Yes IS Yes IX Yes S Yes SIX Yes X Yes IS IX S SIX X Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes No No No Yes No Yes No No Yes No No No No No No No No No If Tran 1 holds a lock of the given type and Tran 2 requests another lock of the given type will that request be granted? Lock Granularity: A transaction may set locks on more than one node at a time and those nodes are not restricted to be leaf nodes. If a transaction sets a lock on a non-leaf node that lock “implicitly” applies to all descendents of that node. Performing locks at a lower (item) level allows other transactions to access other items in the lot – this HIGH GRANULARITY approach provides for maximum concurrency but requires more overhead to maintain. Performing locks at a higher (lot) level prevents other transactions from accessing a whole group of items with a single lock – this LOW GRANULARITY approach provides for minimum overhead but reduces concurrency. A consistent locking protocol must be used: To create an “S” or “IS” lock – one must hold an “IX” or “IS” lock on the parent To create an “X”, “SIX” or “IX” lock – one must hold an “SIX” or “IX” lock on the parent. Locks always proceed from root nodes on down Locks always released from leaf on up Who decides the granularity locks are established at? The optimizer does: by evaluating the needs of the query against the statistics and catalog and passing it to the locking mechanism as part of the access plan. A more thorough definition of Transaction: * Some UNITS OF WORK may be a single statement, but others may span multiple statements that depend on each other (e.g. if one statement within the unit of work fails, so should the others.) * Therefore DBMS's give us the ability to bundle statements into logical groups called TRANSACTIONS, which have well defined starting and ending points controlled by the DBMS client so that it can guarantee the logical units of work move the database from one consistent state to another. * As work is done within a transaction the original values are preserved and clients can end a transaction by rolling back (undoing) all work performed since the transaction was begun OR by committing that work which makes it available to all future transactions (and abdicates the right to rollback that work). * Data that has been added, changed or deleted within a transaction but has not yet been committed can be called "dirty" data. * The transaction creates "locks" to signal other clients’ transactions that this transaction is changing the data or requires data it has read to remain unchanged (usually). Transaction Consistency: Transactions help to guarantee the database is VALID and RECOVERABLE by providing mechanisms that: 1) Guarantee data the transaction is working with will not change during the transaction. 2) Remember work the transaction performed so the database can be rolled back to the last consistent state (before the transaction began) if a “bad thing” happens while running the transaction. 1) 2) 3) 4) 5) 6) 7) Power Failure Hardware or System Software failure Deadlock conflict with another transaction’s lock(s) Security Violation (Are you authorized to work with that table/object?) Constraint violation (Did you put in a valid value for that column?) Referential Integrity Violation (Did you delete a record another relation uses?) Flow Integrity (Did a dependency step fail in a multi-step process?) Degrees of consistency (Isolation): The "degree of consistency" desired by a client depends on how it needs to isolate its view of the database from other transactions -- hence it is also called the "isolation level“. DBMS’s use different combinations of locks and transaction/audit log entries to effect their degree of consistency. Sometimes it is not required that a transaction’s data be “perfectly” consistent. When this is the case, we can reduce overhead and increase concurrency by telling the DBMS to use a lower isolation level when beginning a transaction. The Isolation level of the transaction then guides the DBMS when setting and respecting locks. All Transactions must adhere to level 0 and create “well-formed” locks (set lock before touching data) General Isolation Levels 0 Transactions don't overwrite other transactions dirty data Write Locks 1-Phase 1 Transactions don't commit data until the end of the transaction Write Locks 2-Phase Write Locks 2-Phase 2 Does not read dirty data from other transactions Read Locks 1-Phase Write Locks 2-Phase Read Locks 2-Phase 3 Other Transactions do not dirty any data read by other transactions 1-Phase - Data can be flushed and locks released in mid-transaction 2-Phase - Locks are held until end of transaction Transaction Interdependency: “The principle application of dependency definition is as a proof technique and for discussing schedules and recovery issues” Transactions can work with the same data! T1 And - the order in which transactions set locks (which is controlled by the degree of consistency) affects how well a transaction can be recovered when “something bad” happens. T2 Degree Reads <<< Writes 3 Writes << Reads 2 Writes < Writes 1 Reads - Reads 0 Note: A schedule's consistency degree is guaranteed only if the closure of (<, <<, <<<) is a partial order So- along with maintaining locks, a DBMS also needs to keep track of WHEN locks are established and data is changed. Most DBMS’s employ a “Transaction Log” to handle this. T2 <<< T1 T2 < T1 T1 <<< T2 T2 << T1 T1 - T2 T1 Begin T2 Begin L(A) R(A) T2 - T1 U(A) L(B) L(A) R(A) W(A) L(B) R(B) W(B) U(A) U(B) R(B) W(B) U(B) End End Transaction Recovery: When “something bad” happens. . . Transactions make sure to follow an exact order, establish lock, update log, update record, commit, unlock – this “well-formed”, “two-phase” protocol guarantees changes are safely in the log if we crash. Periodically, the DBMS suspends commits, does a quick flush of all committed changes, and records a “check point” in the log – so the whole log need not be replayed again. If we need to recover – we clear all the dirty data from the db and cache and replay the log from the last checkpoint according to guidelines to the left: Degree Plan Side Effects 0 Operations are "replayed" with no dependencies (in any order) Since operation may have used data not committed, it's "old" value may not be accurate 1 Sort operations according to writes and then time (< order), try to detect change has not been made Since operation may have used data not committed, it's "old" value may not be accurate 2 Transactions can be "replayed" in the Syncronous operations may order they were created (data changes suffer as reads can be are guaranteed "idempotent" by performed in a different order locking protocol) - make sure log order accurate 3 Sort operations by writes, then writereads, then read-writes then time (<<< order) -- changes are guaranteed "idempotent" by locking protocol No side effects! (other than slowness) Summary: * Whenever multiple users have simultaneous access to the database, additional overhead (usually in the form of locking) must be added to avoid conflicts. * Locks can be granularized at different levels of objects within the database – those objects are organized into a acyclic hierarchy to promote consistency. * Locks come in different flavors. The protocol suggested in this paper uses Read and Write locks with an Intention modifier. A table specifying the compatibility rules between these locks was established. * A transaction is a logical grouping of all operations into a single unit of work. The DBMS automatically maintains locks and log entries for the transaction to keep it from conflicting with other transactions and enabling it to recover if failure is encountered. * Transactions are “Isolated” from each other based on a desired degree of consistency given by the client. Isolation controls how locks are established and respected. * The order of transactions differs at different isolation levels and this ordering can affect the recoverability of the database. Dependency identification rules were addressed and recovery plans were discussed. Look at the readings book, page 189 it pretty much sums it all up nice and neat!