Granularity of Locks and Degrees of Consistency in a Shared Database Authors: J.N. Gray, R.A. Lorie, G. R. Putzolu and I. L. Traiger Presented by: Charles Braxmeier and Stuart Ness Motivation In a multi-user database, it is essential to provide a means for operating concurrent transactions in order to increase the efficiency of the database. At the time of publication of this article, very simple policies providing concurrent processing had already been implemented. These existing policies were not efficient for working in a massively concurrent system. Systems that had fast concurrent policies did not have a fine grain approach to the locking policy, while those policies with fine grain had a large amount of overhead in order to create the locks. Concurrent processing allows a database to process multiple transactions at a given time. The research described in this paper was performed in the mid 1970’s and was not done on current database standards, however this research demonstrated a fast approach to allow for less overhead without reducing the ability to lock the database at different levels. Problem Definition Databases need an efficient means for providing concurrent transactions. The early stages of databases did not focus heavily on this issue because of the single-user mode that most databases originally started with. However, with the move towards large multiuser database systems became normal, it became apparent that concurrent processing was needed. The problem with many early forms of concurrency programming revolved around either large grain locks, (requiring either the whole database, file, or table that the transaction was using, or it would require the individual records which required a large amount of overhead to check each transaction every time a new request was issued. Therefore, explicitly locking an entire sub-tree required a great deal of computation time (no structure to implement an inherited lock). Contributions The article reasons that lock granularity is important to be able to provide efficient concurrent transactions. In addition, he explains that without the finer grained locking, it would not be practical to have a large concurrent system. In order to deal with this, he provides to main mechanisms which would allow for finer-grained resource locking. The first mechanism which he suggests is using an implicit structure to lock the entire sub-tree. This recommendation uses the idea that all nodes underneath a particular node are implicitly locked at the parent’s lock level, or higher. The second mechanism introduces an ‘intention’ mode which allows a given node to lock the ancestors without requiring them to have a full lock. This provides a means for locking a tree above which provides a quick (computationally inexpensive) means for checking to see if two transactions can exist concurrently. In addition to these two modes, the paper also suggests ways to deal with preemption and deadlock with these two new modes. (Deadlock occurs particularly when two transactions which to change there mode in an incompatible way, when they both are coexisting with lower compatible permissions. Concepts – 1 In reference to the main concepts in the paper, there are some key concepts that need to be understood. First, the granularity of locks. As shown in the figure the database is structured in a manner much like a tree. The reasoning in this article, there are some assumptions made as to how the database is setup. The article assumed that the database had the following levels: a database level, which contained multiple areas (folders) which contained many files, and each file contained individual records. The assumption mimics the basic layout of file structure in computer systems. This structure is important to be able to perform the locking mechanism. For instance, a lock at the database level would allow for a complete replacing of all existing files and folders. A lock on an area would allow for a direct replacement of an entire file, and a lock on a file would allow for a mass upload of records. In addition to having a tree structure available for locks, there are also different lock modes. These are five modes: exclusive mode (write mode), share mode (read mode), which both had been used, but also the intention share, intention exclusive, and share and intention exclusive modes. These last three modes were used to provide a clear marker for what type of hold a sub-level had on the particular file. For instance, if a file was marked in IS mode, it could easily be derived that a record was using an S mode. Concepts -2 To understand how these modes fit together, there are certain modes which can cooperate together. The best way to think of this is that if a current session is marked with one of these modes, then that node cannot be locked in a non-compatible mode. (Explain the chart) While a node may need to be locked at a certain level, the path is always locked from root to the desired node. This prevents a need for a large amount of searching to see who should be able to have access. So, if any node along the path were locked in an incompatible way, the transaction would be put onto a waiting queue. (Show the example of how the record would be locked) The article notes that leaf nodes cannot be locked in ‘intention’ mode, since it really would make no sense to allow them to do that. Concepts – 3 The article then goes on to explain how implicit based locking would work. The implicit based locking uses the idea that if an ancestor is held at a certain level than all of that ancestor’s descendants must also be held at that level or higher. For instance, if file F was locked in exclusive write mode, then we could assume that that exclusive lock mode would also apply to record R that exists within file F. So, rather than specify each record that is needed within F, explicitly imposing the exclusive lock at one level allows for all sub-levels to be locked implicitly. The article then points out that unfortunately, most database file systems do not hold a true hierarchical format. It goes onto state that most files will look more like a directed acyclic graph. In the case of the database, this is as simple as stating that a file may have the ability to access via the file structure, or via an index structure. Therefore, in order to move a transaction between files for instance, not only would the file have to be locked, but the index would also have to be locked. The article provides a process which essentially is to create a hierarchical lock, either explicit or implicit on both the file and the index. (Go through example of how this is done) Concepts - 4 The article then goes on to state that the locks can become dynamic by being based on a particular value. This is simply an extension of the Directed Acyclic Graph by adding one condition, if moving from one index interval to another, then both intervals must be locked, along with their required path. However, offering these new modes would not help unless the database was able to schedule and grant the new requests. In the basic form of how it operates, it continually adds requests that are “allowable” modes (see the chart) until the next arriving request no longer is compatible. For example, The node current state is intentional share access and a node wants exclusive access. Then a queue is formed with the Exclusive access as the head node. It then waits until the current processing is compatible or empty, then it takes the next however many requests are compatible with the first mode. One problem that can occur is when a process starts with one permission level and requests a different permission level. Essentially, the way to deal with this is if the permission level is lower than the current level, it does nothing, but if it is higher, it checks to see if the current state will support its higher mode. If yes, then it can change immediately, if not, it must wait with its current access level until the state level will allow it the correct permissions. This can cause the problem of deadlock if multiple processes request higher permissions and sit waiting for the state to clear, because while the transactions wait, they do not leave the current state. The article then says that at this point one or more of the level change requests would have to be preempted to alleviate deadlock. Validation Methodology This paper is more of an informal theory paper, offering no proofs, but using the article to share findings that have been implemented on an IBM research lab system. This essentially amounted to showing that a proof of concept working system had been provided. The unfortunate part is that it lacks a formal proof or any statistical evidence of increased performance improvements. Assumptions A few assumptions were in the article. The first is that the database can be broken into some fashion of a hierarchical form, whether that is a straight hierarchy or a Directed Acyclic Graph, it uses a hierarchical form. The second assumption is that the current database technology is from 1975. Obviously, it is impossible to predict the future, but this assumption means that it does not allow for transcending to all DB types (relational, structured, etc). By removing this assumption, the techniques may not hold up, however, the basic principles of this locking mechanism would be possible, as long as a hierarchical structure was imposed. If a structure cannot be imposed such as in more complex queries, this locking mechanism may lose its advantages. Rewrite Changes If this paper were rewritten, it would need to be reflective of the current state of databases. In addition, comparisons between this method and other methods would be beneficial to show the concurrency vs. overhead tradeoffs, and how this fits into the mix of the options for concurrency. Finally, the process and ideas still are fundamentally realizable given the current hierarchical system that is being used for storage mechanisms. Preserving the notion of the groups and the scheduling and hierarchical system would be beneficial for that part of the database system.