On the Isolation Property of ACID Transactions

advertisement
On the Isolation Property of ACID
Transactions 1
2
1
Martti Laiho , Tim Lessner , Matti Kurki , Kari Silpiö
1
1
Dpt. of Business Information Technology, Haaga-Helia University of Applied Sciences,
Ratapihantie 13, 00520 Helsinki, Finland
E-mail: <firstname>.<lastname> @ haaga-helia.fi
2 School of Computing, University of the West of Scotland
Paisley PA1 2BE, UK
E-mail: tim.lessner @ uws.ac.uk
Abstract
ACID properties of SQL transactions has been generally referred as the model of reliable flat
transactions, but along the lines of this paper we argue that the original definition of the isolation
property, the "big I" in the ACID acronym, is too strict for the modern, mainstream DBMS systems
used by the industry. Therefore we propose a more precise, new acronym namely ACiD, and the
lower-case "i" stands for a relaxed definition of isolation for serializable transactions controlled by a
properly selected isolation level of the transaction. The idea behind this paper is to encourage a
discussion especially in the education sector.
Keywords: DBMS, SQL, transaction, ACID, isolation, concurrency
Introduction
Databases provide the reliable data storage services, and especially for business critical
applications the use of these reliable services requires that applications requesting these services
use the concurrency control (CC) protocol of SQL transactions properly. Too often the prevailing
attitude is to skip the reasoning in detail about the implemented concurrency control mechanism
implemented by a database management system (DBMS) and rely on the ACID paradigm instead,
even if considering the real CC implementations in database products, transaction isolation - the “I”
property of ACID is less straightforward.
At DBTechNet, a consortium of European database teachers and professionals (see
www.DBTechNet.org), we are concerned about knowledge and practical skills of teachers,
students, and European database professionals in reliable use of the mainstream database
products in applications. As part of our DBTech EXT project, partially funded by the EU LLP
Transversal Programme, we have developed free training materials and hands-on labs, Virtual
Laboratory Workshops VLWs as we call them, on various subject areas of database technologies
using the free editions of the mainstream database systems used by the industry today. One of
our VLWs is focusing on SQL concurrency technologies, and while developing the tutorial for the
Concurrency VLW, we noticed the contradiction between the theory and current mainstream
DBMS systems used by the industry. To provide a more realistic transaction model, with the
objective to encourage further discussions also in the education domain, we present a slightly
different, but nevertheless important, interpretation of ACID what we call ACiD - to emphasize with
the lower-case "i" the actually available concurrency services in the mainstream DBMS systems of
today for isolation requests of applications.
Isolation in Database Literature
Today the ISO SQL standard and typical commercial database systems, used by the industry,
support so called flat transactions, and the ideal of reliable SQL transactions is called ACID
transactions. The acronym ACID, as defined by Haerder and Reuter [1], is built of initials of the
four transaction properties freely quoted below:
Atomicity
A transaction will be either successfully committed or rolled back, i.e. it is of "allor-nothing type".
Consistency
"Each successful transaction by definition commits only legal results."
Isolation
"Events within a transaction must be hidden from other transactions running
concurrently."
Durability
"Once a transaction has been completed and has committed its results to the
database, the system must guarantee that these results survive any subsequent
malfunctions."
The definition of these properties varies in database literature and there is no official definition for
the ACID transaction, even if the concept is generally accepted.
In this paper we will focus on the isolation property, and for the beginning we quote some typical
isolation definitions in literature:
In their classic and seminal transaction processing book, Gray and Reuter[2] define isolation as
follows: "even though transactions execute concurrently, it appears to each transaction, T, that
others execute either before T or after T, but not both."
Other pioneers in the transaction processing field, Bernstein and Newcomer[4] define isolation as
follows: "the technical definition of isolation is serializability. An execution is serializable (meaning
isolated) if its effect is the same as running the transactions serially, one after the next, in
sequence, with no overlap in executing any two of them".
In his classic database textbook, Chris Date [5] defines isolation simply as follows: "transactions
are isolated from one another", and in his latest critical book on SQL [6] he defines transaction
isolation property as "any given transaction’s updates are concealed from all other transactions
until the given transaction commits.".
Comparing the original definition by Haerder and Reuter[1] with definition of Date [6], these both
imply that to call any transaction as ACID transaction, in terms of isolation, requires that all
concurrent transactions are isolated from each other. This means that the strict Isolation is not a
property of a single transaction, but a property of the whole database. However, in the SQL
implementations of the mainstream DBMS systems, such as DB2, Oracle, or SQL Server, we don't
have such configuration opportunities.
Jim Melton, the architect of the ANSI / ISO SQL standard, and Alan R. Simon [7] define isolation as
follows: "the system must give each transaction the illusion that it is running in isolation; that is, it
appears that all other transaction either ran previous to the start of the transaction or ran
subsequent to its commit." and later they define isolation as "The regulation that concurrent
transactions have no effect on one another (except in some implementations, where some of the
concurrent transactions may be 'blocked' and therefore delayed or even forced to abort or restart)".
The latter statement is crucial in terms of practice while using the DBMS systems used by the
industry.
Isolation in the SQL Standard and Implementations
The mainstream SQL implementations use either (i) multi-granular locking scheme concurrency
control (which we call LSCC), for example DB2 and SQL Server, or (ii) multi-versioning
concurrency control (MVCC), for example Oracle and a specially configured SQL Server database,
as explained in Figure 1 and in detail in our tutorial on SQL concurrency technologies [8]. And the
third concurrency control technology is optimistic concurrency control (OCC) in which transactions
make all database modifications in their private workspace, and only at commit phase their
changes are synchronized back into database as an indivisible atomic action. For example, the
Pyrrho DBMS of the University of the West of Scotland (UWS) [2] implements such an OCC.
The concurrency control service provided by the DBMS of the database is usually not configured or
controlled by the application directly, for example by lock1 and unlock statements, but the SQL
standard and implementations provide setting of Isolation level that can be set for the transaction,
actually as the default isolation of SQL statements in the transaction. The requested isolation level
does not guarantee a successful execution of the transaction, but it guarantees that an
unsuccessful execution cannot be committed. If a LSCC system cannot provide the requested
isolation level to a transaction, as the result of a concurrency conflict such as deadlocking, the
transaction will be aborted and its actions in the database need to be rolled back. The same
applies to concurrency conflict situations of transactions in MVCC, except that it is the application
which based on the exception has to decide if the transaction should be rolled back.
In Figure 1 we present a generic explanation of the MVCC concurrency control mechanisms. An
MVCC system typically supports following two isolation levels, Read Committed and Serializable,
which have different semantics than the ISO SQL isolation levels having same names. On MVCC
Read Committed isolation level read operations will access, without locking, the latest committed
versions of the requested rows. A more proper name for this isolation level would be Currently
Committed, as it has been named lately in DB2 implementation for avoiding read locking. On
MVCC Serializable isolation level a read operation will access for each requested row the latest
committed version at the start time of the transaction. The proper name for this isolation would be
Snapshot, as it is called in the MVCC implementation of SQL Server 2005 and the later editions, or
Snapshot Isolation (SI) as suggested by Fekete [9].
1
Some DBMS systems provide table level locking possibility, and in Oracle a common practice has been to lock rows
using the SELECT .. FOR UPDATE mechanism, but these locks are always kept up to end of transaction.
Figure 1. Basic idea of the MVCC concurrency control mechanism
The ANSI / ISO SQL Standard defines the following 4 well known isolation levels in terms of
anomalies (badly behaving transaction types such as Dirty Read, Unrepeatable Read, and
Phantoms) against which the selected isolation level gives protection for the transaction as listed in
Table 1.
Anomaly: Dirty Read
ISO SQL
Isolation level:
Unrepeatable
Read
Phantoms
Read
Uncommitted
possible
possible
possible
Read Committed
not possible
possible
possible
Repeatable Read
not possible
not possible
possible
Serializable
not possible
not possible
not possible
Table 1. Concurrency anomalies solved by the SQL isolation levels
Isolation level should be set in the beginning of every transaction, as we explain in detail in our
Concurrency Paper [8]. The ISO SQL format for the command is following
SET TRANSACTION ISOLATION LEVEL <isolation level>
but the for example DB2 uses a slictly different syntax and different isolation level names.
The standard does not directly mention how the isolation levels will map to the concurrency control
implementations, but clearly these assume LSCC as the concurrency control mechanism, since the
isolation levels can be explained by protection of read operations by multi-granular shared and
intent locking and duration of these locks, while write operations are protected by exclusive locks
up the end of the transaction. This has been criticized by Berenson at al [3] since these isolation
levels do not match with all concurrency control technologies implemented in the mainstream
DBMS systems, ignoring especially the MVCC implementations.
Dirty read means that a read operation is not protected by shared locks, and so the transaction can
read uncommitted data written by some other concurrent transactions. Therefore we can conclude
that any DBMS which supports Read Uncommitted as isolation level, namely SQL Server and DB2
(DB2 calls the isolation level as UR), for example, does not support the strict ACID isolation of
Haerder and Reuter [1].
The default isolation level of the mainstream database systems is Read Committed or a compatible
isolation level. On LSCC systems this means that even if the data read by a transaction was
covered by shared locks, the locks can be released before the transaction commits, so it is
possible that some concurrent transaction changes the content which the transaction just read, and
if the transaction accesses the same data it will see the new content affected by a transaction
which has been running concurrently, We illustrate this isolation problem using a simple content
initialized as presented in listing 1 and the running the concurrency scenario presented in listing 2.
CREATE TABLE T (
id
INT NOT NULL PRIMARY KEY,
val INT NOT NULL
) ;
BEGIN TRANSACTION ;
INSERT INTO T (id, val) VALUES (1, 100) ;
COMMIT;
Listing 1. Intializing commands for the isolation test on SQL Server 2008
step
Transaction A
1
BEGIN TRANSACTION;
Transaction B
SET TRANSACTION ISOLATION LEVEL
READ COMMITTED;
2
SELECT val FROM T WHERE id=1;
VAL
----------100
3
BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE;
4
UPDATE T SET val=val+1 WHERE id=1;
5
COMMIT;
6
UPDATE T SET val=val+1 WHERE
id=1;
7
SELECT val FROM T WHERE id=1;
VAL
----------102
8
COMMIT;
Listing 2. Concurrency scenario of the isolation test on SQL Server 2008
There is nothing that the well-behaving transaction B can do to prevent the concurrent transaction
A from seeing the result of the update event. Clearly the test proves that the transactions A and B
do not serialize, but according to the relaxed ACID definition transaction B is ACiD transaction.
According to ISO SQL standard [7] and Gray and Reuter [2, p 398], a transaction with isolation
level less than REPEATABLE READ is not allowed to write into database. However, the
mainstream DBMS systems have not implemented this restriction, so this decision is left to the
application developers who should know what they are doing instead of just trusting on textbooks.
On MVCC systems, such as Oracle, the isolation level also called Read Committed reads the
latest committed version of the requested data without waiting for any locks to be released. In both
cases it is possible that some concurrent transactions update this data and commit, and if the
current transaction reads the data again, the transaction will violate the strict ACID isolation. When
the isolation test of listings 1 and 2 is run on Oracle the result is the same.
The ACID transaction is useful as the ideal of serializable transactions, but above we have shown
that none of these mainstream DBMS systems support the ACID transactions according to the
strict isolation definition of Haerder and Reuter [1]. The only DBMS which we know so far
supporting the strict ACID transactions is Pyrrho DBMS as it uses the genuine OCC as the only
concurrency control technology (and Serializable as the only supported isolation level accordingly).
So application programmers need to understand that they will not get ACID SQL transactions by
default. However, understanding the concurrency control technology of the DBMS and applying
the proper isolation levels for the SQL transactions accordingly, transactions are serializable,
depending on the mix of concurrent transactions. A proper isolation level can provide a
concurrency mechanism that prevents the transaction from seeing updates made by concurrent
transactions, and protect the transaction's updates against overwriting by others during the
transaction. However, the transaction cannot be responsible for a possible misbehaving by
concurrent transaction responsible for a dirty read for instance. This kind of relaxed isolation is a
property of the transaction, and we could call the transaction as “ACiD transaction”.
ACID (or actually ACiD) isolation has often been considered too restrictive (blocking) for concurrent
transactions in terms of throughput of transactions in LSCC systems, and therefore some less
demanding isolation levels, such as Read Uncommitted and Read Committed (or Cursor Stability),
have been defined in both the SQL standard and implementations.
The point is, however, isolation affects the performance of concurrent applications and of the
DBMS server and a trade-off between availability (scalability and performance) and consistency is
crucial - isolation basically affects consistency in terms of write and read anomalies. This trade-off
should be considered as reliability spanning both an acceptable availability, acceptable in terms of
application requirements, and consistency. Notice, consistency, the “C” in ACID, could be
confronted with a similar discussion and different transactions have indeed different consistency
requirements, but in terms of ACID, consistency refers to data integrity and inconsistent data due
to concurrency anomalies is caused by a bad isolation.
Since coping with both performance and consistency requirements is part of the art of transaction
programming, it is important for the educational institutions to discuss ACID more in terms of ACiD
and thereby considering the differences between CC mechanism and this aforementioned tradeoff. From a research point of view it will be important to provide models to describe this trade-off,
and based on such models tools that assist developers to take the “right” decision. One model
analysing the side-effects and correctness if transactions are run under MVCC or Strict Two Phase
Locking (S2PL) is provided by [9]. We would like to point out that S2PL is a biased concept in the
CC context of current commercial DBMS products, because some variant of LSCC is the typically
implemented CC mechanism, whereas S2PL is a theory and a principle to be applied by the
developers when they select strong enough isolation levels to guarantee serilizability of the
transactions. A more practical scenario and examples of user transactions consisting of interrelated SQL transactions using different isolation levels is provided in our RVV Paper [10].
To improve teaching the actual CC mechanisms in database systems as implemented by leading
DB vendors the DBTech EXT project has implemented the Concurrency VLW, with tutorial and
hands-on concurrency exercises to be experimented and verified by using downloadable free
virtual computer images with installed Oracle XE and DB2 Express-C servers.
Conclusion
The classical serializability theory and ACID transaction model have been covered in many
textbooks, research papers and prototypes, providing concept universes on their own right, and
serving ideas for development of future DBMS systems. However, for purposes of practical
advanced level education of application developers we need realistic concepts for mastering
reliable use of the modern DBMS systems used by the industry today. This raises the requirement
to understand the effects of the actual concurrency control services provided for available isolation
levels. Otherwise students will get confused when trying to apply the theory into practice or
reading statements like following “The isolation level describes the degree to which the data being
updated is visible to other transactions” [11]. The suggested acronym ACiD just reminds of the
distinction between theory and the concurrency control services implemented in the commercial
database products.
Acknowledgements
This paper is the result of collaborative work undertaken along the lines of the DBTechNet
Consortium (www.DBTechNet.org). The authors participate in DBTech EXT, a project partially
funded by the European Commission LLP Transversal Programme (Project Number: 143371-LLP1-2008-1-FI-KA3-KA3MP). This paper reflects the views only of the authors, and the Commission
cannot be held responsible for any use which may be made of the information contained therein.
References
[1] T. Haerder, A. Reuter, " Principles of Transaction-Oriented Database Recovery", ACM
Computing Surveys, Vol. 15, No. 4, December 1983
[2] J. Gray, A. Reuter, "Transaction Processing: Concepts and Techniques", Morgan Kaufmann,
1993
[3] H. Berenson et al, "A Critique of ANSI SQL Isolation Levels", Microsoft Research, Technical
Report MSR-TR-95-51, June 1995
[4] P. A. Bernstein, E. Newcomer, "Principles of Transaction Processing For the Systems
Professional", Morgan Kaufmann, 1997
[5] C. J. Date. "An Introduction to Database Systems", 8th ed. Addison Wesley, 2004
[6] C. J. Date. "SQL and Relational Theory: How to write accurate SQL code", O'Reilly, 2009
[7] J. Melton, A. R. Simon, " SQL:1999 Understanding Relational Langue Components", Morgan
Kaufmann, 2002
[8] M. Laiho, F. Laux, "SQL Concurrency Technologies", DBTechNet tutorial at
http://www.dbtechnet.org/papers/SQL_ConcurrencyTechnologies.pdf (2010-07-12)
[9] A. Fekete, “Allocating isolation levels to transactions”, PODS '05: Proceedings of the twentyfourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM,
2005, 206-215
[10] M. Laiho, F. Laux, "On Row Version Verifying ..", DBTechNet tutorial at
http://www.dbtechnet.org/papers/RVV_Paper.pdf (2010-07-12)
[11] S. Bodoff et al, “The J2EETM Tutorial”, 1.3 edition, Addison-Wesley, 2002
[12] M. Crowe, The Pyrrho Database Management System, University of the West of Scotland
http://www.pyrrhodb.com/
Download