3 Multi-version ROCC - NDSU Computer Science

advertisement
Multiversion ROCC for Concurrency Control1
Victor Shi, Shan Duanmu and William Perrizo
Computer Science Department
North Dakota State University
Fargo, ND 58105
Abstract: This paper discusses a multi-version method for concurrency control. Multi-version
Read-commit Order Concurrency Control (MVROCC) uses three timestamps (Start, Commit and
Combined) for transaction validation. The Start timestamp records the time when the transaction
reads its first data object. The Commit timestamp records the time when the transaction requests
to commit. The Combined timestamp can either be set to the Start timestamp or to the Commit
timestamp, depending on the result of a conflict check using the proposed “intervening interval
validation” algorithm. MVROCC is more permissive than Snapshot isolation, since it is fully
serializable and it allows one-sided write-write conflicts. Our preliminary simulation results
showed that the throughput of MVROCC is 5% higher than Snapshot Isolation.
More
importantly, MVROCC guarantees execution correctness while achieving good performance.
The easy-to-implement feature of MVROCC may attract attentions from the database industry,
because the major change is to replace the first-committer-wins policy with the proposed
intervening validation algorithm in a system that has already implemented Snapshot Isolation
mechanisms.
Key Words: Concurrency control, High performance, Database systems.
Key Words: Concurrency control, High performance, Database systems.
1
Introduction
High system throughput is a desirable feature in database systems. In traditional database
management systems, two phase locking is used to support concurrent access while maintaining
correctness. With increasing pressure of demanding higher system throughput, researchers have
1
Patents on ROCC and MVROCC technology are pending.
explored numerous mechanisms for better performance with regard to system throughput and
transaction response time. The mechanisms that have demonstrated better performance can be
classified into three categories: optimistic, multi-version and deadlock avoidance.
In general, concurrency control methods utilizing optimistic mechanism allow data access
with no restrictions, they will validate data when transaction commits. Probably the most
successful commercially used optimistic method is field call and its Escrow variant [1, 2]. The
field-call method tests predicates at the time the transaction requests access to data items. A
REDO log record is generated if the predicate test succeeds.
All updates are deferred to
transaction commit time. When the transaction commits, all its predicates will be tested again to
validate all required conditions hold. The drawback is that transaction commit may return an
error if the update predicate has become false since the field call was issued. Escrow method
overcomes this drawback by introducing Escrow record. The Escrow record preserves the truth
of the predicate between the time the transaction first makes the field call and the time the
predicate is reevaluated at commit time. The truth preservation works well when the field is an
aggregate.
With the fast evolution of technology and storage no longer being a problem, multi-version
concurrency control methods become attractive.
A successful example of multi-version
concurrency control is Snapshot isolation method. It was introduced in 1995 and then adopted
by Oracle, PostgreSQL, Microsoft SQL server and Exchange as a method for “serializable”
isolation level transactions. Though Snapshot isolation suffers “write skew” problem and thus is
not truly serializable [5], it has an obvious advantage – read-only transactions always succeed.
A representative method of deadlock-avoidance concurrency control (CC) is the WDL (WaitDepth Limit) method proposed by IBM researchers [3]. It limits the length of waiting list so that
no deadlock can be formed. They then proposed a hybrid two-phase CC method [3, 4], which in
the first execution phase uses optimistic CC method and in the restart phase uses conservative
2PL [4].
In [6] and [9], we proposed methods for improving system performance. Our simulation
study showed that ROCC (Read-commit Order Concurrency Control [6]), significantly
outperforms WDL and 2PL, and has a comparable performance with Snapshot. Since snapshot
is not a concurrency control method providing full serializable correctness guarantees, ROCC
seems very promising.
In this paper we explore a multi-version ROCC method (MVROCC). Our intentions are
twofold – to keep the advantage of Snapshot Isolation and to fix its “write skew” problem at the
system level. We expect that the MVROCC may have better performance while maintaining
correctness. In the next, section we summarize related work. Section 3 presents the proposed
multi-version ROCC. In section 4 we analyze and discuss the possible performance gain.
Section 5 concludes the paper and discusses future work.
2
2.1
Related work
ROCC
Read-commit Order Concurrency Control method (ROCC [6]) is a deadlock-free, serializable
concurrency control method based on optimistic mechanisms. It maintains a Read-Commit
queue (RC-queue) that records the access order of transactions. Along with the RC-queue, an
“intervening” validation algorithm is developed for execution validation.
In addition to
traditional operation conflict, element conflict is introduced to reduce transaction restarts.
Through intervening element conflict check, transaction restarts and validation complexity are
reduced significantly.
The key contribution of the ROCC method is the intervening validation algorithm. This
validation algorithm checks only conflict of the validating transaction’s elements with elements
of other transactions, which are in its “intervening interval” (between its read-elements and its
commit-element). An element in the RC queue contains the identifiers of the transaction, the
data items to be accessed in one request, and the read/write operations in the request. Element
conflict is conflict between a transaction’s operations represented by an element and the
operations represented by another element that belongs to a different transaction. Four types of
elements are defined: Read element, Commit element, Validated element and Restart element. A
Read element represents the read/write request message a transaction submits.
A Commit
element represents a commit request message. A Validated element corresponds to transaction
that has been validated, or a transaction that does not need validation. A Restart element
contains all the identifiers of data items and the operations that a failed transaction intends to
perform.
The scheduler’s intervening validation algorithm works as follows. When a commit request
message arrives, the scheduler generates a Commit element and posts it to the RC queue starting the validation process for that transaction. The validation process traverses the queue
from the Commit element to the first Read element (called “first” below) of the validating
transaction. If “first” conflict with no elements in the intervening interval between “first” and
the next element of the validating transaction, the validation process combines “first” with that
next element and renames the combination as “first”. This is iterated until the commit element is
reached. If “first” conflicts with no intervening elements in any step, the validation passes. The
scheduler sends a commit request message to the execution queue for data manager to perform
write operations upon the validation success, otherwise it sends out a restart request message.
If a conflict is found, let “second” be the Commit element. Check if “second” conflicts with
any elements in the intervening interval between “first” and “second” as above. If both “first”
and “second” have conflicts in their intervening intervals, then the validation fails. Otherwise
the validation passes. Note that this check for a second conflict allows one write-write conflict
to exist in the interval.
If the data manager performs operations following their order in the RC queue when conflicts
occur, and the transaction scheduler validates execution using the above “intervening”
validation, then the execution of transactions is serializable.
2.2
Snapshot isolation
Snapshot isolation is an optimistic multi-version concurrency control method, in which the
scheduler uses first-committer-wins policy to avoid lost-update problems. A transaction always
reads data from a snapshot of the (committed) data as of the time the transaction starts, called its
Start-Timestamp. Reads from a transaction are never blocked provided the snapshot data can be
maintained (an assumed Data Manager function). The transaction’s writes (updates, inserts and
deletes) will also be recorded in this snapshot, so that it can be read again if the transaction
accesses the data a second time. Updates of other transactions beginning after the transaction
Start-Timestamp are invisible to the transaction.
Suppose we have a transaction, T1.
When T1 is ready to commit, it gets a Commit-
Timestamp, which is larger than any existing Start-Timestamp or Commit-Timestamp. The
transaction commits only if no other transaction, T2, with a Commit-Timestamp in T1’s interval
[Start-Timestamp, Commit-Timestamp] wrote data that T1 also wrote. Otherwise, T1 will abort.
This is the so-called first-committer-wins policy. As pointed out in [5][7], the first-committerwins policy allows the following history. Thus systems (ORACLE, PostgreSQL, Microsoft SQL
server and exchange) using Snapshot isolation have write-skew problem. The system errors
caused by write-skew problems could be “thousands per day” and “quite damaging” [7].
r1[x=50] r1[y=50] r2[x=50] r2[y=50] w1[y=-40] w2[x=-40] c1 c2
To fix the write skew problem, A. Fekete et al [7] proposed promoting read operations to
write operations, at the application level, so that the first-committer-wins policy can be utilized to
force one of the transactions in the cycle to abort.
Such approaches require all possible
operations to be known in advance. In addition, the promotion of reads may significantly
degrade system performance.
3
Multi-version ROCC
The Multi-version ROCC can be described as follows
a. Read committed (the latest committed version) to avoid cascading abort.
b. Set a timestamp for each read-element and each write-element.
c. When the transaction commits, check if there are two or more intervening element-conflicts
based on the read/write timestamps, if not, the transaction commits.
Otherwise the
transaction aborts.
d. Unlike ROCC, deferred writes are not necessary. This is because multi-version allows
immediate write without causing cascading abort problem.
An interesting question is, what are the timestamps for read operations and write operations,
and what committed data version should be read when a read request arrives.
A simple option is to follow the Snapshot idea, i.e., all reads should read the latest data
version committed before the Start-timestamp. This Start-timestamp could be any time before
the transaction’s first read. Thus the timestamp for all reads in a transaction is the Starttimestamp. Also the timestamp for all writes can be set to be the Commit-timestamp to avoid
cascading abort problem. Thus, unlike ROCC, in which there may exist many intervals for
conflict checking, the interval between [Start-timestamp, Commit-timestamp] is the only interval
left for validation. Intervening validation checking is thus simplified significantly.
The difference between MVROCC and Snapshot Isolation is the scheduler algorithm: we use
our intervening validation algorithm to replace the “first-committer-wins” policy to achieve
serializability and better performance. Before we formalize the description of the intervening
interval validation algorithm, we use two examples to explain how our algorithm avoids “write
skew” and why it will produce better performance than Snapshot Isolation.
Example 1: avoiding “write skew”
We rewrite the history H1 with timestamps to help explain how our intervening validation
algorithm avoids “write skew”. In this history, x and y are two accounts shared by a couple, with
initial values x=50 and y=50. The integrity constraint is x + y > 0, i.e., the accounts does not
allow overdraw.
Transaction T2 concurrently withdraws 90 from x when transaction T1
withdraws 90 from y. While Snapshot allows H1, the execution result x + y = (-40) + (-40) = -80
< 0 apparently violates the constraint (x + y > 0).
H1: r1[x=50] r1[y=50] r2[x=50] r2[y=50] w1[y=-40] w2[x=-40] c1 c2
T1’s Start-timestamp
T2’s Start-timestamp
T1’s Commit-timestamp
Figure 1 Validation when T1 commits
MVROCC checks the element conflicts in the interval of T1’s [Start-timestamp, Committimestamp] when T1 commits. The Read element with T1’s Start-timestamp is r1[x=50] r1[y=50]
and the commit element with T1’s Commit-Timestamp is w1[y=-40]c1. The intervening element
between the Read element and the commit element is r2[x=50] r2[y=50] (please note w2[x=-40]
is not included since its timestamp is T2’s Commit-timestamp, which has not come when T1
commits).
Thus we find there is only one element conflict (T1’s read element with the
intervening element on y). T1 can commit, and T1’s read element and commit element merges
into a combined-element with a timestamp of the same value as its Commit-timestamp, as shown
in Figure 2. This is because T1’s read element does not have conflict with its intervening
element and thus can freely move down to merge with its commit element.
r2[x=50] r2[y=50] r1[x=50] r1[y=50] w1[y=-40] c1 w2[x=-40] c2
T2’s Start-timestamp
T1’s Commit-timestamp
T2’s Commit-timestamp
Figure 2 Validation when T2 commits
When T2 commits, its read element is r2[x=50] r2[y=50] with its Start-timestamp. T2’s
commit element is w2[x=-40]c2. The intervening element between T1’s [Start-timestamp,
Commit-timestamp] is r1[x=50] r1[y=50] w1[y=-40] c1. We can find a read-write conflict on y
between T2’s read element and the intervening element, and a read-write conflict on x between
the intervening element and T2’s commit element. Thus T2 has two-sided element conflicts and
thus has to be aborted according to the criteria set by the intervening validation algorithm.
“Write-skew” therefore is avoided.
In the above example we introduced some new concepts – read element, commit element,
intervening element and combined element. We give their formal definitions as follows.
Definition #1 (Read element): A transaction T’s read element is the set of all T’s reads with a
timestamp when T reads its first data item. A transaction can have only one read element due to
the snapshot read principle imposed on the MVROCC mechanism (the snapshot read principle
requires all reads are on the data versions that are committed before the start-timestamp).
Definition #2 (Commit element): A transaction T’s commit element is the set of all T’s writes
with a timestamp when T requests to commit. A transaction can have only one commit element
due to the snapshot write principle imposed on the MVROCC mechanism (snapshot write
principle requires all writes to be invisible until the transaction commits).
Definition #3 (Intervening element): A transaction T’s intervening element is the set of all
reads/ writes of other transactions with timestamps within T’s timestamp interval [Starttimestamp, Commit-timestamp]. A transaction can have only one intervening element due to the
fact that it has only one interval.
Definition #4 (Combined element): A transaction T’s combined element is the set of all T’s
reads/writes with a timestamp determined by the intervening validation algorithm. A transaction
can have only one combined element after it commits. Otherwise it has to abort.
With the 4 element definitions, we give upper/lower sided element conflict definitions.
Definition #5 (Upper-sided element conflict) A transaction T has an upper-sided element
conflict if operations in its read element conflict with the operations in its intervening element.
Definition #6 (Lower-sided element conflict) A transaction T has a lower-sided element
conflict if operations in its commit element conflict with operations in its intervening element.
With definitions #1 – #6, we give the following intervening validation algorithm
When transaction T requests to commit
Check if T has an upper-sided element conflict;
If T has an upper-sided element conflict
{ check if T has a lower-sized element conflict;
If has a lower-sized element conflict
T aborts;
Else
{ T commits;
set T’s combined element’s timestamp to its Start-timestamp;
}
}
else
{ T commits;
set T’s combined element’s timestamp to its Commit-timestamp;
}
Figure 3 The intervening validation algorithm in MVROCC
From the above algorithm, we can see that MVROCC only aborts transactions that have both
upper-sided element conflict and lower-sized element conflict (two-sided conflicts). We can
prove that MVROCC produces serializable execution history.
Proof: Suppose MVROCC produces a history that contains a cycle: Ti …Tj…Ti, and
MVROCC allows Ti to commit. Then Ti must have two timestamps after it commits: one
timestamp is before Tj’s timestamp and the other timestamp is after Tj’s timestamp. This
contradicts the fact that Ti can have only one timestamp after it commits (please note that we
move element to have a merge only when there is no element conflict, thus such movement will
not affect the conflict history).
4
Discussions
The following goals are what we try to achieve when we develop MVROCC:
1. Keep the advantage of Snapshot Isolation. In particular, we want to provide Read-only
transactions (query) the privilege of success.
2. Be more permissible than Snapshot isolation, thus provides better performance.
3. Guarantee correctness.
4. Keep the modification to Snapshot isolation to a minimum, thus it can be easily
implemented in the commercial products that have already implemented the Snapshot
Isolation mechanism.
5. Easy recovery when the system crashes.
Apparently the goals #1 and #4 are achieved by using as much as possible the Snapshot
isolation. MVROCC simply replaces the first-committers-wins policy by its own intervening
validation algorithm described in Figure 3. This intervening validation algorithm makes the
execution serializable as proved in section 2.
Also it is more permissible than the first-
committer-wins policy, since it only aborts transactions that have two-sided conflicts. As A.
Adya et al pointed out in [8], write-write conflicts may not always cause non-serializable
problem in multi-version database systems. For example, consider the history H2 in which
transaction T2 increments the salaries of all employees for which “Dept = Sales”, and transaction
T1 adds two employees, x and y, to the Sales department.
H2: w1[x1] r2[Dept=Sales: z] w1[y1] w2[z] c2 c1
The transaction T1 will be aborted by Snapshot Isolation because it has a predicate write-write
conflict on y (w2[z], y is in z since it satisfies the predicate “Dept = Sales”. W 2[z] is in T1’s
interval [Start-timestamp, Commit-timestamp]). MVROCC will permit the history because it
only has a lower-sided predicate conflict on x and y (x and y both are in z, thus w2[z] conflicts
with w1[x]w1[y]). The equivalent history is T2T1 ( r2[Dept=Sales: y1] w2[z] c2 w1[x] w1[y]c1,
w1[y]w1[y] occurs with c1). In general, MVROCC allows interval write-write history, as long as
it is one sided (a transaction commits when only has lower-sided element conflict).
Goal # 5 (easy recovery) can be achieved by removing all the versions that are written by the
transactions that are not yet committed.
5
Conclusions and future work
In this paper we discussed MVROCC – a multi-version concurrency control method in
database systems. Compared to ROCC, the implementation of MVROCC is surprisingly simple.
No special technique is needed to store the RC queue for recovery purpose. No over-declaration
is required as in [9]. Compared to Snapshot isolations used in Oracle 9i, PostgreSQL, Microsoft
SQL server and Exchange, MVROCC may have fewer restarts and still guarantees execution
correctness (our preliminary simulation results showed that the throughput of MVROCC was 5%
higher than Snapshot Isolation). Also it is surprising to find that, to implement MVROCC, we
only need to replace the first-committer-wins policy with intervening-validation, a simple
algorithm described in Figure 2.
Our future work is to construct a prototype for further
evaluation of MVROCC over Snapshot isolation.
References
[1] J. Gray and A. Reuter, “Transaction processing: Concepts and techniques”, 1993, Morgan
Kaufmann.
[2] P. O’neil, “The Escrow transaction method”, ACM Transaction on Database Systems,
Vol.11, No.4, 1986, pp. 405-429.
[3] P. Franaszek, J. Robinson and A. Thomasian, “Concurrency control for high performance
environments”, ACM Transactions on Database Systems, Vol. 17, No.2, 1992, pp.304-345.
[4] A. Thomasian, “Distributed optimistic concurrency control methods for high-performance
transaction processing”, IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No.1,
1998, pp.173-189.
[5] H. Berenson, P. Beinstein, J. Gray, E. O’neil and P. O’neil, “A critique of ANSI SQL
isolation levels”, ACM SIGMOD, 1995, pp.1-10.
[6] Victor T.S. Shi and William Perrizo, “A New Method for Concurrency Control in
Centralized Database Systems”, ISCA CATA-2002.
[7] A. Fekete, E. O’Neil, P. O’Neil and D. Shasha, “Making Snapshot Isolation Serializable’, in
Print.
[8] A. Adya, B. Liskove and P. O’Neil, “Generalized Isolation Level Definitions”, Proceedings
of The IEEE International Conference on Data Engineering, 2000.
[9] W. Perrizo, “Request Order Linked List (ROLL): A Concurrency Control Object”,
Proceedings of IEEE International Conference on Data Engineering, 1991.
Download