Transaction Operations

advertisement
TRANSACTIONS and CONCURRENCY
Database Technology Lecture
by
Ross Lee Graham
Senior Lecturer, IISLAB
rosgr@ida.liu.se
CHAPTER 19: Transaction Processing Concepts
1.
2.
3.
4.
5.
6.
Transactional Databases (Queries and Transactions)
Desirable Properties of Transactions (ACID)
Update Problems not using Transactions
Serializability, Schedules, Recoverability,
System Log
Transaction Support in SQL
CHAPTER 20: Concurrency Control Techniques
7.
8.
9.
10.
Concurrency Control
Lock Manager, Binary Locking Scheme
Two-Phase Locking
Deadlock and Starvation
QUERIES and TRANSACTIONS
QUERIES (Read- Only)
Write)
TRANSACTIONS (Read-
Read Data
[Manipulate Data]
View Results
Do not write to DB
Read Data
Manipulate Data
[View Results]
Write to DB (commit)


Materialized Views are not transactions.
Transactions change the content of the database (write to DB).
Note that your main text is not strict about this distinction.
Desirable Properties of Transactions
ACID
1.
2.
3.
4.
Atomicity
A transaction is either performed in its entirety or not performed at
all.
Consistency Preservation
The complete execution of a transaction takes the database from
one consistent state to another.
Isolation
The execution of a transaction is not interfered with by concurrent
transactions.
Durability (Permanency)
Changes applied to the database by a committed transaction must
persist in the database.
Update Problems
When not using Transactions
The lost update problem
Two interleaved transactions T1 and T2 that access the same database
items.
T2 reads item X before T1 updates it, and writes X back after T1’s
update.
T1’s update is lost.
The temporary update (or dirty read) problem
T2 reads item X, which has been written by T1. T1 fails and must change
X’s value back. T2 has done a “dirty read” of X.
The incorrect summary problem
T computes a sum of item values, of which some are updated by other
transactions before T reads them and has finished.
Basic Transaction Concepts
Transactions have two basic operations:
read_item(X): reads a database item named X into a program
variable.
write_item(X): writes the value of a program variable X to the
database item named X.
NOTE: Transactions operate on database items. The Granularity of
database items refers to whether they are fields, records, or whole disk blocks
(concepts are independent of granularity).
Read and Write Operations
Read_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a main memory buffer (if it’s not already there).
3. Copy item X from the buffer to the program variable named X.
Write_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a main memory buffer (if it’s not already there).
3. Copy item X from the program variable named X into its location in the
buffer.
4. Store the updated block from the buffer back to disk.
Additional Transaction Operations
Transaction Operations
BEGIN_TRANSACTION
READ and WRITE
END_TRANSACTION
COMMIT_TRANSACTION
ROLLBACK (or ABORT)
Recovery techniques use the following operations
UNDO – similar to ROLLBACK but on a single operation.
REDO – a single operation is redone.
The System Log



Keeps track of all transaction operations that affect the values of
database items.
Enables recovery after (transaction) failures.
Kept on disk, and periodically backed up.
Types of Log Records
T is a unique transaction-id generated by the system
1.
[begin_transaction, T]
2.
[write_item, T, X, old_value, new_value]
3.
[read_item, T, X]
4.
[commit, T]
5.
[abort, T]
Recovery using log records
 Undo WRITE operations by tracing backwards through the log,
resetting old_value.
 Redo WRITE operation by tracing forward, setting X to
new_value.
Commit Point of a Transaction
Definition:
 A transaction T reaches its commit point when all its operations that
access the database have been executed successfully and recorded
to the log.
 Beyond the commit point, the transaction is said to be committed
and its effects assumed to be permanently recorded in the database.
The transaction then writes an entry [commit, T] into the log.
Force writing a log: Before T reaches its commit point, all log records
must be written to disk.
When recovering after a (system) failure:
 Rollback of transactions: Needed for transactions that have a
[begin_transaction, T] in the Log, but no [commit, T].
 Redoing transactions: Needed for transactions that have a
[commit, T] in the Log after the last database checkpoint
(writing all dirty database to disk).
SERIALIZATION
In a serial schedule, each transaction is performed in its entirety in
serial order. There is no interleafing.
DEFINITION: A schedule S is said to be serial if, for every transaction T
participating in the schedule, all the operations of T are executed
consecutively in the schedule; otherwise, the schedule is called
nonserial.
SERIAL SCHEDULES
From the definition of serial schedules it follows that:

The commit (or abort) of the active transaction initiates execution of
the next transaction.

Only one transaction at a time is active. No interleafing occurs in a
serial schedule.
Every transaction is assumed to be correct if executed on its own.
Therefore, if the transactions are independent of each other, then every
serial schedule is regarded as correct.
Therefore, it does not matter which transaction is executed first—Order
is not significant.
Why we like Interleafing

In a serial schedule, if a transaction waits for an I/O operation to
complete, idle CPU time is generated and wasted for lack of use.

Other transactions may also be in line waiting for the completion of a
transaction.
For these reasons, serial schedules are generally considered
unacceptable in practice.
Interleaving could improve the use of the CPU cycles.
Some nonserial schedules produce erroneous results and some
produce correct results.
This introduces the use of the concept of ‘serializability of a schedule’.
A schedule S of n transactions is serializable if it is equivalent to some
serial schedule of the same n transactions.
SCHEDULES OF TRANSACTIONS
A schedule S of n transactions T1, T2, …, Tn is an ordering of the operations of
the transactions.
For each transaction Ti in S, the order of operations of Ti sustain this same order in S
even though its operations are interleaved with operations from other transactions, Tj.
SCHEDULE NOTATION
The schedule keeps track of read_item, write_item, commit, abort.
Shorthand notation—r, w, c, a.
The shorthand notation appears with the transaction number as subscript.
Each operation on a value is identified by r or w. The commit or abort
operations apply to the transaction itself.
In this notation we write Figure 19.3(a) as follows:
Sa : r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
CONFLICT
CONFLICT between two operations in a schedule must satisfy all three of the
following:
1. They belong to different transactions.
2. They access the same item.
3. At least one of the operations is a write.
CONFLICT EXAMPLE
Sa : r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
The following operations conflict
r1(X) and w2(X)
r2(X) and w1(X)
w1(X) and w2(X)
Can you find operation pairs that do not conflict?
COMPLETE SCHEDULE
A schedule S of n transactions T = {T1, T2, …, Tn} is said to be a complete
schedule if the following conditions hold:
1. The operations in S are exactly those operations in any Ti in T, and Ti is
closed with the commit or abort operation.
2. Any pair of operations from the same transaction Ti in S follow the same
order in S as in Ti.
3. The order of any conflicting pair of operations in S is determinable.
PARTIAL ORDER and TOTAL ORDER
It is not necessary to determine an order between non-conflicting operations.
This allows for partial ordering of the operations of n transactions.
To implement a partial ordering would require parallel processing.
Without parallel processing we can say that every schedule is totally ordered.
COMMITTED PROJECTION
It is not easy to encounter complete schedules in a transaction processing system
because transactions are continually being submitted to the system.
What we use instead is what we call a committed projection C(S) of a schedule S,
which includes only those transactions Ti in S that are finalized by their
corresponding commit operation ci.
SCHEDULES BASED ON RECOVERABILITY
First distinguish between schedules that are recoverable and schedules which are not
recoverable.
If we can ensure that a transaction T, when committed, never has to roll back, then we
have a demarcation between recoverable and non-recoverable schedules.
Schedules determined as non-recoverable should not be permitted.
Among the recoverable schedules, transaction failures generate a spectrum of
recoverability, from easy to complex.
RECOVERABLE SCHEDULE
A schedule S is recoverable if no transaction T in S commits until all transactions
T’, that have written an item that T reads, have committed.
A transaction T reads from transaction T’ in a schedule S if some item X is first
written by T’ and read later by T.
In addition, T’ should not be aborted before before T reads item X, and there should
be no transactions that write X after T’ writes it and before T reads it (unless those
transactions, if any, have aborted before T reads X).
RECOVERABLE SCHEDULE (EXAMPLE 1)
Sa’: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1;
This is the same as Sa except two commit operations have been added to Sa.
Is Sa’ recoverable?
Yes, even though it suffers from the lost update problem.
If the transactions T1 and T2 are submitted at about the same time, and suppose that
their operations are interleaved... If T2 reads a value of X before T1 changes it in the
database, the update from T1 is lost.
RECOVERABLE SCHEDULE (EXAMPLE 2)
Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1;
Sc is not recoverable, because T2 reads item X from T1, and then T2 commits before
T1 commits. If T1 aborts after the c2 ooperation in Sc, then the value of X that T2
read is no longer valid and Tw must be aborted after it had been committed. This
schedule is not recoverable.
For the schedule to be recoverable, the c2 operation in Sc must be postponed until
after T1 commits, else both must abort because the X that T2 reads is no longer valid.
Sd: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1; c2;
Se: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); a1; a2;
CASCADING ROLLBACK
An uncommitted transaction has to be rolled back because it read an item from a
transaction that failed. This is the case for Se.
This form of rollback is undesirable, since it can lead to undoing a significant amount
of work. It is desirable to restrict the schedules to those where cascading rollbacks
cannot occur.
A schedule is said to avoid cascading rollback if every transaction in the schedule
reads only items that were written by committed transactions. This guarantees that
read items will not be discarded.
STRICT SCHEDULE
Transactions can neither read nor write an item X until the last transaction that wrote
X has committed or aborted.
Strict schedules simplify the recovery process.
The process of undoing a write (X) operation of an aborted transaction is simply to
restore the before image, the old-value for X.
Though this always works correctly for strict schedules, it may not work for
recoverable or cascadeless schedules.
EXAMPLE:
Sf: w1(X, 5); w2(X, 8); a1;
CATEGORY HIERARCHY OF SCHEDULES
We have the following categories:
1. recoverable
2. cascadeless
3. strict
If the schedule is cascadeless it is recoverable.
If it is strict it is cascadeless.
The reverse is not always true.
SERIALIZABILITY OF SCHEDULES
As a trivial example of a concurrency-control scheme, consider this one:
A transaction acquires a lock on the entire database before it starts, and releases the
lock after it has committed.
While a transaction holds a lock, no other transaction is allowed to acquire the lock,
and all must therefore wait for the lock to be released.
As a result of this locking policy, only one transaction can execute at a time.
Therefore, only serial schedules are generated.
These are trivially serializable and it is easy to verify that they are cascadeless as well.
This is also a trivial implementation of transaction isolation.
So, formally how do we define the terms serial or nonserial?
SERIAL/NONSERIAL
A schedule S is serial if for any transaction Ti executing in S, all the operations in Ti
are executed consecutively in S; otherwise, S is called nonserial.
In other words, only one transaction at a time is executed in S. There is no
interleaving. Commit or abort signals the end of the transaction and the next
transaction in S, if there is one, begins.
SERIALIZABILITY FOR INTERLEAVING
CONCURRENT TRANSACTIONS
A schedule S of n interleaved transactions is serializable if it is equivalent to some
serial schedule of the same n transactions.
For concurrency cases, the two main forms of serializability are:
1. Conflict Serializability
2. View Serializability
Both are based on the read and write operations.
When designing concurrency control schemes, we must show that schedules
generated by the scheme are serializable. To do that, we must first understand how to
determine, given a particular schedule S, whether the schedule is serializable.
CONFLICT SERIALIZABILITY
Does the order matter?
Determining possible conflict orders in pairs
1. Ii = read(Q), Ij = read(Q). This order does not matter since Q holds the same value
for both reads.
2. Ii = read(Q), Ij = write(Q). This order matters since the value of Q changes.
Therefore, similarly for Ii = write(Q), Ij = read(Q).
3. Ii = write(Q), Ij = write(Q). This order matters since the value of Q may change
and only the last write value is preserved. Therefore, whatever follows must use
that last Q value only.
We say that Ii and Ij conflict if at least one of the pair is a write
instruction
TEST FOR CONFLICT SERIALIZABILITY
Let S be a schedule.
We construct a directed graph, called a precedence graph from S.
This graph consists of a pair G = (V,E), where V is a set of vertices, and E is a set of
edges.
The set of vertices consists of all the transactions participating in the schedule.
The set of edges consists of all edges Ti -> Tj for which one of the following three
conditions holds
1. Ti executes write(Q) before Tj executes read(Q).
2. Ti executes read(Q) before Tj executes write(Q).
3. Ti executes write(Q) before Tj executes write(Q).
If an edge Ti -> Tj exists in the precedence graph, then, in any serial schedule S’
equivalent to S, Ti must appear before Tj.
CONFLICT SERIALIZABILITY ALGORITHM:
Testing conflict serializability of a schedule S.
1. For each transaction Ti participating in schedule S, create a node labeled Ti in the
precedence graph.
2. For each case in S where Tj executes a read(Q) after Ti executes a write(Q), create
an edge (Ti-> Tj) in the precedence graph.
3. For each case in S where Tj executes a write(Q) after Ti executes a read(Q), create
an edge (Ti-> Tj) in the precedence graph.
4. For each case in S where Tj executes a write(Q) after Ti executes a write(Q),
create an edge (Ti-> Tj) in the precedence graph.
5. The schedule S is serializable if and only if the precedence graph has no cycles.
CONFLICT SERIALIZABILITY ALGORITHM (cont’d)
For a graph cycle, the start node is also the end node of the cycle. Therefore, no
precedence can be established.
There are cycle-detection algorithms. They require on the order of n2 operations,
where n is the number of vertices (transactions) in the graph.
In the case of partially ordered schedules we can use topological sorting algorithms to
establish serializable schedules.
CONFLICT SERIALIZABILITY (negative example)
Testing conflict serializability of Schedule C
Schedule C
Precedence Graph for Schedule C
(p645)
(p648)
T1
read_item(X):
X:=X-N;
T2
X
read_item(X);
X:=X+M;
write_item(X);
read_item(Y);
T1
T2
write_item(X);
Y:=Y+N;
write_item(Y);
X
This graph has a cycle.
Therefore it is NOT serializable.
CONFLICT SERIALIZABILITY (positive example)
Testing conflict serializability of Schedule D
Schedule D
Precedence Graph for Schedule D
(p645)
(p648)
T1
read_item(X):
X:=X-N;
write_item(X);
T2
read_item(X);
X:=X+M;
write_item(X)
read_item(Y);
Y:=Y+N;
write_item(Y);
T1
T2
X
This graph has no cycles.
Therefore schedule D is serializable.
TEST FOR VIEW SERIALIZABILITY
Consider two schedules S and S’, where the same set of transactions participates in
both schedules. The schedules S and S’ are said to be view equivalent if the following
three conditions are met for each data item Q:
1. If transaction Ti reads the initial value of Q in schedule S, then transaction Ti
must, in schedule S’, also read the initial value of Q.
2. If transaction Ti executes read(Q) in schedule S, and that value was produced by
transaction Tj (if any), then transaction Ti must, in schedule S’, also read the value
of Q that was produced by transaction Tj.
3. The transaction (if any) that performs the final write(Q) operation in schedule S
must perform the final write(Q) operation in schedule S’.
A schedule S is said to be view serializable if it is view equivalent to a serial schedule.
VIEW SERIALIZABILITY (example)
Regard the following schedule—
T3
read3(Q)
write3(Q)
|
|
|
|
|
T4
write4(Q)
|
|
|
|
|
T6
write6(Q)
Sm: r3(Q); w4(Q); w3(Q); w6(Q); c1; c2; c3;
This schedule is view serializable. It is view equivalent to the serial schedule
<T3, T4, T6>, since one read(Q) instruction reads the initial value of Q in both
schedules, and T6 performs the final write of Q in both schedules.
VIEW SERIALIZABILITY (cont’d)
Every conflict-serializable schedule is view serializable, but there are view
serializable schedules that are not conflict serializable. The above schedule is not
conflict serializable, since every pair of consecutive instructions conflicts, and thus,
no swapping of instructions is possible.
Observe in the previous schedule that transactions T4 and T6 perform
write(Q) operations without having performed a read(Q) operation.
Writes of this sort are called blind writes. Blind writes appear in any viewserializable schedule that is not conflict serializable.
VIEW SERIALIZABILITY (cont’d)
NO EFFICIENT ALGORITHM
We can modify the precedence-graph test for conflict serializability to test for view
serializability. However, the resultant is expensive in CPU time. It is computationally
an expensive problem.
For conflict serializability, when we know the conflict pair case we either choose the
edge Ti -> Tj or the edge Tj -> Ti. This is no longer the case in view serializability.
This is why no efficient algorithm exists for this view test.
TRANSACTION DEFINITION IN SQL
A data manipulation language must include a construct for specifying the set of
actions that constitute a transaction.
The SQL standard specifies that a transaction begins implicitly. Transactions are
ended by one of the following SQL statements:
COMMIT
--commits the current transaction and begins a new one.
ROLLBACK --causes the current transaction to abort.
If the program terminates without either of these commands, the updates are either
committed or rolled back. Which of the two is not specified by the standard. The
choice is implementation dependent.
LOCKS
There are different ways in which a data item may be locked. We shall restrict
ourselves to two ways:
1. Shared. If a transaction Ti has obtained a shared-mode lock (denoted by
S) on item Q, then Ti can read, but cannot write, Q.
2. Exclusive. If a transaction Ti has obtained an exclusive-mode lock
(denoted by X) on item Q, then Ti can both read and write Q.
We can show a compatibility table for these modes:
S
X
|
|
|
S
T
F
|
|
|
X
F
F
|
|
|
For a transaction to unlock a data item immediately after its final access is not always
desirable. Serializability may not be ensured.
EXAMPLE: Let A and B be two accounts in a simplified banking system. They are
accessed by transactions T1 and T2. Transaction T1 transfers 50 sek from account B
to account A, and is defined as
T1:
lock-X(B);
read(B);
B := B – 50;
write(B);
unlock(B);
lock-X(A);
read(A);
A := A + 50;
write(A);
unlock(A);
Transaction T2 displays the total amount of money in accounts A and B—that is, the
sum A + B—and is defined as
T2:
lock-S(A);
read(A);
unlock(A);
lock-S(B);
read(B);
unlock(B);
display(A + B);
Suppose that the values of accounts A and B are 100 sek and 200 sek, respectively.
If these transactions are executed serially, <T1,T2> or <T2,T1>, the result is the
same, 300 sek.
If T1 and T2 execute concurrently, the following schedule is possible
T1
lock-X(B)
read(B)
B:= B-50
write(B)
unlock(B)
lock-X(A)
read(A)
A:=A+50
write(A)
unlock(A)
|
T2
|
|
|
|
|
|
| lock-S(A)
|
| read(A)
| unlock(A)
| lock-S(B)
|
| read(B)
| unlock(B)
| display(A+B)
|
|
|
|
|
|
| concurrency-control manager
|
| grant-X(B,T1)
|
|
|
|
|
| grant-S(B,T2)
|
|
|
| grant-S(B,T2)
|
|
|
|
| grant-X(A,T1)
|
|
|
|
COMMENTS
If we do not use locking, or unlock data items as soon as possible after reading or
writing them, we may get inconsistent states.
On the other hand, if we do not unlock a data item before requesting a lock on another
data item, deadlocks may occur.
More about deadlocks this later.
STARVATION
Suppose transaction T2 has a shared-mode lock on a data item,
transaction T1 requests an exclusive-mode lock on the same data item.
T1 has to wait for T2 to release the shared-mode lock.
Meanwhile, T3 requests a shared-mode lock.
This is compatible with the T2 lock and so T3 may be granted the lock before T2
finishes and T1 may have to wait for T3 as well. But etc….
It is possible that there is a sequence of transactions that each requests a shared-mode
lock on the data item and T1 never gets the exclusive-mode lock.
T1 may never make progress and is said to be starved.
Avoid Starvation
When a transaction Ti requests a lock on a data item Q in a particular mode M, the
lock is granted provided that
1. There is no other transaction holding a lock on Q in a mode that conflicts with M.
2. There is no other transaction that is waiting for a lock on Q, and that made its lock
request before Ti.
TWO-PHASE LOCKING PROTOCOL
One protocol that ensures serializability is the two-phase locking protocol. This
protocol requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new
locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as
needed.
Once the transaction releases a lock, it enters the shrinking phase, and it can issue no
more lock requests.
Two-phase locking does not ensure freedom from deadlock.
Cascading rollback may also occur.
STRICT TWO-PHASE LOCKING
Cascading rollbacks can be avoided by a modification of two-phase locking called the
strict two-phase locking protocol.
The strict two-phase locking protocol requires that in addition to locking being twophase, all exclusive-mode locks taken by a transaction must be held until that
transaction commits.
RIGOROUS TWO-PHASE LOCKING PROTOCOL
Another variant of two-phase locking is the rigorous two-phase locking protocol.
This requires all locks to be held until the transaction commits.
Most database systems implement either strict or rigorous two-phase locking.
These two-phase locking protocols can reduce transactions to serial schedules. This
observation led to a refinement of the basic two-phase locking protocol, in which lock
conversions are allowed.
A shared to exclusive is an upgrade.
An exclusive to a shared is a downgrade.
Upgrading can only take place in the growing phase.
Downgrading can only take place in the shrinking phase.
These conversions allow for more concurrent transactions.
WHAT IS DEADLOCK
A system is in deadlock if there exists a set of transactions such that every transaction
in the set is waiting for another transaction in the set.
More precisely, there exists a set of waiting transactions {T0,T1,T2,…,Tn} such that
T0 is waiting for a data item that is held by T1, T1 for 2 etc.. and Tn is waiting for T0.
None of these transaction can make progress in this situation.
DEADLOCK HANDLING
The only remedy to deadlock is for the system to invoke some drastic action, such as
rolling back some of the transactions involved in the deadlock.
There are two principle approaches.
1. We can use a deadlock prevention protocol to ensure that the system will never
enter a deadlock state.
2. We can allow the system to enter a deadlock state and then try to recover using a
deadlock-detection and deadlock recovery scheme.
Both methods may result in a rollback.
Download