File

advertisement
UNIT III
TRANSACTION PROCESSING AND CONCURRENCY CONTROL
Introduction-Properties of Transaction- Serializability- Concurrency Control – Locking
Mechanisms- Two Phase Commit Protocol-Dead lock.




1. TRANSACTION CONCEPTS
A Transaction is a unit of program execution that accesses and possibly updates various data
items.
A transaction is initiated by a user program written in a high level data manipulation
language or programming language(For example, SQL, C++, Java) where it is delimited by
statements (or function calls) of the form begin transaction and end transaction.
The transaction consists of all operations executed between the begin transaction and end
transaction.
To ensure integrity of the data, require that the database system maintain the following
properties of the transactions.
a. Atomicity
b. Consistency
c. Isolation
d. Durability
These properties are called ACID properties.
 Transactions access data using two operations.
 Read(X): This transfer the data item X from the database to a local buffer belonging
to the transaction that executed the read operation.
 Write(X): This transfer the data item X from the local buffer of the transaction that
executed the write back to the database.
 Let Ti be a transaction that transfers $50 from account A to account B. This transaction can
be defined as
Ti:
read (A)
A: =A-50;
Write (A);
Read (B);
B: =B+50;
Write (B);
 Transaction management guarantees a correct transaction and maintains the database in a
correct state.
 It guarantees that if the transaction executes some updates and then a failure occurs before the
transaction reaches its planned termination, then those updates will be undone.
 Thus the transaction either executes entirely or totally cancelled.
 The system component that provides this atomicity is called transaction manager or
transaction processing monitor or TP monitor.
 ROLLBACK and COMMIT are key to the way it works.
1. COMMIT:
 The COMMIT operation signals successful end of transaction.
 It tells the transaction manager that a logical unit of work has been successfully completed
and database is in correct state and the updates can be recorded or saved.
2. ROLLBACK:
 By contrast, the ROLLBACK operation signals unsuccessful end of transaction.
 It tells the transaction manager that something has gone wrong, the database might be in
incorrect state and all the updates made by the transaction should be undone.
TRANSACTION PROPERTIES
 ACID stands for Atomicity, Correctness, Isolation and Durability.
Atomicity:
 Transactions are atomic.
Consider the following example
Transaction to transfer $50 from account A to account B:
read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
 read(X), which transfers the data item X from the database to a local buffer belonging to the
transaction that executed the read operation.
 write(X), which transfers the data item X from the local buffer of the transaction that
executed the write back to the database.
 Before the execution of transaction Ti the values of accounts A and B are $1000 and $2000,
respectively. Suppose if the transaction fails due to some power failure, hardware failure and
system error the transaction Ti will not execute successfully.
 If the failure happens after the write(A) operation but before the write(B) operation. The
database will have values $950 and $2000 which results in a failure.
 The system destroys $50 as a result of failure and leads the system to inconsistent state.
 The basic idea of atomicity is: The database system keeps track of the old values of any data
on which a transaction performs a write, if the transaction does not terminate successfully
then the database system restores the old values.
 Atomicity is handled by transaction-management component.
Consistency:
 Transactions transform a correct state of the database into another correct state, without
necessarily preserving correctness at all intermediate points.
 In example, the transaction is in consistent state if the sum of A and B is unchanged by the
execution of transaction.
 If the database is consistent before an execution of the transaction, the database remains
consistent after the execution of the transaction.
 Ensuring consistency for an individual transaction is the responsibility of the application
programmer who codes the transaction.
Isolation:
 Transactions are isolated from one another.
 Even though there are many transactions running concurrently, any given transaction‘s
updates are concealed from all the rest, until that transaction commits.
 The database will be temporarily inconsistent while the transaction is in progress.
 When the amount is reduced from A and not yet incremented to B. the database will be
inconsistent.
 If a second concurrently running transaction reads A and B at this intermediate point and
computes A+B, it will observe an inconsistent value.
 If the second transaction performs updates on A and B based on the inconsistent values that it
read, the database will remain inconsistent even after both transactions are completed.
 In order to avoid this problem serial execution of transaction is preferred.
 Concurrency control component maintain isolation of transaction.
Durability:
 Once a transaction commits, its updates persist in the database, even if there is a subsequent
system crash.
 The computer system failure may lead to loss of data in main memory, but data written to
disk are not lost.
 Durability is guaranteed by ensuring the following.
 The updates carried out by the transaction should be written to the disk.
 Information stored in the disk should be sufficient to enable the database to
reconstruct the updates when the database system restarts after failure.
 Recovery management component is responsible for ensuring durability.
TRANSACTION STATES
A transaction must be in one of the following states:
1.
2.
3.
4.
5.








Active: The initial state; the transaction stays in this state while it is executing.
Partially committed: After the final statement has been executed.
Failed: After the discovery that normal execution can no longer proceed.
Aborted: After the transaction has been rolled back and the database has been restored to its
state prior to the start of the transaction
Committed: After successful completion
A transaction has committed only if it has entered the committed state.
Similarly, we say that a transaction has aborted only if it has entered the aborted state.
A transaction is said to have terminated if has either committed or aborted.
A transaction starts in the active state.
When it finishes its final statement, it enters the partially committed state.
At this point, the transaction has completed its execution, but it is still possible that it may
have to be aborted, since the actual output may still be temporarily residing in main memory,
and thus a hardware failure may preclude its successful completion.
The database system then writes out enough information to disk that, even in the event of a
failure, the updates performed by the transaction can be re-created when the system restarts
after the failure.
When the last of this information is written out, the transaction enters the committed state.
 A transaction enters the failed state after the system determines that the transaction can no
longer proceed with its normal execution (for example, because of hardware or logical
errors).
 Such a transaction must be rolled back.
 Then, it enters the aborted state. At this point, the system has two options:
 It can restart the transaction, but only if the transaction was aborted as a result of
some hardware or software error that was not created through the internal logic of the
transaction. A restarted transaction is considered to be a new transaction.
 It can kill the transaction. It usually does so because of some internal logical error that
can be corrected only by rewriting the application program, or because the input was
bad, or because the desired data were not found in the database.
SERIALIZABILITY



Basic Assumption – Each transaction preserves database consistency.
Thus serial execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.
Different forms of schedule equivalence give rise to the notions of:
1.conflict serializability
2.view serializability
 Simplified view of transactions
 We ignore operations other than read and write instructions
 We assume that transactions may perform arbitrary computations on
data in local buffers in between reads and writes.
 Our simplified schedules consist of only read and write instructions.
Conflicting Instructions:
*Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists
some item Q accessed by both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
*Intuitively, a conflict between li and lj forces a (logical) temporal order between them.
l If li and lj are consecutive in a schedule and they do not conflict, their results
would remain the same even if they had been interchanged in the schedule.
Conflict Serializability:
*If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting
instructions, we say that S and S´ are conflict equivalent.
*We say that a schedule S is conflict serializable if it is conflict equivalent to a serial
schedule.
View Serializability:
*Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’
also transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value was produced
by transaction Tj (if any), then in schedule S’ also transaction Ti must read the
value of Q that was produced by the same write(Q) operation of transaction Tj
.
3. The transaction (if any) that performs the final write(Q) operation in schedule
S must also perform the final write(Q) operation in schedule S’.
*As can be seen, view equivalence is also based purely on reads and writes alone.
Testing for Serializability:
 Consider some schedule of a set of transactions T1, T2, ..., Tn
 Precedence graph — a direct graph where the vertices are the transactions (names).
 We draw an arc from Ti to Tj if the two transaction conflict, and Ti accessed the data
item on which the conflict arose earlier.
 We may label the arc by the item that was accessed.
 Example 1
Test for Conflict Serializability:
 A schedule is conflict serializable if and only if its precedence graph is acyclic.
 Cycle-detection algorithms exist which take order n2 time, where0is the number of
vertices in the graph.
l (Better algorithms take order0+ e where e is the number of edges.)

If precedence graph is acyclic, the serializability order can be obtained by a
topological sorting of the graph.
 This is a linear order consistent with the partial order of the graph.
 For example, a serializability order for Schedule A would be
T5  T1  T3  T2  T4
Test for View Serializability:
 The precedence graph test for conflict serializability cannot be used directly to test for
view serializability.
 Extension to test for view serializability has cost exponential in the
size of the precedence graph.
 The problem of checking if a schedule is view serializable falls in the class of NPcomplete problems.
 Thus existence of an efficient algorithm is extremely unlikely.
 However practical algorithms that just check some sufficient conditions for view
serializability can still be used.








CONCURRENCY
Concurrency in terms of databases means allowing multiple users to access the data
contained within a database at the same time.
If concurrent access is not managed by the Database Management System (DBMS) so that
simultaneous operations don't interfere with one another problems can occur when various
transactions interleave, resulting in an inconsistent database.
Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB
objects) of various transactions.
Each transaction must leave the database in a consistent state if the DB is consistent when
the transaction begins.
Concurrent execution of user programs is essential for good DBMS performance.
Because disk accesses are frequent, and relatively slow, it is important to keep the CPU
humming by working on several user programs concurrently.
Interleaving actions of different user programs can lead to inconsistency: e.g., check is
cleared while account balance is being computed.
DBMS ensures such problems don’t arise: users can pretend they are using a single-user
system.
Purpose of Concurrency Control




To enforce Isolation (through mutual exclusion) among conflicting transactions.
To preserve database consistency through consistency preserving execution of transactions.
To resolve read-write and write-write conflicts.
Example: ----In concurrent execution environment if T1 conflicts with T2 over a data item
A, then the existing concurrency control decides if T1 or T2 should get the A and if the
other transaction is rolled-back or waits.
LOCKING PROTOCOLS
 One way to ensure serializability is to require that data items be accessed in a mutually
exclusive manner; that is, while one transaction is accessing a data item, no other
transaction can modify that data item.
 The most common method used to implement this requirement is to allow a transaction to
access a data item only if it is currently holding a lock on that item.
 A locking protocol is a set of rules to be followed by each transaction to ensure that, even
though the actions of several transactions might be interleaved; the net effect is identical to
executing all transactions in some serial order.
LOCKS
 Lock is a variable associated with data item which gives the status whether the possible
operations can be applied on it or not.
 There are various modes in which a data item may be locked.
 The two modes are
1. Shared
 It is denoted by ‘S’.
 If a transaction T1 has obtained a shared mode lock on item P, then T1 can
read, but cannot write P.
2. Exclusive
 It is denoted by X.
 If a transaction T2 has obtained an exclusive mode lock on item P, then T2
can have both read and write P.
Requesting a Lock:
 Every transaction request a lock on data item P, depending on the types of operations that it
will perform on P.
 The request is made to the ‘Concurrency Control Manager’.
 The transaction can precede the operation only after the concurrency control manager ‘grants’
the lock to the transaction.
Lock Compatibility Matrix ‘comp’:
 The compatibility relation between 2 modes of locks exclusive (X) and shared (S) is given by
the matrix ‘comp’.
Comp (A,B)
S
X
S
True
False
X
False
False
 An element comp (A, B) of the matrix has the value ‘True’ if and only if A is in shared mode
and B is also in shared mode.
 The shared mode is compatible with shared mode, but not with exclusive mode.
a. Lock-S (Q):
 A transaction requests a shared lock on data item Q by executing this instruction.
b. Lock-X (Q):
 A transaction requests an exclusive lock on data item Q by executing this instruction.
c. Unlock (Q):
 A data item Q can be unlocked with this instruction.
 To access a data item, any transaction Ti must first lock that item. In any transaction,
a data item is unlocked immediately after its final access of the data item.
 Example:
T1 transfers Rs.50/- from account B to account A.
T2 displays the total amount of money in A & B.
T1: lock-X (B);
Read (B);
T2: lock-S (A);
Read (A);
B: =B-50;
Write (B);
Unlock (B);
Lock-X (A);
Unlock (A);
Lock-S (B);
Read (B);
Read (A);
A: =A+50;
Unlock (B);
Write (A);
Display (A+B);
Unlock (A);
 The transaction making a lock request cannot execute its next action until the concurrency
control manager grants the lock.
 Hence, the lock must be granted in the interval of time between the lock request operations.
LOCKING TECHNIQUES
 A lock is a mechanism to control concurrent access to a data item
 Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested
using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S
instruction.
Lock requests are made to concurrency-control manager. Transaction can proceed only after
request is granted.
Pitfalls of Lock-Based Protocols:
 The potential for deadlock exists in most locking protocols. Deadlocks are a necessary
evil.
Starvation is also possible if concurrency control manager is badly designed. For example:
1.A transaction may be waiting for an X-lock on an item, while a sequence of
other transactions request and are granted an S-lock on the same item.
2.The same transaction is repeatedly rolled back due to deadlocks.
 Concurrency control manager can be designed to prevent starvation.
The Two-Phase Locking Protocol:
This is a protocol which ensures conflict-serializable schedules.
Phase 1: Growing Phase
o transaction may obtain locks
o transaction may not release locks
Phase 2: Shrinking Phase
 transaction may release locks
 transaction may not obtain locks
 The protocol assures serializability. It can be proved that the transactions can be
serialized in the order of their lock points (i.e. the point where a transaction acquired
its final lock).
 Two-phase locking does not ensure freedom from deadlocks.
 Cascading roll-back is possible under two-phase locking. To avoid this, follow a
modified protocol called strict two-phase locking. Here a transaction must hold all
its exclusive locks till it commits/aborts.
 Rigorous two-phase locking is even stricter: here all locks are held till commit/abort.
In this protocol transactions can be serialized in the order in which they commit.
Lock Conversions:
 Two-phase locking with lock conversions:
– First Phase:
o can acquire a lock-S on item
o can acquire a lock-X on item
o can convert a lock-S to a lock-X (upgrade)
– Second Phase:
o can release a lock-S
o can release a lock-X
o can convert a lock-X to a lock-S (downgrade)
 This protocol assures serializability. But still relies on the programmer to insert the
various locking instructions.
Implementation of Locking:
 A lock manager can be implemented as a separate process to which transactions send
lock and unlock requests.
 The lock manager replies to a lock request by sending a lock grant messages (or a
message asking the transaction to roll back, in case of a deadlock).


The requesting transaction waits until its request is answered.
The lock manager maintains a data-structure called a lock table to record granted
locks and pending requests.
The lock table is usually implemented as an in-memory hash table indexed on the name of the
data item being locked.
TWO PHASE COMMIT
 Two-phase commit is important whenever a given transaction can interact with several
independent “resource managers”.
 Example:
 Consider a transaction running on an IBM mainframe that updates both an IMS
database and a DB2 database. If the transaction completes successfully, then both
IMS data and DB2 data are committed.
 Conversely, if the transaction fails, then both the updates must be rolled back.
 It is not possible to commit one database update and rollback the other. If done so the
atomicity will not be maintained in the system.
 Therefore, the transaction issues a single “global” or system-wide COMMIT or
ROLLBACK.
 That COMMIT or ROLLBACK is handled by a system component called the
coordinator.
 Coordinators task is to guarantee the resource managers commit or roll back.
 It should also guarantee even if the system fails in the middle of the process.
 The two-phase commit protocol is responsible for maintaining such a guarantee.
WORKING
 Assume that the transaction has completed and a COMMIT is issued. On receiving the
COMMIT request, the coordinator goes through the following two-phase process:
Prepare:
1. The resource manager should get ready to “go either way” on the transaction.
2. The participant in the transaction should record all updates performed during the transaction
from temporary storage to permanent storage.
3. In order to perform either COMMIT or ROLLBACK as necessary.
4. Resource manager now replies ”OK” to the coordinator or “NOT OK” based on the write
operation.
Commit:
1. When the coordinator has received replies from all participants, it takes a decision regarding
the transaction and records it in the physical log.
2. If all replies were ”OK” that the decision is “commit”; if any reply was “Not OK” the
decision is “rollback”
3. The coordinator informs its decision to all the participants.
4. Each participant must then commit or roll back the transaction locally, as instructed by the
coordinator.
 If the system fails at some point during the process, the restart procedure looks for the
decision of the coordinator.
 If the decision is found then the two phase commit can start processing from where it has left
off.
 If the decision is not found then it assumes that the decision is ROLLBACK and the process
can complete appropriately.
 If the participants are from several systems like in distributed system, then some participants
should wait for long time for the coordinators decision.
 Data communication manager (DC manager) can act as a resource manager in case of a twophase commit process.
DEADLOCK
Deadlock Handling:
n Consider the following two transactions:
T1: write (X)
T2: write(Y)
write(Y)
write(X)
n Schedule with deadlock
T1
T2
lock-X on X
write (X)
lock-X on Y
write (X)
wait for lock-X on X
wait for lock-X on Y

System is deadlocked if there is a set of transactions such that every transaction in the
set is waiting for another transaction in the set.
 Deadlock prevention protocols ensure that the system will never enter into a deadlock
state. Some prevention strategies :
 Require that each transaction locks all its data items before it begins execution
(predeclaration).
 Impose partial ordering of all data items and require that a transaction can lock
data items only in the order specified by the partial order (graph-based protocol).
More Deadlock Prevention Strategies:
 Following schemes use transaction timestamps for the sake of deadlock prevention
alone.
 wait-die scheme — non-preemptive
o older transaction may wait for younger one to release data item. Younger
transactions never wait for older ones; they are rolled back instead.
o a transaction may die several times before acquiring needed data item
 wound-wait scheme — preemptive
o older transaction wounds (forces rollback) of younger transaction instead of
waiting for it. Younger transactions may wait for older ones.
o may be fewer rollbacks than wait-die scheme.
Deadlock prevention:
 Both in wait-die and in wound-wait schemes, a rolled back transactions is
restarted with its original timestamp. Older transactions thus have precedence over
newer ones, and starvation is hence avoided.
 Timeout-Based Schemes :
o a transaction waits for a lock only for a specified amount of time. After that,
the wait times out and the transaction is rolled back.
o thus deadlocks are not possible.
o simple to implement; but starvation is possible. Also difficult to determine
good value of the timeout interval.
Deadlock Detection:
 Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E),
o V is a set of vertices (all the transactions in the system)
o E is a set of edges; each element is an ordered pair Ti Tj.
 If Ti  Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is
waiting for Tj to release a data item.

When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted
in the wait-for graph. This edge is removed only when Tj is no longer holding a data
item needed by Ti.
 The system is in a deadlock state if and only if the wait-for graph has a cycle. Must
invoke a deadlock-detection algorithm periodically to look for cycles.
Deadlock Recovery:
 When deadlock is detected :
o Some transaction will have to rolled back (made a victim) to break
deadlock. Select that transaction as victim that will incur minimum
cost.
o Rollback -- determine how far to roll back transaction
 Total rollback: Abort the transaction and then restart it.
 More effective to roll back transaction only as far as necessary to break
deadlock.
o Starvation happens if same transaction is always chosen as victim. Include
the number of rollbacks in the cost factor to avoid starvation
Download