Transaction management overview Concurrency control

advertisement
Database Systems
(資料庫系統)
December 26, 2005
Lecture 13
Merry Christmas
1
Announcement
• Assignment #6 will be out later today.
• Practicum #4 will also be out later today.
– Due in two weeks
2
Overview of Transaction
Management
Chapter 16
3
Transaction
• Transaction = one execution of a user program.
– Example: transfer money from account A to account B.
– A Sequence of Read & Write Operations
• For good performance, concurrent execution of user programs is
needed.
– Disk accesses are slow and frequent - keep the CPU busy by running
several transactions at the same time.
– Concurrency Control ensures that the result of concurrent execution of
several transactions is the same as some serial (one at a time)
execution of the same set of transactions.
• How can the results be different?
• Must also handle system crash in the middle of a transaction (or
aborted transactions).
– Crash recovery ensures that partially aborted transactions are not seen
by other transactions.
• How can the results be seen by other transactions?
4
Non-serializable Schedule
• A is the number of available copies of a
book = 1.
• T1 wants to buy one copy.
• T2 wants to buy one copy.
• T1 gets an error.
T1
T2
R(A=1)
Check if
(A>0)
R(A=1)
Check if
(A>0)
– The result is different from any serial
schedule T1,T2 or T2,T1
• How to ensure/detect a schedule is
serializable?
W(A=0)
W(A=0)
Error!
Commit
Commit
5
Unrecoverable Schedule
•
•
•
•
•
T1 deducts $100 from A.
T2 adds 6% interests to A.
T2 commits.
T1 aborts.
Why is the problem?
T1
R(A)
W(A)
R(A)
– Undo T1 => T2 has read a value for A that should
never been there.
– But T2 has committed! (may not be able to undo
committed actions).
– This is called unrecoverable schedule.
•
•
Is this a serializable schedule?
How to ensure/detect that a schedule is
recoverable (& serializable)?
T2
W(A)
Commit
Abort
6
Outline
• Four fundamental properties of transactions (ACID)
– Atomic, Consistency, Isolation, Durability
•
•
•
•
•
•
Schedule actions in a set of concurrent transactions
Problems in concurrent execution of transactions
Lock-based concurrency control (Strict 2PL)
Performance issues with lock-based concurrency control
Transaction support in SQL
Introduction to crash recovery
7
ACID Properties
• The DBMS’s abstract view of a transaction: a sequence
of read and write actions.
• Atomic: either all actions in a transaction are carried out
or none.
– Transfer $100 from account A to account B: R(A), A=A-100, W(A),
R(B), B=B+100, W(B) => all actions or none (retry).
– The system ensures this property.
– How can a transaction be incomplete?
• Aborted by DBMS, system crash, or error in user program.
– How to ensure atomicity during a crash?
• Maintain a log of write actions in partial transactions. Read the log
and undo these write actions.
8
ACID Properties (2)
• Consistency: run by itself with no concurrent execution leave the DB
in a “good” state.
– This is the responsibility of the user.
– No increase in total balance during a transfer => the user makes sure
that credit and debit the same amount.
• Isolation: transactions are protected from effects of concurrently
scheduling other transactions.
– The system (concurrency control) ensures this property.
– The result of concurrent execution is the same as some order of serial
execution (no interleaving).
– T1 || T2 produces the same result as either T1;T2 or T2;T1.
9
ACID Properties (3)
• Durability: the effect of a completed transaction should persist
across system crashes.
– The system (crash recovery) ensures durability property and atomicity
property.
– What can a DBMS do ensure durability?
– Maintain a log of write actions in partial transactions. If system crashes
before the changes are made to disk from memory, read the log to
remember and restore changes when the system restarts.
10
Schedules
• A transaction is seen by DBMS as a list of read and write actions on
DB objects (tuples or tables).
– Denote RT(O), WT(O) as read and write actions of transaction T on object
O.
• A transaction also need to specify a final action:
– commit action means the transaction completes successfully.
– abort action means to terminate and undo all actions
• A schedule is an execution sequence of actions (read, write, commit,
abort) from a set of transactions.
– A schedule can interleave actions from different transactions.
– A serial schedule has no interleaving actions from different transactions.
11
Examples of Schedules
Schedule with
Interleaving Execution
T1
T2
Serial Schedule
T1
R(A)
R(A)
W(A)
W(A)
R(B)
R(C)
W(B)
W(C)
Commit
Commit
T2
R(C)
R(B)
W(C)
W(B)
Commit
Commit
12
Concurrent Execution of
Transactions
• Why do concurrent executions of
transactions?
– Better performance.
– Disk I/O is slow. While waiting for disk I/O
on one transaction (T1), switch to another
transaction (T2) to keep the CPU busy.
T1
R(A)
W(A)
R(B)
• System throughput: the average number of
W(B)
transactions completed in a given time (per
second).
• Response Time: difference between
transaction completion time and submission time.
– Concurrent execution helps response time of
small transaction (T2).
T2
Commit
R(C)
W(C)
Commit
13
Serializable Schedule
• A serializable schedule is a schedule that
produces identical result as some serial
schedule.
– A serial schedule has no interleaving actions from
multiple transactions.
T1
R(A)
W(A)
R(A)
• We have a serializable schedule of T1 & T2.
– Assume that T2:W(A) does not influence T1:W(B).
– It produces the same result as executing T1 then
T2 (denote T1;T2).
T2
W(A)
R(B)
W(B)
R(B)
W(B)
Commit
Commit
14
Anomalies in Interleaved Execution
• Under what situations can an arbitrary (non-serializable)
schedule produce inconsistent results from a serial
schedule?
– Three possible situations in interleaved execution.
• Write-read (WR) conflict
• Read-write (RW) conflict
• Write-write (WW) conflict
– Note that you can still have serializable schedules with the
above conflicts.
15
WR Conflict (Dirty Read)
• Situation: T2 reads an object that has been
modified by T1, but T1 has not committed.
• T1 transfers $100 from A to B. T2 adds 6%
interests to A and B. A non-serializable
schedule is:
T1
R(A)
W(A-100)
R(A)
– Step 1: deduct $100 from A.
– Step 2: add 6% interest to A & B.
– Step 3: credit $100 in B.
W(A+6%)
R(B)
• Why is the problem?
– The result is different from any serial schedule > Bank adds $6 less interest.
– Why? Inconsistent state when T2 completes.
The value of A written by T1 is read by T2 before R(B)
T1 completes.
• A Transaction must leave DB in a consistent
state after it completes!
T2
W(B+6%)
Commit
W(B+100)
Commit
16
RW Conflicts (Unrepeatable Read)
• Situation: T2 modifies A that has been
read by T1, while T1 is still in progress.
– When T1 tries to read A again, it will get a
different result, although it has not modified
A in the meantime.
T1
R(A==1)
Check if (A>0)
R(A==1)
• A is the number of available copies of a
book = 1. T1 wants to buy one copy. T2
wants to buy one copy. T1 gets an error.
– The result is different from any serial
schedule
T2
Check if (A>0)
W(A=0)
W(A)
Error!
Commit
Commit
17
WW Conflict (Overwriting
Uncommitted Data)
• Situation: T2 overwrites the value of an
object A, which has already been
modified by T1, while T1 is still in
progress.
• T1 wants to set salaries of Harry & Larry
to ($2000, $2000). T2 wants to set them
to ($1000, $1000).
– The left schedule can produce the results
($1000, $2000).
– The result is different from any serial
schedule.
– This is called lost update (T2 overwrites
T1’s A value, so T1’s value of A is lost.)
T1
T2
W(A=2000)
W(A=1000)
W(B=1000)
Commit
W(B=2000)
Commit
18
Schedules with Aborted
Transactions
•
Serializable schedule needs to produce the
correct results under aborted transactions.
– For aborted transactions, undo all their actions as if
they were never carried out (atomic property).
•
T1
R(A)
W(A)
T1 deducts $100 from A. T2 adds 6% interests to
A. A non-recoverable schedule:
R(A)
W(A)
– Step 1: T1 deducts $100 from A.
– Step 2: T2 adds 6% interest to A. commit.
– Step 3: T1 abort!
•
R(B)
W(B)
Why is the problem?
– Undo T1 => T2 has read a value for A that should
never been there. But T2 has committed! (may not
be able to undo committed actions).
– This is called unrecoverable schedule.
T2
Commit
Abort
19
Another Problem in Undo
• Undo T1: restore value of A to
before T1’s change (A=5).
• Problem: T2’s change to A is also
lost, even if T2 has already
committed.
T1
T2
A=5
R(A)
W(A) // A=6
R(A)
W(A) // A=7
R(B)
W(B)
Commit
Abort
20
Lock-Based Concurrency Control
• Concurrency control ensures that
– (1) Only serializable, recoverable schedules are allowed
– (2) Actions of committed transactions are not lost while undoing
aborted transactions.
• How to guarantee safe interleaving of transactions’
actions (serializability & recoverability)?
– Strict Two-Phase Locking (Strict 2PL)
• A lock is a small bookkeeping object associated with a
DB object.
– Shared lock: several transactions can have shared locks on the
same DB object.
– Exclusive lock: only one transaction can have an exclusive lock
on a DB object.
21
Strict 2PL
• Rule #1: If a transaction T wants to read an object, it requests a
shared lock. Denote as S(O).
• Rule #2: If a transaction T wants to write an object, it requests an
exclusive lock. Denote as X(O).
• Rule #3: When a transaction is completed (aborted), it releases all
held locks.
• Notes:
– A transaction suspends when DBMS cannot grant the lock.
– Why shared lock for read? (Ok for concurrent reads but no concurrent
read/write => avoid RW/WR conflicts)
– Why exclusive lock for write? (Allow no concurrent reads/writes =>
avoid RW/WR/WW conflicts)
– Requests to acquire or release locks can be automatically inserted into
transactions.
22
Example #1 of Strict 2PL
T1
T2
Suspend
T1
T2
X(A)
X(A)
X(A)
R(A)
R(A)
R(A)
W(A)
W(A)
W(A)
X(B)
X(B)
X(B)
R(B)
R(B)
W(B)
W(B)
Commit
Commit
R(B)
W(B)
Commit //
Release locks
X(A)
R(A)
W(A)
R(B)
W(B)
Commit
23
Example #2 of Strict 2PL
T1
T2
T1
T2
S(A)
S(A)
S(A)
R(A)
R(A)
R(A)
X(C)
X(B)
S(A)
R(C)
R(B)
R(A)
W(C)
W(B)
X(B)
Commit
Commit
R(B)
W(B)
Commit
X(C)
R(C)
W(C)
Commit
24
Deadlocks
• T1 and T2 will make no further
progress.
• Two ways to handle deadlock:
– Prevent deadlocks from occurring.
– Detection deadlocks and resolve
them.
• Detect deadlocked transactions
by timeout (assuming they are
waiting for a lock for too long)
Abort the transactions.
T1
T2
X(A)
X(B)
Request X(B)
Blocked!
Request X(A)
Blocked too!
25
Performance of Locking
• Lock-based schemes are designed to
resolve conflicts between transactions. It
has two mechanisms:
• Both have costs and may impact
throughput (# transactions completed per
second).
– More active transaction executing
concurrently, higher the probably of blocking.
– Thrashing can occur when too many
blocked transactions (or aborted
transactions).
Throughput
– Blocking: waiting for a lock.
– Aborting: waiting for too long, restarting it.
thrashing
# Active Transactions
26
Improve Throughput
• Prevent thrashing: monitor % blocked transactions and
reduce the number of active transactions executing
concurrently.
• Other methods to improve throughputs:
– Lock the smallest sized objects possible (reduce the likelihood of
two transactions need the same lock).
– Reduce the time that transaction hold locks (reduce blocking
time of other transactions)
– Reduce hot spots (hot spots = frequently accessed and modified
objects).
27
What to Lock in SQL?
• Option 1: table granularity
– T1: shared lock on S
– T2: exclusive lock on S
– Big sized object -> low
concurrency (T1 no|| T2)
• Option 2: row granularity
– T1: shared lock on rows with
rating = 8.
– T2: exclusive lock on rows with
S.name=“Joe” AND S.rating=8
– Smaller granularity -> better
concurrency (T1 || T2)
<T1>
SELECT S.rating, MIN(S.age)
FROM Sailors S
WHERE S.rating = 8
<T2>
UPDATE Sailors S
SET
S.age=10
WHERE S.name=“Joe” AND
S.rating=8
28
Concurrency Control
Chapter 17 (17.1~17.4)
29
The Question
• How to ensure that a schedule is both
serializable & recoverable?
– Serializable: correct results from interleaving of
transactions
– Recoverable: safe from aborted transactions
30
Conflict Serializable Schedules
• Two schedules are conflict equivalent if:
–
–
Involve the same (write/read) actions of the same transactions
Every pair of conflicting actions is ordered the same way
•
Conflicting actions = actions on the same data object and at least
one of the action is a write.
• Schedule S is conflict serializable if S is conflict
equivalent to a serial schedule
– A serial schedule is a schedule with no interleaving actions from
different transactions.
– A serializable schedule is a schedule that produces identical
result as some serial schedule.
– A conflict serializable schedule is serializable + (closer to
recoverable)
31
Example
• Serializable, but not conflict
serializable schedule
– Why serializable?
– Same result as the serial
schedule T1,T2,T3.
– Why not conflict serializable?
– Writes of T1 and T2 (conflicting
actions) are ordered differently
than the serial schedule of
T1,T2,T3.
T1
T2
T3
R(A)
W(A)
Commit
W(A)
Commit
W(A)
Commit
32
Precedence Graph
T1
• How do we know a given
schedule is conflict serializable?
• Capture the conflicting actions on
a precedence graph & check for
cycle in the graph.
T2
T3
1
R(A)
W(A)
2
Commit
W(A)
Commit
W(A)
– One node per transaction
– Edge from Ti to Tj if an action of Ti
precedes and conflicts (R/W, W/R, W/R)
with one of Tj’s actions.
• Schedule is conflict serializable if
and only if its precedence graph
is acyclic.
Commit
1
T1
T2
2
T3
33
Strict 2PL
• Strict Two-phase Locking (Strict 2PL) Protocol:
– If a transaction T wants to read an object, it requests a shared
lock. Denote as S(O). If a transaction T wants to write an object,
it requests an exclusive lock. Denote as X(O).
– Locks are released only when transaction is completed (aborted).
– If a transaction holds an X lock on an object, no other transaction
can get a lock (S or X) on that object.
• Why Strict 2PL allows only conflict serializable schedules?
– Say T1 & T2 have conflicting actions.
– If T1 obtains a lock on the data object first, T2’s conflicting
actions must wait until T1 is done.
– What if T2 obtains a lock on another data object before T1?
34
T1
T2
Conflicting action on data #1 (get lock)
Conflicting action on data #2 (get
lock?)
Conflicting action on data #2 (get lock)
Commit (release locks)
Conflicting action on data #1
(get lock)
35
Two-Phase Locking (2PL)
• 2PL Protocol is the same as Strict 2PL, except
–
–
–
•
•
2PL also produces conflict serializable schedules.
What is the benefit of 2PL over strict 2PL?
–
•
A transaction can release locks before the end (unlike Strict 2PL),
i.e., after it is done with reading & writing the objects.
However, a transaction can not request additional locks after it
releases any locks.
2PL has lock growing phase and shrinking phase.
Smaller lock holding time -> better concurrency
What is the benefit of strict 2PL over 2PL?
–
Recoverable schedule vs. non-recoverable schedule
36
Non-recoverable
Schedule
• Schedule allowed by 2PL
may not be recoverable in
aborts:
T1
T2
X(A)
R(A)
W(A)
Release X(A)
X(A)
– Say T1 aborts and we need
to undo T1.
– But T2 has read a value for
A that should never been
there.
– But T2 has committed!
(may not be able to undo
committed actions).
R(A)
W(A)
X(B)
R(B)
Release X(A)
W(B)
Release X(B)
Commit
Abort
37
38
Lock Management
• Lock and unlock requests are handled by the lock
manager.
• Each lock manager maintains a lock table of lock table
entries.
• Each lock table entry keeps info about:
– The data object (page, record) being locked
– Number of transactions currently holding a lock (>1 if shared
mode)
– Type of lock held (shared or exclusive)
– Lock request queue
39
Lock Management Implementation
• If a shared lock is requested:
– Check if the request queue is empty. Check if the lock is in
exclusive/share mode. If (yes,share), grant the lock & update
the lock table entry.
• When a transaction aborts or commits, it releases all
lock.
– Update the lock table entry. Check for lock request queue, …
• Locking and unlocking have to be atomic operations:
– E.g., cannot have concurrent operations on the same lock table
entry.
40
Lock Conversions
• Lock upgrade: transaction that
holds a shared lock can be
upgraded to hold an exclusive lock.
– Get shared lock on each row in a table.
– When a row meets the condition, get
an exclusive lock.
• Alternative approach is lock
downgrade.
UPDATE Sailors S
SET
S.age=10
WHERE S.name=“Joe”
AND S.rating=8
– Get exclusive lock on each row in a
table.
– When a row does not meet the
condition, downgrade to shared lock.
41
Deadlocks
• Deadlock: Cycle of transactions waiting for
locks to be released by each other.
• Two ways of dealing with deadlocks:
–
–
Deadlock detection
Deadlock prevention
• Deadline Detection:
T1
S(A)
R(A)
•
Nodes are transactions
There is an edge from Ti to Tj if Ti is waiting
for Tj to release a lock
S(B)
R(B)
– Create a waits-for graph:
•
T2
X(B): W(B)
X(A): W(A)
– Periodically check for cycles in the waits-for
graph,
• cycle = Deadlock
• Resolve a deadlock by aborting a transaction on a
cycle.
42
Deadlock Detection
T1
T2
T3
T4
T1
T2
T4
T3
T1
T2
T4
T3
S(A)
R(A)
X(B)
W(B)
S(B)
S(C)
R(C)
X(C)
X(B)
X(A)
43
Deadlock Prevention
• Basic ideas:
– Assign priorities to transactions based on timestamps when they
start up. (lower timestamps = higher priority)
– Lower priorities cannot wait for higher-priority transactions.
• Assume Ti wants a lock that Tj holds. Two policies are
possible to prevent deadlocks:
–
–
Wait-Die: If Ti has higher priority, Ti waits for Tj; otherwise Ti
aborts (lower-priority T never waits for higher-priority T)
Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits.
(higher-priority T never waits for lower-priority T)
• Why these two policies prevent deadlocks?
– The oldest transaction (highest priority one) will eventually get all
the locks it requires!
44
Wait-Die Policy
• Lower-priority T
never waits for
higher-priority T
• If Ti has higher
priority, Ti waits
for Tj;
• Otherwise Ti
aborts
T1
T2
T3
T4
S(A)
R(A)
X(B)
W(B)
S(B)
S(C)
R(C)
X(C)
X(B) // abort
X(A) // abort
45
Wound-Wait Policy
•
•
•
Higher-priority T
never waits for
lower-priority T
If Ti has higher
priority, Tj aborts;
Otherwise Ti
waits.
T1
T2
T3
T4
S(A)
R(A)
X(B)
W(B)
S(B) // Abort T2
Abort
46
Deadlock Prevention (2)
• If a transaction re-starts, make sure it has its original
timestamp.
– It will not be forever aborted due to low priority.
• Conservative 2PL: ensure no deadlock (no blocking)
during transaction
– Each transaction begins by getting all locks it will ever need
– What is the tradeoff?
• Longer lock holding time -> reduce concurrency
47
Download