Transaction Processing

advertisement
Transaction Processing
John Ortiz
Introduction
 Transactions are motivated by two of the
properties of DBMS's discussed way back in
our first lecture:
Multi-user database access
Safe from system crashes
 Main issues:
How to model concurrent execution of user
programs?
How to guarantee acceptable DB behavior?
How to deal with system crashes?
Lecture 19
Transaction Processing
2
Why Concurrency?
 Allowing only serial execution of user programs
may cause poor system performance
Low throughput, long response time
Poor resource utilization (CPU, disks)
 Concurrent execution of user programs is
essential for good DBMS performance.
 Because disk accesses are frequent, and
relatively slow, it is important to keep the
CPU humming by working on several user
programs concurrently
Lecture 19
Transaction Processing
3
Example: Why Concurrency?
 Assume each users’ program uses CPU and I/O
resources (disks) in an interleaved fashion:
CPU, R(X), CPU, W(X)
 Suppose each CPU request takes 1 time unit
and each I/O request takes 5 time units.
 For a 2 GHz Machine, one clock tick is ½ ns
An 8 millisecond seek time is 8000 microseconds,
which is 8,000,000 ns
Clearly the CPU can get quite a bit done while the
disk is searching for a block
Lecture 19
Transaction Processing
4
Example: Why Concurrency?
Serial schedule
Time units = 48
T1
T2
T3
T4
T1
T2
T3
T4
Lecture 19
CPU
I/O
Time
Non-serial schedule
Time units = 41
CPU
I/O
Time
Transaction Processing
5
Example: Why Concurrency?
Serial schedule
Time units = 48
T1
T2
T3
T4
T1
T2
T3
T4
Lecture 19
CPU
I/O
Time
Non-serial schedule
Time units = 22
CPU
Use 2 disks
I/O 1
I/O 2
Time
Transaction Processing
6
Transaction
 A user program may carry out many operations
on data retrieved from database, but DBMS is
only concerned about what data is read/written
from/to the database (on disk)
 A transaction is a sequence of database actions
that is considered as a unit of work
DB actions: read (R(X)), write (W(X)),
commit, abort
Represent DBMS’s abstract view of
Interact user sessions
Execution of user programs
Lecture 19
Transaction Processing
7
Example: Transaction
 Account(Ano, Name, Type, Balance)
A user want to
update Account set Balance = Balance – 50
where Ano = 10001
update Account set Balance = Balance + 50
where Ano = 12300
Let A be account w/ Ano=10001, B be account
w/ Ano=12300. The transaction is
R(A), W(A), R(B), W(B)
Lecture 19
Transaction Processing
8
States of a Transaction
begin
transaction
partially
committed
end
active transaction
exception
commit
committed
failure
failed
abort
aborted
read/write
Lecture 19
Transaction Processing
9
Consistency of Transaction

Each transaction must leave the database in a
consistent state if the DB is consistent when
the transaction begins.
DBMS will enforce some ICs, depending on
the ICs declared in CREATE TABLE
statements.
Beyond this, the DBMS does not really
understand the semantics of the data. (e.g.,
it does not understand how the interest on a
bank account is computed).
Lecture 19
Transaction Processing
12
Atomicity of Transactions
 A transaction might commit after completing
all its actions, or it could abort (or be aborted
by the DBMS) after executing some actions.
 A very important property guaranteed by the
DBMS for all transactions is that they are
atomic. That is, a user can think of a
transaction as always executing all its actions
in one step, or not executing any actions at all.
DBMS logs all actions so that it can undo the
actions of aborted transactions.
Lecture 19
Transaction Processing
13
Example: Why Atomicity?
 Account(Ano, Name, Type, Balance)
A user want to
update Account set Balance = Balance – 50
where Ano = 10001
update Account set Balance = Balance + 50
where Ano = 12300
System crashed in the middle
 Possible outcome w/o recovery:
$50 transferred or lost
The operations must be done as a unit
Lecture 19
Transaction Processing
14
Durability
 DBMS often save data in main memory buffer
to improve system efficiency. Data in buffer is
volatile (may get lost if system crashes)
 When a transaction commits, DBMS must
guarantee that all updates make by the
transaction will not be lost even if the system
crashes later
DBMS uses the log to redo actions of
committed transactions if necessary
Lecture 19
Transaction Processing
15
Isolation
 Users submit transactions, and can think of
each transaction as executing by itself (in
isolation)
 Concurrency is achieved by the DBMS, which
interleaves actions (reads/writes of DB
objects) of various transactions
 DBMS guarantees that interleaving
transactions do not interfere with each
other
Lecture 19
Transaction Processing
16
Example: Why Isolation?
 Two users (programs) do this at the same time
User 1: update Student set GPA = 3.7 where
SID = 123
User 2: update Student set Major = ‘CS’
where SID = 123
 Sequence of events: for each user, read tuple,
modify attribute, write tuple.
 Possible outcomes w/o concurrency control: one
change or both
Lecture 19
Transaction Processing
17
Example: Why Isolation?
Emp(EID, Name, Dept, Sal, Start, Loc)
 User 1: update Emp set Dept = ‘Sales’ where
Loc = ‘Downtown'
 User 2: update Emp set Start = 3/1/00 where
Start = 2/29/00
 Possible outcomes w/o concurrency control:
each tuple has one change or both, may be
inconsistent across tuples
Lecture 19
Transaction Processing
18
Example: Interleaved Transactions
 Consider two transactions:
T1: BEGIN A=A+100, B=B-100 END
T2: BEGIN A=1.06*A, B=1.06*B END
One possible interleaved execution:
T1: A=A+100,
B=B-100
T2:
A=1.06*A,
B=1.06*B
It is OK. But what about another interleaving?
T1: A=A+100,
B=B-100
T2:
A=1.06*A, B=1.06*B
Lecture 19
Transaction Processing
19
Schedule: Modeling Concurrency
 Schedule: a sequence of operations from a set
of transactions, where operations from any one
transaction are in their original order
 Notation:
Ri(X): read X by Ti
T1
T2
Wi(X): write X by Ti
R(A)
W(A)
R(B)
R1(A), W1(A), R2(B), W2(B),
W(B)
R1(C), W1(C)
R(C)
W(C)
Lecture 19
Transaction Processing
20
Schedule (cont.)
 Represents some actual sequence of database
actions.
 In a complete schedule, each transaction ends
in commit or abort.
 A schedule transforms database from an initial
state to a final state
Initial
state
Lecture 19
A schedule
Transaction Processing
Final
state
21
Schedule (cont.)
 Assume a consistent initial state
 A representation of an execution of operations
from a set of transactions
 Ignore
 aborted transactions
 Incomplete (not yet committed) transactions
 Operations in a schedule conflict if
1. They belong to different transactions
2. They access the same data item
3. At least one item is a write operation
Lecture 19
Transaction Processing
22
Anomalies with Concurrency
 Interleaving transactions may cause many kinds
of consistency problems
 Reading Uncommitted Data ( “dirty reads”):
R1(A), W1(A), R2(A), W2(A), C2, R1(B), A1
 Unrepeatable Reads:
R1(A), R2(A), W2(A), C2, R1(A), W1(A), C1
 Overwriting Uncommitted Data (lost update):
R1(A), R2(A), W2(A), W1(A)
Lecture 19
Transaction Processing
23
Anomalies with Concurrency
 Incorrect Summary Problem
Data items may be changed by one transaction while
another transaction is in the process of calculating
an aggregate value
A correct “sum” may be obtained prior to any
change, or immediately after any change
Lecture 19
Transaction Processing
24
Serial Schedule
 An acceptable schedule must transform
database from a consistent state to another
consistent state
 Serial schedule : one transaction runs entirely
before the next transaction starts.
T1: R(X), W(X)
T2: R(X), W(X)
R1(X) W1(X) C1 R2(X) W2(X) C2
R2(X) W2(X) C2 R1(X) W1(X) C1
R1(X) R2(X) W2(X) W1(X) C1 C2
Lecture 19
Transaction Processing
Serial
Non-serial
25
Serial Schedule IS Acceptable
 Serial schedules guarantee transaction
isolation & consistency
 Different serial schedules can have different
final states
N transactions may form N! different serial
schedules
Any state from a serial schedule is
acceptable – DBMS makes no guarantee
about the order in which transactions are
executed
Lecture 19
Transaction Processing
26
Example: Serial Schedules
 T1: R(X), X=X+10, W(X)
T2: R(X), X=X*2, W(X)
Final
X = 60
S1: R1(X) W1(X) C1 R2(X) W2(X) C2
Initial
X = 20
Final
X = 50
S2: R2(X) W2(X) C2 R1(X) W1(X) C1
Lecture 19
Transaction Processing
27
Is Non-Serial Schedule Acceptable?
T1: R(X), X=X*2, W(X), R(Y), Y=Y-5, W(Y)
T2: R(X), X=X+10, W(X)
S1: R1(X) W1(X) R2(X) W2(X) R1(Y) W1(Y) C1 C2
Initial
X=20
Y=35
final
X=50
Y=30
S2: R1(X) W1(X) R1(Y) W1(Y) C1 R2(X) W2(X) C2
Lecture 19
Transaction Processing
28
Serializable Schedules
 Serializable schedule: Equivalent to a serial
schedule of committed transactions.
Non-serial (allow concurrent execution)
Acceptable (final state is what some serial
schedule would have produced)
 Types of Serializable schedules: depend on how
the equivalency is defined
Conflict: based on conflict operations
View: based on viewing of data
 Ex: p.645, text does not show commits
Lecture 19
Transaction Processing
29
Lock-Based Concurrency Control
 Strict Two-phase Locking (Strict 2PL)
Protocol:
Each transaction must obtain a S (shared)
lock on object before reading, and an X
(exclusive) lock on object before writing.
All locks held by a transaction are released
when the transaction completes
 If a transaction holds an X lock on an
object, no other transaction can get a lock
(S or X) on that object.
 Strict 2PL allows only serializable schedules.
Lecture 19
Transaction Processing
30
Cascading Aborts
 When a transaction aborts, all its actions are
undone. DBMS uses a log to keep track of
actions of each transaction
 If T1 reads uncommitted data written by T2
(dirty read) and T2 must aborted, then T1 must
also be aborted (cascading aborts)
T1: R(A) W(A)
… Abort
T2:
R(A) W(A) …
 Cascadeless schedule: transactions only read
data from committed transactions
Lecture 19
Transaction Processing
31
Recoverability
 If a transaction fails, the DBMS must return
the DB to its previous state
1.
2.
3.
4.
Computer failure – hw, sw, network, memory error
Transaction error – erroneous input, divison by zero
Local errors – insufficient funds, data not found
Concurrency control enforcement – transaction
aborted
5. Disk failure – hard disk crash (listed in text but not
much different from 1.)
6. Physical catastrophe – power, theft, fire, etc.
Lecture 19
Transaction Processing
32
Recoverability
 If T1 reads data from T2, commits and then
T2 needs to abort, what should DBMS do?
This situation is undesirable!
 A schedule is recoverable if very transaction
commits only after all transactions from which
it reads data commit.
 Cascadeless schedules are recoverable (but not
vice-versa!).
 Real systems typically ensure that only
recoverable schedules arise (through locking).
Lecture 19
Transaction Processing
33
Summary
 Transactions model DBMS’ view of user
programs
 Concurrency control and recovery are
important issues in DBMSs
 Transactions must have ACID properties
Atomicity
Consistency
Isolation
Durability
 C & I are guaranteed by concurrency control
 A & D are guaranteed by crash recovery
Lecture 19
Transaction Processing
34
Summary (cont.)
 Schedule models concurrent execution of
transactions
 Conflicts arise when two transactions access
the same object, and one of the transactions is
modifying it
 Serial execution is our model of correctness
 Serializability allows us to “simulate” serial
execution with better performance
 Concurrent execution should avoid cascade
abort and be recoverable
Lecture 19
Transaction Processing
35
Download