On Transactions

advertisement
On Transactions
1

Most database allow multiple users to
execute simultaneously on common data.
 Each user runs his/her own process
accessing possibly same data(relations,
rows, objects).
 Without adequate control, such concurrent
access can lead:
--Inconsistent data in the database
--wrong results
 For example, inconsistent view of data
2
Consider an accounts relation:
A(aid, balance, owner)
Assume a depositor Smith has two accounts:
(A1, $900, smith) (A2, $100, smith)
Consider two processes P1 and P2, where:
P1: move $400 from A1 to A2
P2: perform a credit check on depositor Smith and if
total balance in the bank is at least $900, issue a
credit card
3
Only P1 update the database. Hence the three
possible states in which we can find balances of
Smith are as follows:
 State 1: A1.balance=$900, A2.balance=$100
Values before any update from P1 takes place
 State 2: A1.balance=$500, A2.balance=$100
Values after subtracting $400 from A1.balance
 State 3: A1.balance=$500, A2.balance=$500
Values after adding $400 to A2.balance
--- NOTE that State 2, is an intermediate state and should
NOT be visible from process P2(inconsistent view)
4

To avoid inconsistent views and a number of other
problems that can arise with concurrent access, the
DBMS provides a feature called transaction.
 A transaction offers easier programming to the
database programmer. One of the guarantees of
this feature is that whatever is declared as a
transaction, it runs in ISOLATION, I.E., Other
processes cannot interfere with it.
 Using the notion of transaction, each process is
able to “package” together a series of database
operations that should be executed in isolation.
5
In general:

Transaction: a means by which an
application programmer can “package”
together a sequence of database operations,
so that this part of the program is executed
with the ACID properties of a transaction.

If a transaction contains both reads and
updates, it represents an attempt of the
programmer to change the state of database.
6

If only reads: attempt to view data from the
database.
 NOTE: there is NO begin-transaction
statement in standard SQL.
 A transaction begins when there is no active
transaction in progress and an SQL
statement is performed to access the data.
 Hence when the application program starts,
or when we DECLARE cursor, is NOT the
beginning of a transaction.
7

But when we issue an SELECT…FROM… or
UPDATE… or INSERT, DELETE, OPEN
CURSOR, and there is no active transaction in
progress, a new transaction starts.

When a transaction is in progress any updates it
makes or data it reads cannot be ‘seen’ or updated
by other concurrent transactions.

There are TWO SQL statements in an application
program to end a transactional execution:
8

(1) COMMIT - the programmer uses this
statement to inform the DBMS that the
ongoing transaction has successfully
completed.
(then all the updates of this transaction
become persistent and visible to other
transactions)
 (2) ROLLBACK – the programmer says to
DBMS that the ongoing transaction has
finished unsuccessfully(aborted).
9

NOTE: if neither Commit nor Rollback
Statement is executed for a transaction in
progress before the application program
terminates, then the system adds a default
action(Commit or Rollback).
Which one? Depends on the DBMS.
10

Question: why they need to put ‘rollback’ or
‘commit’ on our own?

It is usually a bad programming to keep a
transaction active across user interactions.
[Add a commit before requesting a user input]
why?
11

So, the system does not know the limits of a
transaction.

Rather it is ??? when a transaction ends.

What problems were there before the use of
transactions?
12

1. Creation of INCONCISTENT RESULTS

2. Errors if concurrent execution (the
inconsistent analysis problem)

3. Uncertainly as to what changes become
permanent.
13

Examples:
-- CASE 1:the lost update problem
T1
read_item(X)
X:=X-N
T2
read_item(X);
X:=X+M
write_item(X);
Read_item(Y);
Item X has an incorrect value
because its update by T1 is “lo
(overwrite).
write_item(X);
Y:=Y+N;
write_item(Y);
14
-- Case 2: The Dirty Read Problem
T1
read_item(X)
X:=X-N
write_item(X);
T2
read_item(X);
Trans T1 fails and must
X:=X+M
change the value of X back
write_item(X);
Read_item(Y);
to its old value; meanwhile,
T2 has read the “temporary”
incorrect value of X.
15
--CASE 3:The Inconsistent Analysis Problem.
T1
T3
sum:=0;
read_item(A);
sum:=sum+A
…
read_item(X);
X:=X-N;
write_item(X);
read_item(X);
sum:=sum+X;
read_item(Y);
sum:=sum+Y;
read_item(Y);
T3 reads X after N is subtracted, and
reads Y before N is added, so a wrong
summary is the result (off by N).
RW Conflict(Unrepeatable Read) as
T1 writes(Y) what T3 read
Y:=Y+N;
Write_item(Y)
16
-- CASE 4:
We normally buffer popular pages in memory, to
reduce I/O.
Then very popular pages remain in main memory
for extensive period.
Problem: After a crash, the contents of main
memory are lost.
(this problem exists independently of concurrency)
17
One solution:

After each update, write the buffered page
back to disk(stable storage). Bad.(Why?)
Better solution: use a LOG.

Using the motion of transaction, it will
become clear when the contents of the LOG
have to be written on the disk.
18

Hence, TRANSACTIONS will be used for
both:
CONCURRENCY & Recovery
19

How can we ensure Concurrency &
Recovery?
Transactions have 4 basic preperties.
Atomic
Consistent
Isolated
Durable
ACID properties
20

Atomicity: The instructions ‘packaged’ in a
transaction either happen all or none of
them happens.
(All  the transaction commits
None  the transaction aborted)
--Hence a transaction cannot be left partially
complete.
--Example: if a transaction updates 200
records(e.g. give raise 10% to all
employees), it cannot end after only 150 of
them were updated.
21

Consistency:
A transaction preserves the consistency
(integrity constraints) of the database.
Hence, during a transaction, the database
may go from a consistent state to an
inconsistent one, but after the transaction
terminates the database is again
consistent.(provided the transaction
program is consistent with the DB
constraints).
22

ISOLATION:
--A transaction executes as if it was the only
transaction running. Thus it is independent
of other concurrently running transactions.
--Hence: no transaction can see intermediate
results of other transaction.
23

DURABILITY:
After a transaction commits its updates
(results) persist, even if system failures
occur.
24

Note:
--Recovery is related to:
Atomicity & Durability
(a trans, fails on its own) (system failures)
--Concurrency is related to :
Consistency & Isolation.
25
First we’ll talk about concurrency. It’s
easier if we consider concurrency on its
own[i.e. let’s assume for the moment that
there are no failures]
 To understand Concurrency we need to
define transaction Histories or Schedules.

26

NOTE: Concurrent execution of
transactions means that their operations are
interleaved.

Problem: This interleaving if it is done
carelessly it results into errors.
To avoid such errors, some concurrency
control is needed.
27

When a transaction starts it gets a UNIQUE
ID (transaction identifier)

More important operations inside a
transaction package are those that read/write
the database. Hence we concentrate on these
only.
28

Notation: Ri(A)
 Meaning: Trans. With ID i performs a read
on data item A.
 A: can be a table, a record, an object
(granularity depends on application).
29

Example: Table X
SSN
Value
and Trans. i performs: Ri(A)
…
A
…
…
…
…
Select Value into: progr.val
From x
where SSN=A
Similarly: Wj(B)
Update X
set value=:progr.val
where SSN=B
Part of
transaction j
30

We sometimes associate the values read/write e.g.
Ri(A,30), Wj(B,20)

Usually a select_from_where results in read/write
more than one item(a whole predicate)
e.g.
Update X
set value = 1.1s*value
where SSN between :low
and
:high
--The DBMS “sees” it as a sequence of writes
Wj(ss1), Wj(ss2)…
31

For simplicity assume only read/write(in
general we may also have insert)
 In addition we are interested in
Cj (transaction j commits)
Ai (transaction i aborts)
 Then the history or schedule is an
interleaved series of R,W,C,A’s
 Example:
… R2(A) W2(A) R1(A) R3(B) R1(B) C1 R2(B)…
32

Another representation:
T1
T2
T3
R2(A)
W2(A)
Time
R1(A)
R3(B)
R1(B)
C1
R2(B)
33

How is the notion of transactions supported
in a DBMS?
User1
User2… User n
Application
Programs
Issue calls on behalf of the user: Open Cursor, Update,
Fetch, Select, Insert, Delete, Commit, Work, Rollback
Transaction
Manager
Intercepts calls, and initiates transactions when
appropriate; assigns number, Ti. Decides which Ti to
abort in event of deadlock; passes on Rollback as
AborTi call.
Scheduler
Interprets all calls as sequences of reads and writes.
Assures serializable schedule, using R and W locks.
Detects deadlocks and passes back such information
to TM.
34

Problem: The scheduler “sees” a sequence
of operations from various transactions.
How does it decide whether this
interleaving produces correct results?
First what is correct?
 A serial execution is correct.
In a serial schedule each transaction finishes
in its entirety before the next one executes.
35

Example:
T1
T2
T1
T2
read_item(X);
read_item(X);
X:=X-N;
X:=X+M;
write_item(X);
write_item(Y);
read_item(Y);
Y:=Y+N;
read_item(X);
write_item(Y)
X:=X-N;
write_item(X);
read_item(X);
read_item(Y);
X:=X+M;
Y:=Y+N;
write_item(Y);
write_item(Y)
(a) Schedule A:T1 followed by T2
(b) Schedule B: Ts followed by T1
36
T1
T2
T1
read_item(X);
read_item(X);
X:=X-N;
X:=X-N;
read_item(X);
T2
write_item(X);
X:=X+M;
read_item(X);
write_item(X);
sum:=sum+X;
read_item(Y);
write_item(Y);
write_item(X);
read_item(Y);
Y:=Y+N;
Y:=Y+N;
write_item(Y);
write_item(Y);
©Two schedules with interleaving of operations.
37

The serial schedule is easy for the scheduler
to implement:
--simply delay all other transactions until the
first one finishes. Repeat for second
transaction etc.
(FCFS)
--But : NO concurrency
 Question: How to interleave(i.e. increase
concurrency) while still having correct results?
38

A schedule is called SERIALIZABLE if it is
equivalent to a serial schedule. (of committed
transaction)
--That is: if produces the same effect as a serial
schedule. Since serial schedule is correct then
a serializable schedule is CORRECT.
39

How to analytically define equivalence?
--Definition: Two operations are called
CONFLICTING if:
1. They are from different transactions,
2. They access the same item, and,
3. At least one of them is WRITE.
--Definition: Two Schedules that contain the
same operations are conflict-equivalent if
conflicting operations appear in the same
order in both schedules.
40
--Conflict serializable: a schedule that is
conflict equivalent to a serial one.
(Hence, when two operations conflict in a
schedule, the order in which they occur is
important).
41
3
types of conflicting operations in a
schedule:
(1) Ri(A)…Wj(A)
then in an equivalent serial schedule we
showed have: Ti<<Tj
(2) Wk(A)…Rl(A)
then in an equiv. serial: Tk<<Tl
(3) Wp(A)…Wr(A)
then in an equiv. serial: Tp<<Tr
--NOTE: Ri(A)…Rj(A) Does Not imply Ti<<Tj
42
Also: Ri(A)…Wj(B) does not imply anything
as no conflict exists.
 Note: Transitivity holds, hence
Ri(A)……Wk(A)……Rj(A)

Ti<<Tk
and
Tk<<Tj 
Ti<<Tk<<Tj

Thus we have one way to check for conflict
serializability.
43

Consider:
H: R2(A)N2(A)R1(A)R1(B)R2(B)W2(B)C1C2
--This history(schedule) is not serializable.
Why?
--W2(A) conflicts with R1(A)
=> T2<<T1
--R1(B) conflicts with W2(B)
=> T1<<T2
Which is a contradiction to (1)
(1)
(2)
44

See why this schedule may produce
incorrect execution:
Suppose A=50, B=50
T1: add A+B, Print it
T2: transfer 30 from A to B
R2(A,50) W2(A,20) R1(A,20) R1(B,50)
R2(B,50)W2(B,80) C1, C2
T1 will print A+B=70
which is wrong?????
45

H’:
T1<<T2
R1(A) R2(A) W1(A) W2(A) C1 C2
T2<<T1
Also non-serializable
(this is a lost-update schedule)
46

H’’:
W1(A) W2(A) W2(B) W1(B) C1C2
T1<<T2
T2<<T1
How can the scheduler check for conflict serializability?
Use PRECEDENCE GRAPH
A direct graph where:
Vertices: committed trans. of schedule
edges: conflicting operations
47

Serializability Theorem:
A schedule(history) H has an equivalent serial
execution H’ (i.e. it is serializable) iff the precedence
graph of H contains NO cycle.
Example:
H’
T1<<T2
1
2
T2<<T1
48

In general the graph has many nodes as
there are many transactions.
1
2
3
1
1
2
2
or
4
3
3
4
4
This is serializable
Equivalent serial:
49
-- NOTE

Another form of equivalence is
“view equivalence”
 Also based on read/write operations but less stringent
than “conflict equivalence”
 Two schedules S1 and S2 that contain the same
transactions are said to be view equivalent if:
1. For each data item Q, if trans. Ti reads the initial
value of Q in schedule S1, then Ti must also read the
initial value of Q in S2.
50
2. For each data item Q, if trans. Ti executes
read(Q) in schedule S1 and that value was
produced by trans. Tj (if any), then Ti must
in schedule S2 also read the value of Q that
was produced by Tj.
3. For each data item Q, the transaction (if
any) that performs the final write (Q)
operation in S1 must perform the final
write(Q) operation in S2.
51

Conditions 1 and 2 ensure that each trans.
need the same values in both schedules.
 Condition 3 coupled with 1,2 ensure that
both schedules result in the same final
system stable.
 Every conflict-serializable schedule is view
–serializ. But there are view-serializable
schedules that are not conflict-serializable.
52

Example
T2
T3
T5
Read(Q)
Write(Q)
Commit
Write(Q)
Commit
Write(Q)
commit
53

This schedule is view-serializable to serial
schedule<T2,T3,T5> but is not conflict
serialazable.
(The reason: writes that do not come after a
read as write(Q) in T3 and write(Q) in T5.
Called: Blind writes
Blind Writes appear in any view-serializable
schedule that is not conflict –serializable!!!)
54
All schedules
View-serializable schedule
Conflict-serializable schedule
Serial Schedules
55

Note:
 In conflict-serializable we can create the
dependency-graph and decide whether a schedule is
serializable (no cycles)
 For view-serializability there is no easy way to
decide that(many graphs have to be tested for
cycles which is an NP-complete problem, i.e.
almost certainly we will need an exponential-time
algorithm on the size of the graphs to search)
 Hence view-serializability is of no practical use.
56

However, to check the dependency graph for
cycles is still not practical, if many
transactions.
 A more practical solution: Use locks and 2phase locking.
(2-phase locking is a protocol that says how
locks are used. Locking on its own is not
enough)
57

First we discuss locks:
 There are 2 kinds of locks
-- read(or shared)
-- write(or exclusive)
Truth Table
R
W
R
Yes
No
W
No
No
58

When Ti issues a Ri(A) the scheduler
intercepts this call and first issues a read
lock on A for i.
---Rli(A)
 Similarly for Wi(A) it issues a write lock
---Wli(A)
 Before granting a lock to a transaction for a
data item, the scheduler requires the
requesting items. Hence a transaction may
wait until no conflicting lock on this item
exists.
59

NOTE: Conflicting locks work similarly to
the notion of conflicting operations(see
truth table)
 Locking on its own is NOT enough.
H1:R1(A)R2(B)W2(B)R2(A)W2(A)R1(B)C1C2
recall that H1 is not serializable.
 Note however that locking only, could allow the
above schedule to happen:
( RU: unlock read_lock
WU: unlock write_lock )
60

RL1(A) R1(A) RU1(A) RL2(B) R2(B) WL2(B)
W2(B) WU2(B) RL2(A) R2(A) WL2(A) W2(A)
RL1(B) R1(B) C1 C2
here we get a lock when it is needed and we
get it go after we are done.
 Instead:
2-phase locking
Two phases: growing phase
(during which locks are acquired)
and then shrinking phase (when locks are
released)
61

But the two phases are separate. i.e. after
shrinking phase starts(first lock is released)
no new lock can be obtained.
 A trans. cannot release a lock and then
acquire a new lock.
62

It can be proved that the schedules allowed
by 2PL are conflict serializable.
(Note: there are some few serializable
schedules that 2 PL would not allow.But
2PL has the advantage of being a practical
solution for concurrency.)
Locks acquired
by trans.
time
63
Problem:Using Locks may lead to deadlock
T4
RL(A)
R(A)
T2
WL(B)
W(B)
wait
RL(B)
R(B)
wait
WL(A)
64

How to deal with deadlock:
(A). Deadlock detection
WAITS-FOR-GRAPH
( uncommitted transactions )
T2
T1
T3
65

The scheduler creates a new node when a
new trans. starts. Put an edge when a trans.
waits (for another trans.) . Takes an edge
away when waiting is done. Takes a node
away when a trans. commits.
 The scheduler tests this graph for cycles at
regular intervals.
 If cycle(deadlock) is found then the TM is
informed and chooses a victim transaction
to abort.
 Abort trans. will be retried later.
66

(B). Deadlock Prevention
we can prevent deadlocks by giving each
transaction a priority and ensure that lower
priority trans. are not allowed to wait for higher
priority ones(or vice-versa)
(one way to assign priorities: timestamp
the older transaction –lower timestamp--has
higher priority)
* Two ways to prevent:
67
-- Wait-die Suppose Ti requests a lock and Tj
has a conflicting lock already.
If Tj > Ti (Ti has higher priority)
prior
=> Ti waits
else Ti aborts (i.e. a lower priority transaction is killed)
Note: the trans. with the lock is not affected)
68
--Wound-wait Ti requests a lock and Tj has a
conflicting lock already
If Tj > Ti (Ti has higher priority)
=> Tj aborts
(now a higher priority can preempt the
trans. that already has the lock)
else Ti waits
-- In both cases no deadlock can occur.
69

In Wait-Die
lower priority trans. can never wait for
higher priority trans. (the lower prior aborts)
 In Wound-Wait
higher-priority trans. never waits for
lower priority ones (the lower prior aborts)
Difference: Wound-Wait is preemptive
( a trans. that runs can be aborted if a higher
priority asks its locks => work is lost)
70
In wait-die younger transactions don’t have a
chance!
 Which one to choose depends on the transaction
workload and application.
 As said a usual trans. priority is its timestamp.
 Note: if a trans. is aborted for deadlock prevention,
it must be restarted with the same timestamp as
before, so as to avoid repeated aborts.(why? It will
eventually become the highest priority (timestamp)
and will get the locks!)

71

In practice: strict-2PL
Release all
(easier to implement as you don’t need to
know when shrinking starts and safeguards
against cascading aborts)
But: if limits concurrency than classical
2PL.
72

There is also a version
of strict 2PL called
conservative – 2PL
 A transaction gets all locks that it will ever
need when it starts(or else it keeps waiting
until it can get all).
 Advantage: no deadlock(deadlock
prevention)
 Disadvantage : limits concurrency as a
transaction gets all locks earlier than
actually needed.
73

Obviously:
Conservative 2PL
Strict 2PL
2PL

Definition: a schedule is called strict if a
value written be a trans. T is not read or
overwritten by another trans. until T is
aborted or committed.
74

A schedule is called recoverable if its
transactions commit only after all
transactions whose changes they read
commit.
75

A schedule avoids cascading aborts if
aborting a transaction can be accomplished
without cascading the abort to other
transactions.
 strict=>avoid casc.aborts =>recoverable
recov
Avoid casc. aborts
Strict
76

Strict-2PL is a strict schedule
=> it does not create cascading
aborts & is recoverable
 2PL could have casc. aborts
=> in practice strict – 2PL is common
77
All Schedules
View Serializable
Conflict Serializable
S1
S2
S3
S4
S5
S6
S7
S8
S9
Recoverable
Avoid Cascading Abort
Strict
S10 S11 S12
serial
Venn Diagram for Classes of Schedules
78
Concurrency control without locks

To avoid deadlock there are other
techniques for concurrency control:
-- Timestamps
Pessimistic(as
-- Multiversioning
2PL)
-- optimistic CC
(2PL, Timestamps, multiversioning are
examples of pessimistic conc. control.
for system that expect a lot of conflicts.
We also have optimistic schemes that are
more efficient if the number of conflicts is
relatively low).
79

Another technique for concurrency control:
Timestamping
 A timestamp: a unique id that identifies a
transaction. It is created by the DBMS.
 Assume the DBMS has a counter and when a
transaction starts it gives it the timestamp.
(like the TID but now it is used for
implementing the concurrency).
 Timestamps determine the serializability
order.(i.e.no locks are used)
80

If TS(Ti) < TS(Tj) then the system must
guarantee that the produced schedule is
equivalent to a serial one where Ti is before
Tj.
 How is that ensured?
When an item is accessed by more than one
transactions, it is accessed in an order that
does not violate serializability.
81

Each item x has two variables associated
with it:
1. read_timestamp(x): the largest timestamp
of a transaction that has already read item x,
successfully.
2. write_timestamp(x): the largest
timestamp of a transaction that has already
written item x, successfully.
82

So when a transaction T issues a read(x) or
write(x), the timestamp TS(T) is compared
with read_TS(x), write_TS(x) to check if
the order is violated.
 If the serializability order is violated, then T
is aborted(rollback) and resubmitted(with
new TS)
83
Protocol:
 (1) trans. T issues a write(x) operation:
a. if read_TS(x) > TS(T) => abort &
rollback T(why?some later trans. T1 has
already used this item)
b. if write_TS(x) > TS(T) then do NOT
execute write(x) of T continue.(some later
T1 has already written a later value for x).
c. else execute write(x) and make
write_TS(x) = TS(T).
84

(2) T issues a read(X) operation:
a. if write_TS(X)>TS(T) then abort &
rollback trans. T(since it will try to read later
value).
b. else (i.e. write_TS(X)<=TS(T)), execute
read(X) and set
read_TS(X)=MAX(TS(T), current readTS(x))
(This is a TS disadvantage: even for a read we may
have to update data)
85

Explanations:
2.a: T tries to read a value of X which was already
overwritten
Hence reject this read and rollback T.
1.a: the value of X that T is producing was
previously needed (and was assumed that would
never be produced). Hence write is rejected and T
is rollback.
1.b: T tries to write an obsolete value. No need to
do that. Just continue.[ this is Thomas’s Write Rule.
Allows to continue on obsolete writes instead of
aborting T => increases concurrency.]
86

Note: if T that has aborted is restarted with
same timestamp it is guaranteed to be
aborted again!
(=> use new TS when restart T)
(note: different policy than TS in deadlock
prevention)
87

Example:
T1
Read(X)
Read(Y)
Display X+Y
T2
Read(X)
X=x-S0
Write(X)
Read(Y)
Y=Y+S0
Write Y
Display (X+Y)
88

Suppose
TS(T1)<TS(T2)
then the following
schedule is possible
(and it is
serializable)
T1
Read(X)
T2
Read(X)
X=X-50
Write(X)
Read(Y)
Read(Y)
Check it!
Display(X+Y)
Y=Y+50
Write(Y)
Display(X+Y)
89

Note: there are serializable schedules that
are possible under 2PL and are not possible
under timestamping and vice versa.
 If Thomas Write Rule is not used then (like
2PL) the TS protocol allows ONLY conflict
serializable schedules.(But each allows
schedules that the other does not)
 With the TWR allowed, the TS protocol
allows some serializable schedules that are
not conflict serializable.
90

Ex.
T1
T2
R(A)
W(A)
Commit
W(A)
commit
Not conflict serializable
(T1 << T2 << T1)
but is still serializable.why?
Tricky: T2’s W(A) is not
seen by anyone hence as if
it never happened
T1
T2
R(A)
Which is
serializable!
C
W(A)
91

TS(with or without TWR) may permit schedules
that are NOT-recoverable
 Ex.
T1
W(A)
T2
Assume
TS(T1)=1
TS(T2)=2
R(A)
W(B)
C
 It is not recoverable since T2 reads a change of Ti
& commits before T1 commits.
 However is allowed by TS protocol(with or
without TWR)
92

One solution: Buffer writes until a trans.
commits!
 Hence W1(A) is buffered until T1 commits
(write TS(A) is updated though) R2(A) even
though permissible is not allowed (i.e. T2
blocks) until T1 commits then A written
from buffer to disk & T2 continues.
 Note: buffering looks like blocking!(as if
exclusive look on A!)
93

Even with this modification TS & 2PL are
still not the same!(one admits schedules that
the other does not & vice-versa)
 Since recoverability is essential, the above
modification is usually needed. But then
2PL seems (and is) more practical than TS
for centralized DBs.
 TS has an advantage in distributed DBs!
94
Multiversion Schemes

Up to now we ensured serializability by:
(1) wait
(locking 2PL)
(2) abort
(timestamps)
 There is another approach under which each
write(x) creates a new version of item x.
 With this approach a read(x) never fails. Simply
read the appropriate version of item X.
 Better C.C. Protocol for workloads dominated by
transactions that only read values from DB.
95

The concurrency control scheme must
ensure that the selection of the version to be
read is done in a manner that ensures
serializability.
 This must be done quickly for good
performance. Again we will use timestamps.
 Each trans. Ti gets a unique, static
timestamp TS(Ti).
 With each data item Q, a sequence of
versions < Q1, Q2, …, Qm > is associated.
96

Each version Qk has three data fields:
- content (the value of version Qk)
- write_timestamp(Qk); the timestamp of the trans.
That created Qk.
- read_timestamp(Qk); the largest timestamp of
any trans. that successfully read version Qk.
 A trans. Ti create Qk on data item Q by issuing a
write(Q) operation. The content of Qk is written
by Ti and the write_TS and read_TS of Qk are
initialized to TS(Ti).
 The read_TS of Qk is updated when a trans. Tj
reads Qk and read_TS(Qk) < TS(Tj)
97

The following multiversion protocol ensures
serializability.
- Assume Ti issues read(Q) or write(Q).
- Let Qk be the version of Q with the largest
write_TS(Qk) < TS(Ti)
(1) if Ti issues a read(Q) then the value
returned is that if the content of Qk.
(2) if Ti issues a write(Q) and
TS(Ti)<read_TS(Qk) then Ti is rolled back.
Else (TS(Ti)>read_TS(Qk)) a new version
Qk is created.
98
Justification
 (1) the trans. needs to read the most recent
version to its timestamp.
 (2) A trans. will abort if it come “too-late”
in doing a write(some other transaction that
was later, has already read the item).
 Advantage: Read operation never fails or
never has to wait. Thus good for systems
with many reads and few writes.
99

Disadvantage:
(a) when read(Q), need to update the
read_TS(Q) (i.e. maybe a second I/O)
(b) conflicts are still resolved through
rollbacks (as with timestamp protocols)
instead of waits (as with locking protocols),
which maybe too expensive.
(a lot of work may have been done on a
transaction that is forced to abort).
100
Optimistic concurrency control
 Previous techniques assume that is much
contention for common data among
transactions. The concurrency control followed
by 2PL, timestamps, multiversioning is thus
pessimistic.
 But this creates overhead in running a
transaction (e.g. keeping the locks, checking
timestamps etc.)
 To reduce such overhead we may let
transactions run freely and check them
(validate them) when they finish.
101

If not many transactions contend for some
resources (data) most of them will pass the
validation test!
 To monitor this execution, a transaction is
said to be in two or three different phases in
its lifetime:
1. Read Phase. During this phase, the
execution of the trans. Ti takes place. The
values of the various data items “read” are
stored in variables local to Ti. All “write”
operations are performed on these local
variables, without updating the database.
102
2. Validation Phase. When its work is done,
trans. Ti performs a validation test to
determine whether it can copy to the
database the temporary local variables (that
hold the results of its “writes”), without
violating the serializability. (basically it
checks whether its results violate
serializability against the already committed
trans.)
103
3. Write Phase if Ti succeeds in validation,
then the actual updates are applied to the
database. Otherwise Ti is rolled back.
To perform the validation test we associate
three different timestamps to Ti.
--Start(Ti): the time Ti started its execution.
--Validation(Ti): the time Ti finished its read
phase and started its validation phase.
--Finish(Ti): the time when Ti finished its
write phase.
104

The serializability order is determined by the
timestamp ordering technique, using the
validation(Ti) as the transactions identifying
timestamp, i.e. : TS(Ti) = Validation(Ti)
 Then if TS(Tj) < <TS(Tk) any produced schedule
should be equivalent to a serial schedule where Tj
is before Tk.
 The validation test of transaction Ti requires that
for all Th such that TS(Th) < TS(Ti), one of the
following two conditions must hold:
105
1. Finish(Th) < Start(Ti) Since Th completes its
execution before Ti started, the serializability is
maintained.
2. The set of data items written by Th does not
intersect with the data items read by Ti, and Th
completes its write phase before Ti’s validation
(Start(Ti)<Finish(Th)<Validation(Ti))
--This condition ensures that writes of Th
and Ti do not overlap. Since the writes of
Th do not affect the read of Ti and since Ti
cannot affect the read of Th, the
serializability is maintained.
106

Example: Consider T17, T18 with
TS(T17)<TS(T18) then following schedule
is serializable and allowed by validation
protocol (but not allowed by 2PL or
timestamp-ordering protocols)
107
T17
T18
read(B)
read(B)
B:=B-50
write(B)
read(A)
A:=A+50
write(A)
Note: the validation
protocol guards against
cascading rollbacks
since the actual writes
take place only after
validation phase(i.e.
the trans. committed).
read(A)
display(A+B)
display(A+B)
108
Optimistic
Conservative 2PL
Strict 2PL
2PL
Timestamp W/O TWR
Timestamp with TWR
Multiversion
Conflict-serializable
Serializable
109

Comparison of all CC protocols interesting
point:
All CC protocols discussed are subset of
conflict-serializable schedules (one more
reason why we concentrate on conflictserializability)
 Note that checking the validation criteria
requires the maintenance of lists of objects
read/written by each transaction.
110

Also: while one transaction is being
validated no other transaction can commit
(otherwise the first trans. can miss any
conflicts with respect to the newly
committed trans.)
 Hence even optimistic concurrency control
has some overhead (2PL has lock
maintenance 2PL blocks=>have to wait)
while Opt.C.Control may require restarting
a transaction.
111
Transaction Support in SQL-92

A transaction is automatically started when for example
the user writes SELECT, UPDATE, CREATE TABLE,
INSERT etc.
 A transaction can be terminated by
-- a COMMIT command
-- a ROLLBACK command (the SQL keyword for ABORT)
 Each Transaction has
-- access mode
-- isolation level
-- diagnostics (for error conditions)
112

Access mode
--Read Only: the transaction is not allowed
to modify the DB. (i.e. only shared locks)
--Read Write: the transaction is allowed to
modify the DB.(it’s the default)
 Isolation level: controls the extent to which
a transaction is exposed to other concurrent
transactions.
Choices: -- Read uncommitted
-- Read committed
-- Repeatable Read
-- Serializable
113
Level
Read
Uncommitted
Read
committed
Repeatable
read
Serializable
Dirty
Read
Maybe
Unrepeatable
Read
Maybe
Phantom
No
Maybe
Maybe
No
No
Maybe
No
No
No
Maybe
Transaction Isolation Levels in SQL-92
114

Recall
--Dirty Read(WR conflict): if a transaction could
read an object written by an uncommitted trans.
--Unrepeatable Read(RW conflict): if a transaction
could write an object that has been read by an
uncommitted transaction.
--Phantom: if a transaction reads a collection of
objects twice and sees different results even
though it did not modify them itself.
 Highest degree of isolation(strict 2PL is used):
serializable(default)
115

Repeatable read same as serializable except
that it does not lock “sets” of objects (hence
phantom phenomenon could occur)
 The lower the isolation degree the less safe
the transaction, but maybe improved system
performance.(some trans. could live with
few missing values,
ex. statistical queries=>less isol. degree)
116
Download