PPTX

advertisement
Slides on Normalization
CSE
4701
Chapter 14-1
Towards Normalization of Relations

CSE
4701

We take each Relation Individually and “Improve”
Them in Terms of the Desired Characteristics
Normalization Decomposes Relations into Smaller
Relations that Results in
 No Information Loss
 Support for Reconstruction
No

Spurious Joins
Query Execution Time May Increase
Denormalization

May Be Necessary Later on
Objectives: Minimizing
 Redundancy
 Insertion, Deletion, and Update Anomalies
Chapter 14-2
What is the Normalization Process?
CSE
4701



Provides DB Designers with the Ability to “Improve”
their Relations
Deal with Redundancies and Anomalies
Normalization Procedure Provides DB Designs with
 A Formal Framework for Analyzing Relation
Schemas based on their Keys and on the Functional
Dependencies among their Attributes
 A Series of Normal Form Tests that can be Carried
out on Individual Relation Schemas so the
Relational DB can be Normalized to Desired Degree
Chapter 14-3
What are Normal Forms?
CSE
4701

A Normal Form is a Condition using Keys and FDs to
Certify Whether a Relation Schema meets Criteria
 Primary keys (1NF, 2NF, 3NF)
 All Candidate Keys ( 2NF, 3NF, BCNF)
 Multivalued Dependencies (4NF) - Chapter 15
 Join Dependencies (5NF) - Chapter 15
1NF
2NF
3NF
4NF
5 NF
Chapter 14-4
How is Normalization Attained?
CSE
4701


Typically, Normalization is Attained through a Process
of Decomposition that Breaks Apart Relations to
Remove Redundancies and Anomalies
In Process, we must Maintain Two Properties:
 Lossless Join or Nonadditive Join Property
Guarantees the Spurious Tuple Generation Problem
does not occur on Decomposed Relations


Dependency Preservation Property
Ensures that each FD is Represented in some
Individual Relation(s) after Decomposition
Premise: Relational Schema with Primary Keys and
Functional Dependencies Specified
Chapter 14-5
Recall Key Constraints
CSE
4701


Superkey (SK):
 Any Subset of Attributes Whose Values are
Guaranteed to Distinguish Among Tuples
Candidate Key (CK):
 A Superkey with a Minimal Set of Attributes (No
Attribute Can Be Removed Without Destroying the
Uniqueness -- Minimal Identity)
 A Value of an Attribute or a Set of Attributes in a
Relation That Uniquely Identifies a Tuple
 There may be Multiple Candidate Keys
Chapter 14-6
Recall Key Constraints
CSE
4701


Primary Key (PK):
 Choose One From Candidate Keys
 The Primary Key Attributed are Underlined
Foreign Key (FK):
 An Attribute or a Combination of Attributes (Say A)
of Relation R1 Which Occurs as the Primary Key of
another Relation R2 (Defined on the Same Domain)
 Allows Linkages Between Relations that are
Tracked and Establish Dependencies
 Useful to Capture ER Relationships
Chapter 14-7
Superkeys vs. Candidate Keys
CSE
4701

Superkey of R:
 A Superkey SK is a Set of Attributes of R Such that
No Two Tuples in Any Valid Relation Instance R(r)
will Have the Same Value for SK
 Given R(U), U is the Set of Attributes of R and a
Relation Instance of R, Denoted As R(r), For Any
Distinct Tuples T1 and T2 in R(r), T1[sk] < > T2[sk]
 For Cars, Valid Superkeys Must Contain:
SerialNo

OR State, Reg# OR Both
For EMPLOYEE {SSN} is a Key and
{SSN},
{SSN, ENAME}, {SSN, ENAME, BDATE} are
all SUPERKEYS
Chapter 14-8
Superkeys vs. Candidate Keys
CSE
4701

Candidate Key of R:
 A "Minimal" Superkey: a Candidate Key K is a
Superkey s.t. Removal of any Attribute From K
Results in a Set of Attributes that is Not a Superkey
 Given R(U), U is the Set of Attributes of R and a
Relation Instance of R, Denoted as R(r)
K is a Candidate Key iff for any A in K, there exists
Two Distinct Tuples T1 and T2 in R(r) such that
T1[K-A] = T2[K-A]
 In Previous (State, Reg#, Make, Model) is SK
Is
it a CK?
Why or Why Not?
Chapter 14-9
Example and Remaining Definitions
CSE
4701

Example:
 CAR(State, Reg#, SerialNo, Make, Model, Year)
 Primary key is {State, Reg#}
 It has two candidate keys (also superkeys)

Key1 = {State, Reg#}
 Key2 = {SerialNo}
{SerialNo} can also be Chosen as Primary Key
Definition: Prime Attribute - Attribute A of R that is
Member of some Candidate Key K or R
Definition: Non-Prime Attribute - An Attribute that is
not Prime (i.e., Not a Member of Any Candidate Key)
WORKS_ON – SSN, Pnumber PRIME




Chapter 14-10
First Normal Form (1NF)
CSE
4701



All Attributes Must Be Atomic Values:
 Only Simple and Indivisible Values in the Domain
of Attributes.
 Each Attribute in a 1NF Relation is a Single Value
 Disallows Composite Attributes, Multivalued
Attributes, and Nested Relations (Non-Atomic)
1NF Relation cannot have an Attribute Value :
 A Set of Values (Set-Value)
 A Tuple of Values (Nested Relation)
1NF is a Standard Assumption of Relation DBs
Chapter 14-11
One Example of 1NF
CSE
4701


Consider Following Department Relation
What is the Inherent Problem?
DLOCATIONS is Multi-valued
Chapter 14-12
What are Possible Solutions?
CSE
4701




Decompose: Move the Attribute DLOCATIONS that
Violates 1NF into a Separate Relation
DEPT_LOCATIONS(DNUMBER, DLOCATION)
Expand the key to have a Separate Tuple in the
DEPARTMENT relation for each location (below)
Introduce DLOC1, DLOC2, DLOC3, if there are
Three Maximum Locations
Problems with Each? Best Solution?
Chapter 14-13
Another 1NF Example - Nested Relations
EMP_PROJ - Table and Tuples
CSE
4701
Transition to:
Chapter 14-14
Second Normal Form (2NF)
CSE
4701


Second Normal Form Focuses on the Concepts of
Primary Keys and Full Functional Dependencies
Intuitively:
 A Relation Schema R is in Second Normal Form
(2NF) if Every Non-Prime Attribute A in R is Fully
Functionally Dependent on the Primary Key
 R can be Decomposed into 2NF Relations via the
Process of 2NF Normalization
 Successful Process Typically Involves Decomposing
R into Two or More Relations
 Iteratively Applying to Each Relation in Schema
Chapter 14-15
Full Functional Dependency
CSE
4701


Full FD - Formally:
Given R(U) and X, YU. If XY holds, and there
exists no such X’ that X’X, and X’Y holds over
f
R, then Y is fully dependent on X, denoted as XY
Full FD- Intuitively: A FD XY where Removal of
any Attribute from X means the FD no Longer Holds
 {SSN, PNUMBER}  HOURS is full since Neither
SSN -> HOURS nor PNUMBER  HOURS holds
 What about in the Following: {S#, CN}Grade
Chapter 14-16
Partial Functional Dependency
CSE
4701


Partial FD - Formally:
Given R(U) and X, YU. If XY holds but Y is not
f
fully dependent on X ( XY),
then Y is partially
p
functional dependent on X, denoted by XY
Partial FD - Intuitively: Removal of a Attribute from
the R.H.S. still Results in a Valid FD
 {SSN, PNUMBER}  ENAME is Partial since
Removing PNUMBER still Results in the Valid FD
SSN  ENAME
 Are Following Full or Partial?
{S#, CN}CN, {S#, CN}S#
{S#, CN, DNAME}Grade
Chapter 14-17
Second Normal Form (2NF)
CSE
4701



Formal 2NF Definition
R 2NF iff
 (i) R 1NF;
 (ii) all Non-Key Attributes in R are Fully
Functional Dependent on Every Key.
Alternative Definition:
R 2NF iff the Attributes are Either
 a Candidate Key, or
 Fully Dependent on Every Key.
Reason: Partial Functional Dependencies may cause
Update Problems
Chapter 14-18
Another Way to View the Problem

CSE
4701

If the Primary Key Contains a Single Attribute, than No
Need to Test for Problems
This is 1NF but not 2NF since
 Ename a non-prime attribute in FD2 Violates 2NF
since it Depends on Part of Key (SSN)
 Pname and Ploc two non-prime attributes in FD3
Violates 2NF Depends on Part of Key (Pnumber)
Chapter 14-19
One Example of 2NF
CSE
4701

Consider the Example Below
STUDENT_DEPT(S#, DName, DHead, CN, Grade)
S#
DName
DHead
CN
Grade
fd1
fd2
fd3
STUDENT_DEPT 1NF
But STUDENT_DEPT 2NF
“{S#, CN} DName, DHead” since S#  DName and
DName  DHead is a Partial FD causes Anomalies
Chapter 14-20
Recall the Anomalies…
CSE
4701
STUDENT_DEPT(S#, DName, DHead, CN, Grade)



Insertion Anomalies:
 No Department Can Be Recorded if it has No
Student Who Enrolls Courses
Deletion Anomalies:
 Delete the Last Student in a Department will also
Delete the Department
Update Anomalies:
 Change a Head of a Department must Modify All
Students in that Department Due to Redundancies
Chapter 14-21
One Example of 2NF (Continued)
CSE
4701

Decomposition into 2NF by Separating Course
Information from Department Information (Link S#)
S_D(S#, DName, DHead)
S#
DName
DHead
fd2
fd3
S_C(S#, CN, Grade)
S#
CN
Grade
fd1
Chapter 14-22
Another Example of 2NF
CSE
4701

EMP_PROJ is 1NF with Key SSN, PNUMBER but…
 SSN  ENAME - Means ENAME, a Non-Prime
Attribute, Depends Partially on SSN, PNUMBER,
i.e., Depend on Only SSN and not Both
 PNUMBER  {PNAME, PLOCATION} - Means
PNAME, PLOCATION, two Non-Prime Attributes,
Depends Partially on SSN, PNUMBER, i.e., Depend
on Only PNUMEBER and not Both
Chapter 14-23
Another Example of 2NF
CSE
4701

What Does Decomposition Below Accomplish?
 ENAME Fully Dependent on SSN
 PNAME, PLOC Fully Dependent on PNUMBER

Result: 2NF for EP1, EP2, and EP3
Chapter 14-24
Yet Another Example of 2NF
CSE
4701


Consider 1NF Lots to Track Building Lots for Towns
What is the 2NF Problem?
 FD3: COUNTY_NAME  TAX_RATE Means
TAX_RATE Depends Partially on Candidate Key
{COUNTY_NAME, LOT#}
 All Other Non-Prime Attributes are Fine
Chapter 14-25
Yet Another Example of 2NF
CSE
4701


What Does Decomposition Below Accomplish?
 TAX_RATE Fully Dependent on
COUNTY_NAME
Result: 2NF for LOTS1 and LOTS2
Chapter 14-26
Third Normal Form (3NF)
CSE
4701



Third Normal Form Focuses on the Concepts of
Primary Keys and Transitive Functional Dependencies
Intuitively:
 A Relation Schema R is in Third Normal Form
(3NF) if it is in 2NF and no Non-Prime Attribute A
in R is Transitively Dependent on Primary Key
 R can be Decomposed into 3NF Relations via the
Process of 3NF Normalization
In XY and Y Z , with X as the Primary Key, there
is only a a problem only if Y is not a candidate key.
EMP(SSN, Emp#, Salary), SSN  Emp#  Salary
isn’t Problem Since Emp# is a Candidate Key
Chapter 14-27
Transitive Partial FDs
CSE
4701


Transitive FD - Formally:
Given R(U) and X, YU.
If XY, YX and YX, YZ, then Z is called
transitively functional dependent on X.
Transitive FD - Intuitively: a FD X Z that can be
derived from two FDs XY and YZ


SSN  ENAME is non-transitive Since there is no set of
Attributes X where SSN  X and X  ENAME
For FD X Z that can be derived from two FDs XY
and YZ, if Y is a Candidate Key – No Problem
Chapter 14-28
Third Normal Form (3NF)
CSE
4701



Formal 3NF Definition
R 3NF iff
(i) R 2NF;
(ii) No Non-Key Attribute of R is Transitively
Dependent on Every Candidate Key.
Alternative Definition:
R 3NF iff for every FD X  Y, either
 X is a superkey, or
 Y is a key attribute.
Reason: Transitive Functional Dependencies may cause
Update Problems
Chapter 14-29
One Example of 3NF
STUDENT_DEPT(S#, DName, DHead, CN, Grade) 2NF
CSE
4701
S_D(S#, DName, DHead) 2NF
S_C(S#, CN, Grade) 2NF
S_D 3NF
S_C 3NF
“S#  DHead” is a Transitive FD in S_D and
“DHead” is non-key attribute since
S# (X)  Dname (Y) and DName (Y) DHead (Z)
S#
DNAME
DHead
CN
Grade
fd1
fd2
fd3
S#DHead
Chapter 14-30
One Example of 3NF
CSE
4701
fd S#  DHead
S#
DName
DHead
fd2 S#  DName
fd3 DName  DHead
S_C(S#, CN, Grade) 2NF
S_D(S#, DName, DHead) 2NF
DEPT(DName, DHead)
S_D (S#, DName)
3NF
Decompose to Eliminate the Transitivity Within S_D
Chapter 14-31
Another Example of 3NF
CSE
4701

EMP_DEPT is 2NF with Key SSN, but there are Two
Transitive Dependencies (Undesirable)
 SSN  DNUMBER and DNUMBER  DNAME
Means DNAME, Neither Key Nor Subset of Key, is
Transitively Dependent on SSN
 SSN is the Only Candidate Key of EMP_DEPT!
 Note: Also Similar Problem with SSN and
DMGRSSN via DNUMBER
Chapter 14-32
Another Example of 3NF
CSE
4701


To Attain 3NF, Decompose into ED1 and ED2
Intuitively - we are Separating Out Employees and
Departments from One Another
Chapter 14-33
Yet Another Example of 3NF
CSE
4701


Recall 2NF Solution for Building Lots Problem
What is the 3NF Problem? Violate Alternative Defn.
 In LOTS1, FD4 AREA  PRICE
AREA is not a Superkey
PRICE not a Prime Attribute of LOTS1
Chapter 14-34
Yet Another Example of 3NF
CSE
4701


Decompose to Introduce a Separate Key AREA
Result: 3NF for LOTS1A and LOTS1B
Chapter 14-35
1NF and 2NF – Maintain FDs!
CSE
4701
Chapter 14-36
Transition to 3NF – Maintain FDs!
CSE
4701
Chapter 14-37
Summary of Progression – Maintain FDs!
STUDENT_DEPT
CSE
4701
1
N
F
S#
DName
DHead
CN
Grade
fd1
fd2
fd3
S_C
2
S#
N
F fd1
eliminate partial FDs
CN
Grade
S#
S_D
DName
DHead
fd2
fd3
S_C
3
S#
N
fd1
F
S_D
CN
S#
Grade
DName
fd2
DEPT
DName
eliminate transitive FDs
fd3
DHead
Chapter 14-38
Summary of 1NF, 2NF, 3NF Concepts
CSE
4701
Test
Remedy (Normalization)
1NF
Relation should have
no nonatomic attributes
or nested relations.
Form new relations for each nonatomic
attribute or nested relation.
2NF
For relations where primary
key contains multiple
attributes, no nonkey
attribute should be
functionally dependent on
a part of the primary key.
Decompose and set up a new relation
for each partial key with its dependent
attribute(s). Make sure to keep a
relation with the original primary key
and any attributes that are fully
functionally dependent on it.
3NF
Relation should not have a
nonkey attribute functionally
determined by another nonkey
attribute (or by a set of nonkey
attributes.) That is, there should
be no transitive dependency of
a nonkey attribute on the
primary key.
Decompose and set up a relation that
includes the nonkey attribute(s) that
functionally determine(s) other
nonkey attribute(s).
Chapter 14-39
Boyce-Codd Normal Form (BCNF)
CSE
4701




Boyce-Codd Normal Form Focuses on Searching for
Remaining Anomalies that can Arise in FDs
Intuitively:
 A Relation Schema R is in Boyce-Codd Normal
Form (BCNF) if Whenever an FD X  A Holds in
R, then X is a Superkey of R
 R can be Decomposed into BCNF Relations via the
Process of BCNF Normalization
There exist Relations that are in 3NF but not in BCNF
The Goal is to have each Relation in BCNF (or 3NF)
Chapter 14-40
Boyce-Codd Normal Form (BCNF)
CSE
4701


Formal BCNF Definition
R BCNF iff
(i) R 1NF;
(ii) for every FD X  Y, X is a Superkey,
i.e., if X  Y and YX, then X Contains a Key.
Properties of BCNF
R BCNF iff for every FD X  Y, either



All Non-key Attributes Fully Dependent on Every Key
All Key Attributes Fully Dependent on the Keys that
they do not Belong to
No Attribute Fully Dependent on any Set of Non-key
Attributes
Chapter 14-41
Comparing the Normal Forms
Poor Relational Schema Design
Developed as Stepping Stone
CSE
4701
1NF
Eliminate the
non-trivial
functional
dependencies
of non-key
attributes to
key
Eliminate partial FDs of
non-key attributes to key
2NF
Eliminate transitive FDs of nonkey attributes to key
3NF
BCNF
Eliminate partial and transitive
FDs of key attributes to key
Most 3NF are in BCNF - BCNF
Eliminates All Update Anomalies
Chapter 14-42
One Example of BCNF
CSE
4701




Recall 3NF Solution for Building Lots Problem
Suppose that AREA is Sizes in Acres with
 AREAs in Tolland County 0.5, 0.6, …, 1.0
 AREAs in Windham County 1.1, 1.2, …, 2.0
Adding FD5: “AREA  COUNTYNAME”
What Does Data in LOTS1A Look like for Given Set
of Properties?
Chapter 14-43
One Example of BCNF
CSE
4701
LOTS1A PROPERTY_ID#
T11
T12
W13
W11
W12
T10




COUNTY_NAME
Tolland
Tolland
Windham
Windham
Windham
Tolland
LOT#
L1
L2
L6
L1
L4
L3
AREA
0.5
0.8
1.5
1.1
1.6
0.9
What is the Problem Here?
 What if you Delete W11?
 You have “Lost” the “Windham, 1.1” Combination
Also - Redundancy since “County Name, Area” is
Repeated in Multiple Tuples Throughout LOTS1A
Even Though LOTS1A in 3NF - Still Problems
Problems with FD5: “AREA  COUNTY_NAME”
Chapter 14-44
Transition to BCNF – Maintain FDs!
CSE
4701
Add new FD5
Chapter 14-45
One Example of BCNF
CSE
4701

FD5: “AREA  COUNTY_NAME”
 Satisfies 3NF: COUNTY_NAME is Prime Attribute
 Violates BCNF: AREA not a SuperKey of LOTS1A
So Do One More Split
Chapter 14-46
One Example of BCNF
CSE
4701
LOTS1AX
PROPERTY_ID#
T11
T12
W13
W11
W12
T10
LOTS1AX
PROPERTY_ID#
T11
T12
W13
W11
W12
T10
LOT#
L1
L2
L6
L1
L4
L3
COUNTY_NAME
Tolland
Tolland
Windham
Windham
Windham
Tolland
AREA
0.5
0.8
1.5
1.1
1.6
0.9
LOT#
L1
L2
L6
L1
L4
L3
AREA
0.5
0.8
1.5
1.1
1.6
0.9
LOTS1AY
AREA
0.5
...
1.0
1.1
...
2.0
COUNTY_NAME
Tolland
Tolland
Tolland
Windham
Windham
Windham
Chapter 14-47
Another Example of BCNF

Consider the TEACH Relation:
CSE
4701
TEACH(STUDENT, COURSE, INSTRUCTOR)


in 3NF but NOT BCNF with
 FD1: {STUDENT, COURSE}  INSTRUCTOR
 FD2: INSTRUCTOR  COURSE
3 Possible Decompositions of TEACH:





T1(STUDENT, INSTRUCTOR), T2(STUDENT, COURSE)
T1(COURSE, INSTRUCTOR), T2(COURSE, STUDENT)
T1(INSTRUCTOR, COURSE), T2 (INSTRUCTOR, STUDENT)
All Three “Lose” FD1!
3rd is Best Since After Join, Recaptures FD1 and
Doesn’t Generate any Spurious Tuples
Chapter 14-48
What Does Table Look Like?
CSE
4701

Note TEACH in 3NF but NOT BCNF
Chapter 14-49
Reflections on Normalization
CSE
4701


Normalization
 A Tool for Validating the Quality of the Schema,
Rather than Merely as a Method for Designing a
Relational Schema
 Promotes Each Concept of the Application Domain
Mapping to Exactly One Concept of the Schema
Normalization Process
 Actually a Process of Concept Separation
 Concept Separation is Result of Applying a Topdown Methodology for Producing a Schema Via
Subsequent Refinements and Decompositions
Chapter 14-50
Relational DB Design Process
CSE
4701


Normalization Process Focused on Decomposition
Raises Number of Questions
 How do we Decompose a Schema into a Desirable
Normal Form?
 What Criteria Should the Decomposed Schemas
Follow in order to Preserve the Semantics of the
Original Schema?
 Can we Guarantee the Decomposition’s Quality?
 Can we Prevent the “Loss” of Information?
 Are Dependencies Maintained in Decomposition?
Chapter 14-51
Recall Transitive FD/Update Anomalies
R = ( U, F )
U = { S#, DName, DHead }
F = { S#DName,
DName DHead }
CSE
4701

S#
S1
S2
S3
S4
DName DHead
D1
D1
D2
D3
John
Jonh
Smith
Black
S#  Dhead” is a Transitive FD
 When S4 Graduates, Head Information of D3 Lost
 Similarly, If D5 has No Students Yet, then the Head
Information cannot be Stored in this Database
 Update Head of Any Department Requires an
Update to Every Student Enrolled in the Dept.
Chapter 14-52
What are Possible Decompositions?
CSE
4701
R = ( U, F ) U = { S#, DName, DHead }
F = { S#DName, DName DHead }
S#
DName
DHead
S1
S2
S3
S4
D1
D1
D2
D3
John
John
Smith
Black


Information Based
 = { R1(S#, ), R2(DName,  R3(DHead, )}
 is
Neither Lossless nor FD-Preserving
Chapter 14-53
What are Possible Decompositions?
CSE
4701
R = ( U, F ) U = { S#, DName, DHead }
F = { S#DName, DName DHead }
S# DName
S1
D1
S2
D1
S3
D2
S4
D3

S# DHead
S1 John
S2 John
S3 Smith
S4 Black

•Lossless Decomposition but
not Dependency-Preserving
•DNameDHead is lost in
the decomposition
 = { R1({S# ,DName}, {S#DName}),
R2({S#, DHead}, {S#DHead})}
 is Lossless but not FD-Preserving
Chapter 14-54
What are Possible Decompositions?
CSE
4701
R = ( U, F ) U = { S#, DName, DHead }
F = { S#DName, DName DHead }
S# DName
S1
D1
S2
D1
S3
D2
S4
D3

DName DHead
D1
D1
D2
D3
John
John
Lossless & dependency preserving decomposition
= { R1({S# ,DName}, {S#  DName})
R3({DName, DHead}, {Dname  DHead})}
is both Lossless and FD-Preserving
Chapter 14-55
Summary of Normalization
1NF
CSE
4701
Lossless Decomposition
and Dependency Preserving
Eliminate the Partial
Functional Dependencies of
Non-prime Attributes to
Key Attributes
2NF
Eliminate the Transitive
Functional Dependencies
of Non-prime Attributes to
Key Attributes
3NF
Lossless Decomposition
but not Dependency Preserving
Eliminate the Partial and
Transitive Functional
Dependencies of Prime
(Key) Attributes to Key
BCNF
Chapter 14-56
The Entire Normalization Picture
1NF
CSE
4701
2NF
3NF
Eliminate Partial FDs of
Non-prime Attributes to Key
Eliminate Transitive FDs of Non-prime
Attributes to Key
Eliminate Partial and Transitive FDs
of Prime Attributes to Key
BCNF
Eliminate Non-trivial and Nonfunctional Multi-Valued Dependencies
4NF
Eliminate Join Dependencies that are
Not Implied by Candidate Key
5NF
Chapter 14-57
What are Multi-Valued Dependencies?
CSE
4701



Focused on the Concept of Multi-Valued Dependencies
A MVD X  Y Indicates that a Value of X
Corresponds to Multiple Values of Y
Consider EMP with MVDs:

ENAME  PNAME (E works on many P)

ENAME  DNAME (E has many Dependents)
Chapter 14-58
What is Fourth Normal Form (4NF)?
CSE
4701


A Relation Schema R is in Fourth Normal Form
(4NF) w.r.t Dependencies F (FD and MVD) if for
every Non-Trivial MVD X  Y in F+, X is a
Superkey for R
Reconsider EMP with MVDs:

ENAME  PNAME (E works on many P)
ENAME  DNAME (E has many Dependents)
ENAME is Not a Superkey of R since Need Triple of
ENAME, PNAME, and DNAME to Distinguish
We need to Decompose EMP!



Chapter 14-59
Decomposition into 4NF
CSE
4701
ENAME  PNAME is Trivial MVD: ENAME  PNAME is
Equal to EMP_PROJECTS (same for ENAME  DNAME)
Chapter 14-60
What about the Supply Table?
CSE
4701


In 4NF But Not in 5NF since: Supplier supplies Parts,
Supplier supplies Projects, & Parts Used on Projects
Removes Join Dependencies – Many-many-many
Chapter 14-61
Slides on Query Optimization
CSE
4701
Chapter 14-62
Simplification
CSE
4701


Why Simplify?
 The Simpler the Query, the Less Work there is and
the Better the Performance
How? Use transformation rules
 Elimination of Redundancy
Idempotency
Rules
p1  ¬(p1) = false
¬(p1 p2) = ¬(p1) ¬(p2)
p1  false = p1

…
Application of
Transitivity
Use of Integrity Rules

Example
 x > a and x > b
Chapter 14-63
Restructuring

CSE
4701


Convert Relational Calculus to
Relational Algebra
ENAME
Make use of Query Trees
Example
Find the names of employees
(DUR=12 OR DUR=24) AND
other than J. Doe who worked
JNAME=“CAD/CAM” AND
ENAME°“J. DOE”
on the CAD/CAM project for
either 1 or 2 years.
SELECT ENAME
FROM
E, W, P
WHERE E.ENO=W.ENO
AND
W.JNO=P.JNO
P
AND
E.ENAME°"J. Doe"
AND
P.JNAME="CAD/CAM"
AND
(W.DUR=12 OR
W.DUR=24)
Project
Select
JNO
Join
ENO
W
E
Chapter 14-64
Query Optimization Objectives
CSE
4701





Improving Performance
Arriving at a Query Plan of Execution
Analyzing the Relational Algebra Query
 Replace Costly Operations
 Do Selections and Projections Early
Optimization Heuristics for the Relational Algebra
 Performing Selection and Projection Before Join
 Combining Several Selections Over a Single
Relation Into One Selection
 Find Common Subexpressions
 Algebraic Rewriting/transformation Rules
General Transformation Rules for Relational Algebra
Chapter 14-65
Query Optimization: An Example
CSE
4701

Why is it important?
SELECT ENAME
FROM
E,W
WHERE E.ENO = W.ENO
AND
W.RESP = "Manager"

Strategy 1


ENAME(RESP="Manager"E.ENO=G.ENO(E  W))
Strategy 2
 ENAME( E
ENO(RESP="Manager"(W)))
Chapter 14-66
Cost of Alternatives

CSE
4701


Assume :
 card(E) = 4,000; card(W)=10,000
 10% of tuples in W satisfy RESP="Manager"
(selection generates 1,000 tuples)
Execution time Proportional to the Sum of the
Cardinalities of the Temporary Relations
Searching is Done by Sequential Scanning
Strategy 1
Cartesian prod. = 40,000,000
Search over all = 40,000,000
80,000,000
Strategy 2
Selection over W =
10,000
Join(4000*1000) = 4,000,000
4,010,000
Chapter 14-67
General Query Optimization Strategy
CSE
4701



Perform Selections Early
 Yields Smaller Intermediate Results
 Direct Impact on Subsequent Join/Cartesian Prod.
Combine Selections with a Prior Cartesian Product into
a Theta or Equi Join
 Join is a Cheaper Operation
Combine (Cascade) Selections and Projections
AB(B (R))  AB(R)
p1 ( p2 (R))  p1 ^ p2 (R)
This Results in One Pass Instead of Two over Table
Chapter 14-68
General Query Optimization Strategy
CSE
4701



Identify Common Subexpressions
 Compute Once and Store
 use Stored Version for Subsequent Times
 Often Useful When Views are Employed
Preprocess Data via Sorts and Indexes
 Speeds up Searches and Joins by Limiting Scope
Evaluate and Assess Different Options
 For Cartesian Product, Use Smaller Relation for
Comparison
 Use System Catalog (Meta-data) to Effect Order in
Query Execution Plan
Chapter 14-69
Relational Algebra Transformations
CSE
4701
1.
Cascade of Selection

2.
Commutativity of Selection


3.
p1(p2(R))p2(p1(R))
p1 or p2(R )p1(R p2(R)
Cascade of Projection

4.
p1 ^ p2 ^ …^ pn(R)p1(p2(...(pn(R))...))
A1,A2, … An(R)A1(A2(...(An(R))...))
A1(R) if A1 A2 ...  An
Commuting Selection with Projection (A’s not in p)

A1,A2,...,An(p(R))p(A1,A2,...,An(R)
Chapter 14-70
Relational Algebra Transformations
CSE
4701
5.
6.
Commutativity of Theta Join and Cartesian Product
 R
A SS
AR
 R  SS  R
Commuting Selection with Theta Join (Cartesian)
 p(A)(R S) p(A)(R)) S
A defined on R only
 p(A)^p(B)(R S)  p(A)(R))  p(B)(S))
(A defined on R, B defined on S)

7.
Also Holds for Theta Join as Well
Commuting Projection with Theta Join (Cartesian)
 C(R S) A(R) B(S) where AB=C
 A are Attributes in C for R and B are Attributes in C
Chapter 14-71
Relational Algebra Transformations
CSE
4701
8.
9.
10.
Commutativity of Set Operations
 R S S R
 R S S R
Associativity of Set Operations
 (R S) T R S T)
 (R
S)
T R
(S
T)
 (R S) S R  (S  T)
 (R S) S R (S T)
Commuting Select with Set Operations
 p(Ai)(R T) p(Ai)(R) p(Ai)(T)
where Ai is defined on both R and T
Chapter 14-72
Relational Algebra Transformations
CSE
4701
11. Commuting Projection with Union
 C(R
q(Aj,Bk) S) A(R)
q(Aj,Bk) B(S)

C(R S) A’ (R) B’ (S)
where R[A] and S[B]
C = A' B' where A'  A, B’  B
12. Converting Selection/Cartesian Into Theta Join
C
 C (R S)  R
S
Chapter 14-73
Using Heuristics in Query Optimization
CSE
4701

Process for heuristics optimization
1. The parser of a high-level query generates an initial
internal representation;
2. Apply heuristics rules to optimize the internal
representation.
3. A query execution plan is generated to execute
groups of operations based on the access paths
available on the files involved in the query.

The main heuristic is to apply first the
operations that reduce size of intermediate
results

E.g., Apply SELECT and PROJECT operations
before applying the JOIN or other operations.
Chapter 14-74
Using Heuristics in Query Optimization (2)

CSE
4701
Query tree:



A tree data structure that corresponds to a relational algebra
expression. It represents the input relations of the query as
leaf nodes of the tree, and represents the relational algebra
operations as internal nodes.
An execution of the query tree consists of executing an
internal node operation whenever its operands are
available and then replacing that internal node by the
relation that results from executing the operation.
Query graph:

A graph data structure that corresponds to a relational
calculus expression. It does not indicate an order on which
operations to perform first. There is only a single graph
corresponding to each query.
Chapter 14-75
Using Heuristics in Query Optimization

CSE
4701
Heuristic Optimization of Query Trees:




The same query could correspond to many different
relational algebra expressions — and hence many different
query trees.
Remember – Not One Soln to Each Query on Exam
The task of heuristic optimization of query trees is to find a
final query tree that is efficient to execute.
Example:
Q: SELECT
LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE
PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;
Chapter 14-76
Heuristics Algebraic Optimization Concepts
CSE
4701



Using Cascade of Selections Rule, Break up Any
Selections With Conjunctive Conditions Into a Cascade
of Selections
 Allows More Freedom in Moving Selections Down
Different Branches of the Tree
Using Commutativity of Selections with Other
Operations Rules, Move Each Selection Down the
Query Tree as far as Possible
If Possible, Combine a Cartesian Product With a
Selection Into a Join
Chapter 14-77
Heuristics Algebraic Optimization Concepts
CSE
4701



Using Associativity of Binary Operations, Rearrange
the Leaf Nodes So That the Most Restrictive Selections
Are Executed First
 The Fewer Tuples the Resulting Relation Contains,
the More Restrictive the Selection
 Reducing the Size of Intermediate Results Improves
Performance
Using Cascade of Projections and Commutativity of
Projections with Other Operations, Move Projections
Down the Query Tree as Far as Possible
Identify Subtrees that Represent Groups of Operations
that can be Executed by a Single Algorithm
Chapter 14-78
Summary of All Rules
CSE
4701
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Cascade of Selection
Commutativity of Selection
Cascade of Projection
Commuting Selection with Projection (A’s not in p)
Commutativity of Theta Join and Cartesian Product
Commuting Selection with Theta Join (Cartesian)
Commuting Projection with Theta Join (Cartesian)
Commutativity of Set Operations
Associativity of Set Operations
Commuting Select with Set Operations
Commuting Projection with Union
Converting Selection/Cartesian Into Theta Join
Chapter 14-79
Heuristic Algebraic Optimization Algorithm
CSE
4701






Use Rule 1 to Break up Selects with Conjunctions into
a Cascade to Move them Down the Query Tree
Use Rules 2, 4, 6, and 10 to Commute Select with
Project, Join, Cart. Prod., Union, and Intersection
Use Rule 5 (Commute) and 9 (Associative) to
Rearrange the Leaf Nodes of Query Tree to:
 Most Restrictive Select Executed First
 Avoid Cartesian Product in Leaf Nodes
Use Rule 12 to Convert a Select/Cart Prod to Join
Use Rules 3, 4, 7, and 11 to Cascade and Commute
Project - Pushing Down Tree as Far as Possible
Identify Subtrees that Can Execute as Independent
Chapter 14-80
Heuristic Optimization: Example
CSE
4701
Canonical query tree at the end of
query preprocessing phase
ENAME
(DUR=12 OR DUR=24) AND
JNAME=“CAD/CAM” AND
ENAME= “J. DOE”
E(ENAME, ENO)
P(JNO,JNAME)
W(ENO,PNO,DUR)
JNO
ENO
P
W
E
Chapter 14-81
Heuristic Optimization– Example
ENAME
CSE
4701
DUR=12 OR DUR=24
JNAME=“CAD/CAM”
ENAME = “J. DOE”
Use cascading of selections
rule to decompose selections
JNO
P
ENO
W
E
Chapter 14-82
Heuristic Optimization– Example
ENAME
CSE
4701
DUR=12 OR DUR=24
JNAME=“CAD/CAM”
Push selection down
using commutativity of
selection over join
JNO
ENO
ENAME = "J. Doe"
P
W
E
Chapter 14-83
Heuristic Optimization–Example
CSE
4701
ENAME
DUR=12 OR DUR=24
JNO
JNAME = "CAD/CAM"
Push selection down
using commutativity of
selection over join
ENO
ENAME = "J. Doe"
P
W
E
Chapter 14-84
Heuristic Optimization–Example
CSE
4701
ENAME
JNO
Push selection down
ENO
JNAME = "CAD/CAM"
P
DUR =12 DUR=24
W
ENAME = "J. Doe"
E
Chapter 14-85
Heuristic Optimization–Example
ENAME
CSE
4701
JNO
JNO,ENAME
Do early projection
ENO
JNO
JNAME = "CAD/CAM" 
P
JNO,ENO
DUR =12 DUR=24
W
ENO,ENAME
ENAME = "J. Doe"
E
Chapter 14-86
Heuristic Optimization–Example
ENAME
CSE
4701
Identify subtrees that
can be implemented in
one algorithm
JNO
JNO,ENAME
ENO
JNO
JNAME = "CAD/CAM"
JNO,ENO
JNO,ENAME
DUR =12 DUR=24
ENAME = "J. Doe"
P
W
E
Chapter 14-87
Heuristic Optimization: A Second Example
CSE
4701
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Let XLOANS = S(F(Loans x Borrowers x Books))
where:
S ={Title, Author, Pname, LC_No, Name,
Addr, City, Card_No, Date}
and
F = {Borrower.Card_No = Loans.Card_No ^
Books.LC_No = Loans.LC_No}
Chapter 14-88
Heuristic Optimization: A Second Example

CSE
4701

Title, Author, Pname,
LC_No, Name, Addr,
City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^
Books.LC_No = Loans.LC_No
XLOANS
X
Books
X
Loans
Borrower
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Chapter 14-89
Heuristic Optimization: A Second Example
 Title
CSE
4701
 Date  1/1/88


Title, Author, Pname,
LC_No, Name, Addr,
City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^
Books.LC_No = Loans.LC_No
X
Books
X
Loans
Query= TITLE(Date  1/1/88 (XLOANS))
Borrower
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Chapter 14-90
Heuristic Optimization: A Second Example
 Title
Try to Cascade
CSE
4701
Date  1/1/88
 Date  1/1/88


Title, Author, Pname,
LC_No, Name, Addr,
City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^
Books.LC_No = Loans.LC_No
X
Books
X
Loans
Borrower
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Chapter 14-91
Heuristic Optimization: A Second Example
 Title
CSE
4701

Title, Author, Pname,
LC_No, Name, Addr,
City, Card_No, Date
 Date  1/1/88

Commute Select
and Project
Borrower.Card_No = Loans.Card_No ^
Books.LC_No = Loans.LC_No
X
Books
X
Loans
Borrower
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Chapter 14-92
Heuristic Optimization: A Second Example
 Title
CSE
4701


Title, Author, Pname,
LC_No, Name, Addr,
City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^
Books.LC_No = Loans.LC_No
 Date  1/1/88
Commute Select
and Select
X
Books
X
Loans
Borrower
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Chapter 14-93
Heuristic Optimization: A Second Example
 Title
CSE
4701


Title, Author, Pname,
LC_No, Name, Addr,
City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^
Books.LC_No = Loans.LC_No
X
Books
X
 Date  1/1/88
Loans
Borrower
Commute Select and
Cartesian Product
Two Levels Down
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Chapter 14-94
Heuristic Optimization: A Second Example
 Title
Try to Cascade
CSE
4701


Borrower.Card_No = Loans.Card_No
Title, Author, Pname,
LC_No, Name, Addr,
City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^
Books.LC_No = Loans.LC_No
X
Books
X
 Date  1/1/88
Loans
Borrower
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Chapter 14-95
Heuristic Optimization: A Second Example
 Title
CSE
4701


Title, Author, Pname,
LC_No, Name, Addr,
City, Card_No, Date
Books.LC_No = Loans.LC_No
X

Books
Borrower.Card_No = Loans.Card_No
Commute Select and
Cartesian Product
One Level Down
X
What’s Next?
 Date  1/1/88
Loans
Borrower
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Chapter 14-96
Heuristic Optimization: A Second Example
 Title
CSE
4701


Title, Author, Pname,
LC_No, Name, Addr,
City, Card_No, Date
Books.LC_No = Loans.LC_No
X
Combine
Projections
Books

Borrower.Card_No = Loans.Card_No
X
 Date  1/1/88
Loans
Borrower
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
Chapter 14-97
Heuristic Optimization: A Second Example
BOOKS(Title, Author, Pname, LC_No)
PUBLISHERS(Pname, Paddr, Pcity)
BORROWERS(Name, Addr, City, Card_No)
LOANS(Card_No, LC_No, Date)
 Title
CSE
4701

Books.LC_No = Loans.LC_No
X

Books
Borrower.Card_No = Loans.Card_No
X
 Date  1/1/88
Loans
Borrower
What is Still a Problem?
We are Not Projecting so All Attributes are
Still Collected Until the Final Project!
Chapter 14-98
Heuristic Optimization: A Second Example
 Title
CSE
4701

 Loans.LC_No
 Books.LC_No, Title
X

 Loans.LC_No,
Books.LC_No = Loans.LC_No
Books
Borrower.Card_No = Loans.Card_No
X
 Borr.Card_No
Loans.Card_No
 Date  1/1/88
Loans
Borrower
Add Strategic Projections
to Send Only the Minimum
Up the Tree as Needed
for Join/Result Set
Chapter 14-99
Heuristic Optimization: A Second Example
CSE
4701
 Title
What is the Final Step?
Combine Select and
Cartesian Product

Books.LC_No = Loans.LC_No
Result: Equijoins!
 Loans.LC_No
X

 Loans.LC_No,
 Books.LC_No, Title
Books
Borrower.Card_No = Loans.Card_No
X
 Borr.Card_No
Loans.Card_No
 Date  1/1/88
Borrower
Loans
Chapter 14-100
Heuristic Optimization: A Second Example
CSE
4701
FINAL TREE with
Equijoins!
 Title
LC_No
 Loans.LC_No
 Books.LC_No, Title
Books
Card_No
 Loans.LC_No,
 Borr.Card_No
Loans.Card_No
 Date  1/1/88
Borrower
Loans
Chapter 14-101
Heuristic Optimization: A Third Example

CSE
4701
Heuristic Optimization of Query Trees:



The same query could correspond to many different
relational algebra expressions — and hence many
different query trees.
The task of heuristic optimization of query trees is to find
a final query tree that is efficient to execute.
Example:
Q: SELECT
FROM
WHERE
LNAME
EMPLOYEE, WORKS_ON, PROJECT
PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;
Chapter 14-102
Heuristic Optimization: A Third Example
CSE
4701
What’s one Approach?
Chapter 14-103
Heuristic Optimization: A Third Example
CSE
4701
Moving Selects Down
Is this Optimal?
Chapter 14-104
Heuristic Optimization: A Third Example
CSE
4701
No! Prior Version
Retrieved All Employees
Without First Apply
Pname Select
Chapter 14-105
Heuristic Optimization: A Third Example
CSE
4701
Replace CART PRODUCT
Plus SELECT with JOIN!
What’s left to do?
Chapter 14-106
Heuristic Optimization: A Third Example
CSE
4701
Chapter 14-107
Heuristic Optimization: A Fourth Example
CSE
4701
Sailors (sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
Query: Find all Sailors that have Reserved red Boats that are
younger who are younger than 30 and have a
rating of at least 11.
SELECT S.sid, S.sname, S.age
FROM Sailors S, Boats B, Reserves R
WHERE B.bid=R.bid AND S.sid=R.sid AND S.Rating >= 11 AND
B.color = “Red” AND S.age < 30;
πS.sid, S.sname, S.age(σ B.bid=R.bid^S.sid=R.sid^S.age<30^
B.color=“Red”^S.rating≥11(B×S×R)
Chapter 14-108
Heuristic Optimization: A Fourth Example

CSE
4701

S.sid, S.sname, S.age
B.bid=R.bid^S.sid=R.sid^
S.age < 30 ^ S.Rating >= 11 ^ B.color = “Red”
X
Boats
Sailors (sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
X
Reserves
Sailors
Step 1 - Break up Selects
Chapter 14-109
Heuristic Optimization: A Fourth Example
CSE
4701

S.sid, S.sname, S.age

B.bid=R.bid^S.sid=R.sid

S.age < 30 ^ S.Rating >= 11

B.color = “Red”
Sailors (sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
X
Boats
X
Step 2 – Move that Boats Select
Reserves
Sailors
Chapter 14-110
Heuristic Optimization: A Fourth Example
CSE
4701

S.sid, S.sname, S.age

B.bid=R.bid^S.sid=R.sid

S.age < 30 ^ S.Rating >= 11
X

Sailors (sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
B.color = “Red”
Boats
Step 3 – Move that Sailor Select
X
Sailors
Reserves
Chapter 14-111
Heuristic Optimization: A Fourth Example

CSE
4701

S.sid, S.sname, S.age
Sailors (sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
B.bid=R.bid^S.sid=R.sid
X
X

B.color = “Red”
Boats
Reserves

S.age < 30 ^ S.Rating >= 11
Sailors
Step 4 – Introduce Projections
Chapter 14-112
Heuristic Optimization: A Fourth Example

CSE
4701

S.sid, S.sname, S.age
Sailors (sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
B.bid=R.bid^S.sid=R.sid
Step 5 – What’s Next Step?
X


X
B.bid
B.color = “Red”
Boats

R.sid,R.bid
Reserves


S.sid,S.name,S.age
S.age < 30 ^ S.Rating >= 11
Sailors
Chapter 14-113
Heuristic Optimization: A Fourth Example
CSE
4701

S.sid, S.sname, S.age

B.bid=R.bid
Sailors (sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
Step 6 - Move Down S.sid=R.sid
X

S.sid=R.sid
Step 7 – What’s Next Step?
X



B.bid
B.color = “Red”
Boats

R.sid,R.bid
Reserves

S.sid,S.name,S.age
S.age < 30 ^ S.Rating >= 11
Sailors
Chapter 14-114
Heuristic Optimization: A Fourth Example
CSE
4701

S.sid, S.sname, S.age

B.bid=R.bid
Sailors (sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
Step 7 – Combined for Equi Join
X
Step 8 – What’s Final Step?
S.sid=R.sid



B.bid
B.color = “Red”
Boats

R.sid,R.bid
Reserves

S.sid,S.name,S.age
S.age < 30 ^ S.Rating >= 11
Sailors
Chapter 14-115
Heuristic Optimization: A Fourth Example
CSE
4701

Sailors (sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
S.sid, S.sname, S.age
Step 8 – Introduce Final EquiJoin
B.bid=R.bid
S.sid=R.sid



B.bid
B.color = “Red”
Boats

R.sid,R.bid
Reserves

S.sid,S.name,S.age
S.age < 30 ^ S.Rating >= 11
Sailors
Chapter 14-116
Converting Relational Algebra to Query
Tree
Movies1997
=
CSE
4701
Lname,Fname,State( Person.PersonID = AllActors.PersonID ^
Movies1997.ShowID=MovieRoles.ShowID ^ Year=1997
(Person x
Movies x MovieRoles))
 Lname,Fname,State

Person.PersonID = AllActors.PersonID ^
Movies1997.ShowID=MovieRoles.ShowID ^ Year=1997
X
X
Person
Movies
MovieRoles
Chapter 14-117
Converting Relational Algebra to Query
Tree
FriendsActors = Lname,Fname,RLName,RFName
CSE
(
4701 ShowName=Friends ^ TVRoles.ShowID = Friends.ShowID ^ EpisodeID>10 ^ EpisodeId<26 ^
Person.PersonID = RoleNames.PersonID(TVShows

x TVRoles x Roles x Person))
ShowID

ShowName=Friends ^ TVRoles.ShowID = Friends.ShowID ^
EpisodeID>10 ^ EpisodeId<26 ^ Person.PersonID =
RoleNames.PersonID
X
X
TVShows
TVRoles
Roles
X
Person
Chapter 14-118
Heuristics Query Optimization: Summary
CSE
4701

First Apply Operations that Reduce the Size of
Intermediate Results
 Move Selections and Projections Down the Tree as
far as Possible
Early
Selections Reduce the Number of Tuples
Early Projections Reduce the Number of Attributes

Selection and Join Should be Executed Before Other
Similar Operations.
This
is Accomplished by Reordering the Leaf Nodes of
the Tree Among Themselves and Adjusting the Rest of
the Tree Appropriately
Chapter 14-119
Slides on Concurrency Control Algorithms
CSE
4701
Chapter 14-120
What is a Schedule?

CSE
4701

Transaction schedule or history:
 When transactions are executing concurrently in an
interleaved fashion, the order of execution of
operations from the various transactions forms
what is known as a transaction schedule
A schedule S of n transactions T1, T2, …, Tn is:
 Ordering of operations of transactions where, for
each transaction Ti that participates in S, the
operations of T1 in S must appear in the same
order in which they occur in T1.
 Operations from other transactions Tj can be
interleaved with the operations of Ti in S.
Chaps19&20-121
What is a Schedule?

CSE
4701

A Schedule S is a Sequence of R/W Operations,
Which End with Commit or Abort
 Different Transactions Executing Concurrently in
an Interleaved Fashion with One Another
 Each Transaction a Sequence of R/W Operations
Two Schedules S1 and S2 are Equivalent,
Denoted as S1  S2 , If and Only If S1 and S2
 Execute the Same Set of Transactions
 Produce the Same Results (i.e., Both Take the DB
to the Same Final State)
Chaps19&20-122
Transactions and a Schedule

CSE
4701


Below are Transactions T1 and T2
Note that the Their Interleaved Execution Shown
Below is an Example of One Possible Schedule
There are Many Different Interleaves of T1 and T2
T1
T2
Read(X);
X:=X;
Write(X);
Read(X);
X:=X;
Write(X);
commit;
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Schedule S: R1(X), W1(X), R2(X), W2(X), c2, R1(Y), W1(Y), c1;
Chaps19&20-123
Transactions and a Schedule

What Happens if the Schedule Changes to:
CSE
4701
T1
T2
T2
Read(X);
X:=X;
Read(X);
X:=X;
Write(X);
Read(X);
Read(X);
X:=X;
Write(X);
commit;
Read(Y);
Y = Y + 20;
Write(Y);
commit;
T1
Write(X);
X:=X;
Write(X);
commit;
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Chaps19&20-124
Equivalent Schedules

CSE
4701

Are the Two Schedules below Equivalent?
S1 and S4 are Equivalent, since They have the Same
Set of Transactions and Produce the Same Results
T1
T2
Read(X);
X:=X;
Write(X);
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Schedule S1
T1
T2
Schedule S4
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
commit;
Read(Y);
Y = Y + 20;
Write(Y);
commit;
S1: R1(X),W1(X), R1(Y), W1(Y), c1, R2(X), W2(X), c2;
S4: R1(X), W1(X), R2(X), W2(X), c2, R1(Y), W1(Y), c1;
Chaps19&20-125
What are Different Types of Schedules?

CSE
4701



Recoverable schedule:
 One where no transaction needs to be rolled back.
 No transaction T in S commits until all transactions
T’ that write an item that T reads have committed.
Cascadeless schedule:
 One where every transaction reads only the items
that are written by committed transactions.
Cascaded rollback:
 A schedule in which uncommitted transactions that
read an item from a failed transaction must be
rolled back – Read value written by Failed Trans
Strict Schedules:
 A schedule in which a transaction can neither read
or write an item X until the last transaction that
wrote X has committed.
Chaps19&20-126
Serial and Serializable Schedules

CSE
4701
Serial schedule:
 A schedule S is serial if, for every transaction T
participating in the schedule, all the operations of
T are executed consecutively in the schedule.
 Otherwise, the schedule is called nonserial schedule.


Serializable schedule:
 A schedule S is serializable if it is equivalent to
some serial schedule of the same n transactions.
Being serializable implies that the schedule is a correct
schedule that:
 Leaves the database in a consistent state.
 The interleaving of operations results in a state as
if the transactions were serially executed, while
achieving efficiency due to concurrent execution.
Chaps19&20-127
Serializability of Schedules

CSE
4701

A Serial Execution of Transactions Runs One
Transaction at a Time (e.g., T1 and T2 or T2 and T1)
 All R/W Operations in Each Transaction Occur
Consecutively in S, No Interleaving
 Consistency: a Serial Schedule takes a Consistent
Initial DB State to a Consistent Final State
A Schedule S is Called Serializable If there Exists an
Equivalent Serial Schedule
 A Serializable Schedule also takes a Consistent
Initial DB State to Another Consistent DB State
 An Interleaved Execution of a Set of Transactions
is Considered Correct if it Produces the Same Final
Result as Some Serial Execution of the Same Set
of Transactions
 We Call such an Execution to be Serializable
Chaps19&20-128
Example of Serializability

CSE
4701


Consider S1 and S2 for Transactions T1 and T2
If X = 10 and Y = 20
 After S1 or S2 X = 7 and Y = 40
These are the two Possible Serial Schedules
Schedule S1
T1
T2
Schedule S2
T1
T2
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Chaps19&20-129
Example of Serializability

CSE
4701

Consider S1 and S2 for Transactions T1 and T2
If X = 10 and Y = 20
 After S1 or S2 X = 7 and Y = 40
 Is S3 a Serializable Schedule?
Schedule S1
T1
T2
Schedule S2
T1
T2
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Schedule S3
T1
T2
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Read(X);
X:=X;
Write(X);
commit;
Chaps19&20-130
Example of Serializability

CSE
4701

Consider S1 and S2 for Transactions T1 and T2
If X = 10 and Y = 20
 After S1 or S2 X = 7 and Y = 40
 Is S4 a Serializable Schedule?
Schedule S1
T1
T2
Schedule S2
T1
T2
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Schedule S4
T1
T2
Read(X);
X:=X;
Write(X);
Read(X);
X:=X;
Write(X);
commit;
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Chaps19&20-131
Two Serial Schedules with Different Results

CSE
4701

Consider S1 and S2 for Transactions T1 and T2
If X = 10 and Y = 20
 After S1 X = 7 and Y = 28
 After S2 X = 7 and Y = 27
Schedule S1
T1
T2
Schedule S2
T1
T2
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = X + 20;
Write(Y);
commit;
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = X + 20;
Write(Y);
commit;
A Schedule is Serializable
if it Matches Either S1 or S2 ,
Even if S1 and S2 Produce
Different Results!
Chaps19&20-132
Thoughts on Serializability

CSE
4701

Serializability is hard to check
 Interleaving of operations occurs in an operating
system through some scheduler
 Difficult to determine beforehand how the
operations in a schedule will be interleaved
Need to Adopt a Practical Approach
 Come up with methods (protocols) to ensure
serializability.
 However, it is not possible to determine when a
schedule begins and when it ends.
 Hence, we reduce the problem of checking the
whole schedule to checking only a committed
project of the schedule
Chaps19&20-133
How do we Check for Conflicts?

CSE
4701
Testing for conflict serializability:
 Look at only read_Item (X) and write_Item (X)
operations
 Constructs a precedence graph (serialization graph)
with directed edges
 An edge is created from Ti to Tj if one of the
operations in Ti appears before a conflicting
operation in Tj
 The schedule is serializable if and only if the
precedence graph has no cycles.
Chaps19&20-134
The Serializability Theorem

CSE
4701


A Dependency Exists Between Two Transactions If:
 They Access the Same Data Item Consecutively in
the Schedule and One of the Accesses is a Write
Three Cases: T2 Depends on T1 , Denoted by T1 T2
 T2 Executes a Read(x) after a Write(x) by T1
 T2 Executes a Write(x) after a Read(x) by T1
 T2 Executes a Write(x) after a Write(x) by T1
 Don’t carE about Read(x) Read(x)
Transaction T1 Precedes Transaction T2 If:
 There is a Dependency Between T1 and T2, and
 The R/W Operation in T1 Precedes the Dependent
T2 Operation in the Schedule
Chaps19&20-135
The Serializability Theorem

CSE
4701

A Precedence Graph of a Schedule is a Graph
G = <TN, DE>, where
 Each Node is a Single Transaction;
i.e.,TN = {T1, ..., Tn} (n>1)
and
 Each Arc (Edge) Represents a Dependency Going
from the Preceding Transaction to the Other
i.e., DE = {eij | eij = (Ti, Tj), Ti, Tj TN}
 Use Dependency Cases on Prior Slide
The Serializability Theorem
 A Schedule is Serializable if and only of its
Precedence Graph is Acyclic
Chaps19&20-136
Serializability Theorem Example

CSE
4701


Consider S1 and S2 for Transactions T1 and T2
Consider the Two Precedence Graphs for S1 and S2
No Cycles in Either Graph!
Schedule S1
T1
T2
X
T2
X
T2
T1
T2
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Schedule S1
T1
T1
Schedule S2
Read(X);
X:=X;
Write(X);
commit;
Read(X);
X:=X;
Write(X);
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Schedule S2
Chaps19&20-137
What are Precedence Graphs for S3 and S4?

CSE
4701

For S3
 T1  T2 (T2 Write(X) After T1 Write(X))
 T2  T1 (T1 Write(X) After T2 Read (X))
For S4 T1  T2 (T2 Read/Write(X) After T1 Write(X))
X
Schedule S3
T1
T1
T2
Read(X);
X:=X;
X
Write(X);
Read(Y);
Schedule S3
T1
T2
X
Schedule S4
T2
Y = Y + 20;
Write(Y);
commit;
Read(X);
X:=X;
Write(X);
commit;
Schedule S4
T1
T2
Read(X);
X:=X;
Write(X);
Read(X);
X:=X;
Write(X);
commit;
Read(Y);
Y = Y + 20;
Write(Y);
commit;
Chaps19&20-138
Four Schedules and their …
CSE
4701
Chaps19&20-139
… Precedence Graphs
CSE
4701
Chaps19&20-140
Serializability Facts

CSE
4701


Serializability Emphasizes Throughput
Serializable Executions Allow us to Enjoy the Benefits
of Concurrency without Giving up Any Correctness
 However, we May NOT GET the Same Result
Testing for Serializability Difficult in Practice:
 Finding a Serializable Schedule for an Arbitrary
Set of Transactions is NP-hard
 Interleaving of Operations From Concurrent
Transs is Determined Dynamically at Run-time
 Practically Almost Impossible to Determine
Ordering of Operations Beforehand to Ensure
Serializability
Chaps19&20-141
Database Concurrency Control

CSE
4701

Purpose of Concurrency Control
 To enforce Isolation (through mutual exclusion)
among conflicting transactions.
 To preserve database consistency through
consistency preserving execution of transactions.
 To resolve read-write and write-write conflicts.
Example:
 In concurrent execution environment if T1
conflicts with T2 over a data item A, then the
existing concurrency control decides if T1 or T2
should get the A and if the other transaction is
rolled-back or waits.
Chaps19&20-142
Concurrency Control

CSE
4701




Different Locking-Based Algorithms
 Binary Locks (Lock and Unlock)
 Share Read Locks and Exclusive Write Locks
 Write Lock Does Not Imply Read
2 Phase Protocol
 All Locks Must Precede All Unlocks in Trans.
 True for All Transactions - Schedule Serializable
Concurrency Control Implementation Techniques
Optimistic Concurrency Control
 Time-Based Access to Information
 Consider “When” Information Read/Written to
Identify Potential or Prior Conflicts
We’ll Deviate from Textbook Notation
Chaps19&20-143
Summary of CC Techniques

CSE
4701


Two-Phase Locking
 Most Important in Practice
 Used by a Majority of DBMSs
 Serializes in the Middle of Transactions
 Low Overhead
 Relatively Low Concurrency
Timestamp-Based
 Based on Multiple Versions of Data Items
 Serializes at the Beginning of Transactions
 Mostly Used in Distributed DBMSs
Optimistic Concurrency Control Methods
 Serializes at the End of Transactions
 Relatively High Concurrency
Chaps19&20-144
Recalling Important Concepts

CSE
4701
Transaction: Sequence of Database Commands that
Must be Executed as a Single Unit (Program)
 Recall SQL Update Query
 Equivalent to Multiple Operations
 Read from DB, Modify (Local Copy), Write to DB
 Modify Sometimes Delete and Insert


Granularity: Size of Data that is Locked for an
Executing DB Transaction - Wide Range
 Database
 Relation (Tuple vs. Entire Table)
 Attribute (Column)
 Meta-Data (System Catalog)
Locking: Provides Means for Synchronization
Chaps19&20-145
Transaction Example

CSE
4701

Two Possible Outcomes for T1 and T2 – Let A = 5
 If T1 First, then A = 150
 If T2 First, then A = 60
Is this a Problem?
T1
T2
T1
T2
LOCK A
READ A
A=A*10
WRITE A
UNLOCK A
commit;
LOCK A
READ A
A=A+10
WRITE A
UNLOCK A
commit;
LOCK A
READ A
A=A*10
WRITE A
UNLOCK A
commit;
LOCK A
READ A
A=A+10
WRITE A
UNLOCK A
commit;
Chaps19&20-146
Transaction Example

CSE
4701



The Two Different Orderings of
T1 and T2 Represent Alternate
Serial Schedules (Non-Interleaved)
Key Concept: Concurrent (Interleaved) Execution of
Several DB Transactions is Correct if and only if its
Effect is the Same as that Obtained by Running the
Same Transactions in a Serial Order
If Result is Either 150 or 60 – it is OK!
This is the Concept of Serializability!
T1
LOCK A
READ A
A=A+10
WRITE A
UNLOCK A
commit;
T2
LOCK A
READ A
A=A*10
WRITE A
UNLOCK A
commit;
Chaps19&20-147
Recalling Key Definitions

CSE
4701




A Schedule for a Set of Transactions is the Order in
When the Elementary Steps (Read, Lock, Assign,
Commit, etc.) are Performed
A Schedule is Serial if All Steps of Each Transaction
Occur Consecutively
A Schedule is Serializable if it is Equivalent to
“Some” Serial Schedule
If T1, T2 and T3 are Transactions - What are the
Possible Serial Schedules?
 T2 T3 T1
 T1 T2 T3
 T3 T1 T2
 T1 T3 T2
 T3 T2 T1
 T2 T1 T3
Different Serial Schedules for 4 Transactions?
Chaps19&20-148
Another Example of Serializability

CSE
4701

Two Serial Schedules – Let A = 15, B = 25, C=5
What are Values of A, B, and C after Each?
A = 5, B = 15, C=25
S1
T1
Read(A);
A:=A0;
Write(A);
Read(B);
B = B + 10;
Write(B);
commit;
T2
Read(B);
B:=B0;
Write(B);
Read(C);
C=C+20
Write(C)
commit;
S2
T1
T2
Read(B);
B:=B0;
Write(B);
Read(C);
C=C+20
Write(C)
commit;
Read(A);
A:=A0;
Write(A);
Read(B);
B = B + 10;
Write(B);
commit;
Chaps19&20-149
Another Example of Serializability

CSE
4701

Is S3 or S4 – Let A = 15, B = 25, C = 5
Serial Values: A = 5, B = 15, C=25
T1
A = 5
B = 15
C = 25
T2
Read(A);
Read(B);
T1
T2
Read(A);
A:=A0;
Read(B);
A:=A0;
B:=B0;
A = 5
B = 35
C = 25
Write(A);
B:=B0;
Write(A);
Write(B);
Read(B);
Write(B);
Read(B);
Read(C);
B = B + 10;
Read(C);
B = B + 10;
C=C+20
Write(B);
Write(C)
commit;
commit;
Write(B);
commit;
C=C+20
Write(C)
commit;
Chaps19&20-150
Locks

CSE
4701

Lock: Variable Associated with a Data Item in DB,
Describing the Status of that Item w.r.t. Possible Ops.
 A Means of Synchronizing the Access by
Concurrent Transactions to the Database Item
 Managed by Lock Manager
Binary Locks: Lock(x) and Unlock(x)
 A Transaction T Must Issue the Lock(x) before any
Read(x) or Write(x)
 A Transaction T Must use the Unlock(x) After all
Read(x)/Write(x) Operations are Completed in T
 System Catalog Maintains a Lock Table for All
Locked Items
 Lock(x)(or Unlock(x)) will not be Granted if there
Already Exists a Lock(x) (or Unlock(x))
Chaps19&20-151
A Basic Lock/Unlock Model

CSE
4701



Database Transaction is a Sequence of Lock/Unlocks
Item Locked must Eventually be Unlocked
A Transaction Holds a Lock between Lock and
Unlock Statements
Lock/Unlock Assumes that the Value of the Item
Changes (Always Assumes a Write)
a0
f(a0)  a0
Lock A
Unlock A
f(a0)

For a Number of Transactions that Lock/Unlock A,
we’d have: f1(f2(f3( … fn( a0))))
Chaps19&20-152
Example - Assessing Schedule

CSE
4701

Consider Three Transactions Below:
 T1 has f1(a) and f2(b)
 T2 has f3(b) and f4(c) and f5(a)
 T3 has f6(a) and f7 (c)
Functions Represent actions that Modify Instances a,
b, and c of Data Items A, B, and C, Respectively
T1
Lock A
Lock B
Unlock A
Unlock B
T2
Lock B
Lock C
Unlock B
Lock A
Unlock C
Unlock A
T3
Lock A
Lock C
Unlock C
Unlock A
Chaps19&20-153
Example - Assessing Schedule

Consider the Schedule with Changes to a, b, and c
CSE
4701
T1 Lock A
T2 Lock B
T2 Lock C
T2 Unlock B
T1 Lock B
T1 Unlock A
T2 Lock A
T2 Unlock C
T2 Unlock A
T3 Lock A
T3 Lock C
T1 Unlock B
T3 Unlock C
T3 Unlock A

A
a
a
a
a
a
f1(a)
f1(a)
f1(a)
f5 (f1(a))
f5 (f1(a))
f5 (f1(a))
f5 (f1(a))
f5 (f1(a))
f6(f5 (f1(a)))
B
b
b
b
f3(b)
f3(b)
f3(b)
f3(b)
f3(b)
f3(b)
f3(b)
f3(b)
f2 (f3(b))
f2 (f3(b))
f2 (f3(b))
C
c
c
c
c
c
c
c
f4( c )
f4( c )
f4( c )
f4( c )
f4( c )
f7 (f4( c ))
f7 (f4( c ))
Is this Schedule Serializable?
Chaps19&20-154
Is this Schedule Serializable?

CSE
4701




Focus on the Final Line - It indicates the Effective
Order of Execution of Each Transaction for a, b, and c
 T1 has f1(a) and f2(b)
 T2 has f3(b) and f4(c) and f5(a)
 T3 has f6(a) and f7 (c)
For A - Order of Transactions is T1 T2 T3
For B - T2 Must Precede T1
For C - T2 Must Precede T3
Can All Three Conditions be True w.r.t. Order?
T3 Unlock A
A
f6(f5 (f1(a)))
B
f2 (f3(b))
C
f7 (f4( c ))
Chaps19&20-155
Determining Serializability in this Model

CSE
4701



Examine Schedule Based on Order in Which Various
Transactions Obtain Locks
Order must be Equivalent to Some Hypothetical Serial
Schedule of Transactions
If Orders for Different Data Items Forces Two
Transactions to Appear in a Different Order
(T2 Must Precede T1 and T1 Must Precede T2 )
There is a Paradox!
This is Equivalent to Searching for Cycles in a
Directed Graph
Chaps19&20-156
Recall Topological Sort

CSE
4701


Graph is Acyclic
Find a Node of Graph with ONLY Arrows Leaving
(no Entering)
Delete Node and Arrows
Chaps19&20-157
Algorithm 1: Binary Lock Model

CSE
4701


Input: Schedule S for Transactions T1, T2 , … Tk
Output: Determination if S is Serializable, and If so,
an Equivalent Serial Schedule
Method: Create a Directed Precedence Graph G:
 Let S = a1 ; a2 ; … ; an where each ai is
Tj :Lock Am or Tj : Unlock Am
 For each ai = Tj : Unlock Am , find next
ap = Ts : Lock Am (1 < p  n) (Ts is next Trans. to
lock Am), and if so, draw Arc in G from Tj to Ts
 Repeat Until All Unlock/Lock are Checked
 Review the Resulting Precedence Graph
 If G has Cycles - Non-Serializable
 If G is Acyclic - Topological Sort to Find an Equivalent
Serial Schedule
Chaps19&20-158
Precedence Graph for Prior Example

CSE
4701
T1 Lock A
T2 Lock B
T2 Lock C
T2 Unlock B
T1 Lock B
T1 Unlock A
T2 Lock A
T2 Unlock C
T2 Unlock A
T3 Lock A
T3 Lock C
T1 Unlock B
T3 Unlock C
T3 Unlock A
Look for Unlock Lock Combos on the
Same Data Item
 T2 Unlock B and T1 Lock B
 T1 Unlock A and T2 Lock A
 T2 Unlock C and T3 Lock C
 T2 Unlock A and T3 Lock A
B
T1
T2
A, C
A
T3

IS IT SERIALIZABLE?
Chaps19&20-159
Another Example
CSE
4701
T2 Lock A
T2 Unlock A
T3 Lock A
T3 Unlock A
T1 Lock B
T1 Unlock B
T2 Lock B
T2 Unlock B

Look for Unlock Lock Combos on the
Same Data Item
 T2 Unlock A and T3 Lock A
 T1 Unlock B and T2 Lock B

IS IT SERIALIZABE?
IF SO WHAT IS THE SCHEDULE?

T1
T2
A
B
T3
Chaps19&20-160
Two-Phase Protocol

CSE
4701


Two-Phase Protocol - All Locks Must Precede All
Unlocks in the Schedule for a Transaction
Which of the Transactions Below are Two-Phase?
Why or Why Not?
T1
Lock A
Lock B
Unlock A
Unlock B
T2
Lock B
Lock C
Unlock B
Lock A
Unlock C
Unlock A
T3
Lock A
Lock C
Unlock C
Unlock A
Chaps19&20-161
Theorems Regarding Serializability

CSE
4701

Theorem 1: Algorithm 1 Correctly Determines if a
Schedule S is Serializable (omit the proof).
Theorem 2: If S is any Schedule of 2 Phase
Transactions (i.e., all of its Transactions are 2-Phase),
then S is Serializable.
 Proof by Contradiction.
 Suppose Not - they by Theorem 1, S has a
Precedence Graph G with a Cycle
 T1 
T2  T3 …  Tp  T1
UNL
L
UNL
UNL
L
 In T1  T2 , T1 is Unlock, so all Remaining
Actions must also be Unlock, since S is 2 Phase
 However, in Tp  T1 , T1 is Lock, which is a
Contradiction to Fact that S is 2 Phase
Chaps19&20-162
Problems of Binary Locks

CSE
4701


Only One Transaction Can Hold a Lock on a Given
Item
No Shared Reading is Allowed - Too Restrictive
For Example
 T1 is Read Only on X - Yet Needs Full Lock
 T2 is Read Only on X and Y - Needs Full Locks
T1
Read(X);
Read(Y);
time
t1
t2
Y = Y + 20;
Write(Y);
T2
t3
t4
t5
Read(X);
Read(Y)
commit;
commit;
Chaps19&20-163
Algorithm 2: A Read/Write Lock Model

CSE
4701




Refines the Granularity of Locking to Differentiate
Between Read and Write Locks
Improves Concurrent Access
Rlock (Shared): If T has an Rlock A, then Any Other
Transaction can Also Rlock A, but All Transactions
are Forbidden from Wlock A until All Transactions
with Rlock A issue Ulock A (Multiple Reads)
Wlock (Exclusive): If T has Wlock A, then All Other
Transactions are Forbidden to Rlock or Wlock A Until
T Ulocks A (Write Implies Reading, Single Write)
Two Schedules are Equivalent if:
 Produce Same Value for Each Data Item
 Each Rlock on an Item Occurs in Both Schedules
at a Time When Locked Item has the Same Value
Chaps19&20-164
Motivating Algorithm 2

CSE
4701


Rlock (Shared): Multiple Reads Allowed
Wlock (Exclusive): Write Implies Reading, Sole Write
Identify All Dependencies Among Transactions that
Read and Write the Same Item
 If Ti :Rlock A and Tj : Wlock A is Next Trans
to Write A – put in an arc from Ti to Tj
 Ti must precede Tj in the Schedule w.r.t. A

If Ti :Wlock A and Tj : Wlock A is Next Trans
to Write A – put in an arc from Ti to Tj
 Ti must precede Tj in the Schedule w.r.t. A

If Tm: Rlock A between Ti :Wlock A and Tj :
Wlock– put in an arc from Ti to Tm
 Tm must follow Ti in the Schedule w.r.t. A
Chaps19&20-165
Algorithm 2: Read/Write Lock Model

CSE
4701


Input: Schedule S for Transactions T1, T2 , … Tk
Output: Is S Serializable? If so, Serial Schedule
Method: Create a Directed Precedence Graph G:
 Suppose in S, Ti :Rlock A.
 If Tj : Wlock A is the Next Transaction to Wlock A (if
it exists) then place an Arc from Ti to Tj.
 Repeat for all Ti’s, all Rlocks before Wlock on A!
 Suppose in S, Ti :Wlock A.
 If Tj : Wlock A is the Next Transaction to Wlock A (if
it exists) then place an Arc from Ti to Tj.
 If Also exists Tm :Rlock A after Ti :Wlock A but before
Tj : Wlock A, then Draw an Arc from Ti to Tm.
 Review the Resulting Precedence Graph
 If G has Cycles - Non-Serializable
 If G is Acyclic - Topological Sort for Serial Schedule
Chaps19&20-166
Algorithm 2: Read/Write Lock Model
CSE
4701

Look for Following Arcs:

Add Arc: Ti :Rlock A to Tj : Wlock A
 where Tj is the NEXT transaction to Write A

Add Arc: Ti :Wlock A to Tj : Wlock A
 where Tj is the NEXT transaction to Write A

Add Arc: Ti :Wlock A to Tm :Rlock
 Where Tm :Rlock A after Ti :Wlock A but before Tj :
Wlock A, then Draw an Arc from Ti to Tm.
Chaps19&20-167
Consider the Following Schedule

What are the Dependencies Among Transactions?
CSE
4701
T1
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
T2
T3
Wlock A
T4
Rlock B
Unlock A
Rlock A
Unlock B
Wlock B
Rlock A
Unlock B
Wlock B
Unlock A
Unlock A
Wlock A
Unlock B
Rlock B
Unlock A
Unlock B
Chaps19&20-168
What are the Different Cases?
T1 before T4, T2 before T4
T3 before T1, T3 before T2, T3 before T4
CSE T4 before T3, T3 before T1
4701
T1
T2
T3
(1)
Wlock A
(2)
(3)
Unlock A
(4) Rlock A
(5)
(6)
Wlock B
(7)
Rlock A
(8)
Unlock B
(9) Wlock B
(10)
Unlock A
(11) Unlock A
(12)
(13) Unlock B
(14)
Rlock B
(15)
(16)
Unlock B
T4
For Each Rlock
T1 :Rlock A
T2 :Rlock A
Look for
Next T to
Wlock A
Rlock B For Each Wlock
T3 :Wlock A
Look for
Unlock BNext T to
Rlock or Wlock A
For Each Rlock
T4 :Rlock B
Next T to
Wlock B
Wlock A
Unlock A
For Each Wlock
T3 :Wlock B
Look for
Next T to
Wlock Chaps19&20-169
B
Consider the Following Schedule

What is the Precedence Graph G?
CSE
4701
T1
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
T2
T3
Wlock A
T4
Rlock B
Unlock A
Rlock A
Unlock B
Wlock B
Rlock A
Unlock B
Wlock B
Unlock A
Unlock A
Wlock A
Unlock B
Rlock B
Unlock A
Unlock B
Chaps19&20-170
Precedence Graph

CSE
4701


What is the Resulting Precedence Graph?
Is the Schedule Serializable?
Why or Why Not?
T1 before T4, T2 before T4
T3 before T1, T3 before T2, T3 before T4
T4 before T3, T3 before T1
T1
T2
A:RW
A:RW
A:WR
B:WW
A:WW B:WW
T4
T3
B:RW
Chaps19&20-171
A Read-Only/Write-Only Lock Model

CSE
4701

Revision of the Read/Write Model for Algorithm 2
Refining Our Assumptions
 Assume that a Wlock on an Item Does not Mean
that the Transaction First Reads the Item
Contrary to First Two Models
 Example:
Read A; Read B; C=A+B; A=A-1; Write A; Write C
Reads A, B and Writes A,C (No Read on C)

Reformulate Notion of Equivalent Schedules
Chaps19&20-172
How Does This Model Differ from Alg. 2?

CSE
4701



Consider the Schedule Segment:
T1 : Wlock A
T1 : Ulock A
T2 : Wlock A
T2 : Ulock A
In Algorithm 2 - T2 : Wlock A Assumes that T2 Reads
the Value Written by T1
However, This Need Not be True in the New Model
If Between T1 and T2, No Transaction Rlocks A, then
 Value Written by is T1 Lost,
 T1 Does not Have to Precede T2 in a Schedule
w.r.t. A
Chaps19&20-173
Motivating Algorithm 3

CSE
4701




Rlock (Shared): Multiple Reads Allowed
Wlock (Exclusive):
 Write Does Not Mean Read, Sole Write
 Successive Writes without intervening Read
Means the Effects of Earlier Writes Disappear
For a Clean Start
 All Items Written Prior before 1st Step of Sched
For a Clean Finish
 All Items are Read After last Step of Sched
Identify All Dependencies Among Transactions that
Write (Ti) and Read (Tj) Same Item (T0 through Tf )
 Add Arc from Ti to Tj (Ti is BEFORE Tj )
 For Next “Reads” after “Write”
 Can’t be Intervening Writes
Chaps19&20-174
Intuitive View of Algorithm 3

CSE
4701



If Tj Reads Value of “A” Written by Ti , then Tj Must
Precede in any Serial Schedule
 For WR Combo - Draw an Arc from Ti to Tj
Now Consider a T that also Writes “A”
 T Must be either Before Ti or After Tj
 Add in a Pair of Arcs T to Ti and Tj to T of Which
one Must be Chosen in the Final Precedence Graph
Serializability Occurs if After Choices Made for each
“T” Pair, the Resulting Graph is Acyclic
G is Referred to as a “Polygraph” with Nodes, Arcs,
and Alternate Arcs
Chaps19&20-175
Redefine Serializability

CSE
4701

Conditions on Serializability Must be Redefined in
Support of the Write-Does-Not-Assume Read Model
If in Schedule S, Tj Reads “A” Written by Ti, then
 Ti Before Tj in any Serial Schedule Equivalent to S
 Further, if there is a T that Writes “A”, then in any
Serial Schedule Equivalent to S, T is Before Ti or
After Tj, but may not be Between Ti and Tj
 Graphically, we have:
T
A:WR
A:WR
Ti
A:WR
Ti
Tj
Tj
A:RW
A:RW
T
T
A:WR
Chaps19&20-176
Algorithm 3 Example Schedule
T1
CSE
4701
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
T2
T3
T4
Rlock A
Rlock A
Wlock C
Unlock C
Rlock C
Wlock B
Unlock B
Rlock B
Unlock A
Unlock A
Wlock A
Rlock C
Wlock D
Unlock B
Unlock C
Rlock B
Unlock A
Wlock A
Unlock B
Wlock B
Unlock B
Unlock D
Unlock C
Unlock A
Chaps19&20-177
Augmentation of Precedence Graph

CSE
4701

In Support of the Write Does Not Imply Read Model,
we must Augment the Precedence Graph:
 Add an Initial Transaction To that Writes Every
Item, and a Final Transaction Tf that Reads Every
Item
 When a Transaction T’s Output is Invisible in Tf
(I.e., the Value is Lost), Then T is Referred to as a
Useless Transaction
 Useless Transactions have no Paths from
Transaction to Tf
Note: Maintain Same set of Locks (Rlock, Wlock,
Ulock) with Different Interpretation on Wlock
Chaps19&20-178
Algorithm 3 – Augmented Graph
CSE
4701
T0
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
Tf
T1
Write A
Rlock A
Wlock C
Unlock C
T2
Write B
Rlock A
T3
Write C
T4
Write D
T0 Writes A, B, C, D
Prior to Step (1)
Rlock C
Wlock B
Unlock B
Rlock B
Unlock A
Unlock A
Wlock A
Rlock C
Wlock D
Unlock B
Unlock C
Rlock B
Unlock A
Wlock A
Unlock B
T
Unlock D f
Reads A, B, C, D
After Step (24)
Read A
Read B
Read C
Wlock B
Unlock B
Unlock C
Unlock A
Read D
Chaps19&20-179
Algorithm 3 – Steps 1 to 4

CSE
4701


Input: Schedule S for Transactions T1, T2 , … Tk
Output: Is S Serializable? If so, Serial Schedule
Method: Create a Directed Polygraph Graph P:
1. Augment S with Dummy To (Write Every Item)
an Dummy Tf (Read Every Item)
2. Create Initial Polygraph P by Adding Nodes for
To, Tf, and Each Ti Transaction , in S
3. Place an Arc from Ti to Tj Whenever Tj Reads A
in Augmented S (with Dummy States) that was
Last Written by Ti. Write to Read for Each Item
 Repeat this Step for all Arcs.
 Don’t Forget to Consider Dummy States!

4. Discover Useless Transactions - T is Useless if
there is no Path from T to Tf
This is the “Initialization” Phase of Algorithm 3
Chaps19&20-180
Resulting Polygraph - Steps 1 to 2

Create the Polygraph by
1. Add To and Tf to S,
2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P
CSE
4701
T0
T1
T2
T3
T4
Tf
3. Augment Schedule with To and Tf
T0
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
Tf
T1
Write A
T2
Write B
T3
Write C
T4
Write D
Rlock A
Rlock A
Wlock C
Unlock C
Rlock C
Wlock B
Unlock B
Rlock B
Unlock A
Unlock A
Wlock A
Rlock C
Wlock D
Unlock B
Unlock C
Rlock B
Unlock A
Wlock A
Unlock B
Wlock B
Unlock B
Unlock D
Unlock C
Unlock A
Read A
Read B
Read C
Read D
Chaps19&20-181
Alg 3 Step 3 - Init=T0 & Fin=Tf
CSE
4701
T0
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
Tf
T1
Write A
T2
Write B
Rlock A
Rlock A
Wlock C
Unlock C
T3
Write C
T4
Write D
WhoReads
ReadsB
after
AD
Who
Who
Reads
CAafter
after
WritesB?
A?
A?
TT41T210Writes
Writes
C?
D?
Rlock C
Wlock B
Unlock B
Rlock B
Unlock A
Unlock A
Wlock A
Rlock C
Wlock D
Unlock B
Unlock C
Rlock B
No one
Reads A after
T3 Writes A?
Unlock A
Wlock A
Unlock B
Wlock B
Unlock B
Unlock D
Read A
Read B
Read C
Unlock C
Unlock A
Read D
Chaps19&20-182
Step 3 -Write to Reads on A
CSE
4701
Chaps19&20-183
Step 3 - Write to Reads on B
CSE
4701
Chaps19&20-184
Step 3 - Write to Reads on C
CSE
4701
Chaps19&20-185
Step 3 - Write to Reads on D
CSE
4701
Chaps19&20-186
Resulting Polygraph - Steps 1 to 3

CSE
4701



1. Add To and Tf to S,
2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P
3. Look for Ti Write X to Tj Read X for all Items X
4. Look for Useless Transactions - No Paths from T to Tf
D:WR
C:WR B:WR
T0
A:WR
T1
A:WR
B:WR
T2
T3
T4
A:WR
B:WR
Tf
C:WR
C:WR
Chaps19&20-187
Resulting Polygraph - Steps 1-4

CSE
4701



1. Add To and Tf to S,
2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P
3. Look for Ti Write X to Tj Read X for all Items X
4. For - T3 Remove Arcs Into T3 – This Completes Step 4
D:WR
C:WR B:WR
T0
A:WR
T1
B:WR
T2
T3
T4
A:WR
B:WR
Tf
A:WR
C:WR
Chaps19&20-188
Algorithm 3 – Steps 5 to 7

CSE
4701
Method: Reassess the Initial Polygraph P:
5. For Each Remaining Arc Ti W to Tj R(meaning
that Tj Reads Item A Written by Ti )
Consider all T  To and T  Tf that also Writes A:
I. If Ti = To and Tj = Tf then Add No Arcs
II. If Ti = To and Tj  Tf then Add Arc from Tj to T
III. If Ti  To and Tj = Tf then Add Arc from T to Ti
IV. If Ti  To and Tj  Tf then Add Arc Pair from T to
Ti and Tj to T
6. Determine if P is Acyclic by “Choosing” One
Transaction Arc for Each Pair - Make Choices
Carefully
7. If Acyclic - Serializable - Perform Topological
Sort without To , Tf for Equivalent Serial Schedule.
Else - Not Serializable
Chaps19&20-189
What are Four Cases of Step 5 Conceptually?

CSE
4701
5. For Each Remaining Arc Ti W to Tj R
Consider all T  To and T  Tf that also Writes A:
I. If Ti = To and Tj = Tf then Add No Arcs
II. If Ti = To and Tj  Tf then Add Arc from Tj to T
III. If Ti  To and Tj = Tf then Add Arc from T to Ti
IV. If Ti  To and Tj  Tf then Add Arc Pair from T to Ti and Tj
to T
General Case:
Ti
X:WR
Case I: no new arc
T0
X:WR
Tf
Tj
Case II: Add Arc to from Ti to T
T is after
T0
X:WR
Tj
T
II X:RW
Chaps19&20-190
What are Four Cases of Step 5 Conceptually?

CSE
4701
5. For Each Remaining Arc Ti W to Tj R
Consider all T  To and T  Tf that also Writes A:
I. If Ti = To and Tj = Tf then Add No Arcs
II. If Ti = To and Tj  Tf then Add Arc from Tj to T
III. If Ti  To and Tj = Tf then Add Arc from T to Ti
IV. If Ti  To and Tj  Tf then Add Arc Pair from T to Ti and Tj
to T
General Case:
Ti
X:WR
Tj
Case III: Add Arc from T to Ti – T is before
T
III X:RW
Ti
X:WR
Tf
Chaps19&20-191
What are Four Cases of Step 5 Conceptually?

CSE
4701
5. For Each Remaining Arc Ti W to Tj R
Consider all T  To and T  Tf that also Writes A:
I. If Ti = To and Tj = Tf then Add No Arcs
II. If Ti = To and Tj  Tf then Add Arc from Tj to T
III. If Ti  To and Tj = Tf then Add Arc from T to Ti
IV. If Ti  To and Tj  Tf then Add Arc Pair from T to Ti
and Tj to T
General Case:
Ti
X:WR
Case IV: Add in two Arcs
T is after Tj or before Ti
Tj
Ti
X:WR
Tj
T
IV X:RW
IV X:RW
Chaps19&20-192
Step 5 - Go Thru Each Write/Read Arrow
CSE
4701
T0
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
Tf
T1
Write A
T2
Write B
Rlock A
T3
Write C
T4
Write D
For
For TT004 to
to TT12f Arc
Arc
Who
Who Else
Else Writes
Writes A?
A?
Rlock A
Wlock C
Unlock C
Rlock C
Wlock B
Unlock B
Rlock B
Unlock A
Unlock A
Wlock A
Rlock C
Wlock D
Unlock B
Unlock C
Rlock B
Unlock A
Wlock A
Unlock B
Wlock B
Unlock B
Unlock D
Read A
Read B
Read C
Unlock C
Unlock A
Read D
Chaps19&20-193
Resulting Polygraph - Step 5 - A:WR
D:WR
C:WR B:WR
CSE
4701
T0
A:WR
T1
B:WR
T2
T3
T4
A:WR
B:WR
Tf
A:WR
C:WR
C:WR B:WR
II A:RW
II A:RW
T0
A:WR
T1
D:WR
II A:RW
T2
T3
T4
B:WR
II A:RW
III A:RW
A:WR
B:WR
Tf
A:WR
C:WR
Chaps19&20-194
Resulting Polygraph - Step 5 - A:WR

5. For Each Arc Ti to Tj Consider All T’s that Write X

I. If Ti = To and Tj = Tf then Add No Arcs
II. If Ti = To and Tj  Tf then Add Arc from Tj to T
III. If Ti  To and Tj = Tf then Add Arc from T to Ti
IV. If Ti  To and Tj  Tf then Add Pair from T to Ti and Tj to T

Check Items A (see new arcs/labels - case II and III)

CSE
4701


C:WR B:WR
II A:RW
II A:RW
T0
A:WR
T1
A:WR
D:WR
II A:RW
T2
T3
T4
B:WR
II A:RW
III A:RW
A:WR
B:WR
Tf
C:WR
Chaps19&20-195
Alg 3 Ex - Step 5 - Who Else Writes C/D?
CSE
4701
T0
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
Tf
T1
Write A
T2
Write B
Rlock A
Rlock A
Wlock C
Unlock C
T3
Write C
T4
Write D
T0
For
1 Arcs
For three
One T2TArc
Does
Does Anyone
Anyone Else
Else Write
Write C?
D?
Rlock C
Wlock B
Unlock B
Rlock B
Unlock A
Unlock A
Wlock A
Rlock C
Wlock D
Unlock B
Unlock C
Rlock B
Unlock A
No Writes
No New Arcs
Wlock A
Unlock B
Wlock B
Unlock B
Unlock D
Read A
Read B
Read C
Unlock C
Unlock A
Read D
Tf
Chaps19&20-196
Resulting Polygraph-Step 5- C:WR & D:WR

5. For Each Arc Ti to Tj Consider All T’s that Write X

CSE
4701




I. If Ti = To and Tj = Tf then Add No Arcs
II. If Ti = To and Tj  Tf then Add Arc from Tj to T
III. If Ti  To and Tj = Tf then Add Arc from T to Ti
IV. If Ti  To and Tj  Tf then Add Pair from T to Ti and Tj to T
Do any Other Transactions Write C or Write D for the
arrows labeled C:WR and D:WR Respectively?
C:WR B:WR
II A:RW
II A:RW
T0
A:WR
T1
D:WR
II A:RW
T2
T3
T4
B:WR
III A:RW
II A:RW
A:WR
B:WR
Tf
A:WR
C:WR
Chaps19&20-197
Alg 3 Ex - Step 5 - Who Else Writes B?
CSE
4701
T0
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
Tf
T1
Write A
T2
Write B
Rlock A
T3
Write C
Rlock A
Wlock C
Unlock C
Rlock
Wlock B
Unlock B
T4
Write D
For
For
to
toCase
Arc
Arc
For
TThis
Just
TTT
is
to
already
TTT
Arc
IV
arc
Two
1T
1 but
41Arcs:
4 so
f4 2no
Who
Who
Else
ElseWrites
Writes
B?
B?
Who
Arc
Else
from
T
T
BWrites
T44Writes
after
T12 to
andTB?
4
C
T4 before T1
Rlock B
Unlock A
Unlock A
Wlock A
Rlock C
Wlock D
Unlock B
Unlock C
Rlock B
Unlock A
Wlock A
Unlock B
Wlock B
Unlock B
Unlock D
Read A
Read B
Read C
Unlock C
Unlock A
Read D
Chaps19&20-198
Two Added Arcs for Case IV and B
T4and
Follows
T2 T1
T4 Before
CSE
4701
IV B:RW
C:WR B:WR
II A:RW
II A:RW
D:WR
II A:RW
T0
A:WR
T1
B:WR
T2
T3
II A:RW
A:WR
III A:RW
T4
A:WR
B:WR
Tf
C:WR
IV B:RW
Chaps19&20-199
Resulting Polygraph - Step 5 and 6

5. For Each Arc Ti to Tj Consider All T’s that Write X

I. If Ti = To and Tj = Tf then Add No Arcs
II. If Ti = To and Tj  Tf then Add Arc from Tj to T
III. If Ti  To and Tj = Tf then Add Arc from T to Ti
IV. If Ti  To and Tj  Tf then Add Pair from T to Ti and Tj to T

B (see new arcs - including alternates - dashed)

CSE
4701


 For T1 to T2, T4 writes - so add T2 to T4 and T4 to T1 – Case IV
 Either T4 After T2 or Before T1 - no new arcs for other WRs.
C:WR B:WR
IV B:RW
II A:RW
II A:RW
D:WR
II A:RW
T0
A:WR
T1
B:WR
T2
T3
II A:RW
A:WR
IV B:RW
III A:RW
T4
A:WR
B:WR
Tf
C:WR
Chaps19&20-200
Resulting Polygraph - Step 5 and 6

6. Which Option of Pair of Arcs Should be Chosen? Why?
CSE
4701
C:WR B:WR
IV B:RW
II A:RW
II A:RW
D:WR
II A:RW
T0
A:WR
T1
B:WR
T2
II A:RW
T3
A:WR
IV B:RW
III A:RW
T4
A:WR
B:WR
Tf
C:WR
Chaps19&20-201
Final Polygraph - Step 7

Final Graph with Are Removed Delete Dummy States below
CSE
4701
C:WR B:WR
IV B:RW
II A:RW
II A:RW
D:WR
II A:RW
T0
A:WR
T1
B:WR
T2
T3
II A:RW
A:WR

III A:RW
T4
A:WR
B:WR
Tf
C:WR
Topological Sort Yields Order: T1 , T2 , T3 , T4
C:WR B:WR
II A:RW
II A:RW
II A:RW
T1
B:WR
T2
II A:RW
T3
IV B:RW
III A:RW
T4
Chaps19&20-202
Why Optimistic Concurrency Control?

CSE
4701

Motivate by Disadvantages of Locking Techniques
 Lock Maintenance
 Deadlock-Free Locking Protocols Limit
Concurrency
 Secondary Memory Access Causes Locks to be
Held for a Long Duration
 Locks Typically Held Until Transaction
Completes, Which Reduces Concurrency
 Often Needed in “Worst” Case Only
 Overhead - Locking + Deadlock Detection
Key Concept
 Write Collisions in Large Databases for “Many”
Applications are Rare
 OCC: “Don’t Worry be Happy” Approach
Chaps19&20-203
Basic Ideas of OCC

CSE
4701


Interference Between Transactions is Rare and
Locking Incurs too Much Overhead
Instead, Allow Each Transaction to Execute Freely,
and Check Serializability at the end of the Transaction
Win (Allow to Commit) If No Interference Occurs or
There have been No Conflicts
Pessimistic execution
Validate
Read
Write
(and Compute)
Optimistic execution
Read
Validate
Write
(and Compute)
Chaps19&20-204
How Does OCC Work?

CSE
4701




Execute Transactions Ad-Hoc - Let them Go
Uncontrolled
Maintain Information of “Relevant” Actions Against
DB (Often in Conjunction with Recovery/Journal)
When Transactions Finish - Check to see if Everything
Proceeded Satisfactorily
Assumes that Probability of Transaction Interference
is Quite Small
Two Questions re. OCC:
 How Do We know Everything Went OK?
 How do we Recover if it Didn’t?
Chaps19&20-205
What is a Timestamp?

CSE
4701
Timestamp
 A system generated clock “tick” to record event
 Two events cannot occur at same “tick”
 A monotonically increasing variable (integer)
indicating the age of an operation or a transaction.
 A larger timestamp value indicates a more recent
event or operation.
 Timestamp based algorithm uses timestamp to
serialize the execution of concurrent transactions.
 For DB Transactions, a timestamp could be:
 Time that transaction is initiated
 Time of first read/write of transaction
 Remains unchanged throughout all Transaction steps
Chaps19&20-206
How are Timestamps Utilized?

CSE
4701



Each Transaction has unique Timestamp(TS) when
started
Associated with the Read time and Write time (when
Stored) of Each Item in the DB
t1 TS of Transaction, B an Item with TS t2
Avoid “impossible” situations –
 A Transaction CANNOT read the value of an Item
if it was not written until after transaction executed
 Trans TS t1 can’t read Item B with write TS t2 if t2 > t1

A Transaction CANNOT write an Item if that Item
has an old value read at a later time (after)
 Trans TS t1 can’t write Item B with read TS t2 if t2 > t1
 If happens - Trans TS t1 must abort
Chaps19&20-207
OCC Utilizes Timestamps

CSE
4701



Timestamps are Clock Ticks used to Record the Major
Milestones in the Execution of a Transaction
Examples Include:
 Start Time of Transaction
 Read/Write Times for DB Items
 Finish Time of Transaction
 Commit Time of Transaction
Two Important Definitions are:
 Read Time of an Item: Highest Time Stamp
Possessed by Any Transaction that Reads the Item
 Write Time of an Item: Highest Time Stamp
Possessed by Any Transaction that Wrote the Item
A Transaction has a Fixed Time when it Started that is
Constant Throughout its Execution
Chaps19&20-208
How are Timestamps Used?

CSE
4701


Focus on “When” Reads and Writes Occur
Transaction Cannot Read an Item if its Value was Not
Written Until After the Transaction Finished its
Execution
 Transaction T with Timestamp t1 Cannot Read an
Item with a Write Time of t2 if t2 > t1
 If this is the Case, T Must Abort and be Restarted
 Can’t Read Item if it hasn’t been Written
Transaction Cannot Write an Item if that Item has its
Old Value Read at a Later Time
 Transaction T with Timestamp t1 Cannot Write an
Item with a Read Time of t2 if t2 > t1
 If this is the Case, T Must Abort and be Restarted
 Can’t Write Item Being Read at a Later Time
Chaps19&20-209
Algorithm 4: Optimistic CC

CSE
4701
Let T be a Transaction with Timestamp t Attempting
to Perform Operation X on a Data Item I with
Readtime tR and Writetime tW
 If (X = Read and t  tW ) Perform Oper
 If t > tW then set tR = t for Data Item I (read after write)

If (X = Write and t  tR and t  tW ) Perform Oper
 If t > tr then set tW = t for Data Item I (write after read)


If (X = Write and tR  t < tW ) then Do Nothing
since Later Write will Cancel out the Write of T
If (X = Read and t < tW ) or
(X = Write and t < tR ) then Abort the Operation
 1st - T trying to Read Item Before it was Written
 2nd - T trying to Write an Item Before it was Read
Chaps19&20-210
Example of OCC
CSE
4701
T1
T2
200
150
T3
175
(1) Read B
(2)
Read A
(3)
Read C
(4)
Write B
(5)
Write A
A
B
C
RT=0
WT=0
RT=0
WT=0
RT=0
WT=0
RT=0
WT=0
RT=150
WT=0
RT=150
WT=0
RT=150
WT=0
RT=150
WT=200
RT=200
WT=0
RT=200
WT=0
RT=200
WT=0
RT=200
WT=200
RT=200
WT=200
RT=0
WT=0
RT=0
WT=0
RT=175
WT=0
RT=175
WT=0
RT=175
WT=0
 What Happens at Each Step w.r.t. RT/WT?
T3 ≥150
TS
175
– set
C.RT
T1 TST2200
B.WT
=≥ 0C.WT
–= set
B.RT
=200
TS
≥ A.WT
0 =– 0
set
A.RT
=150=175
T1 TS 200 ≥ B.RT = 200 – set B.WT =200
T1 TS 200 ≥ A.RT = 150 – set A.WT =200
Chaps19&20-211
CSE
4701
T2 TS 150 ≥ A.WT = 0 – set A.RT = 150
T1 TS 175 ≥ A.WT = 0 – set A.RT = 175
T1 TS 175≥ C.RT = 0 – set C.WT = 175
T3 TS 200 ≥ C.WT = 0 – set C.RT = 200
T1 TS 175≥ B.RT = 0 – set B.WT = 175
T4 TS 225 ≥ B.WT = 175 – set B.RT = 225
T3 TS 200 ≥ A.RT = 175 – set A.WT = 300
T4 TS 225 ≥ C.RT = 0 – set C.WT = 225
T2 TS 150 ≥ D.RT = 0 – set D.WT = 150
T2 TS 150 IN NOT ≥ B.WT = 225 – ABORT T2
Chaps19&20-212
Example of OCC
CSE
4701
T1
T2
200
150
T3
175
(1) Read B
(2)
Read A
(3)
Read C
(4)
Write B
(5)
Write A
(6)
Write C


A
B
C
RT=0
WT=0
RT=0
WT=0
RT=0
WT=0
RT=0
WT=0
RT=150
WT=0
RT=150
WT=0
RT=150
WT=0
RT=150
WT=200
RT=200
WT=0
RT=200
WT=0
RT=200
WT=0
RT=200
WT=200
RT=200
WT=200
RT=0
WT=0
RT=0
WT=0
RT=175
WT=0
RT=175
WT=0
RT=175
WT=0
RT=150
WT=200
RT=200
WT=200
RT=175
WT=0
What Happens at Step 6? T2 WT(C) =150 < RT(C)=175
Trying to write C after its Read - Consequence - Abort T2
Chaps19&20-213
Example of OCC
CSE
4701
T1
T2
200
150
T3
175
(1) Read B
(2)
Read A
(3)
Read C
(4)
Write B
(5)
Write A
(6)
Write C
(7)
Write A

A
B
C
RT=0
WT=0
RT=0
WT=0
RT=0
WT=0
RT=0
WT=0
RT=150
WT=0
RT=150
WT=0
RT=150
WT=0
RT=150
WT=200
RT=150
WT=200
RT=150
WT=200
RT=200
WT=0
RT=200
WT=0
RT=200
WT=0
RT=200
WT=200
RT=200
WT=200
RT=200
WT=200
RT=200
WT=200
RT=0
WT=0
RT=0
WT=0
RT=175
WT=0
RT=175
WT=0
RT=175
WT=0
RT=175
WT=0
RT=175
WT=0
Step (7) T3 175 < A.RT can Finish, but No Effect
Chaps19&20-214
Summary of Example

CSE
4701
T1 Completes Successfully; T2 Aborts;
T3 Completes but Doesn’t Write A
T1
T2
T3
A
200
150
175
RT=0
WT=0
RT=0
WT=0
RT=0
WT=0
RT=0
WT=0
RT=150
WT=0
RT=150
WT=0
RT=150
WT=0
RT=150
WT=200
RT=200
WT=0
RT=200
WT=0
RT=200
WT=0
RT=200
WT=200
RT=200
WT=200
RT=0
WT=0
RT=0
WT=0
RT=175
WT=0
RT=175
WT=0
RT=175
WT=0
RT=150
WT=200
RT=150
WT=200
RT=200
WT=200
RT=200
WT=200
RT=175
WT=0
RT=175
WT=0
(1) Read B
(2)
Read A
(3)
Read C
(4)
Write B
(5)
Write A
(6)
Write C
(7)
Write A
B
C
Chaps19&20-215
Viewing OCC vs. Phases of Execution

CSE
4701


Read Phase:
 Database Information Read from Secondary
Storage into Primary Memory
 All Writes are to Local Workspace
Validate Phase:
 Check to see if Integrity of Data has not been
Violated
Write Phase:
 Update the DB (Secondary Storage) from Local
Copies
Optimistic execution
Read
Validate
Write
(and Compute)
Chaps19&20-216
Contrasting PCC and OCC

CSE
4701



Transaction Control
 PCC: Control by Having Transactions Wait
 OCC: Control by Having Transactions Backed up
Serializability
 PCC: Ordering of Data Items
 OCC: Ordering of Transactions
Biggest Potential Problem
 PCC: Deadlock, rather Preventing it
 OCC: Starvation
Different Applications Suited to Different Approaches
 Some DBMS Support Both
 DBA Can Configure on Application-byApplication Basis
Chaps19&20-217
Download