Slides on Normalization CSE 4701 Chapter 14-1 Towards Normalization of Relations CSE 4701 We take each Relation Individually and “Improve” Them in Terms of the Desired Characteristics Normalization Decomposes Relations into Smaller Relations that Results in No Information Loss Support for Reconstruction No Spurious Joins Query Execution Time May Increase Denormalization May Be Necessary Later on Objectives: Minimizing Redundancy Insertion, Deletion, and Update Anomalies Chapter 14-2 What is the Normalization Process? CSE 4701 Provides DB Designers with the Ability to “Improve” their Relations Deal with Redundancies and Anomalies Normalization Procedure Provides DB Designs with A Formal Framework for Analyzing Relation Schemas based on their Keys and on the Functional Dependencies among their Attributes A Series of Normal Form Tests that can be Carried out on Individual Relation Schemas so the Relational DB can be Normalized to Desired Degree Chapter 14-3 What are Normal Forms? CSE 4701 A Normal Form is a Condition using Keys and FDs to Certify Whether a Relation Schema meets Criteria Primary keys (1NF, 2NF, 3NF) All Candidate Keys ( 2NF, 3NF, BCNF) Multivalued Dependencies (4NF) - Chapter 15 Join Dependencies (5NF) - Chapter 15 1NF 2NF 3NF 4NF 5 NF Chapter 14-4 How is Normalization Attained? CSE 4701 Typically, Normalization is Attained through a Process of Decomposition that Breaks Apart Relations to Remove Redundancies and Anomalies In Process, we must Maintain Two Properties: Lossless Join or Nonadditive Join Property Guarantees the Spurious Tuple Generation Problem does not occur on Decomposed Relations Dependency Preservation Property Ensures that each FD is Represented in some Individual Relation(s) after Decomposition Premise: Relational Schema with Primary Keys and Functional Dependencies Specified Chapter 14-5 Recall Key Constraints CSE 4701 Superkey (SK): Any Subset of Attributes Whose Values are Guaranteed to Distinguish Among Tuples Candidate Key (CK): A Superkey with a Minimal Set of Attributes (No Attribute Can Be Removed Without Destroying the Uniqueness -- Minimal Identity) A Value of an Attribute or a Set of Attributes in a Relation That Uniquely Identifies a Tuple There may be Multiple Candidate Keys Chapter 14-6 Recall Key Constraints CSE 4701 Primary Key (PK): Choose One From Candidate Keys The Primary Key Attributed are Underlined Foreign Key (FK): An Attribute or a Combination of Attributes (Say A) of Relation R1 Which Occurs as the Primary Key of another Relation R2 (Defined on the Same Domain) Allows Linkages Between Relations that are Tracked and Establish Dependencies Useful to Capture ER Relationships Chapter 14-7 Superkeys vs. Candidate Keys CSE 4701 Superkey of R: A Superkey SK is a Set of Attributes of R Such that No Two Tuples in Any Valid Relation Instance R(r) will Have the Same Value for SK Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted As R(r), For Any Distinct Tuples T1 and T2 in R(r), T1[sk] < > T2[sk] For Cars, Valid Superkeys Must Contain: SerialNo OR State, Reg# OR Both For EMPLOYEE {SSN} is a Key and {SSN}, {SSN, ENAME}, {SSN, ENAME, BDATE} are all SUPERKEYS Chapter 14-8 Superkeys vs. Candidate Keys CSE 4701 Candidate Key of R: A "Minimal" Superkey: a Candidate Key K is a Superkey s.t. Removal of any Attribute From K Results in a Set of Attributes that is Not a Superkey Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted as R(r) K is a Candidate Key iff for any A in K, there exists Two Distinct Tuples T1 and T2 in R(r) such that T1[K-A] = T2[K-A] In Previous (State, Reg#, Make, Model) is SK Is it a CK? Why or Why Not? Chapter 14-9 Example and Remaining Definitions CSE 4701 Example: CAR(State, Reg#, SerialNo, Make, Model, Year) Primary key is {State, Reg#} It has two candidate keys (also superkeys) Key1 = {State, Reg#} Key2 = {SerialNo} {SerialNo} can also be Chosen as Primary Key Definition: Prime Attribute - Attribute A of R that is Member of some Candidate Key K or R Definition: Non-Prime Attribute - An Attribute that is not Prime (i.e., Not a Member of Any Candidate Key) WORKS_ON – SSN, Pnumber PRIME Chapter 14-10 First Normal Form (1NF) CSE 4701 All Attributes Must Be Atomic Values: Only Simple and Indivisible Values in the Domain of Attributes. Each Attribute in a 1NF Relation is a Single Value Disallows Composite Attributes, Multivalued Attributes, and Nested Relations (Non-Atomic) 1NF Relation cannot have an Attribute Value : A Set of Values (Set-Value) A Tuple of Values (Nested Relation) 1NF is a Standard Assumption of Relation DBs Chapter 14-11 One Example of 1NF CSE 4701 Consider Following Department Relation What is the Inherent Problem? DLOCATIONS is Multi-valued Chapter 14-12 What are Possible Solutions? CSE 4701 Decompose: Move the Attribute DLOCATIONS that Violates 1NF into a Separate Relation DEPT_LOCATIONS(DNUMBER, DLOCATION) Expand the key to have a Separate Tuple in the DEPARTMENT relation for each location (below) Introduce DLOC1, DLOC2, DLOC3, if there are Three Maximum Locations Problems with Each? Best Solution? Chapter 14-13 Another 1NF Example - Nested Relations EMP_PROJ - Table and Tuples CSE 4701 Transition to: Chapter 14-14 Second Normal Form (2NF) CSE 4701 Second Normal Form Focuses on the Concepts of Primary Keys and Full Functional Dependencies Intuitively: A Relation Schema R is in Second Normal Form (2NF) if Every Non-Prime Attribute A in R is Fully Functionally Dependent on the Primary Key R can be Decomposed into 2NF Relations via the Process of 2NF Normalization Successful Process Typically Involves Decomposing R into Two or More Relations Iteratively Applying to Each Relation in Schema Chapter 14-15 Full Functional Dependency CSE 4701 Full FD - Formally: Given R(U) and X, YU. If XY holds, and there exists no such X’ that X’X, and X’Y holds over f R, then Y is fully dependent on X, denoted as XY Full FD- Intuitively: A FD XY where Removal of any Attribute from X means the FD no Longer Holds {SSN, PNUMBER} HOURS is full since Neither SSN -> HOURS nor PNUMBER HOURS holds What about in the Following: {S#, CN}Grade Chapter 14-16 Partial Functional Dependency CSE 4701 Partial FD - Formally: Given R(U) and X, YU. If XY holds but Y is not f fully dependent on X ( XY), then Y is partially p functional dependent on X, denoted by XY Partial FD - Intuitively: Removal of a Attribute from the R.H.S. still Results in a Valid FD {SSN, PNUMBER} ENAME is Partial since Removing PNUMBER still Results in the Valid FD SSN ENAME Are Following Full or Partial? {S#, CN}CN, {S#, CN}S# {S#, CN, DNAME}Grade Chapter 14-17 Second Normal Form (2NF) CSE 4701 Formal 2NF Definition R 2NF iff (i) R 1NF; (ii) all Non-Key Attributes in R are Fully Functional Dependent on Every Key. Alternative Definition: R 2NF iff the Attributes are Either a Candidate Key, or Fully Dependent on Every Key. Reason: Partial Functional Dependencies may cause Update Problems Chapter 14-18 Another Way to View the Problem CSE 4701 If the Primary Key Contains a Single Attribute, than No Need to Test for Problems This is 1NF but not 2NF since Ename a non-prime attribute in FD2 Violates 2NF since it Depends on Part of Key (SSN) Pname and Ploc two non-prime attributes in FD3 Violates 2NF Depends on Part of Key (Pnumber) Chapter 14-19 One Example of 2NF CSE 4701 Consider the Example Below STUDENT_DEPT(S#, DName, DHead, CN, Grade) S# DName DHead CN Grade fd1 fd2 fd3 STUDENT_DEPT 1NF But STUDENT_DEPT 2NF “{S#, CN} DName, DHead” since S# DName and DName DHead is a Partial FD causes Anomalies Chapter 14-20 Recall the Anomalies… CSE 4701 STUDENT_DEPT(S#, DName, DHead, CN, Grade) Insertion Anomalies: No Department Can Be Recorded if it has No Student Who Enrolls Courses Deletion Anomalies: Delete the Last Student in a Department will also Delete the Department Update Anomalies: Change a Head of a Department must Modify All Students in that Department Due to Redundancies Chapter 14-21 One Example of 2NF (Continued) CSE 4701 Decomposition into 2NF by Separating Course Information from Department Information (Link S#) S_D(S#, DName, DHead) S# DName DHead fd2 fd3 S_C(S#, CN, Grade) S# CN Grade fd1 Chapter 14-22 Another Example of 2NF CSE 4701 EMP_PROJ is 1NF with Key SSN, PNUMBER but… SSN ENAME - Means ENAME, a Non-Prime Attribute, Depends Partially on SSN, PNUMBER, i.e., Depend on Only SSN and not Both PNUMBER {PNAME, PLOCATION} - Means PNAME, PLOCATION, two Non-Prime Attributes, Depends Partially on SSN, PNUMBER, i.e., Depend on Only PNUMEBER and not Both Chapter 14-23 Another Example of 2NF CSE 4701 What Does Decomposition Below Accomplish? ENAME Fully Dependent on SSN PNAME, PLOC Fully Dependent on PNUMBER Result: 2NF for EP1, EP2, and EP3 Chapter 14-24 Yet Another Example of 2NF CSE 4701 Consider 1NF Lots to Track Building Lots for Towns What is the 2NF Problem? FD3: COUNTY_NAME TAX_RATE Means TAX_RATE Depends Partially on Candidate Key {COUNTY_NAME, LOT#} All Other Non-Prime Attributes are Fine Chapter 14-25 Yet Another Example of 2NF CSE 4701 What Does Decomposition Below Accomplish? TAX_RATE Fully Dependent on COUNTY_NAME Result: 2NF for LOTS1 and LOTS2 Chapter 14-26 Third Normal Form (3NF) CSE 4701 Third Normal Form Focuses on the Concepts of Primary Keys and Transitive Functional Dependencies Intuitively: A Relation Schema R is in Third Normal Form (3NF) if it is in 2NF and no Non-Prime Attribute A in R is Transitively Dependent on Primary Key R can be Decomposed into 3NF Relations via the Process of 3NF Normalization In XY and Y Z , with X as the Primary Key, there is only a a problem only if Y is not a candidate key. EMP(SSN, Emp#, Salary), SSN Emp# Salary isn’t Problem Since Emp# is a Candidate Key Chapter 14-27 Transitive Partial FDs CSE 4701 Transitive FD - Formally: Given R(U) and X, YU. If XY, YX and YX, YZ, then Z is called transitively functional dependent on X. Transitive FD - Intuitively: a FD X Z that can be derived from two FDs XY and YZ SSN ENAME is non-transitive Since there is no set of Attributes X where SSN X and X ENAME For FD X Z that can be derived from two FDs XY and YZ, if Y is a Candidate Key – No Problem Chapter 14-28 Third Normal Form (3NF) CSE 4701 Formal 3NF Definition R 3NF iff (i) R 2NF; (ii) No Non-Key Attribute of R is Transitively Dependent on Every Candidate Key. Alternative Definition: R 3NF iff for every FD X Y, either X is a superkey, or Y is a key attribute. Reason: Transitive Functional Dependencies may cause Update Problems Chapter 14-29 One Example of 3NF STUDENT_DEPT(S#, DName, DHead, CN, Grade) 2NF CSE 4701 S_D(S#, DName, DHead) 2NF S_C(S#, CN, Grade) 2NF S_D 3NF S_C 3NF “S# DHead” is a Transitive FD in S_D and “DHead” is non-key attribute since S# (X) Dname (Y) and DName (Y) DHead (Z) S# DNAME DHead CN Grade fd1 fd2 fd3 S#DHead Chapter 14-30 One Example of 3NF CSE 4701 fd S# DHead S# DName DHead fd2 S# DName fd3 DName DHead S_C(S#, CN, Grade) 2NF S_D(S#, DName, DHead) 2NF DEPT(DName, DHead) S_D (S#, DName) 3NF Decompose to Eliminate the Transitivity Within S_D Chapter 14-31 Another Example of 3NF CSE 4701 EMP_DEPT is 2NF with Key SSN, but there are Two Transitive Dependencies (Undesirable) SSN DNUMBER and DNUMBER DNAME Means DNAME, Neither Key Nor Subset of Key, is Transitively Dependent on SSN SSN is the Only Candidate Key of EMP_DEPT! Note: Also Similar Problem with SSN and DMGRSSN via DNUMBER Chapter 14-32 Another Example of 3NF CSE 4701 To Attain 3NF, Decompose into ED1 and ED2 Intuitively - we are Separating Out Employees and Departments from One Another Chapter 14-33 Yet Another Example of 3NF CSE 4701 Recall 2NF Solution for Building Lots Problem What is the 3NF Problem? Violate Alternative Defn. In LOTS1, FD4 AREA PRICE AREA is not a Superkey PRICE not a Prime Attribute of LOTS1 Chapter 14-34 Yet Another Example of 3NF CSE 4701 Decompose to Introduce a Separate Key AREA Result: 3NF for LOTS1A and LOTS1B Chapter 14-35 1NF and 2NF – Maintain FDs! CSE 4701 Chapter 14-36 Transition to 3NF – Maintain FDs! CSE 4701 Chapter 14-37 Summary of Progression – Maintain FDs! STUDENT_DEPT CSE 4701 1 N F S# DName DHead CN Grade fd1 fd2 fd3 S_C 2 S# N F fd1 eliminate partial FDs CN Grade S# S_D DName DHead fd2 fd3 S_C 3 S# N fd1 F S_D CN S# Grade DName fd2 DEPT DName eliminate transitive FDs fd3 DHead Chapter 14-38 Summary of 1NF, 2NF, 3NF Concepts CSE 4701 Test Remedy (Normalization) 1NF Relation should have no nonatomic attributes or nested relations. Form new relations for each nonatomic attribute or nested relation. 2NF For relations where primary key contains multiple attributes, no nonkey attribute should be functionally dependent on a part of the primary key. Decompose and set up a new relation for each partial key with its dependent attribute(s). Make sure to keep a relation with the original primary key and any attributes that are fully functionally dependent on it. 3NF Relation should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes.) That is, there should be no transitive dependency of a nonkey attribute on the primary key. Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s). Chapter 14-39 Boyce-Codd Normal Form (BCNF) CSE 4701 Boyce-Codd Normal Form Focuses on Searching for Remaining Anomalies that can Arise in FDs Intuitively: A Relation Schema R is in Boyce-Codd Normal Form (BCNF) if Whenever an FD X A Holds in R, then X is a Superkey of R R can be Decomposed into BCNF Relations via the Process of BCNF Normalization There exist Relations that are in 3NF but not in BCNF The Goal is to have each Relation in BCNF (or 3NF) Chapter 14-40 Boyce-Codd Normal Form (BCNF) CSE 4701 Formal BCNF Definition R BCNF iff (i) R 1NF; (ii) for every FD X Y, X is a Superkey, i.e., if X Y and YX, then X Contains a Key. Properties of BCNF R BCNF iff for every FD X Y, either All Non-key Attributes Fully Dependent on Every Key All Key Attributes Fully Dependent on the Keys that they do not Belong to No Attribute Fully Dependent on any Set of Non-key Attributes Chapter 14-41 Comparing the Normal Forms Poor Relational Schema Design Developed as Stepping Stone CSE 4701 1NF Eliminate the non-trivial functional dependencies of non-key attributes to key Eliminate partial FDs of non-key attributes to key 2NF Eliminate transitive FDs of nonkey attributes to key 3NF BCNF Eliminate partial and transitive FDs of key attributes to key Most 3NF are in BCNF - BCNF Eliminates All Update Anomalies Chapter 14-42 One Example of BCNF CSE 4701 Recall 3NF Solution for Building Lots Problem Suppose that AREA is Sizes in Acres with AREAs in Tolland County 0.5, 0.6, …, 1.0 AREAs in Windham County 1.1, 1.2, …, 2.0 Adding FD5: “AREA COUNTYNAME” What Does Data in LOTS1A Look like for Given Set of Properties? Chapter 14-43 One Example of BCNF CSE 4701 LOTS1A PROPERTY_ID# T11 T12 W13 W11 W12 T10 COUNTY_NAME Tolland Tolland Windham Windham Windham Tolland LOT# L1 L2 L6 L1 L4 L3 AREA 0.5 0.8 1.5 1.1 1.6 0.9 What is the Problem Here? What if you Delete W11? You have “Lost” the “Windham, 1.1” Combination Also - Redundancy since “County Name, Area” is Repeated in Multiple Tuples Throughout LOTS1A Even Though LOTS1A in 3NF - Still Problems Problems with FD5: “AREA COUNTY_NAME” Chapter 14-44 Transition to BCNF – Maintain FDs! CSE 4701 Add new FD5 Chapter 14-45 One Example of BCNF CSE 4701 FD5: “AREA COUNTY_NAME” Satisfies 3NF: COUNTY_NAME is Prime Attribute Violates BCNF: AREA not a SuperKey of LOTS1A So Do One More Split Chapter 14-46 One Example of BCNF CSE 4701 LOTS1AX PROPERTY_ID# T11 T12 W13 W11 W12 T10 LOTS1AX PROPERTY_ID# T11 T12 W13 W11 W12 T10 LOT# L1 L2 L6 L1 L4 L3 COUNTY_NAME Tolland Tolland Windham Windham Windham Tolland AREA 0.5 0.8 1.5 1.1 1.6 0.9 LOT# L1 L2 L6 L1 L4 L3 AREA 0.5 0.8 1.5 1.1 1.6 0.9 LOTS1AY AREA 0.5 ... 1.0 1.1 ... 2.0 COUNTY_NAME Tolland Tolland Tolland Windham Windham Windham Chapter 14-47 Another Example of BCNF Consider the TEACH Relation: CSE 4701 TEACH(STUDENT, COURSE, INSTRUCTOR) in 3NF but NOT BCNF with FD1: {STUDENT, COURSE} INSTRUCTOR FD2: INSTRUCTOR COURSE 3 Possible Decompositions of TEACH: T1(STUDENT, INSTRUCTOR), T2(STUDENT, COURSE) T1(COURSE, INSTRUCTOR), T2(COURSE, STUDENT) T1(INSTRUCTOR, COURSE), T2 (INSTRUCTOR, STUDENT) All Three “Lose” FD1! 3rd is Best Since After Join, Recaptures FD1 and Doesn’t Generate any Spurious Tuples Chapter 14-48 What Does Table Look Like? CSE 4701 Note TEACH in 3NF but NOT BCNF Chapter 14-49 Reflections on Normalization CSE 4701 Normalization A Tool for Validating the Quality of the Schema, Rather than Merely as a Method for Designing a Relational Schema Promotes Each Concept of the Application Domain Mapping to Exactly One Concept of the Schema Normalization Process Actually a Process of Concept Separation Concept Separation is Result of Applying a Topdown Methodology for Producing a Schema Via Subsequent Refinements and Decompositions Chapter 14-50 Relational DB Design Process CSE 4701 Normalization Process Focused on Decomposition Raises Number of Questions How do we Decompose a Schema into a Desirable Normal Form? What Criteria Should the Decomposed Schemas Follow in order to Preserve the Semantics of the Original Schema? Can we Guarantee the Decomposition’s Quality? Can we Prevent the “Loss” of Information? Are Dependencies Maintained in Decomposition? Chapter 14-51 Recall Transitive FD/Update Anomalies R = ( U, F ) U = { S#, DName, DHead } F = { S#DName, DName DHead } CSE 4701 S# S1 S2 S3 S4 DName DHead D1 D1 D2 D3 John Jonh Smith Black S# Dhead” is a Transitive FD When S4 Graduates, Head Information of D3 Lost Similarly, If D5 has No Students Yet, then the Head Information cannot be Stored in this Database Update Head of Any Department Requires an Update to Every Student Enrolled in the Dept. Chapter 14-52 What are Possible Decompositions? CSE 4701 R = ( U, F ) U = { S#, DName, DHead } F = { S#DName, DName DHead } S# DName DHead S1 S2 S3 S4 D1 D1 D2 D3 John John Smith Black Information Based = { R1(S#, ), R2(DName, R3(DHead, )} is Neither Lossless nor FD-Preserving Chapter 14-53 What are Possible Decompositions? CSE 4701 R = ( U, F ) U = { S#, DName, DHead } F = { S#DName, DName DHead } S# DName S1 D1 S2 D1 S3 D2 S4 D3 S# DHead S1 John S2 John S3 Smith S4 Black •Lossless Decomposition but not Dependency-Preserving •DNameDHead is lost in the decomposition = { R1({S# ,DName}, {S#DName}), R2({S#, DHead}, {S#DHead})} is Lossless but not FD-Preserving Chapter 14-54 What are Possible Decompositions? CSE 4701 R = ( U, F ) U = { S#, DName, DHead } F = { S#DName, DName DHead } S# DName S1 D1 S2 D1 S3 D2 S4 D3 DName DHead D1 D1 D2 D3 John John Lossless & dependency preserving decomposition = { R1({S# ,DName}, {S# DName}) R3({DName, DHead}, {Dname DHead})} is both Lossless and FD-Preserving Chapter 14-55 Summary of Normalization 1NF CSE 4701 Lossless Decomposition and Dependency Preserving Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes 2NF Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes 3NF Lossless Decomposition but not Dependency Preserving Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key BCNF Chapter 14-56 The Entire Normalization Picture 1NF CSE 4701 2NF 3NF Eliminate Partial FDs of Non-prime Attributes to Key Eliminate Transitive FDs of Non-prime Attributes to Key Eliminate Partial and Transitive FDs of Prime Attributes to Key BCNF Eliminate Non-trivial and Nonfunctional Multi-Valued Dependencies 4NF Eliminate Join Dependencies that are Not Implied by Candidate Key 5NF Chapter 14-57 What are Multi-Valued Dependencies? CSE 4701 Focused on the Concept of Multi-Valued Dependencies A MVD X Y Indicates that a Value of X Corresponds to Multiple Values of Y Consider EMP with MVDs: ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents) Chapter 14-58 What is Fourth Normal Form (4NF)? CSE 4701 A Relation Schema R is in Fourth Normal Form (4NF) w.r.t Dependencies F (FD and MVD) if for every Non-Trivial MVD X Y in F+, X is a Superkey for R Reconsider EMP with MVDs: ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents) ENAME is Not a Superkey of R since Need Triple of ENAME, PNAME, and DNAME to Distinguish We need to Decompose EMP! Chapter 14-59 Decomposition into 4NF CSE 4701 ENAME PNAME is Trivial MVD: ENAME PNAME is Equal to EMP_PROJECTS (same for ENAME DNAME) Chapter 14-60 What about the Supply Table? CSE 4701 In 4NF But Not in 5NF since: Supplier supplies Parts, Supplier supplies Projects, & Parts Used on Projects Removes Join Dependencies – Many-many-many Chapter 14-61 Slides on Query Optimization CSE 4701 Chapter 14-62 Simplification CSE 4701 Why Simplify? The Simpler the Query, the Less Work there is and the Better the Performance How? Use transformation rules Elimination of Redundancy Idempotency Rules p1 ¬(p1) = false ¬(p1 p2) = ¬(p1) ¬(p2) p1 false = p1 … Application of Transitivity Use of Integrity Rules Example x > a and x > b Chapter 14-63 Restructuring CSE 4701 Convert Relational Calculus to Relational Algebra ENAME Make use of Query Trees Example Find the names of employees (DUR=12 OR DUR=24) AND other than J. Doe who worked JNAME=“CAD/CAM” AND ENAME°“J. DOE” on the CAD/CAM project for either 1 or 2 years. SELECT ENAME FROM E, W, P WHERE E.ENO=W.ENO AND W.JNO=P.JNO P AND E.ENAME°"J. Doe" AND P.JNAME="CAD/CAM" AND (W.DUR=12 OR W.DUR=24) Project Select JNO Join ENO W E Chapter 14-64 Query Optimization Objectives CSE 4701 Improving Performance Arriving at a Query Plan of Execution Analyzing the Relational Algebra Query Replace Costly Operations Do Selections and Projections Early Optimization Heuristics for the Relational Algebra Performing Selection and Projection Before Join Combining Several Selections Over a Single Relation Into One Selection Find Common Subexpressions Algebraic Rewriting/transformation Rules General Transformation Rules for Relational Algebra Chapter 14-65 Query Optimization: An Example CSE 4701 Why is it important? SELECT ENAME FROM E,W WHERE E.ENO = W.ENO AND W.RESP = "Manager" Strategy 1 ENAME(RESP="Manager"E.ENO=G.ENO(E W)) Strategy 2 ENAME( E ENO(RESP="Manager"(W))) Chapter 14-66 Cost of Alternatives CSE 4701 Assume : card(E) = 4,000; card(W)=10,000 10% of tuples in W satisfy RESP="Manager" (selection generates 1,000 tuples) Execution time Proportional to the Sum of the Cardinalities of the Temporary Relations Searching is Done by Sequential Scanning Strategy 1 Cartesian prod. = 40,000,000 Search over all = 40,000,000 80,000,000 Strategy 2 Selection over W = 10,000 Join(4000*1000) = 4,000,000 4,010,000 Chapter 14-67 General Query Optimization Strategy CSE 4701 Perform Selections Early Yields Smaller Intermediate Results Direct Impact on Subsequent Join/Cartesian Prod. Combine Selections with a Prior Cartesian Product into a Theta or Equi Join Join is a Cheaper Operation Combine (Cascade) Selections and Projections AB(B (R)) AB(R) p1 ( p2 (R)) p1 ^ p2 (R) This Results in One Pass Instead of Two over Table Chapter 14-68 General Query Optimization Strategy CSE 4701 Identify Common Subexpressions Compute Once and Store use Stored Version for Subsequent Times Often Useful When Views are Employed Preprocess Data via Sorts and Indexes Speeds up Searches and Joins by Limiting Scope Evaluate and Assess Different Options For Cartesian Product, Use Smaller Relation for Comparison Use System Catalog (Meta-data) to Effect Order in Query Execution Plan Chapter 14-69 Relational Algebra Transformations CSE 4701 1. Cascade of Selection 2. Commutativity of Selection 3. p1(p2(R))p2(p1(R)) p1 or p2(R )p1(R p2(R) Cascade of Projection 4. p1 ^ p2 ^ …^ pn(R)p1(p2(...(pn(R))...)) A1,A2, … An(R)A1(A2(...(An(R))...)) A1(R) if A1 A2 ... An Commuting Selection with Projection (A’s not in p) A1,A2,...,An(p(R))p(A1,A2,...,An(R) Chapter 14-70 Relational Algebra Transformations CSE 4701 5. 6. Commutativity of Theta Join and Cartesian Product R A SS AR R SS R Commuting Selection with Theta Join (Cartesian) p(A)(R S) p(A)(R)) S A defined on R only p(A)^p(B)(R S) p(A)(R)) p(B)(S)) (A defined on R, B defined on S) 7. Also Holds for Theta Join as Well Commuting Projection with Theta Join (Cartesian) C(R S) A(R) B(S) where AB=C A are Attributes in C for R and B are Attributes in C Chapter 14-71 Relational Algebra Transformations CSE 4701 8. 9. 10. Commutativity of Set Operations R S S R R S S R Associativity of Set Operations (R S) T R S T) (R S) T R (S T) (R S) S R (S T) (R S) S R (S T) Commuting Select with Set Operations p(Ai)(R T) p(Ai)(R) p(Ai)(T) where Ai is defined on both R and T Chapter 14-72 Relational Algebra Transformations CSE 4701 11. Commuting Projection with Union C(R q(Aj,Bk) S) A(R) q(Aj,Bk) B(S) C(R S) A’ (R) B’ (S) where R[A] and S[B] C = A' B' where A' A, B’ B 12. Converting Selection/Cartesian Into Theta Join C C (R S) R S Chapter 14-73 Using Heuristics in Query Optimization CSE 4701 Process for heuristics optimization 1. The parser of a high-level query generates an initial internal representation; 2. Apply heuristics rules to optimize the internal representation. 3. A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query. The main heuristic is to apply first the operations that reduce size of intermediate results E.g., Apply SELECT and PROJECT operations before applying the JOIN or other operations. Chapter 14-74 Using Heuristics in Query Optimization (2) CSE 4701 Query tree: A tree data structure that corresponds to a relational algebra expression. It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes. An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation. Query graph: A graph data structure that corresponds to a relational calculus expression. It does not indicate an order on which operations to perform first. There is only a single graph corresponding to each query. Chapter 14-75 Using Heuristics in Query Optimization CSE 4701 Heuristic Optimization of Query Trees: The same query could correspond to many different relational algebra expressions — and hence many different query trees. Remember – Not One Soln to Each Query on Exam The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute. Example: Q: SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME = ‘AQUARIUS’ AND PNMUBER=PNO AND ESSN=SSN AND BDATE > ‘1957-12-31’; Chapter 14-76 Heuristics Algebraic Optimization Concepts CSE 4701 Using Cascade of Selections Rule, Break up Any Selections With Conjunctive Conditions Into a Cascade of Selections Allows More Freedom in Moving Selections Down Different Branches of the Tree Using Commutativity of Selections with Other Operations Rules, Move Each Selection Down the Query Tree as far as Possible If Possible, Combine a Cartesian Product With a Selection Into a Join Chapter 14-77 Heuristics Algebraic Optimization Concepts CSE 4701 Using Associativity of Binary Operations, Rearrange the Leaf Nodes So That the Most Restrictive Selections Are Executed First The Fewer Tuples the Resulting Relation Contains, the More Restrictive the Selection Reducing the Size of Intermediate Results Improves Performance Using Cascade of Projections and Commutativity of Projections with Other Operations, Move Projections Down the Query Tree as Far as Possible Identify Subtrees that Represent Groups of Operations that can be Executed by a Single Algorithm Chapter 14-78 Summary of All Rules CSE 4701 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Cascade of Selection Commutativity of Selection Cascade of Projection Commuting Selection with Projection (A’s not in p) Commutativity of Theta Join and Cartesian Product Commuting Selection with Theta Join (Cartesian) Commuting Projection with Theta Join (Cartesian) Commutativity of Set Operations Associativity of Set Operations Commuting Select with Set Operations Commuting Projection with Union Converting Selection/Cartesian Into Theta Join Chapter 14-79 Heuristic Algebraic Optimization Algorithm CSE 4701 Use Rule 1 to Break up Selects with Conjunctions into a Cascade to Move them Down the Query Tree Use Rules 2, 4, 6, and 10 to Commute Select with Project, Join, Cart. Prod., Union, and Intersection Use Rule 5 (Commute) and 9 (Associative) to Rearrange the Leaf Nodes of Query Tree to: Most Restrictive Select Executed First Avoid Cartesian Product in Leaf Nodes Use Rule 12 to Convert a Select/Cart Prod to Join Use Rules 3, 4, 7, and 11 to Cascade and Commute Project - Pushing Down Tree as Far as Possible Identify Subtrees that Can Execute as Independent Chapter 14-80 Heuristic Optimization: Example CSE 4701 Canonical query tree at the end of query preprocessing phase ENAME (DUR=12 OR DUR=24) AND JNAME=“CAD/CAM” AND ENAME= “J. DOE” E(ENAME, ENO) P(JNO,JNAME) W(ENO,PNO,DUR) JNO ENO P W E Chapter 14-81 Heuristic Optimization– Example ENAME CSE 4701 DUR=12 OR DUR=24 JNAME=“CAD/CAM” ENAME = “J. DOE” Use cascading of selections rule to decompose selections JNO P ENO W E Chapter 14-82 Heuristic Optimization– Example ENAME CSE 4701 DUR=12 OR DUR=24 JNAME=“CAD/CAM” Push selection down using commutativity of selection over join JNO ENO ENAME = "J. Doe" P W E Chapter 14-83 Heuristic Optimization–Example CSE 4701 ENAME DUR=12 OR DUR=24 JNO JNAME = "CAD/CAM" Push selection down using commutativity of selection over join ENO ENAME = "J. Doe" P W E Chapter 14-84 Heuristic Optimization–Example CSE 4701 ENAME JNO Push selection down ENO JNAME = "CAD/CAM" P DUR =12 DUR=24 W ENAME = "J. Doe" E Chapter 14-85 Heuristic Optimization–Example ENAME CSE 4701 JNO JNO,ENAME Do early projection ENO JNO JNAME = "CAD/CAM" P JNO,ENO DUR =12 DUR=24 W ENO,ENAME ENAME = "J. Doe" E Chapter 14-86 Heuristic Optimization–Example ENAME CSE 4701 Identify subtrees that can be implemented in one algorithm JNO JNO,ENAME ENO JNO JNAME = "CAD/CAM" JNO,ENO JNO,ENAME DUR =12 DUR=24 ENAME = "J. Doe" P W E Chapter 14-87 Heuristic Optimization: A Second Example CSE 4701 BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Let XLOANS = S(F(Loans x Borrowers x Books)) where: S ={Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date} and F = {Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No} Chapter 14-88 Heuristic Optimization: A Second Example CSE 4701 Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No XLOANS X Books X Loans Borrower BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Chapter 14-89 Heuristic Optimization: A Second Example Title CSE 4701 Date 1/1/88 Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X Books X Loans Query= TITLE(Date 1/1/88 (XLOANS)) Borrower BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Chapter 14-90 Heuristic Optimization: A Second Example Title Try to Cascade CSE 4701 Date 1/1/88 Date 1/1/88 Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X Books X Loans Borrower BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Chapter 14-91 Heuristic Optimization: A Second Example Title CSE 4701 Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Date 1/1/88 Commute Select and Project Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X Books X Loans Borrower BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Chapter 14-92 Heuristic Optimization: A Second Example Title CSE 4701 Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No Date 1/1/88 Commute Select and Select X Books X Loans Borrower BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Chapter 14-93 Heuristic Optimization: A Second Example Title CSE 4701 Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X Books X Date 1/1/88 Loans Borrower Commute Select and Cartesian Product Two Levels Down BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Chapter 14-94 Heuristic Optimization: A Second Example Title Try to Cascade CSE 4701 Borrower.Card_No = Loans.Card_No Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X Books X Date 1/1/88 Loans Borrower BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Chapter 14-95 Heuristic Optimization: A Second Example Title CSE 4701 Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Books.LC_No = Loans.LC_No X Books Borrower.Card_No = Loans.Card_No Commute Select and Cartesian Product One Level Down X What’s Next? Date 1/1/88 Loans Borrower BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Chapter 14-96 Heuristic Optimization: A Second Example Title CSE 4701 Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Books.LC_No = Loans.LC_No X Combine Projections Books Borrower.Card_No = Loans.Card_No X Date 1/1/88 Loans Borrower BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Chapter 14-97 Heuristic Optimization: A Second Example BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Title CSE 4701 Books.LC_No = Loans.LC_No X Books Borrower.Card_No = Loans.Card_No X Date 1/1/88 Loans Borrower What is Still a Problem? We are Not Projecting so All Attributes are Still Collected Until the Final Project! Chapter 14-98 Heuristic Optimization: A Second Example Title CSE 4701 Loans.LC_No Books.LC_No, Title X Loans.LC_No, Books.LC_No = Loans.LC_No Books Borrower.Card_No = Loans.Card_No X Borr.Card_No Loans.Card_No Date 1/1/88 Loans Borrower Add Strategic Projections to Send Only the Minimum Up the Tree as Needed for Join/Result Set Chapter 14-99 Heuristic Optimization: A Second Example CSE 4701 Title What is the Final Step? Combine Select and Cartesian Product Books.LC_No = Loans.LC_No Result: Equijoins! Loans.LC_No X Loans.LC_No, Books.LC_No, Title Books Borrower.Card_No = Loans.Card_No X Borr.Card_No Loans.Card_No Date 1/1/88 Borrower Loans Chapter 14-100 Heuristic Optimization: A Second Example CSE 4701 FINAL TREE with Equijoins! Title LC_No Loans.LC_No Books.LC_No, Title Books Card_No Loans.LC_No, Borr.Card_No Loans.Card_No Date 1/1/88 Borrower Loans Chapter 14-101 Heuristic Optimization: A Third Example CSE 4701 Heuristic Optimization of Query Trees: The same query could correspond to many different relational algebra expressions — and hence many different query trees. The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute. Example: Q: SELECT FROM WHERE LNAME EMPLOYEE, WORKS_ON, PROJECT PNAME = ‘AQUARIUS’ AND PNMUBER=PNO AND ESSN=SSN AND BDATE > ‘1957-12-31’; Chapter 14-102 Heuristic Optimization: A Third Example CSE 4701 What’s one Approach? Chapter 14-103 Heuristic Optimization: A Third Example CSE 4701 Moving Selects Down Is this Optimal? Chapter 14-104 Heuristic Optimization: A Third Example CSE 4701 No! Prior Version Retrieved All Employees Without First Apply Pname Select Chapter 14-105 Heuristic Optimization: A Third Example CSE 4701 Replace CART PRODUCT Plus SELECT with JOIN! What’s left to do? Chapter 14-106 Heuristic Optimization: A Third Example CSE 4701 Chapter 14-107 Heuristic Optimization: A Fourth Example CSE 4701 Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) Query: Find all Sailors that have Reserved red Boats that are younger who are younger than 30 and have a rating of at least 11. SELECT S.sid, S.sname, S.age FROM Sailors S, Boats B, Reserves R WHERE B.bid=R.bid AND S.sid=R.sid AND S.Rating >= 11 AND B.color = “Red” AND S.age < 30; πS.sid, S.sname, S.age(σ B.bid=R.bid^S.sid=R.sid^S.age<30^ B.color=“Red”^S.rating≥11(B×S×R) Chapter 14-108 Heuristic Optimization: A Fourth Example CSE 4701 S.sid, S.sname, S.age B.bid=R.bid^S.sid=R.sid^ S.age < 30 ^ S.Rating >= 11 ^ B.color = “Red” X Boats Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) X Reserves Sailors Step 1 - Break up Selects Chapter 14-109 Heuristic Optimization: A Fourth Example CSE 4701 S.sid, S.sname, S.age B.bid=R.bid^S.sid=R.sid S.age < 30 ^ S.Rating >= 11 B.color = “Red” Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) X Boats X Step 2 – Move that Boats Select Reserves Sailors Chapter 14-110 Heuristic Optimization: A Fourth Example CSE 4701 S.sid, S.sname, S.age B.bid=R.bid^S.sid=R.sid S.age < 30 ^ S.Rating >= 11 X Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) B.color = “Red” Boats Step 3 – Move that Sailor Select X Sailors Reserves Chapter 14-111 Heuristic Optimization: A Fourth Example CSE 4701 S.sid, S.sname, S.age Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) B.bid=R.bid^S.sid=R.sid X X B.color = “Red” Boats Reserves S.age < 30 ^ S.Rating >= 11 Sailors Step 4 – Introduce Projections Chapter 14-112 Heuristic Optimization: A Fourth Example CSE 4701 S.sid, S.sname, S.age Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) B.bid=R.bid^S.sid=R.sid Step 5 – What’s Next Step? X X B.bid B.color = “Red” Boats R.sid,R.bid Reserves S.sid,S.name,S.age S.age < 30 ^ S.Rating >= 11 Sailors Chapter 14-113 Heuristic Optimization: A Fourth Example CSE 4701 S.sid, S.sname, S.age B.bid=R.bid Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) Step 6 - Move Down S.sid=R.sid X S.sid=R.sid Step 7 – What’s Next Step? X B.bid B.color = “Red” Boats R.sid,R.bid Reserves S.sid,S.name,S.age S.age < 30 ^ S.Rating >= 11 Sailors Chapter 14-114 Heuristic Optimization: A Fourth Example CSE 4701 S.sid, S.sname, S.age B.bid=R.bid Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) Step 7 – Combined for Equi Join X Step 8 – What’s Final Step? S.sid=R.sid B.bid B.color = “Red” Boats R.sid,R.bid Reserves S.sid,S.name,S.age S.age < 30 ^ S.Rating >= 11 Sailors Chapter 14-115 Heuristic Optimization: A Fourth Example CSE 4701 Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) S.sid, S.sname, S.age Step 8 – Introduce Final EquiJoin B.bid=R.bid S.sid=R.sid B.bid B.color = “Red” Boats R.sid,R.bid Reserves S.sid,S.name,S.age S.age < 30 ^ S.Rating >= 11 Sailors Chapter 14-116 Converting Relational Algebra to Query Tree Movies1997 = CSE 4701 Lname,Fname,State( Person.PersonID = AllActors.PersonID ^ Movies1997.ShowID=MovieRoles.ShowID ^ Year=1997 (Person x Movies x MovieRoles)) Lname,Fname,State Person.PersonID = AllActors.PersonID ^ Movies1997.ShowID=MovieRoles.ShowID ^ Year=1997 X X Person Movies MovieRoles Chapter 14-117 Converting Relational Algebra to Query Tree FriendsActors = Lname,Fname,RLName,RFName CSE ( 4701 ShowName=Friends ^ TVRoles.ShowID = Friends.ShowID ^ EpisodeID>10 ^ EpisodeId<26 ^ Person.PersonID = RoleNames.PersonID(TVShows x TVRoles x Roles x Person)) ShowID ShowName=Friends ^ TVRoles.ShowID = Friends.ShowID ^ EpisodeID>10 ^ EpisodeId<26 ^ Person.PersonID = RoleNames.PersonID X X TVShows TVRoles Roles X Person Chapter 14-118 Heuristics Query Optimization: Summary CSE 4701 First Apply Operations that Reduce the Size of Intermediate Results Move Selections and Projections Down the Tree as far as Possible Early Selections Reduce the Number of Tuples Early Projections Reduce the Number of Attributes Selection and Join Should be Executed Before Other Similar Operations. This is Accomplished by Reordering the Leaf Nodes of the Tree Among Themselves and Adjusting the Rest of the Tree Appropriately Chapter 14-119 Slides on Concurrency Control Algorithms CSE 4701 Chapter 14-120 What is a Schedule? CSE 4701 Transaction schedule or history: When transactions are executing concurrently in an interleaved fashion, the order of execution of operations from the various transactions forms what is known as a transaction schedule A schedule S of n transactions T1, T2, …, Tn is: Ordering of operations of transactions where, for each transaction Ti that participates in S, the operations of T1 in S must appear in the same order in which they occur in T1. Operations from other transactions Tj can be interleaved with the operations of Ti in S. Chaps19&20-121 What is a Schedule? CSE 4701 A Schedule S is a Sequence of R/W Operations, Which End with Commit or Abort Different Transactions Executing Concurrently in an Interleaved Fashion with One Another Each Transaction a Sequence of R/W Operations Two Schedules S1 and S2 are Equivalent, Denoted as S1 S2 , If and Only If S1 and S2 Execute the Same Set of Transactions Produce the Same Results (i.e., Both Take the DB to the Same Final State) Chaps19&20-122 Transactions and a Schedule CSE 4701 Below are Transactions T1 and T2 Note that the Their Interleaved Execution Shown Below is an Example of One Possible Schedule There are Many Different Interleaves of T1 and T2 T1 T2 Read(X); X:=X; Write(X); Read(X); X:=X; Write(X); commit; Read(Y); Y = Y + 20; Write(Y); commit; Schedule S: R1(X), W1(X), R2(X), W2(X), c2, R1(Y), W1(Y), c1; Chaps19&20-123 Transactions and a Schedule What Happens if the Schedule Changes to: CSE 4701 T1 T2 T2 Read(X); X:=X; Read(X); X:=X; Write(X); Read(X); Read(X); X:=X; Write(X); commit; Read(Y); Y = Y + 20; Write(Y); commit; T1 Write(X); X:=X; Write(X); commit; Read(Y); Y = Y + 20; Write(Y); commit; Chaps19&20-124 Equivalent Schedules CSE 4701 Are the Two Schedules below Equivalent? S1 and S4 are Equivalent, since They have the Same Set of Transactions and Produce the Same Results T1 T2 Read(X); X:=X; Write(X); Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Schedule S1 T1 T2 Schedule S4 Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); commit; Read(Y); Y = Y + 20; Write(Y); commit; S1: R1(X),W1(X), R1(Y), W1(Y), c1, R2(X), W2(X), c2; S4: R1(X), W1(X), R2(X), W2(X), c2, R1(Y), W1(Y), c1; Chaps19&20-125 What are Different Types of Schedules? CSE 4701 Recoverable schedule: One where no transaction needs to be rolled back. No transaction T in S commits until all transactions T’ that write an item that T reads have committed. Cascadeless schedule: One where every transaction reads only the items that are written by committed transactions. Cascaded rollback: A schedule in which uncommitted transactions that read an item from a failed transaction must be rolled back – Read value written by Failed Trans Strict Schedules: A schedule in which a transaction can neither read or write an item X until the last transaction that wrote X has committed. Chaps19&20-126 Serial and Serializable Schedules CSE 4701 Serial schedule: A schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule. Otherwise, the schedule is called nonserial schedule. Serializable schedule: A schedule S is serializable if it is equivalent to some serial schedule of the same n transactions. Being serializable implies that the schedule is a correct schedule that: Leaves the database in a consistent state. The interleaving of operations results in a state as if the transactions were serially executed, while achieving efficiency due to concurrent execution. Chaps19&20-127 Serializability of Schedules CSE 4701 A Serial Execution of Transactions Runs One Transaction at a Time (e.g., T1 and T2 or T2 and T1) All R/W Operations in Each Transaction Occur Consecutively in S, No Interleaving Consistency: a Serial Schedule takes a Consistent Initial DB State to a Consistent Final State A Schedule S is Called Serializable If there Exists an Equivalent Serial Schedule A Serializable Schedule also takes a Consistent Initial DB State to Another Consistent DB State An Interleaved Execution of a Set of Transactions is Considered Correct if it Produces the Same Final Result as Some Serial Execution of the Same Set of Transactions We Call such an Execution to be Serializable Chaps19&20-128 Example of Serializability CSE 4701 Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20 After S1 or S2 X = 7 and Y = 40 These are the two Possible Serial Schedules Schedule S1 T1 T2 Schedule S2 T1 T2 Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Chaps19&20-129 Example of Serializability CSE 4701 Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20 After S1 or S2 X = 7 and Y = 40 Is S3 a Serializable Schedule? Schedule S1 T1 T2 Schedule S2 T1 T2 Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Schedule S3 T1 T2 Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X; Write(X); commit; Chaps19&20-130 Example of Serializability CSE 4701 Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20 After S1 or S2 X = 7 and Y = 40 Is S4 a Serializable Schedule? Schedule S1 T1 T2 Schedule S2 T1 T2 Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Schedule S4 T1 T2 Read(X); X:=X; Write(X); Read(X); X:=X; Write(X); commit; Read(Y); Y = Y + 20; Write(Y); commit; Chaps19&20-131 Two Serial Schedules with Different Results CSE 4701 Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20 After S1 X = 7 and Y = 28 After S2 X = 7 and Y = 27 Schedule S1 T1 T2 Schedule S2 T1 T2 Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = X + 20; Write(Y); commit; Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = X + 20; Write(Y); commit; A Schedule is Serializable if it Matches Either S1 or S2 , Even if S1 and S2 Produce Different Results! Chaps19&20-132 Thoughts on Serializability CSE 4701 Serializability is hard to check Interleaving of operations occurs in an operating system through some scheduler Difficult to determine beforehand how the operations in a schedule will be interleaved Need to Adopt a Practical Approach Come up with methods (protocols) to ensure serializability. However, it is not possible to determine when a schedule begins and when it ends. Hence, we reduce the problem of checking the whole schedule to checking only a committed project of the schedule Chaps19&20-133 How do we Check for Conflicts? CSE 4701 Testing for conflict serializability: Look at only read_Item (X) and write_Item (X) operations Constructs a precedence graph (serialization graph) with directed edges An edge is created from Ti to Tj if one of the operations in Ti appears before a conflicting operation in Tj The schedule is serializable if and only if the precedence graph has no cycles. Chaps19&20-134 The Serializability Theorem CSE 4701 A Dependency Exists Between Two Transactions If: They Access the Same Data Item Consecutively in the Schedule and One of the Accesses is a Write Three Cases: T2 Depends on T1 , Denoted by T1 T2 T2 Executes a Read(x) after a Write(x) by T1 T2 Executes a Write(x) after a Read(x) by T1 T2 Executes a Write(x) after a Write(x) by T1 Don’t carE about Read(x) Read(x) Transaction T1 Precedes Transaction T2 If: There is a Dependency Between T1 and T2, and The R/W Operation in T1 Precedes the Dependent T2 Operation in the Schedule Chaps19&20-135 The Serializability Theorem CSE 4701 A Precedence Graph of a Schedule is a Graph G = <TN, DE>, where Each Node is a Single Transaction; i.e.,TN = {T1, ..., Tn} (n>1) and Each Arc (Edge) Represents a Dependency Going from the Preceding Transaction to the Other i.e., DE = {eij | eij = (Ti, Tj), Ti, Tj TN} Use Dependency Cases on Prior Slide The Serializability Theorem A Schedule is Serializable if and only of its Precedence Graph is Acyclic Chaps19&20-136 Serializability Theorem Example CSE 4701 Consider S1 and S2 for Transactions T1 and T2 Consider the Two Precedence Graphs for S1 and S2 No Cycles in Either Graph! Schedule S1 T1 T2 X T2 X T2 T1 T2 Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Schedule S1 T1 T1 Schedule S2 Read(X); X:=X; Write(X); commit; Read(X); X:=X; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Schedule S2 Chaps19&20-137 What are Precedence Graphs for S3 and S4? CSE 4701 For S3 T1 T2 (T2 Write(X) After T1 Write(X)) T2 T1 (T1 Write(X) After T2 Read (X)) For S4 T1 T2 (T2 Read/Write(X) After T1 Write(X)) X Schedule S3 T1 T1 T2 Read(X); X:=X; X Write(X); Read(Y); Schedule S3 T1 T2 X Schedule S4 T2 Y = Y + 20; Write(Y); commit; Read(X); X:=X; Write(X); commit; Schedule S4 T1 T2 Read(X); X:=X; Write(X); Read(X); X:=X; Write(X); commit; Read(Y); Y = Y + 20; Write(Y); commit; Chaps19&20-138 Four Schedules and their … CSE 4701 Chaps19&20-139 … Precedence Graphs CSE 4701 Chaps19&20-140 Serializability Facts CSE 4701 Serializability Emphasizes Throughput Serializable Executions Allow us to Enjoy the Benefits of Concurrency without Giving up Any Correctness However, we May NOT GET the Same Result Testing for Serializability Difficult in Practice: Finding a Serializable Schedule for an Arbitrary Set of Transactions is NP-hard Interleaving of Operations From Concurrent Transs is Determined Dynamically at Run-time Practically Almost Impossible to Determine Ordering of Operations Beforehand to Ensure Serializability Chaps19&20-141 Database Concurrency Control CSE 4701 Purpose of Concurrency Control To enforce Isolation (through mutual exclusion) among conflicting transactions. To preserve database consistency through consistency preserving execution of transactions. To resolve read-write and write-write conflicts. Example: In concurrent execution environment if T1 conflicts with T2 over a data item A, then the existing concurrency control decides if T1 or T2 should get the A and if the other transaction is rolled-back or waits. Chaps19&20-142 Concurrency Control CSE 4701 Different Locking-Based Algorithms Binary Locks (Lock and Unlock) Share Read Locks and Exclusive Write Locks Write Lock Does Not Imply Read 2 Phase Protocol All Locks Must Precede All Unlocks in Trans. True for All Transactions - Schedule Serializable Concurrency Control Implementation Techniques Optimistic Concurrency Control Time-Based Access to Information Consider “When” Information Read/Written to Identify Potential or Prior Conflicts We’ll Deviate from Textbook Notation Chaps19&20-143 Summary of CC Techniques CSE 4701 Two-Phase Locking Most Important in Practice Used by a Majority of DBMSs Serializes in the Middle of Transactions Low Overhead Relatively Low Concurrency Timestamp-Based Based on Multiple Versions of Data Items Serializes at the Beginning of Transactions Mostly Used in Distributed DBMSs Optimistic Concurrency Control Methods Serializes at the End of Transactions Relatively High Concurrency Chaps19&20-144 Recalling Important Concepts CSE 4701 Transaction: Sequence of Database Commands that Must be Executed as a Single Unit (Program) Recall SQL Update Query Equivalent to Multiple Operations Read from DB, Modify (Local Copy), Write to DB Modify Sometimes Delete and Insert Granularity: Size of Data that is Locked for an Executing DB Transaction - Wide Range Database Relation (Tuple vs. Entire Table) Attribute (Column) Meta-Data (System Catalog) Locking: Provides Means for Synchronization Chaps19&20-145 Transaction Example CSE 4701 Two Possible Outcomes for T1 and T2 – Let A = 5 If T1 First, then A = 150 If T2 First, then A = 60 Is this a Problem? T1 T2 T1 T2 LOCK A READ A A=A*10 WRITE A UNLOCK A commit; LOCK A READ A A=A+10 WRITE A UNLOCK A commit; LOCK A READ A A=A*10 WRITE A UNLOCK A commit; LOCK A READ A A=A+10 WRITE A UNLOCK A commit; Chaps19&20-146 Transaction Example CSE 4701 The Two Different Orderings of T1 and T2 Represent Alternate Serial Schedules (Non-Interleaved) Key Concept: Concurrent (Interleaved) Execution of Several DB Transactions is Correct if and only if its Effect is the Same as that Obtained by Running the Same Transactions in a Serial Order If Result is Either 150 or 60 – it is OK! This is the Concept of Serializability! T1 LOCK A READ A A=A+10 WRITE A UNLOCK A commit; T2 LOCK A READ A A=A*10 WRITE A UNLOCK A commit; Chaps19&20-147 Recalling Key Definitions CSE 4701 A Schedule for a Set of Transactions is the Order in When the Elementary Steps (Read, Lock, Assign, Commit, etc.) are Performed A Schedule is Serial if All Steps of Each Transaction Occur Consecutively A Schedule is Serializable if it is Equivalent to “Some” Serial Schedule If T1, T2 and T3 are Transactions - What are the Possible Serial Schedules? T2 T3 T1 T1 T2 T3 T3 T1 T2 T1 T3 T2 T3 T2 T1 T2 T1 T3 Different Serial Schedules for 4 Transactions? Chaps19&20-148 Another Example of Serializability CSE 4701 Two Serial Schedules – Let A = 15, B = 25, C=5 What are Values of A, B, and C after Each? A = 5, B = 15, C=25 S1 T1 Read(A); A:=A0; Write(A); Read(B); B = B + 10; Write(B); commit; T2 Read(B); B:=B0; Write(B); Read(C); C=C+20 Write(C) commit; S2 T1 T2 Read(B); B:=B0; Write(B); Read(C); C=C+20 Write(C) commit; Read(A); A:=A0; Write(A); Read(B); B = B + 10; Write(B); commit; Chaps19&20-149 Another Example of Serializability CSE 4701 Is S3 or S4 – Let A = 15, B = 25, C = 5 Serial Values: A = 5, B = 15, C=25 T1 A = 5 B = 15 C = 25 T2 Read(A); Read(B); T1 T2 Read(A); A:=A0; Read(B); A:=A0; B:=B0; A = 5 B = 35 C = 25 Write(A); B:=B0; Write(A); Write(B); Read(B); Write(B); Read(B); Read(C); B = B + 10; Read(C); B = B + 10; C=C+20 Write(B); Write(C) commit; commit; Write(B); commit; C=C+20 Write(C) commit; Chaps19&20-150 Locks CSE 4701 Lock: Variable Associated with a Data Item in DB, Describing the Status of that Item w.r.t. Possible Ops. A Means of Synchronizing the Access by Concurrent Transactions to the Database Item Managed by Lock Manager Binary Locks: Lock(x) and Unlock(x) A Transaction T Must Issue the Lock(x) before any Read(x) or Write(x) A Transaction T Must use the Unlock(x) After all Read(x)/Write(x) Operations are Completed in T System Catalog Maintains a Lock Table for All Locked Items Lock(x)(or Unlock(x)) will not be Granted if there Already Exists a Lock(x) (or Unlock(x)) Chaps19&20-151 A Basic Lock/Unlock Model CSE 4701 Database Transaction is a Sequence of Lock/Unlocks Item Locked must Eventually be Unlocked A Transaction Holds a Lock between Lock and Unlock Statements Lock/Unlock Assumes that the Value of the Item Changes (Always Assumes a Write) a0 f(a0) a0 Lock A Unlock A f(a0) For a Number of Transactions that Lock/Unlock A, we’d have: f1(f2(f3( … fn( a0)))) Chaps19&20-152 Example - Assessing Schedule CSE 4701 Consider Three Transactions Below: T1 has f1(a) and f2(b) T2 has f3(b) and f4(c) and f5(a) T3 has f6(a) and f7 (c) Functions Represent actions that Modify Instances a, b, and c of Data Items A, B, and C, Respectively T1 Lock A Lock B Unlock A Unlock B T2 Lock B Lock C Unlock B Lock A Unlock C Unlock A T3 Lock A Lock C Unlock C Unlock A Chaps19&20-153 Example - Assessing Schedule Consider the Schedule with Changes to a, b, and c CSE 4701 T1 Lock A T2 Lock B T2 Lock C T2 Unlock B T1 Lock B T1 Unlock A T2 Lock A T2 Unlock C T2 Unlock A T3 Lock A T3 Lock C T1 Unlock B T3 Unlock C T3 Unlock A A a a a a a f1(a) f1(a) f1(a) f5 (f1(a)) f5 (f1(a)) f5 (f1(a)) f5 (f1(a)) f5 (f1(a)) f6(f5 (f1(a))) B b b b f3(b) f3(b) f3(b) f3(b) f3(b) f3(b) f3(b) f3(b) f2 (f3(b)) f2 (f3(b)) f2 (f3(b)) C c c c c c c c f4( c ) f4( c ) f4( c ) f4( c ) f4( c ) f7 (f4( c )) f7 (f4( c )) Is this Schedule Serializable? Chaps19&20-154 Is this Schedule Serializable? CSE 4701 Focus on the Final Line - It indicates the Effective Order of Execution of Each Transaction for a, b, and c T1 has f1(a) and f2(b) T2 has f3(b) and f4(c) and f5(a) T3 has f6(a) and f7 (c) For A - Order of Transactions is T1 T2 T3 For B - T2 Must Precede T1 For C - T2 Must Precede T3 Can All Three Conditions be True w.r.t. Order? T3 Unlock A A f6(f5 (f1(a))) B f2 (f3(b)) C f7 (f4( c )) Chaps19&20-155 Determining Serializability in this Model CSE 4701 Examine Schedule Based on Order in Which Various Transactions Obtain Locks Order must be Equivalent to Some Hypothetical Serial Schedule of Transactions If Orders for Different Data Items Forces Two Transactions to Appear in a Different Order (T2 Must Precede T1 and T1 Must Precede T2 ) There is a Paradox! This is Equivalent to Searching for Cycles in a Directed Graph Chaps19&20-156 Recall Topological Sort CSE 4701 Graph is Acyclic Find a Node of Graph with ONLY Arrows Leaving (no Entering) Delete Node and Arrows Chaps19&20-157 Algorithm 1: Binary Lock Model CSE 4701 Input: Schedule S for Transactions T1, T2 , … Tk Output: Determination if S is Serializable, and If so, an Equivalent Serial Schedule Method: Create a Directed Precedence Graph G: Let S = a1 ; a2 ; … ; an where each ai is Tj :Lock Am or Tj : Unlock Am For each ai = Tj : Unlock Am , find next ap = Ts : Lock Am (1 < p n) (Ts is next Trans. to lock Am), and if so, draw Arc in G from Tj to Ts Repeat Until All Unlock/Lock are Checked Review the Resulting Precedence Graph If G has Cycles - Non-Serializable If G is Acyclic - Topological Sort to Find an Equivalent Serial Schedule Chaps19&20-158 Precedence Graph for Prior Example CSE 4701 T1 Lock A T2 Lock B T2 Lock C T2 Unlock B T1 Lock B T1 Unlock A T2 Lock A T2 Unlock C T2 Unlock A T3 Lock A T3 Lock C T1 Unlock B T3 Unlock C T3 Unlock A Look for Unlock Lock Combos on the Same Data Item T2 Unlock B and T1 Lock B T1 Unlock A and T2 Lock A T2 Unlock C and T3 Lock C T2 Unlock A and T3 Lock A B T1 T2 A, C A T3 IS IT SERIALIZABLE? Chaps19&20-159 Another Example CSE 4701 T2 Lock A T2 Unlock A T3 Lock A T3 Unlock A T1 Lock B T1 Unlock B T2 Lock B T2 Unlock B Look for Unlock Lock Combos on the Same Data Item T2 Unlock A and T3 Lock A T1 Unlock B and T2 Lock B IS IT SERIALIZABE? IF SO WHAT IS THE SCHEDULE? T1 T2 A B T3 Chaps19&20-160 Two-Phase Protocol CSE 4701 Two-Phase Protocol - All Locks Must Precede All Unlocks in the Schedule for a Transaction Which of the Transactions Below are Two-Phase? Why or Why Not? T1 Lock A Lock B Unlock A Unlock B T2 Lock B Lock C Unlock B Lock A Unlock C Unlock A T3 Lock A Lock C Unlock C Unlock A Chaps19&20-161 Theorems Regarding Serializability CSE 4701 Theorem 1: Algorithm 1 Correctly Determines if a Schedule S is Serializable (omit the proof). Theorem 2: If S is any Schedule of 2 Phase Transactions (i.e., all of its Transactions are 2-Phase), then S is Serializable. Proof by Contradiction. Suppose Not - they by Theorem 1, S has a Precedence Graph G with a Cycle T1 T2 T3 … Tp T1 UNL L UNL UNL L In T1 T2 , T1 is Unlock, so all Remaining Actions must also be Unlock, since S is 2 Phase However, in Tp T1 , T1 is Lock, which is a Contradiction to Fact that S is 2 Phase Chaps19&20-162 Problems of Binary Locks CSE 4701 Only One Transaction Can Hold a Lock on a Given Item No Shared Reading is Allowed - Too Restrictive For Example T1 is Read Only on X - Yet Needs Full Lock T2 is Read Only on X and Y - Needs Full Locks T1 Read(X); Read(Y); time t1 t2 Y = Y + 20; Write(Y); T2 t3 t4 t5 Read(X); Read(Y) commit; commit; Chaps19&20-163 Algorithm 2: A Read/Write Lock Model CSE 4701 Refines the Granularity of Locking to Differentiate Between Read and Write Locks Improves Concurrent Access Rlock (Shared): If T has an Rlock A, then Any Other Transaction can Also Rlock A, but All Transactions are Forbidden from Wlock A until All Transactions with Rlock A issue Ulock A (Multiple Reads) Wlock (Exclusive): If T has Wlock A, then All Other Transactions are Forbidden to Rlock or Wlock A Until T Ulocks A (Write Implies Reading, Single Write) Two Schedules are Equivalent if: Produce Same Value for Each Data Item Each Rlock on an Item Occurs in Both Schedules at a Time When Locked Item has the Same Value Chaps19&20-164 Motivating Algorithm 2 CSE 4701 Rlock (Shared): Multiple Reads Allowed Wlock (Exclusive): Write Implies Reading, Sole Write Identify All Dependencies Among Transactions that Read and Write the Same Item If Ti :Rlock A and Tj : Wlock A is Next Trans to Write A – put in an arc from Ti to Tj Ti must precede Tj in the Schedule w.r.t. A If Ti :Wlock A and Tj : Wlock A is Next Trans to Write A – put in an arc from Ti to Tj Ti must precede Tj in the Schedule w.r.t. A If Tm: Rlock A between Ti :Wlock A and Tj : Wlock– put in an arc from Ti to Tm Tm must follow Ti in the Schedule w.r.t. A Chaps19&20-165 Algorithm 2: Read/Write Lock Model CSE 4701 Input: Schedule S for Transactions T1, T2 , … Tk Output: Is S Serializable? If so, Serial Schedule Method: Create a Directed Precedence Graph G: Suppose in S, Ti :Rlock A. If Tj : Wlock A is the Next Transaction to Wlock A (if it exists) then place an Arc from Ti to Tj. Repeat for all Ti’s, all Rlocks before Wlock on A! Suppose in S, Ti :Wlock A. If Tj : Wlock A is the Next Transaction to Wlock A (if it exists) then place an Arc from Ti to Tj. If Also exists Tm :Rlock A after Ti :Wlock A but before Tj : Wlock A, then Draw an Arc from Ti to Tm. Review the Resulting Precedence Graph If G has Cycles - Non-Serializable If G is Acyclic - Topological Sort for Serial Schedule Chaps19&20-166 Algorithm 2: Read/Write Lock Model CSE 4701 Look for Following Arcs: Add Arc: Ti :Rlock A to Tj : Wlock A where Tj is the NEXT transaction to Write A Add Arc: Ti :Wlock A to Tj : Wlock A where Tj is the NEXT transaction to Write A Add Arc: Ti :Wlock A to Tm :Rlock Where Tm :Rlock A after Ti :Wlock A but before Tj : Wlock A, then Draw an Arc from Ti to Tm. Chaps19&20-167 Consider the Following Schedule What are the Dependencies Among Transactions? CSE 4701 T1 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) T2 T3 Wlock A T4 Rlock B Unlock A Rlock A Unlock B Wlock B Rlock A Unlock B Wlock B Unlock A Unlock A Wlock A Unlock B Rlock B Unlock A Unlock B Chaps19&20-168 What are the Different Cases? T1 before T4, T2 before T4 T3 before T1, T3 before T2, T3 before T4 CSE T4 before T3, T3 before T1 4701 T1 T2 T3 (1) Wlock A (2) (3) Unlock A (4) Rlock A (5) (6) Wlock B (7) Rlock A (8) Unlock B (9) Wlock B (10) Unlock A (11) Unlock A (12) (13) Unlock B (14) Rlock B (15) (16) Unlock B T4 For Each Rlock T1 :Rlock A T2 :Rlock A Look for Next T to Wlock A Rlock B For Each Wlock T3 :Wlock A Look for Unlock BNext T to Rlock or Wlock A For Each Rlock T4 :Rlock B Next T to Wlock B Wlock A Unlock A For Each Wlock T3 :Wlock B Look for Next T to Wlock Chaps19&20-169 B Consider the Following Schedule What is the Precedence Graph G? CSE 4701 T1 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) T2 T3 Wlock A T4 Rlock B Unlock A Rlock A Unlock B Wlock B Rlock A Unlock B Wlock B Unlock A Unlock A Wlock A Unlock B Rlock B Unlock A Unlock B Chaps19&20-170 Precedence Graph CSE 4701 What is the Resulting Precedence Graph? Is the Schedule Serializable? Why or Why Not? T1 before T4, T2 before T4 T3 before T1, T3 before T2, T3 before T4 T4 before T3, T3 before T1 T1 T2 A:RW A:RW A:WR B:WW A:WW B:WW T4 T3 B:RW Chaps19&20-171 A Read-Only/Write-Only Lock Model CSE 4701 Revision of the Read/Write Model for Algorithm 2 Refining Our Assumptions Assume that a Wlock on an Item Does not Mean that the Transaction First Reads the Item Contrary to First Two Models Example: Read A; Read B; C=A+B; A=A-1; Write A; Write C Reads A, B and Writes A,C (No Read on C) Reformulate Notion of Equivalent Schedules Chaps19&20-172 How Does This Model Differ from Alg. 2? CSE 4701 Consider the Schedule Segment: T1 : Wlock A T1 : Ulock A T2 : Wlock A T2 : Ulock A In Algorithm 2 - T2 : Wlock A Assumes that T2 Reads the Value Written by T1 However, This Need Not be True in the New Model If Between T1 and T2, No Transaction Rlocks A, then Value Written by is T1 Lost, T1 Does not Have to Precede T2 in a Schedule w.r.t. A Chaps19&20-173 Motivating Algorithm 3 CSE 4701 Rlock (Shared): Multiple Reads Allowed Wlock (Exclusive): Write Does Not Mean Read, Sole Write Successive Writes without intervening Read Means the Effects of Earlier Writes Disappear For a Clean Start All Items Written Prior before 1st Step of Sched For a Clean Finish All Items are Read After last Step of Sched Identify All Dependencies Among Transactions that Write (Ti) and Read (Tj) Same Item (T0 through Tf ) Add Arc from Ti to Tj (Ti is BEFORE Tj ) For Next “Reads” after “Write” Can’t be Intervening Writes Chaps19&20-174 Intuitive View of Algorithm 3 CSE 4701 If Tj Reads Value of “A” Written by Ti , then Tj Must Precede in any Serial Schedule For WR Combo - Draw an Arc from Ti to Tj Now Consider a T that also Writes “A” T Must be either Before Ti or After Tj Add in a Pair of Arcs T to Ti and Tj to T of Which one Must be Chosen in the Final Precedence Graph Serializability Occurs if After Choices Made for each “T” Pair, the Resulting Graph is Acyclic G is Referred to as a “Polygraph” with Nodes, Arcs, and Alternate Arcs Chaps19&20-175 Redefine Serializability CSE 4701 Conditions on Serializability Must be Redefined in Support of the Write-Does-Not-Assume Read Model If in Schedule S, Tj Reads “A” Written by Ti, then Ti Before Tj in any Serial Schedule Equivalent to S Further, if there is a T that Writes “A”, then in any Serial Schedule Equivalent to S, T is Before Ti or After Tj, but may not be Between Ti and Tj Graphically, we have: T A:WR A:WR Ti A:WR Ti Tj Tj A:RW A:RW T T A:WR Chaps19&20-176 Algorithm 3 Example Schedule T1 CSE 4701 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) T2 T3 T4 Rlock A Rlock A Wlock C Unlock C Rlock C Wlock B Unlock B Rlock B Unlock A Unlock A Wlock A Rlock C Wlock D Unlock B Unlock C Rlock B Unlock A Wlock A Unlock B Wlock B Unlock B Unlock D Unlock C Unlock A Chaps19&20-177 Augmentation of Precedence Graph CSE 4701 In Support of the Write Does Not Imply Read Model, we must Augment the Precedence Graph: Add an Initial Transaction To that Writes Every Item, and a Final Transaction Tf that Reads Every Item When a Transaction T’s Output is Invisible in Tf (I.e., the Value is Lost), Then T is Referred to as a Useless Transaction Useless Transactions have no Paths from Transaction to Tf Note: Maintain Same set of Locks (Rlock, Wlock, Ulock) with Different Interpretation on Wlock Chaps19&20-178 Algorithm 3 – Augmented Graph CSE 4701 T0 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) Tf T1 Write A Rlock A Wlock C Unlock C T2 Write B Rlock A T3 Write C T4 Write D T0 Writes A, B, C, D Prior to Step (1) Rlock C Wlock B Unlock B Rlock B Unlock A Unlock A Wlock A Rlock C Wlock D Unlock B Unlock C Rlock B Unlock A Wlock A Unlock B T Unlock D f Reads A, B, C, D After Step (24) Read A Read B Read C Wlock B Unlock B Unlock C Unlock A Read D Chaps19&20-179 Algorithm 3 – Steps 1 to 4 CSE 4701 Input: Schedule S for Transactions T1, T2 , … Tk Output: Is S Serializable? If so, Serial Schedule Method: Create a Directed Polygraph Graph P: 1. Augment S with Dummy To (Write Every Item) an Dummy Tf (Read Every Item) 2. Create Initial Polygraph P by Adding Nodes for To, Tf, and Each Ti Transaction , in S 3. Place an Arc from Ti to Tj Whenever Tj Reads A in Augmented S (with Dummy States) that was Last Written by Ti. Write to Read for Each Item Repeat this Step for all Arcs. Don’t Forget to Consider Dummy States! 4. Discover Useless Transactions - T is Useless if there is no Path from T to Tf This is the “Initialization” Phase of Algorithm 3 Chaps19&20-180 Resulting Polygraph - Steps 1 to 2 Create the Polygraph by 1. Add To and Tf to S, 2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P CSE 4701 T0 T1 T2 T3 T4 Tf 3. Augment Schedule with To and Tf T0 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) Tf T1 Write A T2 Write B T3 Write C T4 Write D Rlock A Rlock A Wlock C Unlock C Rlock C Wlock B Unlock B Rlock B Unlock A Unlock A Wlock A Rlock C Wlock D Unlock B Unlock C Rlock B Unlock A Wlock A Unlock B Wlock B Unlock B Unlock D Unlock C Unlock A Read A Read B Read C Read D Chaps19&20-181 Alg 3 Step 3 - Init=T0 & Fin=Tf CSE 4701 T0 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) Tf T1 Write A T2 Write B Rlock A Rlock A Wlock C Unlock C T3 Write C T4 Write D WhoReads ReadsB after AD Who Who Reads CAafter after WritesB? A? A? TT41T210Writes Writes C? D? Rlock C Wlock B Unlock B Rlock B Unlock A Unlock A Wlock A Rlock C Wlock D Unlock B Unlock C Rlock B No one Reads A after T3 Writes A? Unlock A Wlock A Unlock B Wlock B Unlock B Unlock D Read A Read B Read C Unlock C Unlock A Read D Chaps19&20-182 Step 3 -Write to Reads on A CSE 4701 Chaps19&20-183 Step 3 - Write to Reads on B CSE 4701 Chaps19&20-184 Step 3 - Write to Reads on C CSE 4701 Chaps19&20-185 Step 3 - Write to Reads on D CSE 4701 Chaps19&20-186 Resulting Polygraph - Steps 1 to 3 CSE 4701 1. Add To and Tf to S, 2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P 3. Look for Ti Write X to Tj Read X for all Items X 4. Look for Useless Transactions - No Paths from T to Tf D:WR C:WR B:WR T0 A:WR T1 A:WR B:WR T2 T3 T4 A:WR B:WR Tf C:WR C:WR Chaps19&20-187 Resulting Polygraph - Steps 1-4 CSE 4701 1. Add To and Tf to S, 2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P 3. Look for Ti Write X to Tj Read X for all Items X 4. For - T3 Remove Arcs Into T3 – This Completes Step 4 D:WR C:WR B:WR T0 A:WR T1 B:WR T2 T3 T4 A:WR B:WR Tf A:WR C:WR Chaps19&20-188 Algorithm 3 – Steps 5 to 7 CSE 4701 Method: Reassess the Initial Polygraph P: 5. For Each Remaining Arc Ti W to Tj R(meaning that Tj Reads Item A Written by Ti ) Consider all T To and T Tf that also Writes A: I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T 6. Determine if P is Acyclic by “Choosing” One Transaction Arc for Each Pair - Make Choices Carefully 7. If Acyclic - Serializable - Perform Topological Sort without To , Tf for Equivalent Serial Schedule. Else - Not Serializable Chaps19&20-189 What are Four Cases of Step 5 Conceptually? CSE 4701 5. For Each Remaining Arc Ti W to Tj R Consider all T To and T Tf that also Writes A: I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T General Case: Ti X:WR Case I: no new arc T0 X:WR Tf Tj Case II: Add Arc to from Ti to T T is after T0 X:WR Tj T II X:RW Chaps19&20-190 What are Four Cases of Step 5 Conceptually? CSE 4701 5. For Each Remaining Arc Ti W to Tj R Consider all T To and T Tf that also Writes A: I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T General Case: Ti X:WR Tj Case III: Add Arc from T to Ti – T is before T III X:RW Ti X:WR Tf Chaps19&20-191 What are Four Cases of Step 5 Conceptually? CSE 4701 5. For Each Remaining Arc Ti W to Tj R Consider all T To and T Tf that also Writes A: I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T General Case: Ti X:WR Case IV: Add in two Arcs T is after Tj or before Ti Tj Ti X:WR Tj T IV X:RW IV X:RW Chaps19&20-192 Step 5 - Go Thru Each Write/Read Arrow CSE 4701 T0 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) Tf T1 Write A T2 Write B Rlock A T3 Write C T4 Write D For For TT004 to to TT12f Arc Arc Who Who Else Else Writes Writes A? A? Rlock A Wlock C Unlock C Rlock C Wlock B Unlock B Rlock B Unlock A Unlock A Wlock A Rlock C Wlock D Unlock B Unlock C Rlock B Unlock A Wlock A Unlock B Wlock B Unlock B Unlock D Read A Read B Read C Unlock C Unlock A Read D Chaps19&20-193 Resulting Polygraph - Step 5 - A:WR D:WR C:WR B:WR CSE 4701 T0 A:WR T1 B:WR T2 T3 T4 A:WR B:WR Tf A:WR C:WR C:WR B:WR II A:RW II A:RW T0 A:WR T1 D:WR II A:RW T2 T3 T4 B:WR II A:RW III A:RW A:WR B:WR Tf A:WR C:WR Chaps19&20-194 Resulting Polygraph - Step 5 - A:WR 5. For Each Arc Ti to Tj Consider All T’s that Write X I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti IV. If Ti To and Tj Tf then Add Pair from T to Ti and Tj to T Check Items A (see new arcs/labels - case II and III) CSE 4701 C:WR B:WR II A:RW II A:RW T0 A:WR T1 A:WR D:WR II A:RW T2 T3 T4 B:WR II A:RW III A:RW A:WR B:WR Tf C:WR Chaps19&20-195 Alg 3 Ex - Step 5 - Who Else Writes C/D? CSE 4701 T0 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) Tf T1 Write A T2 Write B Rlock A Rlock A Wlock C Unlock C T3 Write C T4 Write D T0 For 1 Arcs For three One T2TArc Does Does Anyone Anyone Else Else Write Write C? D? Rlock C Wlock B Unlock B Rlock B Unlock A Unlock A Wlock A Rlock C Wlock D Unlock B Unlock C Rlock B Unlock A No Writes No New Arcs Wlock A Unlock B Wlock B Unlock B Unlock D Read A Read B Read C Unlock C Unlock A Read D Tf Chaps19&20-196 Resulting Polygraph-Step 5- C:WR & D:WR 5. For Each Arc Ti to Tj Consider All T’s that Write X CSE 4701 I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti IV. If Ti To and Tj Tf then Add Pair from T to Ti and Tj to T Do any Other Transactions Write C or Write D for the arrows labeled C:WR and D:WR Respectively? C:WR B:WR II A:RW II A:RW T0 A:WR T1 D:WR II A:RW T2 T3 T4 B:WR III A:RW II A:RW A:WR B:WR Tf A:WR C:WR Chaps19&20-197 Alg 3 Ex - Step 5 - Who Else Writes B? CSE 4701 T0 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) Tf T1 Write A T2 Write B Rlock A T3 Write C Rlock A Wlock C Unlock C Rlock Wlock B Unlock B T4 Write D For For to toCase Arc Arc For TThis Just TTT is to already TTT Arc IV arc Two 1T 1 but 41Arcs: 4 so f4 2no Who Who Else ElseWrites Writes B? B? Who Arc Else from T T BWrites T44Writes after T12 to andTB? 4 C T4 before T1 Rlock B Unlock A Unlock A Wlock A Rlock C Wlock D Unlock B Unlock C Rlock B Unlock A Wlock A Unlock B Wlock B Unlock B Unlock D Read A Read B Read C Unlock C Unlock A Read D Chaps19&20-198 Two Added Arcs for Case IV and B T4and Follows T2 T1 T4 Before CSE 4701 IV B:RW C:WR B:WR II A:RW II A:RW D:WR II A:RW T0 A:WR T1 B:WR T2 T3 II A:RW A:WR III A:RW T4 A:WR B:WR Tf C:WR IV B:RW Chaps19&20-199 Resulting Polygraph - Step 5 and 6 5. For Each Arc Ti to Tj Consider All T’s that Write X I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti IV. If Ti To and Tj Tf then Add Pair from T to Ti and Tj to T B (see new arcs - including alternates - dashed) CSE 4701 For T1 to T2, T4 writes - so add T2 to T4 and T4 to T1 – Case IV Either T4 After T2 or Before T1 - no new arcs for other WRs. C:WR B:WR IV B:RW II A:RW II A:RW D:WR II A:RW T0 A:WR T1 B:WR T2 T3 II A:RW A:WR IV B:RW III A:RW T4 A:WR B:WR Tf C:WR Chaps19&20-200 Resulting Polygraph - Step 5 and 6 6. Which Option of Pair of Arcs Should be Chosen? Why? CSE 4701 C:WR B:WR IV B:RW II A:RW II A:RW D:WR II A:RW T0 A:WR T1 B:WR T2 II A:RW T3 A:WR IV B:RW III A:RW T4 A:WR B:WR Tf C:WR Chaps19&20-201 Final Polygraph - Step 7 Final Graph with Are Removed Delete Dummy States below CSE 4701 C:WR B:WR IV B:RW II A:RW II A:RW D:WR II A:RW T0 A:WR T1 B:WR T2 T3 II A:RW A:WR III A:RW T4 A:WR B:WR Tf C:WR Topological Sort Yields Order: T1 , T2 , T3 , T4 C:WR B:WR II A:RW II A:RW II A:RW T1 B:WR T2 II A:RW T3 IV B:RW III A:RW T4 Chaps19&20-202 Why Optimistic Concurrency Control? CSE 4701 Motivate by Disadvantages of Locking Techniques Lock Maintenance Deadlock-Free Locking Protocols Limit Concurrency Secondary Memory Access Causes Locks to be Held for a Long Duration Locks Typically Held Until Transaction Completes, Which Reduces Concurrency Often Needed in “Worst” Case Only Overhead - Locking + Deadlock Detection Key Concept Write Collisions in Large Databases for “Many” Applications are Rare OCC: “Don’t Worry be Happy” Approach Chaps19&20-203 Basic Ideas of OCC CSE 4701 Interference Between Transactions is Rare and Locking Incurs too Much Overhead Instead, Allow Each Transaction to Execute Freely, and Check Serializability at the end of the Transaction Win (Allow to Commit) If No Interference Occurs or There have been No Conflicts Pessimistic execution Validate Read Write (and Compute) Optimistic execution Read Validate Write (and Compute) Chaps19&20-204 How Does OCC Work? CSE 4701 Execute Transactions Ad-Hoc - Let them Go Uncontrolled Maintain Information of “Relevant” Actions Against DB (Often in Conjunction with Recovery/Journal) When Transactions Finish - Check to see if Everything Proceeded Satisfactorily Assumes that Probability of Transaction Interference is Quite Small Two Questions re. OCC: How Do We know Everything Went OK? How do we Recover if it Didn’t? Chaps19&20-205 What is a Timestamp? CSE 4701 Timestamp A system generated clock “tick” to record event Two events cannot occur at same “tick” A monotonically increasing variable (integer) indicating the age of an operation or a transaction. A larger timestamp value indicates a more recent event or operation. Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions. For DB Transactions, a timestamp could be: Time that transaction is initiated Time of first read/write of transaction Remains unchanged throughout all Transaction steps Chaps19&20-206 How are Timestamps Utilized? CSE 4701 Each Transaction has unique Timestamp(TS) when started Associated with the Read time and Write time (when Stored) of Each Item in the DB t1 TS of Transaction, B an Item with TS t2 Avoid “impossible” situations – A Transaction CANNOT read the value of an Item if it was not written until after transaction executed Trans TS t1 can’t read Item B with write TS t2 if t2 > t1 A Transaction CANNOT write an Item if that Item has an old value read at a later time (after) Trans TS t1 can’t write Item B with read TS t2 if t2 > t1 If happens - Trans TS t1 must abort Chaps19&20-207 OCC Utilizes Timestamps CSE 4701 Timestamps are Clock Ticks used to Record the Major Milestones in the Execution of a Transaction Examples Include: Start Time of Transaction Read/Write Times for DB Items Finish Time of Transaction Commit Time of Transaction Two Important Definitions are: Read Time of an Item: Highest Time Stamp Possessed by Any Transaction that Reads the Item Write Time of an Item: Highest Time Stamp Possessed by Any Transaction that Wrote the Item A Transaction has a Fixed Time when it Started that is Constant Throughout its Execution Chaps19&20-208 How are Timestamps Used? CSE 4701 Focus on “When” Reads and Writes Occur Transaction Cannot Read an Item if its Value was Not Written Until After the Transaction Finished its Execution Transaction T with Timestamp t1 Cannot Read an Item with a Write Time of t2 if t2 > t1 If this is the Case, T Must Abort and be Restarted Can’t Read Item if it hasn’t been Written Transaction Cannot Write an Item if that Item has its Old Value Read at a Later Time Transaction T with Timestamp t1 Cannot Write an Item with a Read Time of t2 if t2 > t1 If this is the Case, T Must Abort and be Restarted Can’t Write Item Being Read at a Later Time Chaps19&20-209 Algorithm 4: Optimistic CC CSE 4701 Let T be a Transaction with Timestamp t Attempting to Perform Operation X on a Data Item I with Readtime tR and Writetime tW If (X = Read and t tW ) Perform Oper If t > tW then set tR = t for Data Item I (read after write) If (X = Write and t tR and t tW ) Perform Oper If t > tr then set tW = t for Data Item I (write after read) If (X = Write and tR t < tW ) then Do Nothing since Later Write will Cancel out the Write of T If (X = Read and t < tW ) or (X = Write and t < tR ) then Abort the Operation 1st - T trying to Read Item Before it was Written 2nd - T trying to Write an Item Before it was Read Chaps19&20-210 Example of OCC CSE 4701 T1 T2 200 150 T3 175 (1) Read B (2) Read A (3) Read C (4) Write B (5) Write A A B C RT=0 WT=0 RT=0 WT=0 RT=0 WT=0 RT=0 WT=0 RT=150 WT=0 RT=150 WT=0 RT=150 WT=0 RT=150 WT=200 RT=200 WT=0 RT=200 WT=0 RT=200 WT=0 RT=200 WT=200 RT=200 WT=200 RT=0 WT=0 RT=0 WT=0 RT=175 WT=0 RT=175 WT=0 RT=175 WT=0 What Happens at Each Step w.r.t. RT/WT? T3 ≥150 TS 175 – set C.RT T1 TST2200 B.WT =≥ 0C.WT –= set B.RT =200 TS ≥ A.WT 0 =– 0 set A.RT =150=175 T1 TS 200 ≥ B.RT = 200 – set B.WT =200 T1 TS 200 ≥ A.RT = 150 – set A.WT =200 Chaps19&20-211 CSE 4701 T2 TS 150 ≥ A.WT = 0 – set A.RT = 150 T1 TS 175 ≥ A.WT = 0 – set A.RT = 175 T1 TS 175≥ C.RT = 0 – set C.WT = 175 T3 TS 200 ≥ C.WT = 0 – set C.RT = 200 T1 TS 175≥ B.RT = 0 – set B.WT = 175 T4 TS 225 ≥ B.WT = 175 – set B.RT = 225 T3 TS 200 ≥ A.RT = 175 – set A.WT = 300 T4 TS 225 ≥ C.RT = 0 – set C.WT = 225 T2 TS 150 ≥ D.RT = 0 – set D.WT = 150 T2 TS 150 IN NOT ≥ B.WT = 225 – ABORT T2 Chaps19&20-212 Example of OCC CSE 4701 T1 T2 200 150 T3 175 (1) Read B (2) Read A (3) Read C (4) Write B (5) Write A (6) Write C A B C RT=0 WT=0 RT=0 WT=0 RT=0 WT=0 RT=0 WT=0 RT=150 WT=0 RT=150 WT=0 RT=150 WT=0 RT=150 WT=200 RT=200 WT=0 RT=200 WT=0 RT=200 WT=0 RT=200 WT=200 RT=200 WT=200 RT=0 WT=0 RT=0 WT=0 RT=175 WT=0 RT=175 WT=0 RT=175 WT=0 RT=150 WT=200 RT=200 WT=200 RT=175 WT=0 What Happens at Step 6? T2 WT(C) =150 < RT(C)=175 Trying to write C after its Read - Consequence - Abort T2 Chaps19&20-213 Example of OCC CSE 4701 T1 T2 200 150 T3 175 (1) Read B (2) Read A (3) Read C (4) Write B (5) Write A (6) Write C (7) Write A A B C RT=0 WT=0 RT=0 WT=0 RT=0 WT=0 RT=0 WT=0 RT=150 WT=0 RT=150 WT=0 RT=150 WT=0 RT=150 WT=200 RT=150 WT=200 RT=150 WT=200 RT=200 WT=0 RT=200 WT=0 RT=200 WT=0 RT=200 WT=200 RT=200 WT=200 RT=200 WT=200 RT=200 WT=200 RT=0 WT=0 RT=0 WT=0 RT=175 WT=0 RT=175 WT=0 RT=175 WT=0 RT=175 WT=0 RT=175 WT=0 Step (7) T3 175 < A.RT can Finish, but No Effect Chaps19&20-214 Summary of Example CSE 4701 T1 Completes Successfully; T2 Aborts; T3 Completes but Doesn’t Write A T1 T2 T3 A 200 150 175 RT=0 WT=0 RT=0 WT=0 RT=0 WT=0 RT=0 WT=0 RT=150 WT=0 RT=150 WT=0 RT=150 WT=0 RT=150 WT=200 RT=200 WT=0 RT=200 WT=0 RT=200 WT=0 RT=200 WT=200 RT=200 WT=200 RT=0 WT=0 RT=0 WT=0 RT=175 WT=0 RT=175 WT=0 RT=175 WT=0 RT=150 WT=200 RT=150 WT=200 RT=200 WT=200 RT=200 WT=200 RT=175 WT=0 RT=175 WT=0 (1) Read B (2) Read A (3) Read C (4) Write B (5) Write A (6) Write C (7) Write A B C Chaps19&20-215 Viewing OCC vs. Phases of Execution CSE 4701 Read Phase: Database Information Read from Secondary Storage into Primary Memory All Writes are to Local Workspace Validate Phase: Check to see if Integrity of Data has not been Violated Write Phase: Update the DB (Secondary Storage) from Local Copies Optimistic execution Read Validate Write (and Compute) Chaps19&20-216 Contrasting PCC and OCC CSE 4701 Transaction Control PCC: Control by Having Transactions Wait OCC: Control by Having Transactions Backed up Serializability PCC: Ordering of Data Items OCC: Ordering of Transactions Biggest Potential Problem PCC: Deadlock, rather Preventing it OCC: Starvation Different Applications Suited to Different Approaches Some DBMS Support Both DBA Can Configure on Application-byApplication Basis Chaps19&20-217