Quick Review of Apr 24 material • Sorting (Sections 13.4) • Sort-merge Algorithm for external sorting • Join Operation implementations (sect. 13.5) – – – – – – Size estimation Nested-loop method (and cost) Sort-Merge Join (and cost) Hash Join (and cost) Indexed Join (and cost) 3-way Join Remaining Material • Read over sections the remainder of chapter 13 on your own (you are responsible for the material) • We’re going to look at section 14.3 today (following slides) • We’ve covered a significant part of the material in chapter 14 already, mixed in with the chapter 13 material in the class notes. Read through chapter 14 yourself; you are responsible for the material. • We have five classes remaining (counting today) – – – – – the bulk of it will be spent on chapter 15 (15.1-5, 15.9) additional material will be sections 16.1 (lock-based protocols) and section 17.4 (log-based recovery) if possible, we will attempt to finish up in time to use May 13 as a review May 15? Optional Study Class? Transformation of Relational Expressions • (Section 14.3) • Two relational algebra expressions are equivalent if they generate the same set of tuples on every legal database instance. – Important for optimization – Allows one relational expression to be replaced by another without affecting the results – We can choose among equivalent expressions according to which are lower cost (size or speed depending upon our needs) – An equivalence rule says that expression A is equivalent to B; we may replace A with B and vice versa in the optimizer. Equivalence Rules (1) In the discussion that follows: Ex represents a relational algebra expression y represents a predicate Lz represents a particular list of attributes and are selection and projection as usual • cascade of selections: a conjunction of selections can be deconstructed into a series of them 12(E) = 1(2(E)) • selections are commutative 1(2(E)) = 2( 1(E)) Equivalence Rules (2) • cascade of projections: only the final ops in a series of projections is necessary L1(L2(…(L3(E))…)) = L1(E) • selections can be combined with Cartesian products and theta joins (E1 X E2) = E1 |X| E2 1(E1 | X|2 E2) = E1 |X|12 E2 • joins and theta-joins are commutative (if you ignore the order of attributes, which can be corrected by an appropriate projection) E1 |X| E2 = E2 |X| E1 Equivalence Rules (3) • joins and cross product are associative (E1 |X|1 3 E2 ) |X|2 E3 = E1 |X|1 (E2 |X|23 E3) (where 2 involves attributes from only E2 and E3.) (E1 |X| E2 ) |X| E3 = E1 |X| (E2 |X| E3) (E1 X E2 ) X E3 = E1 X (E2 X E3) (the above two equivalences are useful special cases of the first one) Equivalence Rules (4) • selection distributes over join in two cases: – if all the attributes in 1 appear only in one expression (say E1) 1 (E1 |X| E2) = (1 (E1 )) |X| E2 – if all the attributes in 1 appear only in E1 and all the attributes in 2 appear only in E2 1 2 (E1 |X| E2) = (1 (E1 )) |X| (2 ( E2)) (note that the first case is a special case of the second) Equivalence Rules (5) • projection distributes over join. Given L1 and L2 being attributes of E1 and E2 respectively: – if involves attributes entirely from L1 and L2 then L1L2 (E1 |X| E2) = (L1 (E1)) |X| (L2 ( E2)) – if involves attribute lists L3 and L4 (both not in L1 L2) from E1 and E2 respectively L1L2 (E1 |X| E2) = L1L2 (L1L3 (E1)) |X| (L2L4 ( E2)) Equivalence Rules (6) • union and intersection are commutative (difference is not) E1 E2 = E2 E1 E1 E2 = E2 E1 • union and intersection are associative (difference is not) (E1 E2) E3 = E1 (E2 E3) (E1 E2) E3 = E1 (E2 E3) • selection distributes over union, intersection, set difference (E1 E2) = (E1) (E2) (E1 E2) = (E1) (E2) (E1 E2) = (E1) E2 (E1 - E2) = (E1) - (E2) (E1 - E2) = (E1) - E2 Chapter 15: Transactions • A transaction is a single logical unit of database work -for example, a transfer of funds from checking to savings account. It is a set of operations delimited by statements of the form begin transaction and end transaction • To ensure database integrity, we require that transactions have the ACID properties – Atomic: all or nothing gets done. – Consistent: preserves the consistency of the database – Isolated: unaware of any other concurrent transactions (as if there were none) – Durable: after completion the results are permanent even through system crashes ACID Transactions • Transactions access data using two operations: – read (X) – write (X) • ACID property violations: – Consistency is the responsibility of the programmer – System crash half-way through: atomicity issues – Another transaction modifies the data half-way through: isolation violation – example: transfer $50 from account A to account B T: read(A); A:=A-50; write (A); read (B); B:=B+50; write (B) Transaction States • Five possible states for a transaction: – active (executing) – partially committed (after last statement’s execution) – failed (can no longer proceed) – committed (successful completion) – aborted (after transaction has been rolled back Transaction States • Committed or Aborted transactions are called terminated • Aborted transactions may be – restarted as a new transaction – killed if it is clear that it will fail again • Rollbacks – can be requested by the transaction itself (go back to a given execution state) – some actions can’t be rolled back (e.g., a printed message, or an ATM cash withdrawal) Concurrent Execution • Transaction processing systems allow multiple transactions to run concurrently – improves throughput and resource utilization (I/O on transaction A can be done in parallel with CPU processing on transaction B) – reduced waiting time (short transactions need not wait for long ones operating on other parts of the database) – however, this can cause problems with the “I” (Isolation) property of ACID Serializability • Scheduling transactions so that they are serializable (equivalent to some serial schedule of the same transctions) ensures database consistency (and the “I” property of ACID) • serial schedule is equivalent to having the transactions execute one at a time • non-serial or interleaved schedule permits concurrent execution Serial Schedule • To the right T1 and T2 are on a serial schedule: T2 begins after T1 finishes. No problems. Interleaved Schedule • This is an example of an interleaved concurrent schedule that raises a number of database consistency concerns.