CS 405G: Introduction to Database Systems Lecture 10: Normalization and Transactions Instructor: Chen Qian 3/28 Quiz 4 7/1/2016 Chen Qian @ University of Kentucky 2 Normalization A normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. A normal form is a certification that tells whether a relation schema is in a particular state 7/1/2016 Chen Qian @ University of Kentucky 3 First Normal Form ( 1NF ) NF is to characterize a relation (not an attribute, a key, etc…) We can only say “this relation or table is in 1NF” A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. 7/1/2016 Chen Qian @ University of Kentucky 4 7/1/2016 Chen Qian @ Univ of Kentucky 2nd Normal Form An attribute A of a relation R is a nonprimary attribute if it is not part of any key in R, otherwise, A is a primary attribute. R is in (general) 2nd normal form if every nonprimary attribute A in R is not partially functionally dependent on any key of R 7/1/2016 Chen Qian @ University of Kentucky 6 7/1/2016 Chen Qian @ Univ of Kentucky Decomposition EID PID Ename email Pname Hours 1234 10 John Smith jsmith@ac.com B2B platform 10 1123 9 Ben Liu bliu@ac.com CRM 40 1234 9 John Smith jsmith@ac.com CRM 30 1023 10 Susan Sidhuk Decomposition ssidhuk@ac.com B2B platform 40 Foreign key EID Ename email EID PID Pname Hours 1234 John Smith jsmith@ac.com 1234 10 B2B platform 10 1123 Ben Liu bliu@ac.com 1123 9 CRM 40 1023 Susan Sidhuk ssidhuk@ac.com 1234 9 CRM 30 1023 10 B2B platform 40 Decomposition eliminates redundancy To get back to the original relation, use natural join. 7/1/2016 Chen Qian @ University of Kentucky 8 Decomposition Decomposition may be applied recursively 7/1/2016 EID PID Pname Hours 1234 10 B2B platform 10 1123 9 CRM 40 1234 9 CRM 30 1023 10 B2B platform 40 PID Pname EID PID Hours 10 B2B platform 1234 10 10 9 CRM 1123 9 40 1234 9 30 1023 10 40 Chen Qian @ University of Kentucky 9 Third normal form • 3NF requires that there are no non-trivial functional dependencies of non-key attributes on something other than a superset of a candidate key. • Recall: non-trivial FD means LHS has no intersection with RHS. • In summary, all non-key attributes are mutually independent. 7/1/2016 Chen Qian @ University of Kentucky 10 Boyce-Codd normal form (BCNF) • BCNF requires that there are no non-trivial functional dependencies of attributes on something other than a superset of a candidate key (called a superkey). • All attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A). 7/1/2016 Chen Qian @ University of Kentucky 11 • A table is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, leftirreducible functional dependency has a candidate key as its determinant. • In more informal terms, a table is in BCNF if it is in 3NF and the only determinants are the candidate keys. 7/1/2016 Chen Qian @ University of Kentucky 12 BCNF decomposition example WorkOn (EID, Ename, email, PID, hours) BCNF violation: EID -> Ename, email Student (EID, Ename, email) BCNF 7/1/2016 Grade (EID, PID, hours) BCNF Chen Qian @ University of Kentucky 13 Another example WorkOn (EID, Ename, email, PID, hours) BCNF violation: email -> EID StudentID (email, EID) BCNF StudentGrade’ (email, Ename, PID, hours) BCNF violation: email -> Ename StudentName (email, Ename) Grade (email, PID, hours) BCNF BCNF 7/1/2016 Chen Qian @ University of Kentucky 14 Normalization There is a sequence to normal forms: 1NF is considered the weakest, 2NF is stronger than 1NF, 3NF is stronger than 2NF, and BCNF is considered the strongest Also, any relation that is in BCNF, is in 3NF; any relation in 3NF is in 2NF; and any relation in 2NF is in 1NF. 7/1/2016 15 In 3NF, but not in BCNF: Instructor teaches one course only. student_no course_no instr_no Student takes a course and has one instructor. {student_no, course_no} instr_no instr_no course_no since we have instr_no course-no, but instr_no is not a Candidate key. 7/1/2016 16 student_no course_no instr_no student_no instr_no course_no instr_no {student_no, instr_no} student_no {student_no, instr_no} instr_no instr_no course_no 7/1/2016 17 2NF, but not in 3NF, nor in BCNF: inv_no line_no prod_no prod_desc qty since prod_no is not a candidate key and we have: prod_no prod_desc. 7/1/2016 18 Summary Philosophy behind BCNF: Data should depend on the key, the whole key, and nothing but the key! Philosophy behind 3NF: … But not at the expense of more expensive constraint enforcement! 7/1/2016 19 Basic knowledge Transaction view of DBMS ACID Read(x) Write(x) Atomicity: TX’s are either completely done or not done at all Consistency: TX’s should leave the database in a consistent state Isolation: TX’s must behave as if they are executed in isolation Durability: Effects of committed TX’s are resilient against failures SQL transactions -- Begins implicitly SELECT …; UPDATE …; ROLLBACK | COMMIT; 7/1/2016 Chen Qian @ University of Kentucky 20 Concurrency control Goal: ensure the “I” (isolation) in ACID T1: read(A); write(A); read(B); write(B); commit; A B 7/1/2016 T2: read(A); write(A); read(C); write(C); commit; C Chen Qian @ University of Kentucky 21 Good versus bad schedules Good! T1 T2 r(A) w(A) r(B) w(B) T1 Good! (But why?) T2 T1 r(A) r(A) w(A) r(A) Read 400 Write w(A) 400 – 100 r(A) w(A) r(C) w(C) 7/1/2016 Bad! r(B) Read 400 r(A) w(A) w(A) Write 400 – 50 r(B) r(C) w(B) T2 r(C) w(B) w(C) Chen Qian @ University of Kentucky w(C) 22 Serial schedule Execute transactions in order, with no interleaving of operations T1.r(A), T1.w(A), T1.r(B), T1.w(B), T2.r(A), T2.w(A), T2.r(C), T2.w(C) T2.r(A), T2.w(A), T2.r(C), T2.w(C), T1.r(A), T1.w(A), T1.r(B), T1.w(B) Isolation achieved by definition! Problem: no concurrency at all Question: how to reorder operations to allow more concurrency 7/1/2016 Chen Qian @ University of Kentucky 23 Conflicting operations Two operations on the same data item conflict if at least one of the operations is a write r(X) and w(X) conflict w(X) and r(X) conflict w(X) and w(X) conflict r(X) and r(X) do not r/w(X) and r/w(Y) do not Order of conflicting operations matters E.g., if T1.r(A) precedes T2.w(A), then conceptually, T1 should precede T2 7/1/2016 Chen Qian @ University of Kentucky 24 Precedence graph A node for each transaction A directed edge from Ti to Tj if an operation of Ti precedes and conflicts with an operation of Tj in the schedule T1 T2 r(A) w(A) T2 T1 r(A) r(A) w(A) r(B) r(C) w(B) 7/1/2016 T1 T1 w(C) r(A) T2 w(A) Good: no cycle w(A) r(B) r(C) w(B) w(C) Chen Qian @ University of Kentucky T2 Bad: cycle 25 Conflict-serializable schedule A schedule is conflict-serializable iff its precedence graph has no cycles A conflict-serializable schedule is equivalent to some serial schedule (and therefore is “good”) In that serial schedule, transactions are executed in the topological order of the precedence graph You can get to that serial schedule by repeatedly swapping adjacent, non-conflicting operations from different transactions 7/1/2016 Chen Qian @ University of Kentucky 26 Remember those from OS class? Lock: a high-level concept that describe the state of a data item with respect to read/write operations Deadlock: Spinlock Semaphore Monitor A set of processes is deadlocked if each process is waiting for an event that only another process in the set can cause Starvation: A program continues to run indefinitely but fail to make any progress 7/1/2016 Chen Qian @ University of Kentucky 27 Next Guarantee conflict-serializable schedule with 2 phase locking 7/1/2016 Chen Qian @ University of Kentucky 28 Locking Rules If a transaction wants to read an object, it must first request a shared lock (S mode) on that object If a transaction wants to modify an object, it must first request an exclusive lock (X mode) on that object Allow one exclusive lock, or multiple shared locks Mode of the lock requested Mode of lock(s) currently held by other transactions 7/1/2016 S X S Y N X N N Grant the lock? Compatibility matrix Chen Qian @ University of Kentucky 29 Basic locking is not enough Add 1 to both A and B (preserve A=B) T1 T2 Multiply both A and B by 2 (preserves A=B) lock-X(A) Read 100 Write 100+1 r(A) w(A) unlock(A) lock-X(A) r(A) w(A) Possible schedule under locking Read 101 Write 101*2 unlock(A) lock-X(B) But still not conflict-serializable! r(B) w(B) T1 T2 Read 100 Write 100*2 unlock(B) lock-X(B) Read 200 Write 200+1 7/1/2016 r(B) w(B) Chen Qian @ University of Kentucky unlock(B) A B! 30 Two-phase locking (2PL) All lock requests precede all unlock requests Phase 1: obtain locks, phase 2: release locks T1 lock-X(A) r(A) w(A) lock-X(B) unlock(A) r(B) w(B) 7/1/2016 unlock(B) T2 T1 2PL guarantees a conflict-serializable r(A) w(A) schedule lock-X(A) r(A) r(B) w(A) w(B) lock-X(B) r(B) w(B) Cannot obtain the lock on B until T1 unlocks Chen Qian @ University of Kentucky T2 r(A) w(A) r(B) w(B) 31 Problem of 2PL T1 T2 r(A) w(A) r(A) w(A) r(B) w(B) Abort! r(B) w(B) T2 has read uncommitted data written by T1 If T1 aborts, then T2 must abort as well Cascading aborts possible if other transactions have read data written by T2 Even worse, what if T2 commits before T1? Schedule is not recoverable if the system crashes right after T2 commits 7/1/2016 Chen Qian @ University of Kentucky 32 Strict 2PL Only release locks at commit/abort time A writer will block all other readers until the writer commits or aborts Used in most commercial DBMS 7/1/2016 Chen Qian @ University of Kentucky 33 Next ... A few examples 7/1/2016 Chen Qian @ University of Kentucky 34 Non-2PL, A= 1000, B=2000, Output =? Lock_X(A) Read(A) A: = A-50 Write(A) Lock_S(A) Unlock(A) Read(A) Unlock(A) Lock_S(B) Lock_X(B) Read(B) Unlock(B) PRINT(A+B) Read(B) B := B +50 Write(B) Unlock(B) 35 2PL, A= 1000, B=2000, Output =? Lock_X(A) Read(A) A: = A-50 Write(A) Lock_X(B) Unlock(A) Lock_S(A) Read(A) Read(B) B := B +50 Write(B) Unlock(B) Lock_S(B) Unlock(A) Read(B) Unlock(B) 7/1/2016 Chen Qian @ University of Kentucky PRINT(A+B) 36 Strict 2PL, A= 1000, B=2000, Output =? Lock_X(A) Read(A) A: = A-50 Write(A) Lock_X(B) Read(B) B := B +50 Write(B) Unlock(A) Unlock(B) Lock_S(A) Read(A) Lock_S(B) Read(B) PRINT(A+B) Unlock(A) 7/1/2016 Unlock(B) Chen Qian @ University of Kentucky 37 Lock Management Lock and unlock requests handled by Lock Manager LM keeps an entry for each currently held lock. Entry contains: 7/1/2016 List of xacts currently holding lock Type of lock held (shared or exclusive) Queue of lock requests Chen Qian @ University of Kentucky 38 Lock Management, cont. When lock request arrives: Lock upgrade: 7/1/2016 Does any other xact hold a conflicting lock? If no, grant the lock. If yes, put requestor into wait queue. Shared lock can request to upgrade to exclusive Chen Qian @ University of Kentucky 39 Deadlocks Deadlock: Cycle of transactions waiting for locks to be released by each other. Two ways of dealing with deadlocks: prevention detection Many systems just punt and use Timeouts What are the dangers with this approach? 7/1/2016 Chen Qian @ University of Kentucky 40 Deadlock Detection Create and maintain a “waits-for” graph Periodically check for cycles in graph 7/1/2016 Chen Qian @ University of Kentucky 41 Deadlock Detection (Continued) Example: T1: S(A), S(D), T2: X(B) T3: T4: S(B) X(C) S(D), S(C), X(A) X(B) Deadlock! 7/1/2016 T1 T2 T4 T3 Chen Qian @ University of Kentucky 42 Deadlock Prevention Assign priorities based on timestamps. Say Ti wants a lock that Tj holds Two policies are possible: Wait-Die: If Ti has higher priority, Ti waits for Tj; otherwise Ti aborts Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits Why do these schemes guarantee no deadlocks? Important detail: If a transaction re-starts, make sure it gets its original timestamp. -- Why? 7/1/2016 Chen Qian @ University of Kentucky 43 Summary Correctness criterion for isolation is “serializability”. In practice, we use “conflict serializability,” which is somewhat more restrictive but easy to enforce. Two Phase Locking and Strict 2PL: Locks implement the notions of conflict directly. The lock manager keeps track of the locks issued. Deadlocks may arise; can either be prevented or detected. 7/1/2016 Chen Qian @ University of Kentucky 44