CS 405G: Introduction to Database Systems Lecture 8: SQL III and Functional Dependency Instructor: Chen Qian 3/4 HW3 due Midterm exam to 3/7 (1-2pm) 7/1/2016 Chen Qian @ University of Kentucky 2 Trigger options Possible events include: Granularity—trigger can be activated: INSERT ON table DELETE ON table UPDATE [OF column] ON table FOR EACH ROW modified FOR EACH STATEMENT that performs modification Timing—action can be executed: AFTER or BEFORE the triggering event 7/1/2016 Chen Qian @ Univ. of Kentucky 3 Transition variables OLD ROW: the modified row before the triggering event NEW ROW: the modified row after the triggering event OLD TABLE: a hypothetical read-only table containing all modified rows before the triggering event NEW TABLE: a hypothetical table containing all modified rows after the triggering event Not all of them make sense all the time, e.g. AFTER INSERT statement-level triggers BEFORE DELETE row-level triggers Can use only NEW TABLE Can use only OLD ROW etc. 7/1/2016 Chen Qian @ Univ. of Kentucky 4 Statement-level trigger example CREATE TRIGGER AutoRecruit AFTER INSERT ON Student REFERENCING NEW TABLE AS newStudents FOR EACH STATEMENT INSERT INTO Enroll (SELECT SID, ’CS405’ FROM newStudents WHERE GPA > 3.0); Efficiency??? 7/1/2016 Chen Qian @ Univ. of Kentucky 5 BEFORE trigger example Never give faculty more than 50% raise in one update CREATE TRIGGER NotTooGreedy BEFORE UPDATE OF salary ON Faculty REFERENCING OLD ROW AS o, NEW ROW AS n FOR EACH ROW WHEN (n.salary > 1.5 * o.salary) SET n.salary = 1.5 * o.salary; BEFORE triggers are often used to “condition” data Another option is to raise an error in the trigger body to abort the transaction that caused the trigger to fire 7/1/2016 Chen Qian @ Univ. of Kentucky 6 Statement- vs. row-level triggers Why are both needed? Certain triggers are only possible at statement level If the average GPA of students inserted by this statement exceeds 3.0, do … Simple row-level triggers are easier to implement and may be more efficient Statement-level triggers require significant amount of state to be maintained in OLD TABLE and NEW TABLE However, a row-level trigger does get fired for each row, so complex row-level triggers may be inefficient for statements that generate lots of modifications 7/1/2016 Chen Qian @ Univ. of Kentucky 7 Another statement-level trigger Give faculty a raise if GPA’s in one update statement are all increasing CREATE TRIGGER AutoRaise AFTER UPDATE OF GPA ON Student REFERENCING OLD TABLE AS o, NEW TABLE AS n FOR EACH STATEMENT WHEN (NOT EXISTS(SELECT * FROM o, n WHERE o.SID = n.SID AND o.GPA >= n.GPA)) UPDATE Faculty SET salary = salary + 1000; A row-level trigger would be difficult to write in this case 7/1/2016 Chen Qian @ Univ. of Kentucky 8 System issues Recursive firing of triggers Action of one trigger causes another trigger to fire Can get into an infinite loop Some DBMS restrict trigger actions Most DBMS set a maximum level of recursion (16 in DB2) Interaction with constraints (very tricky to get right!) When do we check if a triggering event violates constraints? After a BEFORE trigger (so the trigger can fix a potential violation) Before an AFTER trigger AFTER triggers also see the effects of, say, cascaded deletes caused by referential integrity constraint violations 7/1/2016 Chen Qian @ Univ. of Kentucky 9 Summary of SQL features covered so far Query Modification Constraints Triggers 7/1/2016 Chen Qian @ Univ. of Kentucky 10 Exercise Consider the following relational schema and briefly answer the questions that follow: Define a table constraint on Emp that will ensure that every employee makes at least $10,000. 7/1/2016 Chen Qian @ Univ. of Kentucky 11 Exercise Define a table constraint on Emp that will ensure that every employee makes at least $10,000. 7/1/2016 Chen Qian @ Univ. of Kentucky 12 Exercise Define a table constraint on Dept that will ensure that all managers have age > 30. 7/1/2016 Chen Qian @ Univ. of Kentucky 13 Exercise Print the names and ages of each employee who works in both the Hardware department and the Software department. 7/1/2016 Chen Qian @ Univ. of Kentucky 14 Exercise For each department with more than 20 full-time-equivalent employees (i.e., where the part-time and full-time employees add up to at least that many full-time employees. Each full-time employee time is counted as 100.), print the did together with the number of employees that work in that department. 7/1/2016 Chen Qian @ Univ. of Kentucky 15 Exercise Print the name of each employee whose salary exceeds the budget of all of the departments that he or she works in. 7/1/2016 Chen Qian @ Univ. of Kentucky 16 Exercise Find the enames of managers who manage the departments with the largest budgets. 7/1/2016 Chen Qian @ Univ. of Kentucky 17 Exercise If a manager manages more than one department, he or she controls the sum of all the budgets for those departments. Find the managerids of managers who control more than $5 million. 7/1/2016 Chen Qian @ Univ. of Kentucky 18 Exercise Find the managerids of managers who control the largest amounts. 7/1/2016 Chen Qian @ Univ. of Kentucky 19 Exercise Find the enames of managers who manage only departments with budgets larger than $1 million, but at least one department with budget less than $5 million. 7/1/2016 Chen Qian @ Univ. of Kentucky 20 Homework 3 7/1/2016 Chen Qian @ Univ. of Kentucky 21 7/1/2016 Chen Qian @ Univ. of Kentucky 22 (b) Find the snames of suppliers who supply every part. 7/1/2016 Chen Qian @ Univ. of Kentucky 23 (c) Find the sids of suppliers who charge more for some part than the average cost of that part (averaged over all the suppliers who supply that part). 7/1/2016 Chen Qian @ Univ. of Kentucky 24 (d) Find the sids of suppliers who supply a red part and a green part. 7/1/2016 Chen Qian @ Univ. of Kentucky 25 7/1/2016 Chen Qian @ Univ. of Kentucky 26 (a). Write the SQL statements required to create these re lations, including appropriateversions of all primary and foreign key integrity constraints 7/1/2016 Chen Qian @ Univ. of Kentucky 27 (a). Write the SQL statements required to create these re lations, including appropriateversions of all primary and foreign key integrity constraints 7/1/2016 Chen Qian @ Univ. of Kentucky 28 (b). Express each of the following integrity constraints in SQL unless it is implied by the primary and foreign key constraint; if so, explain how it is implied. If the constraint cannot be expressed in SQL, say so. I. Every class has a minimum enrollment of 5 students and a maximum enrollment of 30 students. Add 7/1/2016 Chen Qian @ Univ. of Kentucky 29 II. The department with the most faculty members must have fewer than twice the number of faculty members in the department with the fewest faculty members 7/1/2016 Chen Qian @ Univ. of Kentucky 30 Functional Dependency is not included in the Midterm Exam 7/1/2016 Chen Qian @ University of Kentucky 31 Today’s Topic Functional Dependency. Normalization Decomposition BCNF 7/1/2016 Chen Qian @ Univ of Kentucky Motivation How do we tell if a design is bad, e.g., Enroll(SID, Sname, CID, Cname, grade)? This design has redundancy, because the name of an employee is recorded multiple times, once for each project the employee is taking 7/1/2016 SID CID Sname Cname grade 1234 10 John Smith DB A 1123 9 Ben Liu NET A 1234 9 John Smith NET B 1123 10 Ben Liu DB C 1023 10 Susan Sidhuk DB B Chen Qian @ Univ of Kentucky 7/1/2016 SID Sname 1234 John Smith 1123 Ben Liu 1023 Susan Sidhuk CID Cname 9 NET 10 DB SID CID grade 1234 10 A 1123 9 A 1234 9 B 1123 10 C 1023 10 B Chen Qian @ Univ of Kentucky Why redundancy is bad? Waste disk space. What if we want to perform update operations to the relation INSERT an new project that no employee has been assigned to it yet. UPDATE the name of “John Smith” to “John L. Smith” DELETE the last employee who works for a certain project 7/1/2016 SID CID Sname Cname grade 1234 10 John Smith DB A 1123 9 Ben Liu NET A 1234 9 John Smith NET B 1123 10 Ben Liu DB C 1023 10 Susan Sidhuk DB B Chen Qian @ Univ of Kentucky Functional dependencies A functional dependency (FD) has the form X -> Y, where X and Y are sets of attributes in a relation R X -> Y means that whenever two tuples in R agree on all the attributes in X, they must also agree on all attributes in Y t1[X] = t2[X] t1[Y] = t2[Y] X Y Z a b c a b? d? Could be anything, e.g. d Must be “b” 7/1/2016 Chen Qian @ Univ of Kentucky FD examples Address (street_address, city, state, zip) street_address, city, state -> zip zip -> city, state zip, state -> zip? This is a trivial FD Trivial FD: LHS RHS zip -> state, zip? This is non-trivial, but not completely non-trivial Completely non-trivial FD: LHS ∩ RHS = ? 7/1/2016 Chen Qian @ Univ of Kentucky Functional Dependencies An FD is a property of the attributes in the schema R The constraint must hold on every relation instance r(R) If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K]) 7/1/2016 Chen Qian @ Univ of Kentucky Keys redefined using FD’s Let attr(R) be the set of all attributes of R, a set of attributes K is a (candidate) key for a relation R if K -> attr(R) - K, and No proper subset of K satisfies the above condition That is, K is a “super key” That is, K is minimal (full functional dependent) Address (street_address, city, state, zip) {street_address, city, state, zip} {street_address, city, zip} {street_address, zip} {zip} 7/1/2016 Chen Qian @ Univ of Kentucky Super key Super key Key Non-key Reasoning with FDs Given a relation R and a set of FDs F Does another FD follow from F? Are some of the FDs in F redundant (i.e., they follow from the others)? Is K a key of R? What are all the keys of R? 7/1/2016 Chen Qian @ Univ of Kentucky Attribute closure Given R, a set of FDs F that hold in R, and a set of attributes Z in R: The closure of Z (denoted Z+) with respect to F is the set of all attributes {A1, A2, …} functionally determined by Z (that is, Z -> A1 A2 …) Algorithm for computing the closure Start with closure = Z If X -> Y is in F and X is already in the closure, then also add Y to the closure Repeat until no more attributes can be added 7/1/2016 Chen Qian @ Univ of Kentucky A more complex example WorkOn(EID, Ename, email, PID, Pname, Hours) EID -> Ename, email email -> EID PID -> Pname EID, PID -> Hours (Not a good design, and we will see why later) 7/1/2016 Chen Qian @ Univ of Kentucky Example of computing closure F includes: { PID, email }+ = ? Starting from: closure = { PID, email } email -> EID Add Ename, email; closure is now { PID, email, EID, Ename } PID -> Pname Add EID; closure is now { PID, email, EID } EID -> Ename, email EID -> Ename, email email -> EID PID -> Pname EID, PID -> Hours Add Pname; close is now { PID, Pname, email, EID, Ename } EID, PID -> hours Add hours; closure is now all the attributes in WorksOn 7/1/2016 Chen Qian @ Univ of Kentucky Using attribute closure Given a relation R and set of FDs F Does another FD X -> Y follow from F? Is K a super key of R? Compute X+ with respect to F If Y X+, then X -> Y follow from F Compute K+ with respect to F If K+ contains all the attributes of R, K is a super key Is a super key K a key of R? Test where K’ = K – { a | a K} is a superkey of R for all possible a 7/1/2016 Chen Qian @ Univ of Kentucky Rules of FDs Armstrong’s axioms Reflexivity: If Y X, then X -> Y Augmentation: If X -> Y, then XZ -> YZ for any Z Transitivity: If X -> Y and Y -> Z, then X -> Z Rules derived from axioms Splitting: If X -> YZ, then X -> Y and X -> Z Combining: If X -> Y and X -> Z, then X -> YZ 7/1/2016 Chen Qian @ Univ of Kentucky Using rules of FD’s Given a relation R and set of FDs F Does another FD X -> Y follow from F? Use the rules to come up with a proof Example: F includes: EID -> Ename, email; email -> EID; EID, PID -> Hours, Pid -> Pname PID, email ->hours? email -> EID (given in F) PID, email -> PID, EID (augmentation) PID, EID -> hours (given in F) PID, email -> hours (transitivity) 7/1/2016 Chen Qian @ Univ of Kentucky Example of redundancy WorkOn (EID, Ename, email, PID, hour) We say X -> Y is a partial dependency if there exist a X’ X such that X’ -> Y e.g. EID, email-> Ename, email Otherwise, X -> Y is a full dependency e.g. EID, PID -> hours EID PID Ename email Pname Hours 1234 10 John Smith jsmith@ac.com B2B platform 10 1123 9 Ben Liu bliu@ac.com CRM 40 1234 9 John Smith jsmith@ac.com CRM 30 1023 10 Susan Sidhuk 7/1/2016 ssidhuk@ac.com B2B platform Chen Qian @ Univ of Kentucky 40 Database Normalization Database normalization relates to the level of redundancy in a relational database’s structure. The key idea is to reduce the chance of having multiple different version of the same data. Well-normalized databases have a schema that reflects the true dependencies between tracked quantities. Any increase in normalization generally involves splitting existing tables into multiple ones, which must be re-joined each time a query is issued. 7/1/2016 Chen Qian @ University of Kentucky 49 Normalization A normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. A normal form is a certification that tells whether a relation schema is in a particular state 7/1/2016 Chen Qian @ University of Kentucky 50 Normal Forms Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. 3NF is widely considered to be sufficient. Normalizing beyond 3NF can be tricky with current SQL technology as of 2005 Full normalization is considered a good exercise to help discover all potential internal database consistency problems. 7/1/2016 Chen Qian @ University of Kentucky 51 First Normal Form ( 1NF ) NF is to characterize a relation (not an attribute, a key, etc…) We can only say “this relation or table is in 1NF” A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. 7/1/2016 Chen Qian @ University of Kentucky 52 7/1/2016 Chen Qian @ Univ of Kentucky 2nd Normal Form An attribute A of a relation R is a nonprimary attribute if it is not part of any key in R, otherwise, A is a primary attribute. R is in (general) 2nd normal form if every nonprimary attribute A in R is not partially functionally dependent on any key of R 7/1/2016 Chen Qian @ University of Kentucky 54 Redundancy Example If a key will result a partial dependency of a nonprimary attribute. e.g. EID, PID-> Ename In this case, the attribute (Ename) should be separated with its full dependency key (EID) to be a new table. So, to check whether a table includes redundancy. Try every nonprimary attribute and check whether it fully depends on any key. 7/1/2016 Chen Qian @ University of Kentucky 55 7/1/2016 Chen Qian @ Univ of Kentucky Second normal Form ( 2NF ) 2NF prescribes full functional dependency on the primary key. It most commonly applies to tables that have composite primary keys, where two or more attributes comprise the primary key. It requires that there are no non-trivial functional dependencies of a non-key attribute on a part (subset) of a candidate key. A table is said to be in the 2NF if and only if it is in the 1NF and every non-key attribute is irreducibly dependent on the primary key 7/1/2016 Chen Qian @ University of Kentucky 57 Decomposition EID PID Ename email Pname Hours 1234 10 John Smith jsmith@ac.com B2B platform 10 1123 9 Ben Liu bliu@ac.com CRM 40 1234 9 John Smith jsmith@ac.com CRM 30 1023 10 Susan Sidhuk Decomposition ssidhuk@ac.com B2B platform 40 Foreign key EID Ename email EID PID Pname Hours 1234 John Smith jsmith@ac.com 1234 10 B2B platform 10 1123 Ben Liu bliu@ac.com 1123 9 CRM 40 1023 Susan Sidhuk ssidhuk@ac.com 1234 9 CRM 30 1023 10 B2B platform 40 Decomposition eliminates redundancy To get back to the original relation, use natural join. 7/1/2016 Chen Qian @ University of Kentucky 58 Decomposition Decomposition may be applied recursively 7/1/2016 EID PID Pname Hours 1234 10 B2B platform 10 1123 9 CRM 40 1234 9 CRM 30 1023 10 B2B platform 40 PID Pname EID PID Hours 10 B2B platform 1234 10 10 9 CRM 1123 9 40 1234 9 30 1023 10 40 Chen Qian @ University of Kentucky 59 Unnecessary decomposition EID Ename email 1234 John Smith jsmith@ac.com 1123 Ben Liu bliu@ac.com 1023 Susan Sidhuk ssidhuk@ac.com EID Ename EID email 1234 John Smith 1234 jsmith@ac.com 1123 Ben Liu 1123 bliu@ac.com 1023 Susan Sidhuk 1023 ssidhuk@ac.com Fine: join returns the original relation Unnecessary: no redundancy is removed, and now EID is stored twice-> 7/1/2016 Chen Qian @ University of Kentucky 60 Bad decomposition EID PID Hours 1234 10 10 1123 9 40 1234 9 30 1023 10 40 EID PID EID Hours 1234 10 1234 10 1123 9 1123 40 1234 9 1234 30 1023 10 1023 40 Association between PID and hours is lost Join returns more rows than the original relation 7/1/2016 Chen Qian @ University of Kentucky 61 Lossless join decomposition Decompose relation R into relations S and T attrs(R) = attrs(S) attrs(T) S = πattrs(S) ( R ) T = πattrs(T) ( R ) The decomposition is a lossless join decomposition if, given known constraints such as FD’s, we can guarantee that R = S T Any decomposition gives R S T (why?) A lossy decomposition is one with R S T 7/1/2016 Chen Qian @ University of Kentucky 62 Loss? But I got more rows-> “Loss” refers not to the loss of tuples, but to the loss of information Or, the ability to distinguish different original tuples 7/1/2016 EID PID Hours 1234 10 10 1123 9 40 1234 9 30 1023 10 40 EID PID EID Hours 1234 10 1234 10 1123 9 1123 40 1234 9 1234 30 1023 10 1023 40 Chen Qian @ University of Kentucky 63 Questions about decomposition When to decompose How to come up with a correct decomposition (i.e., lossless join decomposition) 7/1/2016 Chen Qian @ University of Kentucky 64