CS 405G: Introduction to Database Systems Lecture 7: SQL II Instructor: Chen Qian Homework 2 due on 2/26 Next Monday (2/29) Dr. Qian will be on business travel. The TA Fatemah Alghamedy will give a lecture of practicing questions. Propose to change Midterm exam to 3/9 (1-2pm) 7/1/2016 Chen Qian @ University of Kentucky 2 Today’s Topic NULL value Outer join Nested queries 7/1/2016 Chen Qian @ University of Kentucky 3 Incomplete information Example: Student (SID, name, age, GPA) Value unknown Value not applicable We do not know Nelson’s age Nelson has not taken any classes yet; what is his GPA? How to express these values in SQL language? 7/1/2016 Chen Qian @ University of Kentucky 4 Solution 1 A dedicated special value for each domain (type) GPA cannot be –1, so use –1 as a special value to indicate a missing or invalid GPA Leads to incorrect answers if not careful Complicates applications SELECT AVG(GPA) FROM Student; SELECT AVG(GPA) FROM Student WHERE GPA <> -1; Remember the Y2K bug? 7/1/2016 “00” was used as a missing or invalid year value Chen Qian @ University of Kentucky 5 Solution 2 A valid-bit for every column Student (SID, name, name_is_valid, age, age_is_valid, GPA, GPA_is_valid) Complicates schema and queries 7/1/2016 SELECT AVG(GPA) FROM Student WHERE GPA_is_valid; Chen Qian @ University of Kentucky 6 Solution 3? Decompose the table; missing row = missing value StudentName (SID, name) StudentAge (SID, age) StudentGPA (SID, GPA) StudentID (SID) Conceptually the cleanest solution Still complicates schema and queries 7/1/2016 How to get all information about a student in a table? Would natural join work? Chen Qian @ University of Kentucky 7 SQL’s solution A special value NULL For every domain Special rules for dealing with NULL’s Example: Student (SID, name, age, GPA) 7/1/2016 < 789, “Nelson”, NULL, NULL > Chen Qian @ University of Kentucky 8 Three-valued logic TRUE = 1, FALSE = 0, UNKNOWN = 0.5 x AND y = min(x, y) x OR y = max(x, y) NOT x = 1 – x AND True True True False False NULL UNK False NULL False False UNK False AND False UNK OR True True True False False NULL TRUE False False False NULL TRUE UNK OR UNK UNK WHERE and HAVING clauses only select rows for output if the condition evaluates to TRUE UNKNOWN is not enough 7/1/2016 Chen Qian @ University of Kentucky 9 Computing with NULL’s (Arithmetic operation) when we operate on a NULL and another value (including another NULL) using +, –, etc., the result is NULL Aggregate functions ignore NULL, except COUNT(*) (since it counts rows) When we compare a NULL with another value (including another NULL) using =, >, etc., the result is UNKNOWN 7/1/2016 Chen Qian @ University of Kentucky 10 Unfortunate consequences SELECT AVG(GPA) FROM Student SELECT SUM(GPA)/COUNT(*) FROM Student; 3.4 2.72 SELECT * FROM Student; SELECT * FROM Student WHERE GPA = GPA Not equivalent 7/1/2016 sid name age gpa 1234 John Smith 21 3.5 1123 Mary Carter 19 3.8 1011 Bob Lee 22 NULL 1204 Susan Wong 22 3.4 1306 Kevin Kim 18 2.9 Chen Qian @ University of Kentucky 11 Another problem Example: Who has NULL GPA values? SELECT * FROM Student WHERE GPA = NULL; (SELECT * FROM Student) EXCEPT ALL (SELECT * FROM Student WHERE GPA = GPA) Does not work; never returns anything Works, but ugly Introduced built-in predicates IS NULL and IS NOT NULL 7/1/2016 SELECT * FROM Student WHERE GPA IS NULL; Chen Qian @ University of Kentucky 12 Outerjoin motivation Example: a master class list SELECT c.CID, s.SID FROM Enroll e, Student s, Course c WHERE e.SID = s.SID and c.CID = e.CID; What if a student take no classes What if a course with no student enrolled yet? For these students, CID column should be NULL For these courses, SID should be NULL Therefore the above query does not return the complete list! 7/1/2016 Chen Qian @ University of Kentucky 13 Outerjoin SELECT * FROM R FULL OUTER JOIN S ON p; A full outer join between R and S (denoted R includes all rows in the result of R pS, plus S) “Dangling” R rows (those that do not join with any S rows) padded with NULL’s for S’s columns “Dangling” S rows (those that do not join with any R rows) padded with NULL’s for R’s columns 7/1/2016 Chen Qian @ University of Kentucky 14 Outerjoin (II) SELECT * FROM R LEFT SELECT * FROM R RIGHT OUTER JOIN S ON p; A left outer join includes rows in R S plus dangling R rows padded with NULL’s A right outer join includes rows in R S plus dangling S rows padded with NULL’s 7/1/2016 OUTER JOIN S ON p;; Chen Qian @ University of Kentucky 15 Outerjoin examples SELECT * FROM Employee LEFT OUTER JOIN Department ON Eid = Mid Employee Eid Name Eid Name Did Mid Dname 1123 John Smith 1123 John Smith 4 1123 Research 1234 Mary Carter 1234 Mary Carter 5 1234 Finance 1311 Bob Lee 1311 Bob Lee NULL NULL NULL SELECT * FROM Employee FULL RIGHTOUTER OUTER JOIN Department ON Eid = Mid Department Did Mid Dname Eid Name Did Mid Dname 4 1123 Research 1123 John Smith 4 1123 Research 5 1234 Finance 1234 Mary Carter 5 1234 Finance 6 1312 HR NULL 1311 Bob NULL Lee NULL 6 NULL 1312 NULL HR NULL NULL 6 1312 HR 7/1/2016 Chen Qian @ University of Kentucky 16 Summary Query SELECT-FROM-WHERE statements Ordering Set and bag operations Aggregation and grouping Table expressions, subqueries NULL Outerjoins 7/1/2016 Chen Qian @ University of Kentucky 17 Exercise Goal: Get the Name of one’s supervisor EMPLOYEE (Eid, Mid, Name) SQL Statement: SELECT EID, Name FROM EMPLOYEE WHERE MId =Eid; SELECT E1.EID, E2.Name FROM EMPLOYEE E1, EMPLOYEE E2 WHERE E1.MId = E2.EId; Eid Mid Name Eid Name 1123 1234 John Smith 1123 Mary Carter 1234 1311 Mary Carter 1234 Bob Lee 1311 1611 Bob Lee 1311 Jack Snow 1455 1611 Lisa Wang 1455 Jack Snow 1611 1611 Jack Snow 1611 Jack Snow 7/1/2016 Chen Qian @ University of Kentucky 18 SELECT DISTINCT E2.Name FROM EMPLOYEE E1, EMPLOYEE E2 WHERE E1.MId = E2.EId; Name Mary Carter Bob Lee Jack Snow 7/1/2016 Chen Qian @ University of Kentucky 19 More on Nested Queries: Scoping To find out which table a column belongs to Start with the immediately surrounding query If not found, look in the one surrounding that; repeat if necessary Use table_name.column_name notation and AS (renaming) to avoid confusion 7/1/2016 Chen Qian @ University of Kentucky 20 An Example SELECT * FROM Student s WHERE EXISTS (SELECT * FROM Enroll e WHERE SID = s.SID AND EXISTS (SELECT * FROM Enroll WHERE SID = s.SID AND CID <> e.CID)); Students who are taking at least two courses 7/1/2016 Chen Qian @ University of Kentucky 21 Quantified subqueries A quantified subquery can be used as a value in a WHERE condition Universal quantification (for all): … WHERE x op ALL (subquery) … True iff for all t in the result of subquery, x op t Existential quantification (exists): … WHERE x op ANY (subquery) … True iff there exists some t in the result of subquery such that x op t Beware 7/1/2016 In common parlance, “any” and “all” seem to be synonyms In SQL, ANY really means “some” Chen Qian @ University of Kentucky 22 Examples of quantified subqueries Which employees have the highest salary? Employee (Sid, Name, Salary) SELECT * FROM Employee WHERE Salary >= ALL (SELECT Salary FROM Employee); How about the lowest salary? SELECT * FROM Employee WHERE Salary <= ALL (SELECT salary FROM Employee); 7/1/2016 Chen Qian @ University of Kentucky 23 More ways of getting the highest Salary Who has the highest Salary? SELECT * FROM Employee e WHERE NOT EXISTS (SELECT * FROM Employee WHERE Salary > e.Salary); SELECT * FROM Employee WHERE Eid NOT IN (SELECT e1.SID FROM Employee e1, Employee e2 WHERE e1.Salary < e2.Salary); 7/1/2016 Chen Qian @ University of Kentucky 24 Nested Queries Nested queries do not add expression power to SQL For convenient Write intuitive SQL queries Can always use SQL queries without nesting to complete the same task (though sometime it is hard) 7/1/2016 Chen Qian @ University of Kentucky 25 More Exercise Sailors (sid: INTEGER, sname: string, rating: INTEGER, age: REAL) Boats (bid: INTEGER, bname: string, color: string) Reserves (sid: INTEGER, bid: INTEGER) Sailors Boats sid 1 sname Fred 2 3 Jim 2 Nancy 8 Reserves 7/1/2016 rating 7 sid bid 1 2 102 102 age 22 bid 101 bname Nina color red 39 27 102 103 Pinta Santa Maria blue red Chen Qian @ University of Kentucky 26 Exercise I Find sid’s of sailors who’ve reserved a red AND a green boat SELECT R1.sid FROM Boats B1, Reserves R1, Boats B2, Reserves R2 WHERE B1.color=‘red’ AND B2.color=‘green’ AND R1.bid=B1.bid AND R2.bid=B2.bid AND R1.sid=R2.sid SELECT sid FROM Boats, Reserves Where B.color = ‘red’ INTERSECT (SELECT sid FROM Boats, Reserves Where B.color = ‘green’) 7/1/2016 Chen Qian @ University of Kentucky 27 Exercise II Find sid’s of sailors who have not reserved a boat SELECT sid Non-monotonic operation! FROM Sailors EXCEPT (SELECT S.sid FROM Sailors S, Reserves R WHERE S.sid=R.sid) 7/1/2016 Chen Qian @ University of Kentucky 28 Exercise III (a tough one) Find sailors who’ve reserved all boats Non-monotonic operation! SELECT Sid FROM Sailor #All sailors EXCEPT #Those who do not reserve all SELECT Sid boats FROM #All possible combinations (SELECT bid, sid between sid and bid FROM Boat, Sailor EXCEPT #Existing reservations Reserves) 7/1/2016 Chen Qian @ University of Kentucky 29 Summary of SQL features covered so far SELECT-FROM-WHERE statements Set and bag operations Ordering Aggregation and grouping Table expressions, subqueries NULL’s and outerjoins Next: data modification statements, constraints 7/1/2016 Chen Qian @ University of Kentucky 30 Review: Create Table in SQL An SQL relation is defined using the create table command: CREATE TABLE r (A1 D1, A2 D2, ..., An Dn, (integrity-constraint1), ..., (integrity-constraintk)) 7/1/2016 r is the name of the relation each Ai is an attribute name in the schema of relation r Di is the data type of values in the domain of attribute Ai Chen Qian @ Univ. of Kentucky 31 Constrains in SQL Attribute constraint Key constraint NOT NULL DEFAULT <value> PRIMARY KEY UNIQUE Referential Integrity FOREIGN KEY 7/1/2016 Chen Qian @ Univ. of Kentucky 32 Constraints in Table Declaration Restrictions on allowable data in a database In addition to the simple structure and type restrictions imposed by the table definitions Declared as part of the schema Enforced by the DBMS Why use constraints? Protect data integrity (catch errors) Tell the DBMS about the data (so it can optimize better) 7/1/2016 Chen Qian @ Univ. of Kentucky 33 Types of SQL constraints Key constraint NOT NULL Referential integrity (foreign key) 7/1/2016 Chen Qian @ Univ. of Kentucky 34 Key declaration At most one PRIMARY KEY per table Typically implies a primary index Rows are stored inside the index, typically sorted by the primary key value ( best speedup for queries) Any number of UNIQUE keys per table Typically implies a secondary index To speedup queries 7/1/2016 Chen Qian @ Univ. of Kentucky 35 NOT NULL Indicates that the value of an attribute can not be NULL 7/1/2016 Chen Qian @ Univ. of Kentucky 36 Examples CREATE TABLE Student (SID INTEGER NOT NULL PRIMARY KEY, name VARCHAR(30) NOT NULL, Used for query email VARCHAR(30) UNIQUE, age INTEGER, processing GPA FLOAT); CREATE TABLE Course (CID CHAR(10) NOT NULL PRIMARY KEY, title VARCHAR(100) NOT NULL); CREATE TABLE Enroll (SID INTEGER NOT NULL, CID CHAR(10) NOT NULL, PRIMARY KEY(SID, CID)); This form is required for multi-attribute keys 7/1/2016 Chen Qian @ Univ. of Kentucky 37 Referential integrity example Enroll.SID references Student.SID If an SID appears in Enroll, it must appear in Student Enroll.CID references Course.CID If a CID appears in Enroll, it must appear in Course That is, no “dangling pointers” 7/1/2016 Chen Qian @ Univ. of Kentucky 38 Referential integrity in SQL Referenced column(s) must be PRIMARY KEY Referencing column(s) form a FOREIGN KEY Example CREATE TABLE Enroll (SID INTEGER NOT NULL, CID CHAR(10) NOT NULL, PRIMARY KEY(SID, CID), FOREIGN KEY CID REFERENCES Course(CID) FOREIGN KEY SID REFERENCES Student(SID)); 7/1/2016 Chen Qian @ Univ. of Kentucky 39 Enforcing referential integrity Example: Enroll.SID references Student.SID Insert or update an Enroll row so it refers to a nonexistent SID Reject Delete or update a Student row whose SID is referenced by some Enroll row Reject Cascade: ripple changes to all referring rows Set NULL: set all references to NULL Set DEFAULT: use the default value of the attribute All four options can be specified in SQL 7/1/2016 Chen Qian @ Univ. of Kentucky 40 Database Modification Operations INSERT DELETE UPDATE 7/1/2016 Chen Qian @ Univ. of Kentucky 41 INSERT Insert one row INSERT INTO Enroll VALUES (456, ’EECS647’); Student 456 takes EECS647 Insert the result of a query INSERT INTO Enroll (SELECT SID, ’CS647’ FROM Student WHERE SID NOT IN (SELECT SID FROM Enroll WHERE CID = ’CS647’)); 7/1/2016 Qian @ Univ. of Kentucky Force everybody toChen take EECS647! 42 DELETE Delete everything DELETE FROM Enroll; Delete according to a WHERE condition Example: Student 456 drops CS116 DELETE FROM Enroll WHERE SID = 456 AND CID = ’CS116’; Example: Drop students from all CS classes with GPA lower than 1.0 7/1/2016 DELETE FROM Enroll WHERE SID IN (SELECT SID FROM Student WHERE GPA < 1.0) AND CID LIKE ’CS%’; Chen Qian @ Univ. of Kentucky 43 UPDATE Example: Student 142 changes name to “Barney” 7/1/2016 UPDATE Student SET name = ’Barney’ WHERE SID = 142; Chen Qian @ Univ. of Kentucky 44 Summary of SQL features covered so far Query Constraints Modification SELECT-FROM-WHERE statements Set and bag operations Table expressions, subqueries Aggregation and grouping Ordering Outerjoins INSERT/DELETE/UPDATE Next: triggers, views, indexes 7/1/2016 Chen Qian @ Univ. of Kentucky 45 Tuple- and attribute-based CHECK’s Associated with a single table Only checked when a tuple or an attribute is inserted or updated Example: CREATE TABLE Enroll (SID INTEGER NOT NULL CHECK (SID IN (SELECT SID FROM Student)), CID ...); Is it a referential integrity constraint? Not quite; not checked when Student is modified 7/1/2016 Chen Qian @ Univ. of Kentucky 46 Build-in Domain Types in SQL Numeric: int. Integer. smallint. Small integer (a machine-dependent subset of the integer domain type). decimal(p,d). Fixed point number, with user-specified precision of a total of p digits, with d digits to the right of decimal point. real, double precision. Floating point and double-precision floating point numbers. String char(n). Fixed length character string, with user-specified length n. varchar(n). Variable length character strings, with user-specified maximum length n. 7/1/2016 Chen Qian @ Univ. of Kentucky 47 User-Created Domains CREATE DOMAIN <domain name> AS <domain type> Example CREATE DOMAIN SSN_TYPE as CHAR(9) CREATE DOMAIN SALARY as Int CHECK SALARY > 0 7/1/2016 Chen Qian @ Univ. of Kentucky 48 Drop and Alter Table Constructs The drop table command deletes all information about the dropped relation from the database. The alter table command is used to add attributes to an existing relation: ALTER TABLE r ADD[DROP] COLUMN A D where A is the name of the attribute to be added (dropped) to relation r and D is the domain of A. 7/1/2016 All tuples in the relation are assigned null as the value for the new attribute. Chen Qian @ Univ. of Kentucky 49 Advanced Techniques No-chicken-no-egg problem CREATE TABLE Dept (name CHAR(20) NOT NULL PRIMARY KEY, chair CHAR(30) NOT NULL REFERENCES Prof(name)); CREATE TABLE Prof (name CHAR(30) NOT NULL PRIMARY KEY, dept CHAR(20) NOT NULL REFERENCES Dept(name)); The first INSERT will always violate a constraint! Deferred constraint checking is necessary Check only at the end of a transaction Allowed in SQL as an option 7/1/2016 Chen Qian @ Univ. of Kentucky 50 “Active” data Constraint enforcement: When an operation violates a constraint, abort the operation or try to “fix” data Example: enforcing referential integrity constraints Generalize to arbitrary constraints? Data monitoring: When something happens to the data, automatically execute some action Example: When price rises above $20 per share, sell Example: When enrollment is at the limit and more students try to register, email the instructor 7/1/2016 Chen Qian @ Univ. of Kentucky 51 Triggers A trigger is an event-condition-action (ECA) rule When event occurs, test condition; if condition is satisfied, execute action Example: Event: whenever there comes a new student… Condition: with GPA higher than 3.0… Action: then make him/her take CS 405G! 7/1/2016 Chen Qian @ Univ. of Kentucky 52 Trigger example CREATE TRIGGER CS405AutoRecruit AFTER INSERT ON Student Event REFERENCING NEW ROW AS newStudent FOR EACH ROW WHEN (newStudent.GPA > 3.0)Condition INSERT INTO Enroll VALUES(newStudent.SID, ’CS405’); Action 7/1/2016 Chen Qian @ Univ. of Kentucky 53 Trigger options Possible events include: Granularity—trigger can be activated: INSERT ON table DELETE ON table UPDATE [OF column] ON table FOR EACH ROW modified FOR EACH STATEMENT that performs modification Timing—action can be executed: AFTER or BEFORE the triggering event 7/1/2016 Chen Qian @ Univ. of Kentucky 54 Transition variables OLD ROW: the modified row before the triggering event NEW ROW: the modified row after the triggering event OLD TABLE: a hypothetical read-only table containing all modified rows before the triggering event NEW TABLE: a hypothetical table containing all modified rows after the triggering event Not all of them make sense all the time, e.g. AFTER INSERT statement-level triggers BEFORE DELETE row-level triggers Can use only NEW TABLE Can use only OLD ROW etc. 7/1/2016 Chen Qian @ Univ. of Kentucky 55 Statement-level trigger example CREATE TRIGGER AutoRecruit AFTER INSERT ON Student REFERENCING NEW TABLE AS newStudents FOR EACH STATEMENT INSERT INTO Enroll (SELECT SID, ’CS405’ FROM newStudents WHERE GPA > 3.0); Efficiency??? 7/1/2016 Chen Qian @ Univ. of Kentucky 56 BEFORE trigger example Never give faculty more than 50% raise in one update CREATE TRIGGER NotTooGreedy BEFORE UPDATE OF salary ON Faculty REFERENCING OLD ROW AS o, NEW ROW AS n FOR EACH ROW WHEN (n.salary > 1.5 * o.salary) SET n.salary = 1.5 * o.salary; BEFORE triggers are often used to “condition” data Another option is to raise an error in the trigger body to abort the transaction that caused the trigger to fire 7/1/2016 Chen Qian @ Univ. of Kentucky 57 Statement- vs. row-level triggers Why are both needed? Certain triggers are only possible at statement level If the average GPA of students inserted by this statement exceeds 3.0, do … Simple row-level triggers are easier to implement and may be more efficient Statement-level triggers require significant amount of state to be maintained in OLD TABLE and NEW TABLE However, a row-level trigger does get fired for each row, so complex row-level triggers may be inefficient for statements that generate lots of modifications 7/1/2016 Chen Qian @ Univ. of Kentucky 58 Another statement-level trigger Give faculty a raise if GPA’s in one update statement are all increasing CREATE TRIGGER AutoRaise AFTER UPDATE OF GPA ON Student REFERENCING OLD TABLE AS o, NEW TABLE AS n FOR EACH STATEMENT WHEN (NOT EXISTS(SELECT * FROM o, n WHERE o.SID = n.SID AND o.GPA >= n.GPA)) UPDATE Faculty SET salary = salary + 1000; A row-level trigger would be difficult to write in this case 7/1/2016 Chen Qian @ Univ. of Kentucky 59 System issues Recursive firing of triggers Action of one trigger causes another trigger to fire Can get into an infinite loop Some DBMS restrict trigger actions Most DBMS set a maximum level of recursion (16 in DB2) Interaction with constraints (very tricky to get right!) When do we check if a triggering event violates constraints? After a BEFORE trigger (so the trigger can fix a potential violation) Before an AFTER trigger AFTER triggers also see the effects of, say, cascaded deletes caused by referential integrity constraint violations 7/1/2016 Chen Qian @ Univ. of Kentucky 60 Views A view is like a “virtual” table Defined by a query, which describes how to compute the view contents on the fly DBMS stores the view definition query instead of view contents Can be used in queries just like a regular table 7/1/2016 Chen Qian @ Univ. of Kentucky 61 Creating and dropping views Example: CS647roster CREATE VIEW CS647Roster AS SELECT SID, name, age, GPACalled “base tables” FROM Student WHERE SID IN (SELECT SID FROM Enroll WHERE CID = ’CS647’); To drop a view DROP VIEW view_name; 7/1/2016 Chen Qian @ Univ. of Kentucky 62 Using views in queries Example: find the average GPA of CS647 students SELECT AVG(GPA) FROM CS647Roster; To process the query, replace the reference to the view by its definition SELECT AVG(GPA) FROM (SELECT SID, name, age, GPA FROM Student WHERE SID IN (SELECT SID FROM Enroll WHERE CID = ’CS647’)); 7/1/2016 Chen Qian @ Univ. of Kentucky 63 Why use views? To hide data from users To hide complexity from users Logical data independence If applications deal with views, we can change the underlying schema without affecting applications Recall physical data independence: change the physical organization of data without affecting applications To provide a uniform interface for different implementations or sources Real database applications use tons of views 7/1/2016 Chen Qian @ Univ. of Kentucky 64 Modifying views Does not seem to make sense since views are virtual But does make sense if that is how users see the database Goal: modify the base tables such that the modification would appear to have been accomplished on the view Be careful! There may be one way to modify There may be many ways to modify There may be no way to modify 7/1/2016 Chen Qian @ Univ. of Kentucky 65 A simple case CREATE VIEW StudentGPA AS SELECT SID, GPA FROM Student; DELETE FROM StudentGPA WHERE SID = 123; translates to: DELETE FROM Student WHERE SID = 123; 7/1/2016 Chen Qian @ Univ. of Kentucky 66 An impossible case CREATE VIEW HighGPAStudent AS SELECT SID, GPA FROM Student WHERE GPA > 3.7; INSERT INTO HighGPAStudent VALUES(987, 2.5); No matter what you do on Student, the inserted row will not be in HighGPAStudent 7/1/2016 Chen Qian @ Univ. of Kentucky 67 A case with too many possibilities CREATE VIEW AverageGPA(GPA) AS SELECT AVG(GPA) FROM Student; Note that you can rename columns in view definition UPDATE AverageGPA SET GPA = 2.5; Set everybody’s GPA to 2.5? Adjust everybody’s GPA by the same amount? Just lower Lisa’s GPA? 7/1/2016 Chen Qian @ Univ. of Kentucky 68 Summary of SQL features covered so far Query Modification Constraints Triggers Views 7/1/2016 Chen Qian @ Univ. of Kentucky 69