SQL: Structured Query Language (‘Sequel’) Chapter 5 1 Running Example Instances of the Sailors and Reserves relations in our examples. R1 sid 22 58 bid 101 103 day 10/10/96 11/12/96 S1 sid 22 31 58 sname dustin lubber rusty rating 7 8 10 S2 sid 28 31 44 58 sname yuppy lubber guppy rusty rating 9 8 5 10 age 45.0 55.5 35.0 age 35.0 55.5 35.0 35.0 3 Basic SQL Query SELECT FROM WHERE [DISTINCT] target-list relation-list qualification relation-list A list of relation names target-list A list of attributes of relations in relation-list qualification • Comparisons (Attr op const or Attr1 op Attr2, where op is one of , , , , , ) combined using AND, OR and NOT. DISTINCT is optional keyword indicating that answer should not contain duplicates. Default is that duplicates are not eliminated! 4 Example Query SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103 5 Comparisons in Oracle 6 Conceptual Evaluation Strategy SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103 Semantics of SQL query defined in terms of conceptual evaluation strategy: Compute cross-product of relation-list. • Discard resulting tuples if they fail qualifications. • Delete attributes that are not in target-list. • If DISTINCT is specified, eliminate duplicate rows. • Probably the least efficient way! Optimizer must find more efficient strategies to compute same answers. 7 R sid 22 58 bid 101 103 day 10/10/96 11/12/96 S sid 22 31 58 sname dustin lubber rusty rating 7 8 10 age 45.0 55.5 35.0 SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103 (sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96 58 rusty 10 35.0 58 103 11/12/96 8 A Note on Range Variables Needed only if same relation appears twice in FROM clause. The previous query can also be written as: SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND bid=103 OR SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103 It is good style, however, to use range variables always! 10 Find sailors who’ve reserved at least one boat SELECT S.sid FROM Sailors S, Reserves R WHERE S.sid=R.sid Question: What is the output of this query ? 11 Expressions and Strings SELECT S.age, age1=S.age-5, 2*S.age AS age2 FROM Sailors S WHERE S.sname LIKE ‘B_%B’ AS and = are two ways to name fields in result. LIKE is used for string matching. `_’ stands for any one character and `%’ stands for 0 or more arbitrary characters. MEANING of Query: Find sailors (age of sailor and two fields defined by expressions) whose names begin and end with B and contain at least three characters. 13 SQL Queries Using Set Operations 14 Find sid’s of sailors who’ve reserved a red or a green boat SELECT S.sid FROM Sailors S, Reserves R, Boats B WHERE S.sid=R.sid AND R.bid=B.bid AND (B.color=‘red’ OR B.color=‘green’) UNION compute union of any two union-compatible sets of tuples SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ UNION SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘green’ 15 Find sid’s of sailors who’ve reserved a red and a green boat. If we replace OR by AND as done below, what are the semantics? SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND (B.color=‘red’ AND B.color=‘green’) 17 Find sid’s of sailors who’ve reserved a red and a green boat INTERSECT: compute intersection of two union-compatible sets of tuples. What happens if change S.sid as S.sname ? Key field! SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ INTERSECT SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘green’ 18 Find sid’s of sailors who’ve reserved a red and a green boat SELECT S.sid Could query be written without INTERSECT? FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ INTERSECT SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘green’ SELECT S.sid FROM Sailors S, Boats B1, Reserves R1, Boats B2, Reserves R2 WHERE (S.sid=R1.sid AND R1.bid=B1.bid AND B1.color=‘red’ ) AND (S.sid=R2.sid AND R2.bid=B2.bid AND B2.color=‘green’) Do we need Sailors relation in the FROM clauses? 20 Find sid’s of sailors who’ve reserved a red and not a green boat What if we want to compute the DIFFERENCE instead? SQL/92 standard, but some systems don’t support EXCEPT. What then? SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ EXCEPT SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘green’ SELECT S.sid FROM Sailors S, Boats B1, Reserves R1, Boats B2, Reserves R2 WHERE S.sid=R1.sid AND R1.bid=B1.bid AND S.sid=R2.sid AND R2.bid=B2.bid AND (B1.color='red' AND NOT(B2.color = 'green')) 22 Find sid’s of sailors who’ve reserved a red and not a green boat What if we want to compute the DIFFERENCE instead? Except is SQL/92 standard, but some systems don’t . support EXCEPT. SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ EXCEPT SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘green’ SELECT S.sid FROM Sailors S, Boats B1, Reserves R1, Boats B2, Reserves R2 WHERE S.sid=R1.sid AND R1.bid=B1.bid AND S.sid=R2.sid AND R2.bid=B2.bid AND (B1.color='red' AND NOT(B2.color = 'green')) Is the above query correct, or not correct ? 23 About duplication Default: Eliminate duplicate Unless we use special keyword “ALL” : # of copies of a row in result UNION ALL INTERSECT ALL EXCEPT ALL (m+n) min(m,n) m-n 24 Nested Queries 25 Nested Queries Find names of sailors who’ve reserved boat #103: SELECT S.sname FROM Sailors S WHERE S.sid IN (SELECT R.sid FROM Reserves R WHERE R.bid=103) WHERE, FROM, HAVING clauses can itself contain another SQL query, called a subquery. powerful feature of SQL. 26 Nested Queries Evaluation Find names of sailors who’ve reserved boat #103: SELECT S.sname FROM Sailors S WHERE S.sid IN (SELECT R.sid FROM Reserves R WHERE R.bid=103) Nested loops evaluation: For each Sailors tuple, • Evaluate the WHERE clause qualification • Which here means re-compute subquery. 27 Nested Queries Find names of sailors who’ve reserved boat #103: SELECT S.sname FROM Sailors S WHERE S.sid IN (SELECT R.sid FROM Reserves R WHERE R.bid=103) Change query to find sailors who’ve not reserved #103: SELECT S.sname FROM Sailors S WHERE S.sid NOT IN (SELECT R.sid FROM Reserves R WHERE R.bid=103) 28 Nested Queries Find names of sailors who’ve reserved boat #103: SELECT S.sname FROM Sailors S WHERE S.sid IN (SELECT R.sid FROM Reserves R WHERE R.bid=103) Could this query be rewritten without nesting? 29 Nested Queries with Correlation Find names of sailors who’ve reserved boat #103: SELECT S.sname FROM Sailors S WHERE EXISTS (SELECT * FROM Reserves R WHERE R.bid=103 AND S.sid=R.sid) EXISTS is another set comparison operator, like IN. 30 Nested Queries with Correlation Find names of sailors who’ve reserved boat #103: SELECT S.sname FROM Sailors S WHERE EXISTS (SELECT * FROM Reserves R WHERE R.bid=103 AND S.sid=R.sid) Why, in general, this subquery must be recomputed for each Sailors tuple ? Could this query be rewritten into a flat SQL query? 31 More on Set-Comparison Operators op = { , , , ,, op ANY and op ALL where } Find sailors whose rating is greater than that of some sailor called Horatio: SELECT * FROM Sailors S WHERE S.rating > ANY (SELECT S2.rating FROM Sailors S2 WHERE S2.sname=‘Horatio’) IN is equal to =ANY NOT IN is equal to <>ANY 33 More on Set-Comparison Operators Find sailors whose rating is greater than that of all sailors called Horatio: SELECT * FROM Sailors S WHERE S.rating > ALL (SELECT S2.rating FROM Sailors S2 WHERE S2.sname=‘Horatio’) What if there is no other sailor with that name? “> ALL” would return TRUE if set is empty. 34 Rewriting INTERSECT Queries Using IN Find sid’s of sailors who’ve reserved both a red and a green boat: SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ AND S.sid IN (SELECT S2.sid FROM Sailors S2, Boats B2, Reserves R2 WHERE S2.sid=R2.sid AND R2.bid=B2.bid AND B2.color=‘green’) 35 Rewriting INTERSECT Queries Using IN Find sname’s of sailors who’ve reserved both a red and a green boat: SELECT S.sname FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ AND S.sid IN (SELECT S2.sid FROM Sailors S2, Boats B2, Reserves R2 WHERE S2.sid=R2.sid AND R2.bid=B2.bid AND B2.color=‘green’) INTERSECT queries re-written using IN. Note: This query can return “sname” (and not just “sid) of Sailors who’ve reserved both red and green boats 36 Division in SQL Find sailors who’ve reserved all boats. 37 Division in SQL Find sailors who’ve reserved all boats. SELECT S.sname FROM Sailors S WHERE NOT EXISTS ((SELECT B.bid FROM Boats B) EXCEPT (SELECT R.bid FROM Reserves R WHERE R.sid=S.sid)) 38 (1) Division in SQL Find sailors who’ve reserved all boats. Can we do it without EXCEPT ? SELECT S.sname FROM Sailors S WHERE NOT EXISTS ((SELECT B.bid FROM Boats B) EXCEPT (SELECT R.bid FROM Reserves R WHERE R.sid=S.sid)) 39 (1) Division in SQL Find sailors who’ve reserved all boats. Let’s do it without EXCEPT: SELECT S.sname FROM Sailors S WHERE NOT EXISTS ((SELECT B.bid FROM Boats B) EXCEPT (SELECT R.bid FROM Reserves R WHERE R.sid=S.sid)) (2) SELECT S.sname FROM Sailors S WHERE NOT EXISTS (SELECT B.bid FROM Boats B WHERE NOT EXISTS (SELECT R.bid Sailors S such that ... FROM Reserves R there is no boat B without ... WHERE R.bid=B.bid AND R.sid=S.sid)) a Reserves tuple showing S reserved B 40 Aggregation and Having Clauses 41 Aggregate Operators Significant extension of relational algebra. COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) Why no Distinct? 42 Aggregate Operators COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) SELECT COUNT (*) FROM Sailors S Why no Distinct? SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’ 43 More Examples of Aggregate Operators SELECT S.sname FROM Sailors S WHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2) SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10 44 Find name and age of the oldest sailor(s) SELECT S.sname, MAX (S.age) FROM Sailors S •Is this first query legal? •No! •Why not ? SELECT S.sname, S.age FROM Sailors S WHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2) Is this second query legal? 47 Motivation for Grouping So far, aggregate operators to all (qualifying) tuples. Question? What if want to apply aggregate to each group of tuples. Example : Find the age of the youngest sailor for each rating level. Example Procedure : Suppose rating values {1,2,…,10}, 10 queries: For i = 1, 2, ... , 10: SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i 48 Motivation for Grouping For i = 1, 2, ... , 10: SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i What’s the problem with above ? • We don’t know how many rating levels exist. • Nor what the rating values for these levels are. 49 Add Group By to SQL 50 Queries With GROUP BY and HAVING SELECT FROM WHERE GROUP BY HAVING [DISTINCT] target-list relation-list qualification grouping-list group-qualification A group is a set of tuples that each have the same value for all attributes in grouping-list. HAVING clause is a restriction on each group. 51 Queries With GROUP BY and HAVING SELECT FROM WHERE GROUP BY HAVING [DISTINCT] target-list relation-list qualification grouping-list group-qualification target-list contains : (i) attribute names (ii) aggregate-op (column-name) (e.g., MIN (S.age)). A group is a set of tuples that each have the same value for all attributes in grouping-list. REQUIREMENT: - target-list (i) grouping-list. - Why? - Each answer tuple of a group must have single value. 52 Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1 53 GroupBy --- Conceptual Evaluation 1. 2. 3. 4. 5. Compute the cross-product of relation-list Discard tuples that fail qualification Delete `unnecessary’ fields Partition the remaining tuples into groups by the value of attributes in grouping-list. (GroupBy) Eliminate groups using the group-qualification (Having) We will have a single value per group That is, one answer tuple is generated per qualifying group. 54 Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1 Sailors instance: sid 22 29 31 32 58 64 71 74 85 95 96 sname rating age dustin 7 45.0 brutus 1 33.0 lubber 8 55.5 andy 8 25.5 rusty 10 35.0 horatio 7 35.0 zorba 10 16.0 horatio 9 35.0 art 3 25.5 bob 3 63.5 frodo 3 25.5 55 Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors. rating 7 1 8 8 10 7 10 9 3 3 3 age 45.0 33.0 55.5 25.5 35.0 35.0 16.0 35.0 25.5 63.5 25.5 rating 1 3 3 3 7 7 8 8 9 10 age 33.0 25.5 63.5 25.5 45.0 35.0 55.5 25.5 35.0 35.0 rating 3 7 8 minage 25.5 35.0 25.5 56 Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60. SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 AND S.age <= 60 GROUP BY S.rating HAVING COUNT (*) > 1 Answer relation: rating 3 7 8 minage 25.5 35.0 25.5 Sailors instance: sid 22 29 31 32 58 64 71 74 85 95 96 sname rating age dustin 7 45.0 brutus 1 33.0 lubber 8 55.5 andy 8 25.5 rusty 10 35.0 horatio 7 35.0 zorba 10 16.0 horatio 9 35.0 art 3 25.5 bob 3 63.5 frodo 3 25.5 60 For each red boat, find the number of reservations for this boat SELECT B.bid, COUNT (*) AS s-count FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid Grouping over Join of three relations. Q: What do we get if we remove B.color=‘red’ from WHERE clause and add HAVING clause with this condition? • No difference. But Illegal. • Only column in GroupBy can appear in Having, unless in aggregate operator of Having 61 Find age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age) HAVING clause can also contain a subquery. SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating) Find age of the youngest sailor with age > 18, for each rating with at least 2 such sailors (of age 18) HAVING COUNT(*) >1 62 Find those ratings for which the average age is the minimum over all ratings SELECT S.rating FROM Sailors S WHERE S.age = (SELECT MIN (AVG (S2.age)) FROM Sailors S2) WRONG : Aggregate operations cannot be nested! 63 Find those ratings for which the average age is the minimum over all ratings Correct solution (in SQL/92): SELECT Temp.rating, Temp.avg-age FROM (SELECT S.rating, AVG (S.age) AS avg-age FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avg-age = (SELECT MIN (avg-age) FROM (SELECT S.rating, AVG(S.age) as avg_age FROM Sailors S GROUP BY S.rating)) 65 Null Values 66 Null Values Field values in a tuple are sometimes : Unknown (e.g., a rating has not been assigned) or Inapplicable (e.g., no maiden-name when a male person) SQL provides a special value null for such situations. The presence of null complicates many issues. 67 Comparisons Using Null Values (S.rating = 8) is TRUE or FALSE ?? What if rating has null value in tuple ? We need a 3-valued logic : condition can be true, false or unknown. SQL special operators IS NULL to check if value is/is not null. Disallow NULL value : rating INTEGER NOT NULL 68 Comparison with NULL values Arithmetic operations on NULL return NULL. Comparison operators on NULL return UNKNOWN. We can explicitly check whether a value is null or not by IS NULL and by IS NOT NULL. 69 Truth table with UNKNOWN UNKNOWN AND TRUE = UNKNOWN UNKNOWN OR TRUE = TRUE UNKNOWN AND FALSE = FALSE UNKNOWN OR FALSE = UNKNOWN UNKNOWN AND UNKNOWN = UNKNOWN UNKNOWN OR UNKNOWN = UNKNOWN NOT UNKNOWN = UNKNOWN WHERE clause is satisfied only when it evaluates to TRUE. 70 Outer Joins : Special Operators Left Outer Join Right Outer Join Full Outer Join SELECT S.sid, R.bid FROM Sailors S LEFT OUTER JOIN Reserves R ON S.sid = R.sid Sailors rows (left) without a matching Reserves row (right) appear in result, but not vice versa. SELECT S.sid, R.bid FROM Sailors S NATURAL LEFT OUTER JOIN Reserves R 71 Sorting: ORDER BY clause SELECT * FROM Student WHERE sNumber >= 1 ORDER BY sNumber, sName (sNumber, sName) ( (sNumber >= 1) (Student)) 72 SQL Queries - Summary SELECT [DISTINCT] a1, a2, …, an FROM R1, R2, …, Rm [WHERE C1] [GROUP BY g1, g2, …, gl [HAVING C2]] [ORDER BY o1, o2, …, oj] 73 Summary SQL was important factor in early acceptance of relational model : easy-to-understand ! Relationally complete: in fact, significantly more expressive power than relational algebra. Many alternative ways to write a query. So optimizer must look for most efficient evaluation plan. In practice, users (still) need to be aware of how queries are optimized and evaluated for best results. 74