SQL REVIEW Example Schema We will use these table definitions in our examples. Sailors(sid: integer, sname: string, rating: integer, age: real) Boats(bid: integer, bname: string, color: string) Reserves(sid: integer, bid: integer, day: date) Basic SQL Query SELECT FROM WHERE [DISTINCT] target-list relation-list qualification relation-list A list of relation names target-list A list of attributes of relations in relation-list qualification Comparisons (“Attr op const” or “Attr1 op Attr2,” where op is one of ˂, ˃, ≤, ≥, =, ≠ ) combined using AND, OR, and NOT. DISTINCT is an optional keyword indicating that the answer should not contain duplicates. Querying Relations (1) What does the following query compute? Enrolled sid 53831 53831 53650 53666 cid grade Carnatic101 C Reggae203 B Topology112 A History105 B SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=“A” A student with “sid” has an entry in Enrolled Students sid name login age gpa 53831 Zhang zhang@ee 19 3.2 53650 Smith smith@cs 21 3.9 53666 Jones jones@cs 20 3.5 The Enrolled entry has a grade of “A” Retrieve names of students and the courses they received an “A” grade Querying Relations (2) Enrolled sid 53831 53831 53650 53666 cid grade Carnatic101 C Reggae203 B Topology112 A History105 B Students sid name login age gpa 53831 Zhang zhang@ee 19 3.2 53650 Smith smith@cs 21 3.9 53666 Jones jones@cs 20 3.5 SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=“A” S.name E.cid Smith Topology112 Semantics of SQL QUERY PROCESSOR SELECT FROM WHERE [DISTINCT] target-list relation-list qualification This strategy is probably the least efficient way to compute a query! An optimizer will find more efficient strategies to compute the same answers. Query Result R1 × R2 × R3 × · · · Define search space ˂, ˃, ≤, ≥, =, ≠ Select rows Projection Select columns Example of Conceptual Evaluation Range SELECT S.sname variable FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103 Reservation sid bid 22 58 101 103 Sailors sid sname rating age 22 31 58 dustin lubber rusty 7 8 10 45.0 55.5 35.0 day 10/10/96 11/12/96 Example of Conceptual Evaluation SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103 S×R (sid) 22 sname dustin rating 7 age 45.0 (sid) 22 bid day 101 10/10/96 22 31 31 dustin lubber lubber 7 8 8 45.0 55.5 55.5 58 22 58 103 11/12/96 101 10/10/96 103 11/12/96 58 58 rusty rusty 10 10 35.0 35.0 22 58 101 10/10/96 103 11/12/96 Answer A Note on Range Variables Really needed only if the same relation appears twice in the FROM clause. The previous query can be written in two ways: Range variable SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103 OR SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103 It is good style, however, to use range variables always! Aggregate Operators Significant extension of relational algebra COUNT (*) The number of rows in the relation The number of (unique) values in the A COUNT ([DISTINCT] A) column The sum of all (unique) values in the A SUM ([DISTINCT] A) column The average of all (unique) values in the A AVG ([DISTINCT] A) column MAX (A) The maximum value in the A column MIN (A) The minimum value in the A column Aggregate Operators SELECT COUNT (*) FROM Sailors S SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 Count the Find the name of number of sailors with the sailors highest rating Compute maximum rating SELECT S.sname FROM S age Find the Sailors average WHERE S.rating= (S2.rating) Find the MAX average of the of sailors with a(SELECT Count the number FROM ofdistinct ages Sailors ofS2) sailors rating of 10 distinct ratings of with a rating of 10 sailors called “Bob” SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’ SELECT AVG (DISTINCT S.age) FROM Sailors S WHERE S.rating=10 GROUP BY and HAVING (1) • So far, we’ve applied aggregate operators to all (qualifying) tuples. 32 Aggregator Qualifier Relation Aggregator SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 Qualifier Find the average age of sailors with a rating of 10 GROUP BY and HAVING (1) • So far, we’ve applied aggregate operators to all (qualifying) tuples. Aggregator 32 Qualifier Relation • Sometimes, we want to apply them to each of several groups of tuples. 12 Aggregator Group 1 9 Aggregator Group 2 11 Aggregator Group 3 Relation Queries With GROUP BY and HAVING SELECT FROM WHERE GROUP BY HAVING [DISTINCT] target-list relation-list qualification MIN(Attribute) grouping-list group-qualification HAVING 12 Aggregator 9 Aggregator Qualifier selecting groups GROUP BY Group 1 Group 2 Group 3 Output a table SELECT FROM WHERE Find the age of the youngest sailor with age ≥ 18, for each rating with at least 2 such sailors SELECT FROM WHERE GROUP BY HAVING S.rating, MIN (S.age) Sailors S S.age >= 18 S.rating COUNT (*) > 1 rating age 7 35.0 Only one 4 rating group groups satisfies HAVING Answer rating age Disqualify 1 7 7 8 10 33.0 45.0 35.0 55.5 35.0 Input relation Sailors sid 22 31 71 64 29 58 sname rating age dustin 7 45.0 lubber 8 55.5 zorba 10 16.0 horatio 7 35.0 brutus 1 33.0 rusty 10 35.0 Only S.rating and S.age are mentioned in SELECT Summary • SQL was an important factor in the early acceptance of the relational model; more natural than earlier, procedural query languages. • Relationally complete; in fact, significantly more expressive power than relational algebra. • Even queries that can be expressed in RA can often be expressed more naturally in SQL. • Many alternative ways to write a query; optimizer should look for most efficient evaluation plan. In practice, users need to be aware of how queries are optimized and evaluated for best results.