SQL CS 186, Spring 2007, Lecture 7 R&G, Chapter 5 Mary Roth The important thing is not to stop questioning. Albert Einstein Life is just a bowl of queries. -Anon (not Forrest Gump) Administrivia • Homework 1 due Thursday, Feb 8 10 p.m. • Source code for diskmgr and global are available on class web site • Coming up: – Homework 2 handed out Feb 13 – Midterm 1: in class February 22 Questions? • Review • Query languages provide 2 key advantages: – Less work for user asking query – More opportunities for optimization • Algebra and safe calculus are simple and powerful models for query languages for relational model – Have same expressive power – Algebra is more operational; calculus is more declarative • SQL can express every query that is expressible in relational algebra/calculus. (and more) Review: Where have we been? Theory Relational Calculus Practice Lecture 6 Query Optimization and Execution Relational Operators Relational Algebra Relational Model Lecture 5 Files and Access Methods Lectures 3 &4 Buffer Management Disk Space Management Lecture 2 DB Where are we going next? This week After the midterm Practice Query Optimization and Execution Relational Operators SQL Next week Files and Access Methods Buffer Management Disk Space Management DB Review: Relational Calculus Example Find names, ages and reservation dates of sailors rated > 7 who’ve reserved boat #103 3 quantifiers, but only 1 is free. The free quantifier defines the shape of the result. sid S 22 S 31 S 58 R {S1 | SSailors S.rating > 7 R(RReserves R.bid = 103 R.sid = S.sid) (S1.sname = S.name S1.age = S.age S1.day = R.day)} R sname rating age dustin 7 45.0 lubber 8 55.5 rusty 10 35.0 sid 22 58 bid 101 103 sname S1 day 10/10/96 11/12/96 age day rusty 35.0 11/12/96 Review: The SQL Query Language • The most widely used relational query language. • Standardized (although most systems add their own “special sauce” -- including PostgreSQL) • We will study SQL92 -- a basic subset Review: SQL • Two sublanguages: – DDL – Data Definition Language • Define and modify schema (at all 3 levels) – DML – Data Manipulation Language • Queries and IUD (insert update delete) • DBMS is responsible for efficient evaluation. – Relational completeness means we can define precise semantics for relational queries. – Optimizer can re-order operations, without affecting query answer. – Choices driven by “cost model” Review: DDL CREATE TABLE Sailors (sid INTEGER,NOT NULL, sname CHAR(20), rating INTEGER, age REAL, PRIMARY KEY sid) CREATE TABLE Boats (bid INTEGER,NOT NULL, bname CHAR (20), color CHAR(10) PRIMARY KEY bid) CREATE TABLE Reserves (sid INTEGER, NOT NULL, bid INTEGER, NOT NULL, day DATE,NOT NULL, PRIMARY KEY (sid, bid, day), FOREIGN KEY sid REFERENCES Sailors, FOREIGN KEY bid REFERENCES Boats) Sailors sid sname rating age 1 Frodo 7 22 2 Bilbo 2 39 3 Sam 8 27 Boats bid bname color 101 Nina red 102 Pinta blue 103 Santa Maria red Reserves sid bid day 1 102 9/12 2 102 9/13 Integrity Constraints (ICs) • A foreign key constraint is an Integrity Constraint: – a condition that must be true for any instance of the database; – Specified when schema is defined. – Checked when relations are modified. • Primary/foreign key constraints; but databases support more general constraints as well. – e.g. domain constraints like: • Rating must be between 1 and 10 ALTER TABLE SAILORS ADD CONSTRAINT RATING CHECK (RATING >= 1 AND RATING < 10) • Or even more complex (and potentially nonsensical): ALTER TABLE SAILORS ADD CONSTRAINT RATING CHECK (RATING*AGE/4 <= SID) DBMSs have fairly sophisticated support for constraints! • Specify them on CREATE or ALTER TABLE statements • Column Constraints: expressions for column constraint must produce boolean results and reference the related column’s value only. NOT NULL | NULL | UNIQUE | PRIMARY KEY | CHECK ( expression) FOREIGN KEY (column) referenced_table [ ON DELETE action ] [ ON UPDATE action ] } action is one of: NO ACTION, CASCADE, SET NULL, SET DEFAULT DBMSs have fairly sophisticated support for constraints! • Table Constraints: UNIQUE ( column_name [, ... ] ) PRIMARY KEY ( column_name [, ... ] ) | CHECK ( expression ) | FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ON DELETE action ] [ ON UPDATE action ] } Here, expressions, keys, etc can include multiple columns Integrity Constraints can help prevent data consistency errors • …but they have drawbacks: – Expensive – Can’t always return a meaningful error back to the application. e.g: What if you saw this error when you enrolled in a course online? “A violation of the constraint imposed by a unique index or a unique constraint occurred”. – Can be inconvenient e.g. What if the ‘Sailing Class’ application wants to register new (unrated) sailors with rating 0? • So they aren’t widely used – Software developers often prefer to keep the integrity logic in applications instead Intermission SQL DML • DML includes 4 main statements: SELECT (query), INSERT, UPDATE and DELETE We’ll spend a lot of time on this one • e.g: To find the names of all 19 year old students: PROJECT SELECT S.name FROM Students S WHERE S.age=19 SELECT sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@ee 18 3.2 53650 Smith smith@math 19 3.8 Querying Multiple Relations • Can specify a join over two tables as follows: SELECT S.name, E.cid PROJECT SELECT FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B' JOIN sid 53831 53831 53650 53666 cid grade Carnatic101 C Reggae203 B Topology112 A History105 B sid name 53666 Jones login 18 3.4 53688 Smith smith@ee 18 3.2 result = jones@cs age gpa S.name Jones E.cid History105 Basic SQL Query DISTINCT: optional keyword indicating target-list : A list of attributes answer should not contain duplicates. of tables in relation-list In SQL, default is that duplicates are not eliminated! (Result is called a “multiset”) SELECT [DISTINCT] target-list FROM relation-list WHERE qualification qualification : Comparisons combined using AND, OR and NOT. Comparisons are Attr op const or Attr1 op Attr2, where op is one of ,,,, etc. relation-list : A list of relation names, possibly with a rangevariable after each name Query Semantics • Semantics of an SQL query are defined in terms of the following conceptual evaluation strategy: 1. FROM clause: compute cross-product of all tables 2. WHERE clause: Check conditions, discard tuples that fail. (called “selection”). 3. SELECT clause: Delete unwanted fields. (called “projection”). 4. If DISTINCT specified, eliminate duplicate rows. • Probably the least efficient way to compute a query! – An optimizer will find more efficient strategies to get the same answer. Query Semantics Example SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103 Boats Sailors sid sname rating age 1 Frodo 7 22 2 Bilbo 2 39 3 Sam 8 27 bid bname color 101 Nina red 102 Pinta blue 103 Santa Maria red X Reserves sid bid day 1 102 9/12 2 103 9/13 Step 1: Compute the cross product Sailors Reserves sid sname rating age sid bid day 1 Frodo 7 22 1 102 9/12 2 Bilbo 2 39 2 103 9/13 3 Sam 8 27 ... SailorsXReserves sid sname rating age sid bid day 1 Frodo 7 22 1 102 9/12 1 Frodo 7 22 2 103 9/13 2 Bilbo 2 39 1 102 9/12 2 Bilbo 2 39 2 103 9/13 3 Sam 8 27 1 103 9/12 3 Sam 8 27 2 103 9/13 Step 1: How big? Sailors Reserves sid sname rating age sid bid day 1 Frodo 7 22 1 102 9/12 2 Bilbo 2 39 2 103 9/13 3 Sam 8 27 Question: If |S| is cardinality of Sailors, and |R| is cardinality of Reserves, What is the cardinality of Sailors X Reserves? Answer: |S| * |R| |Sailors X Reserves| = 3X2 = 6 Step 2: Check conditions in where clause SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103 SailorsXReserves sid sname rating age sid bid day 1 Frodo 7 22 1 102 9/12 1 Frodo 7 22 2 103 9/13 2 Bilbo 2 39 1 102 9/12 2 Bilbo 2 39 2 103 9/13 3 Sam 8 27 1 102 9/12 3 Sam 8 27 2 103 9/13 Step 3: Delete unwanted fields SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103 SailorsXReserves sid sname rating age sid bid day 1 Frodo 7 22 1 102 9/12 1 Frodo 7 22 2 103 9/13 2 Bilbo 2 39 1 102 9/12 2 Bilbo 2 39 2 103 9/13 3 Sam 8 27 1 102 9/12 3 Sam 8 27 2 103 9/13 Range Variables •Used for short hand •Needed when ambiguity could arise e.g two tables with the same column name: SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND Reserves.bid=103 SELECT sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103 Question: do range variables remind you of anything? Variables in relational calculus Sometimes you need a range variable e.g a Self-join: SELECT R1.bid, R1.date FROM Reserves R1, Reserves R2 WHERE R1.bid = R2.bid and R1.date = R2.date and R1.sid != R2.sid R1 day 103 9/12 103 9/12 Reserves Reserves R1 R1 bid sid bid day 1 102 9/12 3 103 9/12 4 103 9/13 2 103 9/12 R2 R2 R2 R2 sid bid day 1 102 9/12 3 103 9/12 4 103 9/13 2 103 9/12 Sometimes you need a range variable SELECT R1.bid, R1.day FROM Reserves R1, Reserves R2 WHERE R1.bid = R2.bid and R1.day = R2.day and R1.sid != R2.sid bid day 103 9/12 103 9/12 What are we computing? Boats reserved on the same day by different sailors SELECT Clause Expressions • Can use “*” if you want all columns: SELECT * FROM Sailors x WHERE x.age > 20 • Can use arithmetic expressions (add other operations we’ll discuss later) SELECT S.age, S.age-5 AS age1, 2*S.age AS age2 FROM Sailors S WHERE S.sname = ‘Dustin’ • Can use AS to provide column names SELECT S1.sname AS name1, S2.sname AS name2 FROM Sailors S1, Sailors S2 WHERE 2*S1.rating = S2.rating - 1 WHERE Clause Expressions • Can also have expressions in WHERE clause: SELECT S1.sname AS name1, S2.sname AS name2 FROM Sailors S1, Sailors S2 WHERE 2*S1.rating = S2.rating - 1 •“LIKE” is used for string matching. SELECT S.age, S.age-5 AS age1, 2*S.age AS age2 FROM Sailors S WHERE S.sname LIKE ‘B_l%o’ `_’ stands for any one character and `%’ stands for 0 or more arbitrary characters. SELECT DISTINCT Sailors Reserves sid sname rating age sid bid day 1 Frodo 7 22 1 102 9/12 2 Bilbo 2 39 2 103 9/12 3 Sam 8 27 2 102 9/13 SELECT DISTINCT S.sid FROM Sailors S, Reserves R WHERE S.sid=R.sid sid Find sailors that have reserved at least one boat 1 2 SELECT DISTINCT • How about: SELECT S.sid FROM Sailors S, Reserves R WHERE S.sid=R.sid sid 1 2 2 Sailors SELECT DISTINCT How about: SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid vs: SELECT DISTINCT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid sid sname rating age sname Frodo Bilbo Bilbo 1 Frodo 7 22 2 Bilbo 2 39 3 Sam 8 27 4 Bilbo 5 32 Reserves sid bid day 1 102 9/12 2 103 9/13 4 105 9/13 sname Frodo Bilbo Do we find all sailors that reserved at least one boat? ANDs, ORs, UNIONs and INTERSECTs Find sids of sailors who’ve reserved a red or a green boat SELECT R.sid FROM Boats B,Reserves R WHERE(B.color=‘red’ OR B.color=‘green’) AND R.bid=B.bid sid 2 Sailors sid sname rating age X 1 Frodo 7 22 2 Bilbo 2 39 3 Sam 8 27 4 Boats bid bname color 101 Nina red 102 Pinta blue 103 Santa Maria red 105 Titanic green Reserves sid bid day 1 102 9/12 2 103 9/13 4 105 9/13 ANDs and ORs Find sids of sailors who’ve reserved a red and a green boat Boats SELECT R.sid bid bname FROM Boats B,Reserves R 101 Nina WHERE(B.color=‘red’ AND 102 Pinta B.color=‘green’) 103 Santa Maria AND R.bid=B.bid X 105 Sailors sid sname rating age 1 Frodo 7 22 2 Bilbo 2 39 3 Sam 8 27 Titanic color red blue red green Reserves sid bid day 1 101 9/12 2 103 9/13 1 105 9/13 Use INTERSECT instead of AND SELECT R.sid FROM Boats B,Reserves R WHERE B.color = ‘red’ AND R.bid=B.bid INTERSECT SELECT R.sid FROM Boats B,Reserves R WHERE B.color = ‘green’ AND R.bid=B.bid sid 1 2 sid 1 = Exercise: try to rewrite this query using a self join instead of INTERSECT! Boats bid bname color 101 Nina red 102 Pinta blue 103 Santa Maria red 105 Titanic green Reserves sid bid day sid 1 101 9/12 1 2 103 9/13 1 105 9/13 Could also use UNION for the OR query SELECT R.sid FROM Boats B, Reserves R WHERE B.color = ‘red’ AND R.bid=B.bid UNION SELECT R.sid FROM Boats B, Reserves R WHERE B.color = ‘green’ AND R.bid=B.bid sid 2 sid 4 = Boats bid bname color 101 Nina red 102 Pinta blue 103 Santa Maria red 105 Titanic green Reserves sid sid bid day 2 1 102 9/12 4 2 103 9/13 4 105 9/13 EXCEPT: Set Difference Find sids of sailors who have not reserved a boat SELECT S.sid FROM Sailors S EXCEPT SELECT S.sid FROM Sailors S, Reserves R WHERE S.sid=R.sid First find the set of sailors who have reserved a boat… and then compare it with the rest of the sailors Sailors sid sname rating age 1 Frodo 7 22 2 Bilbo 2 39 3 Sam 8 27 Reserves sid bid day 1 102 9/12 2 103 9/13 1 105 9/13 sid 3