CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Today Mapping relational algebra to SQL Complex SQL Queries See “movies database sql queries” on class webpage for the actual queries SQL Formal Semantics Advanced SQL Features SQL Query Examples Movie(title, year, length, inColor, studioName, producerC#) StarsIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) Queries: Producer with maximum average length of movies Find producer of Star Wars. All producers of movies in which harrison ford stars SQL Query Examples Movie(title, year, length, inColor, studioName, producerC#) StarsIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) Queries: Find movie titles that appear more than once Find number of people 3 hops away from Kevin Bacon More SQL Select * into temptable from X1, … Having WHERE is to FROM what HAVING is to GROUPBY Duplicates By definition, relations are sets Problem: Not practical to remove duplicates after every operation Why ? So… So No duplicates allowed SQL by default does not remove duplicates SQL follows bag semantics, not set semantics Implicitly we keep count of number of copies of each tuple Formal Semantics of SQL RA can only express SELECT DISTINCT queries • To express SQL, must extend RA to a bag algebra Bags (aka: multisets) like sets, but can have duplicates e.g: {5, 3, 3} e.g: homes = cname ccity Johnson Smith Johnson Smith Brighton Perry Brighton R.H. • Next: will define RA*: a bag version of RA Formal Semantics of SQL: RA* *p (r): preserves copies in r e.g: *city = Brighton (homes) = cname Johnson Johnson ccity Brighton Brighton 2. *A1, …, An (r): no duplicate elimination e.g: *cname (homes) = cname Johnson Smith Johnson Smith Formal Semantics of SQL: RA* r * s : additive union A 1 1 2 B α α β A B 2 β 3 α 1 α * r 4. r -* s: e.g: r -* s = = s A B 1 α 1 α 2 β 2 β 3 α 1 α bag difference A B 1 α s -* r = A B 3 α Formal Semantics of SQL: RA* r * s: A 1 1 2 cartesian product B α α β * C + - = A 1 1 1 1 2 2 B α α α α β β C + + + - Formal Semantics of SQL Query: SELECT FROM WHERE a1, ….., an r1, ….., rm p Semantics: *A1, …, An (*p (r1 * … * rm) ) Query: SELECT DISTINCT FROM WHERE (1) a1, ….., an r1, ….., rm p Semantics: What is the only operator to change in (1)? A1, …, An (*p (r1 * … * rm) ) (2) Set/Bag Operations Revisited Set Operations UNION INTERSECT EXCEPT ≡ U ≡ ∩ ≡ - Bag Operations UNION ALL INTERSECT ALL EXCEPT ALL ≡ U* ≡ ∩* ≡ -* Duplicate Counting: Given m copies of t in r, n copies of t in s, how many copies of t in: r UNION ALL s? A: m + n r INTERSECT ALL s? A: min (m, n) r EXCEPT ALL s? A: max (0, m-n) SQL: Summary Clause Eval Order SELECT [(DISTINCT)] FROM WHERE INTO GROUP BY HAVING ORDER BY 4 1 2 7 3 5 6 AS UNION ALL UNION 8 (similarly intersection, except) Semantics (RA/RA*) (or *) * * Extended relational operator g * Can’t express: requires ordered sets, bags r U* U Next Database updates Modification of the Database – Deletion Delete all account records at the Perryridge branch delete from account where branch-name = ‘Perryridge’ Delete all accounts at every branch located in Needham city. delete from account where branch-name in (select branch-name from branch where branch-city = ‘Needham’) delete from depositor where account-number in (select account-number from branch, account where branch-city = ‘Needham’ and branch.branch-name = account.branch-name) Example Query Delete the record of all accounts with balances below the average at the bank. delete from account where balance < (select avg (balance) from account) Problem: as we delete tuples from deposit, the average balance changes Solution used in SQL: 1. First, compute avg balance and find all tuples to delete 2. Next, delete all tuples found above (without recomputing avg or retesting the tuples) Modification of the Database – Insertion Add a new tuple to account insert into account values (‘A-9732’, ‘Perryridge’,1200) or equivalently insert into account (branch-name, balance, accountnumber) values (‘Perryridge’, 1200, ‘A-9732’) Add a new tuple to account with balance set to null insert into account values (‘A-777’,‘Perryridge’, null) Modification of the Database – Updates Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%. Write two update statements: update account set balance = balance 1.06 where balance > 10000 update account set balance = balance 1.05 where balance 10000 The order is important Can be done better using the case statement