CS 440 Database Management Systems Lecture 2 : Review of Relational Model and SQL 1 Announcements • We have posted the first assignment. – Written assignment – Due on January 14th • Practice questions with solutions will be posted today after the class. – Text book website has solutions for some of its problems. • Piazza is up and running – The preferred way of communication with the course staff. – If you have not been invited, email our TA: subraman@onid.oregonstate.edu 2 Relational Database Management Conceptual Design Entity Relationship(ER) Model Schema Physical Layer Relational Model Files and Indexes 3 Relational Database Management Conceptual Design Entity Relationship(ER) Model Schema Relational Model Physical Layer Files and Indexes 4 Relational Database Management Conceptual Design Entity Relationship(ER) Model Schema Physical Layer Relational Model Files and Indexes 5 Relational Database Management Relational Model & SQL Conceptual Design Entity Relationship(ER) Model Schema Physical Layer Relational Model Files and Indexes 6 Relational Model • Relational model defines: – a way of organizing data: relations – operations to query and/or manipulate the data • Much easier to use than procedural languages. – Say what you want instead of how to do • Everything is a relation. – Both data and query 7 Relation: example Relation name Book Title Attribute names Price Category Year tuples MySQL $102.1 computer 2001 Cell biology $201.69 biology 1954 French cinema $53.99 art 2002 NBA History $63.65 sport 2010 8 Relation • Attributes – Atomic values – atomic types: string, integer, real, date, … • Each relation must have keys – Attributes without duplicate values – A relation does not contain duplicate tuples. • Reordering tuples does not change the relation. • Reordering attributes does not change the relation. 9 Database Schema vs. Database Instance • Schema of a Relation – Names of the relation and their attributes. – E.g.: Person (Name, Address, SSN) – Types of the attributes – Constraints on the values of the attributes • Schema of the database – Set of relation schemata – E.g.: Person (Name, Address, SSN) Employment(Company, SSN) 10 Database Schema vs. Database Instance • Schema: Book(Title, Price, Category, Year) • Instance: Title Price Category Year MySQL $102.1 computer 2001 Cell biology $201.69 biology 1954 French cinema $53.99 art 2002 NBA History $63.65 sport 2010 11 Relational algebra: operations on relations • Basic operations: – – – – – Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Cross-product ( ) Allows us to combine two relations. Set-difference ( ) Tuples in reln. 1, but not in reln. 2. Union ( ) Tuples in reln. 1 and in reln. 2. • Additional operations: – Intersection, join, … : Not essential, but (very!) useful. • Since each operation returns a relation, operations can be composed. (Algebra is “closed”.) Example Schema Beers(name, manf) Bars(name, addr, license) Drinkers(name, addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar) 13 Projection • Deletes attributes that are not in projection list. • Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation. drinker beer John Alice Smith Ron Bud Lite Bud Bud Bud Lite p drinker(Likes) drinker John Alice Smith Ron Selection • Selects rows that satisfy selection condition. • Schema of result identical to schema of (only) input relation. • Result relation can be the input for another relational algebra operation! (Operator composition.) bar beer Blind pig Bud Quality Bud price 9 15 s price<10(Sells) bar beer Blind pig Bud price 9 Union, Intersection, Set-Difference • All of these operations take two input relations, which must be union-compatible: – – Same number of fields. `Corresponding’ fields have the same type. • What is the schema of result? Cross-Product • Each row of S1 is paired with each row of R1. • Result schema has one field per field of S1 and R1. Bars name addr license Blind pig 1st St. 201 Quality 2nd St. 302 name Frequents drinker bar John Blind pig Alice Quality addr license drinker Blind pig 1st St. 201 John Blind pig Quality 2nd St. 302 Alice Quality Blind pig 1st St. 201 Alice Quality Quality 2nd St. 302 John Blind pig Bars´Frequents bar Joins Bars R c S c ( R S) name Frequents addr license Blind pig 1st St. 201 John Blind pig Quality 2nd St. 302 Alice Quality name drinker addr license Blind pig 1st St. 201 John Blind pig Quality 2nd St. 302 Alice Quality Bars ▹▹ drinker bar Bars.name=Frequents.bar bar Frequents Joins • Result schema same as that of crossproduct. • Fewer tuples than cross-product, might be able to compute more efficiently • If the condition is equality, it is called equijoin. • Natural Join: Equijoin on all common fields. SQL • A declarative language for querying data stored in relational databases – implements relational algebra with slight modifications. • Many standards: SQL92, SQL99, … – We focus on the core functionalities. 20 The Basic Form SELECT returned attribute(s) One or more FROM relation(s) WHERE conditions on the tuples of the table(s) 1. Apply the WHERE clause’s conditions on all relations in the tables in the FROM clause. 2. Return the values of the attributes in the SELECT clause. 21 Single Relation Query What beers are made by Anheuser-Busch? name Bud Bud Lite Bud 2.0 manf Anheuser-Busch Anheuser-Busch Adams SELECT name FROM Beers WHERE manf = ‘Anheuser-Busch’; name Bud Bud Lite 22 Using * What beers are made by Anheuser-Busch? SELECT * FROM Beers WHERE manf = ‘Anheuser-Busch’; name manf Bud Anheuser-Busch Bud Lite Anheuser-Busch 23 WHERE clause • • • • May have complex conditions Logical operators: OR, AND, NOT Comparison operators: <, >, =, <>,… Types specific operators: LIKE, … 24 Null Values • Some tuples may not contain any value for some of their attributes – The operator did not enter the data – The operator did not know the value –… • Ex: We do not know Fred’s salary. – Put 0.0 Fred is not on unpaid leave! • Databases use null value for these cases 25 A value not like any other value! • A tuple in Sells relation: bar Joe Bar beer price Bud NULL SELECT * FROM Sells WHERE price < 0.0 OR price >= 0.0 Does not return Joe Bar. 26 A value not like any other value! • A tuple in Sells relation: bar Joe Bar SELECT * FROM Sells WHERE price IS NULL beer price Bud NULL 27 Multi Relation Query: Join • Find relations between different types of entities: have more business value! • Ex: Using relations Likes(drinker, beer) and Frequents(drinker, bar), find the beers liked by at least one person who frequents Joe Bar. SELECT Likes.beer FROM Likes, Frequents WHERE Frequents.bar = ‘Joe Bar’ AND Frequents.drinker = Likes.drinker; 28 Join Queries • Generally, require processing large number of tuples time consuming. • Relational Database Management Systems (RDBMS) have ways to process them efficiently – We talk more about this later in the course 29 Subqueries • SQL queries that appear in WHERE or FROM parts of another query. • Example: Using Sells(bar, beer, price), find the bars that serve Miller for the same price Joe Bar charges for Bud. – Figure out Joe’s price for Bud : JoePrice – Find bars that offer Miller at price = JoePrice 30 Subqueries SELECT bar FROM Sells WHERE beer=‘Miller’ AND price= (SELECT price FROM Sells WHERE bar= ‘Joe Bar’ AND beer = ‘Bud’); Subquery 31 Subqueries: ALL, ANY • We like to compare a value to a set of values • Example: Using Sells(bar, beer, price), find the bars that serve Miller for a cheaper price than the price that every bar charges for Bud. – Figure out the set of all prices for Bud : BudPrice. – Find the bars that offer Miller at a cheaper price than all values in BudPrice. 32 Subqueries: ALL, ANY SELECT bar FROM Sells WHERE beer=‘Miller’ AND price < ALL (SELECT price FROM Sells WHERE beer = ‘Bud’); Subquery • What if we use ANY instead of ALL? • Returns the bars that serve Miller for a cheaper price than33 the price that at least one bar charges for Bud. Subqueries: IN • We like to check if the result of a subquery contains a particular value. • Example: Using Beers(name, manf) and Likes(drinker, beer) find the manf of each beer John likes. SELECT manf FROM Beers WHERE name IN A set of beers (SELECT beer FROM Likes WHERE 34 Subqueries: Exists • We like to check if a subquery has any result. • Example: Using Beers(name, manf), find the beers that are the only beer made by their manufacturers. SELECT name FROM Beers b1 WHERE NOT EXISTS (SELECT * FROM Beers WHERE manf=b1.manf AND name <> b1.name); 35 Bag versus Set • Duplicates are allowed in bags. – {a, a, b, b, b} vs. {a, b} • Generally, the results of SQL queries are bags. SELECT name FROM Beers; name Bud Bud Lite Bud manf Anheuser-Busch Anheuser-Busch B-company name Bud Bud Lite Bud 36 Removing Duplicates • Use DISTINCT name Bud Bud Lite Bud SELECT DISTINCT name FROM Beers; manf Anheuser-Busch Anheuser-Busch B-company name Bud Bud Lite 37 Set Operations • R UNION S – Returns the union between tuples of relation R and tuples of relation S. • R INTERSECT S – Returns the tuples common between relation R and relation S. • R EXCEPT S – Returns the tuples found in relation R but not in relation S. 38 Set Operations: Example • Using relations Likes(drinker, beer), Sells(bar, beer, price), and Frequents(drinker, bar), find the drinkers and beers such that – The drinker likes the beer, and – The drinker frequents at least one bar that sells the beer • “and” shows that we should compute intersection. 39 Set operations: Example (SELECT * FROM Likes) INTERSECT The drinker likes the beer (SELECT drinker, beer FROM Sells, Frequents WHERE Frequents.bar=Sells.bar); The drinker frequents at the bar that sells the beer 40 Set Operations • The results of set operations in SQL do not have any duplicate tuples. • We can force them not to remove duplicates by ALL. – .. INTERSECT … .. INTERSECT ALL … – .. UNION … .. UNION ALL … – .. DIFFERENCE … .. DIFFERENCE ALL … 41