COP5725 Advanced Database Systems Spring 2016 DB Fundamentals Tallahassee, Florida, 2016 What are Database Management Systems DBMS is a system for providing EFFICIENT, CONVENIENT, and SAFE MULTI-USER storage of and access to MASSIVE amounts of PERSISTENT data 1 Example: Banking System • Data • Information on accounts, customers, balances, current interest rates, transaction histories, etc. • MASSIVE • Many gigabytes at a minimum for big banks, more if keep history of all transactions, even more if keep images of checks -> Far too big to fit in main memory • PERSISTENT • Data outlives programs that operate on it 2 Example: Banking System • SAFE: – from system failures – from malicious users • CONVENIENT: – simple commands to debit account, get balance, write statement, transfer funds, etc. – also unpredicted queries should be easy • EFFICIENT: – don't search all files in order to get balance of one account, get all accounts with low balances, get large transactions, etc. – massive data! -> DBMS's carefully tuned for performance 3 Multi-user Access • Many people/programs accessing same database, or even same data, simultaneously -> Need careful controls – Alex @ ATM1: withdraw $100 from account #007 get balance from database; if balance >= 100 then balance := balance - 100; dispense cash; put new balance into database; – Bob @ ATM2: withdraw $50 from account #007 get balance from database; if balance >= 50 then balance := balance - 50; dispense cash; put new balance into database; – Initial balance = 120. Final balance = ?? 4 Why File Systems Won’t Work • Storing data: file system is limited – size limit by disk or address space – when system crashes we may lose data – Password/file-based authorization insufficient • Query/update: – need to write a new C++/Java program for every new query – need to worry about performance • Concurrency: limited protection – need to worry about interfering with other users – need to offer different views to different users (e.g. registrar, students, professors) • Schema change: – entails changing file formats – need to rewrite virtually all applications That’s why the notion of DBMS was motivated! 5 DBMS Architecture User/Web Forms/Applications/DBA query Query Parser transaction DDL commands Transaction Manager DDL Processor Concurrency Control Logging & Recovery Query Rewriter Query Optimizer Query Executor Records Indexes Buffer Manager Storage Manager Storage CS411 Lock Tables Buffer: data, indexes, log, etc Main Memory data, metadata, indexes, log, etc 6 Data Structuring: Model, Schema, Data • Data model – conceptual structuring of data stored in database – ex: data is set of records, each with student-ID, name, address, courses, photo – ex: data is graph where nodes represent cities, edges represent airline routes • Schema versus data – schema: describes how data is to be structured, defined at set-up time, rarely changes (also called "metadata") – data: actual "instance" of database, changes rapidly – vs. types and variables in programming languages 7 Schema vs. Data • Schema: name, name of each field, the type of each field – Students (Sid:string, Name:string, Age: integer, GPA: real) – A template for describing a student • Data: an example instance of the relation Sid Name Age GPA 0001 Alex 19 3.55 0002 Bob 22 3.10 0003 Chris 20 3.80 0004 David 20 3.95 0005 Eugene 21 3.30 8 Data Structuring: Model, Schema, Data • Data definition language (DDL) – commands for setting up schema of database • Data Manipulation Language (DML) – Commands to manipulate data in database: • RETRIEVE, INSERT, DELETE, MODIFY – Also called "query language" 9 People • DBMS user: queries/modifies data • DBMS application designer – set up schema, loads data, … • DBMS administrator – user management, performance tuning, … • DBMS implementer: builds systems 10 Key Steps in Building DB Applications • Step 0: pick an application domain • Step 1: conceptual design – Discuss with your team mate what to model in the application domain – Need a modeling language to express what you want • ER model is the most popular such language – Output: an ER diagram of the application domain • Step 2: pick a type of DBMS’s – Relational DBMS is most popular and is our focus 11 Key Steps in Building DB Applications • Step 3: translate ER design to a relational schema – Use a set of rules to translate from ER to relational schema – Use a set of schema refinement rules to transform the above relational schema into a good relational schema • 1NF, 2NF, 3NF, BCNF, 4NF,…… – At this point • You have a good relational schema on paper 12 Key Steps in Building DB Applications • Step 4: Implement your relational DBMS using a "database programming language" called SQL – SELECT-FROM-WHERE-GROUPBY-HAVING • Step 5: Ordinary users cannot interact with the database directly and the database also cannot do everything you want, hence write your application program in C++, Java, PHP, etc. to handle the interaction and take care of things that the database cannot do 13 Constraints • Constraint: an assertion about the database that must be true at all times – Part of the database schema – Very important in database design • Finding constraints is part of the modeling process – Keys: social security number uniquely identifies a person – Single-value constraints: a person can have only one father – Referential integrity constraints: if you work for a company, it must exist in the database – Domain constraints: peoples’ ages are between 0 and 150 – General constraints: all others (at most 30 students enroll in a class) 14 More about Keys • Every entity must have a key – why? • A key can consist of more than one attribute • There can be more than one key for an entity set – Among all candidate keys, one key will be designated as primary key 15 ER Model vs. Relational Model • Both are used to model data • ER model has many concepts – Entities, relationships, attributes, etc. – Well-suited for capturing the app. requirements – Not well-suited for computer implementation • Relational model – Has just a single concept: relation (table) – World is represented with a collection of tables – Well-suited for efficient manipulations on computers 16 Relation: An Example Name of Table (Relation) Column (Field, Attribute) Products Name Price Category Manufacturer Gizmo 19.99 Gadgets Gizmo works Power gizmo 29.99 Gadgets Gizmo works Single touch 149.99 Photography Canon Multi touch 203.99 househould Hitachi Row (Record, Tuple) Domain (Atomic type) 17 Relations • Schema vs. instance = columns vs. rows • Schema of a relation 1. Relation name 2. Attribute names 3. Attribute types (domains) • Schema of a database – A set of relation schemas • Questions – When do you determine a schema (instance)? – How often do you change your mind? 18 Relations • The database maintains a current database state • Updates to the data happen very frequently – add a tuple – delete a tuple – modify an attribute in a tuple • Updates to the schema are relatively rare, and rather painful. Why? 19 Defining a Database Schema • A database schema comprises declarations for the relations (“tables”) of the database • Simplest form of creation is: CREATE TABLE <name> ( <list of elements> ); • And you may remove a relation from the database schema by: DROP TABLE <name>; 20 Elements of Table Declarations • The principal element is a pair consisting of an attribute and a type • The most common types are: – INT or INTEGER (synonyms) – REAL or FLOAT (synonyms) – CHAR(n ) = fixed-length string of n characters – VARCHAR(n ) = variable-length string of up to n characters 21 Example: Create Table CREATE TABLE bar beer price ); Sells ( CHAR(20), VARCHAR(20), REAL 22 Declaring Keys • An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE – Each says the attribute(s) so declared functionally determines all the attributes of the relation schema – Single attribute keys CREATE TABLE Beers ( name CHAR(20) UNIQUE, manf CHAR(20) ); 23 Multi-attribute Keys CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL, PRIMARY KEY (bar, beer) ); 24 Foreign Keys • A Foreign Key is a field whose values are keys in another relation – Must correspond to primary key of the second relation – Like a `logical pointer’ Enrolled sid 53666 53666 53650 53666 Students cid Carnatic101 Reggae203 Topology112 History105 grade C B A B sid 53666 53688 53650 name login Jones jones@cs Smith smith@eecs Smith smith@math CREATE TABLE Enrolled ( sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students, FOREIGN KEY (cid) REFERENCES Courses ) age 18 18 19 gpa 3.4 3.2 3.8 25 Relational Algebra • Querying the database: specify what we want from our database – Find all the people who earn more than $1,000,000 and pay taxes in Tallahassee • Could write in C++/Java, but a bad idea • Instead use high-level query languages: – Theoretical: Relational Algebra, Datalog – Practical: SQL – Relational algebra: a basic set of operations on relations that provide the basic principles 26 What is an “Algebra”? • Mathematical system consisting of: – Operands --- variables or values from which new values can be constructed – Operators --- symbols denoting procedures that construct new values from given values • Examples – Arithmetic algebra, linear algebra, Boolean algebra …… • What are operands? • What are operators? 27 What is Relational Algebra? • An algebra – Whose operands are relations or variables that represent relations – Whose operators are designed to do common things that we need to do with relations in a database • relations as input, new relation as output – Can be used as a query language for relations 28 Relational Operators at a Glance • Five basic RA operations: – Basic Set Operations • union, difference (no intersection, no complement) – Selection: s – Projection: p – Cartesian Product: X • When our relations have attribute names: – Renaming: r • Derived operations: – Intersection, complement – Joins (natural join, equi-join, theta join, semi-join) 29 Set Operations • Union: all tuples in R1 or R2, denoted as R1 U R2 – R1, R2 must have the same schema – R1 U R2 has the same schema as R1, R2 – Example: • Active-Employees U Retired-Employees • Difference: all tuples in R1 and not in R2, denoted as R1 – R2 – R1, R2 must have the same schema – R1 - R2 has the same schema as R1, R2 – Example • All-Employees - Retired-Employees 30 Selection • Returns all tuples which satisfy a condition, denoted as sc(R) – c is a condition: =, <, >, AND, OR, NOT – Output schema: same as input schema – Find all employees with salary more than $40,000: • sSalary > 40000 (Employee) SSN Name Dept-ID Salary 111060000 Alex 1 30K SSN Name Dept-ID Salary 754320032 Bob 1 32K 983210129 Chris 2 45K 983210129 Chris 2 45K 31 Projection • Unary operation: returns certain columns, denoted as P A1,…,An (R) – – – – Eliminates duplicate tuples ! Input schema R(B1, …, Bm) Condition: {A1, …, An} {B1, …, Bm} Output schema S(A1, …, An) • Example: project social-security number and names: – P SSN, Name (Employee) SSN Name Dept-ID Salary SSN Name 111060000 Alex 1 30K 111060000 Alex 754320032 Bob 1 32K 754320032 Bob 983210129 Chris 2 45K 983210129 Chris 32 Selection vs. Projection • Think of relation as a table – How are they similar? – How are they different? – Why do you need both? 33 Cartesian Product • Each tuple in R1 with each tuple in R2, denoted as R1 x R2 – Input schemas R1(A1,…,An), R2(B1,…,Bm) – Output schema is S(A1, …, An, B1, …, Bm) – Very rare in practice; but joins are very common – Example: Employee x Dependent 34 Example Employee Dependent SSN Name Employee-SSN Dependent-Name 111060000 Alex 111060000 Chris 754320032 Brandy 754320032 David Employee x Dependent SSN Name Employee-SSN Dependent-Name 111060000 Alex 111060000 Chris 111060000 Alex 754320032 David 754320032 Brandy 111060000 Chris 754320032 Brandy 754320032 David 35 Renaming • Does not change the relational instance, denoted as Notation: r S(B1,…,Bn) (R) • Changes the relational schema only – Input schema: R(A1, …, An) – Output schema: S(B1, …, Bn) • Example: rSoc-sec-num, firstname(Employee) SSN Name Soc-sec-num firstname 111060000 Alex 111060000 Alex 754320032 Bob 754320032 Bob 983210129 Chris 983210129 Chris 36 Set Operations: Intersection • Intersection: all tuples both in R1 and in R2, denoted as R1 R2 – R1, R2 must have the same schema – R1 R2 has the same schema as R1, R2 – Example • UnionizedEmployees RetiredEmployees • Intersection is derived: – R1 R2 = R1 – (R1 – R2) why ? 37 Theta Join • A join that involves a predicate q, denoted as R1 q R2 – Input schemas: R1(A1,…,An), R2(B1,…,Bm) – Output schema: S(A1,…,An,B1,…,Bm) – Derived operator: R1 q R2 = s q (R1 x R2) • Take the product R1 x R2 • Then apply SELECTC to the result • As for SELECT, C can be any Boolean-valued condition 38 Theta Join: Example Sells Bar Name Address Bar Beer Price AJ's 1800 Tennessee AJ’s Bud 2.5 Michael's Pub 513 Gaines AJ’s Miller 2.75 Michael’s Pub Bud 2.5 Michael’s Pub Corona 3.0 BarInfo := Sells Sells.Bar=Bar.Name Bar Bar Beer Price Name Address AJ’s Bud 2.5 AJ's 1800 Tennessee AJ’s Miller 2.75 AJ's 1800 Tennessee Michael’s Pub Bud 2.5 Michael's Pub 513 Gaines Michael’s Pub Corona 3.0 Michael's Pub 513 Gaines 39 Natural Join • Notation: R1 R2 • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) – Where {C1, …, Cp} = {A1, …, An} U{B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: – {A1,…,An} {B1,…, Bm} (called the join attributes) 40 Natural Join: Examples Employee Dependent SSN Name SSN Dependent-Name 111060000 Alex 111060000 Chris 754320032 Brandy 754320032 David Employee Dependent = P SSN, Name, Dependent-Name(sEmployee.SSN=Dependent.SSN(Employee x Dependent) SSN Name Dependent-Name 111060000 Alex Chris 754320032 Brandy David 41 Natural Join: Examples R S A B B C X Y Z U X Z V W Y Z Z V Z V R S A B C X Z U X Z V Y Z U Y Z V Z V W 42 Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? • Given R(A, B, C), S(D, E), what is R • Given R(A, B), S(A, B), what is R S? S? 43 Equi-join • Special case of theta join: condition c contains only conjunction of equalities – Result schema is the same as that of Cartesian product – May have fewer tuples than Cartesian product – Most frequently used in practice: R1 A=B R2 – Natural join is a particular case of equi-join – A lot of research on how to do it efficiently 44 Building Complex Expressions • Algebras allow us to express sequences of operations in a natural way – Example • In arithmetic algebra: (x + 4)*(y - 3) – Relational algebra allows the same • Three notations, just as in arithmetic: 1. Sequences of assignment statements 2. Expressions with several operators 3. Expression trees 45 Sequences of Assignments • Create temporary relation names • Renaming can be implied by giving relations a list of attributes • Example: R3 := R1 JOINC R2 can be written: R4 := R1 x R2 R3 := SELECTC (R4) 46 Expressions with Several Operators • Example: the theta-join R3 := R1 JOINC R2 can be written: R3 := SELECTC (R1 x R2) • Precedence of relational operators: 1. Unary operators --- select, project, rename --- have highest precedence, bind first 2. Then come products and joins 3. Then intersection 4. Finally, union and set difference bind last But you can always insert parentheses to force the order you desire 47 Expression Trees • Leaves are operands – either variables standing for relations or particular constant relations • Interior nodes are operators, applied to their child or children 48 Expression Tree: Examples Given Bars(name, addr), Sells(bar, beer, price), find the names of all the bars that are either on Tennessee St. or sell Bud for less than $3 UNION RENAMER(name) PROJECTname SELECTaddr = “Tennessee St.” Bars PROJECTbar SELECT price<3 AND beer=“Bud” Sells 49 Question: How to do this? • Using Sells(bar, beer, price), find the bars that sell two different beers at the same price 50 Glimpse Ahead: Efficient Implementations of Operators • s(age >= 30 AND age <= 35)(Employees) – Method 1: scan the file, test each employee – Method 2: use an index on age – Which one is better ? Depends a lot… • Employees – – – – – Relatives Iterate over Employees, then over Relatives Iterate over Relatives, then over Employees Sort Employees, Relatives, do “merge-join” “hash-join” Etc. 51 Glimpse Ahead: Optimizations Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Person(ssn, name, phone number, city) • Which is better: sprice>100(Product) (Purchase (sprice>100(Product) Purchase) scity=seaPerson) scity=seaPerson • Depends ! This is the optimizer’s job… 52 SQL • Standard language for querying and manipulating data – SQL stands for Structured Query Language – Initially developed at IBM by Donald Chamberlin and Raymond Boyce in the early 1970s, and called SEQUEL (Structured English Query Language) – Many standards out there: SQL92, SQL2, SQL3, SQL99 – Vendors support various subsets of these standards • Why SQL? – A very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in languages like C++ – Its queries are “optimized” quite well, yielding efficient query executions 53 Introduction • Two sublanguages – DDL – Data Definition Language • define and modify schema CREATE TABLE table_name ( { column_name data_type [ DEFAULT default_expr ] [ column_constraint [, ... ] ] | table_constraint } [, ... ] ) – DML – Data Manipulation Language • Queries can be written intuitively Select-From-Where 54 Select-From-Where Statements • The principal form of a SQL query is: SELECT desired attributes FROM one or more tables WHERE condition about tuples of the tables 55 Our Running Example • Most of our SQL queries will be based on the following database schema – Underline indicates key attributes Beers(name, manf) Bars(name, addr, license) Drinkers(name, addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar) 56 Select-From-Where Example • Using Beers(name, manf), what beers are made by Busch? SELECT name FROM Beers Name ‘Bud’ ‘Bud Lite’ ‘Michelob’ WHERE manf = ‘Busch’; • The answer is a relation with a single attribute name, and tuples with the name of each beer by Busch, such as Bud 57 Single-Relation Query • Operation 1. Begin with the relation in the FROM clause 2. Apply the selection indicated by the WHERE clause 3. Apply the extended projection indicated by the SELECT clause • Semantics 1. To implement this algorithm think of a tuple variable ranging over each tuple of the relation mentioned in FROM 2. Check if the “current” tuple satisfies the WHERE clause 3. If so, compute the attributes or expressions of the SELECT clause using the components of this tuple 58 * In SELECT clauses • When there is one relation in the FROM clause, * in the SELECT clause stands for “all attributes of this relation.” • Example using Beers(name, manf): SELECT * FROM Beers WHERE manf = ‘Busch’; Name manf ‘Bud’ ‘Busch’ ‘Bud Lite’ ‘Busch’ ‘Michelob’ ‘Busch’ Now, the result has each of the attributes of Beers 59 Renaming Attributes • If you want the result to have different attribute names, use “AS <new name>” to rename an attribute • Example based on Beers(name, manf): SELECT name AS beer, manf FROM Beers WHERE manf = ‘Busch’ beer manf ‘Bud’ ‘Busch’ ‘Bud Lite’ ‘Busch’ ‘Michelob’ ‘Busch’ 60 Expressions in SELECT Clauses • Any expression that makes sense can appear as an element of a SELECT clause • Example: from Sells(bar, beer, price): SELECT bar, beer, price * 120 AS priceInYen FROM Sells; bar beer priceInYen Joe’s Bud 300 Sue’s Miller 360 … … … 61 Complex Conditions in WHERE Clause • From Sells(bar, beer, price), find the price Joe’s Bar charges for “cheap” beers: SELECT price FROM Sells WHERE bar = ‘joe bar’ AND price < 5.0; 62 Selections • What you can use in WHERE: – attribute names of the relation(s) used in the FROM – comparison operators: =, <>, <, >, <=, >= – apply arithmetic operations: stockprice*2 – – – – operations on strings (e.g., “||” for concatenation) Lexicographic order on strings Pattern matching: s LIKE p Special stuff for comparing dates and times. 63 NULL Values • Tuples in SQL relations can have NULL as a value for one or more components • Meaning depends on context. Two common cases: – Missing value : e.g., we know Joe’s Bar has some address, but we don’t know what it is – Inapplicable : e.g., the value of attribute spouse for an unmarried person • The logic of conditions in SQL is really 3-valued logic: TRUE, FALSE, UNKNOWN – When any value is compared with NULL, the truth value is UNKNOWN – A query only produces a tuple in the answer if its value for the WHERE clause is TRUE (not FALSE or UNKNOWN) 64 Three-Valued Logic • To understand how AND, OR, and NOT work in 3-valued logic, think of TRUE = 1, FALSE = 0, and UNKNOWN = ½, AND = MIN; OR = MAX, NOT(x) = 1-x. • Example: TRUE AND (FALSE OR NOT(UNKNOWN)) = MIN(1, MAX(0, (1 - ½ ))) = MIN(1, MAX(0, ½ ) = MIN(1, ½ ) =½ 65 Surprising Example • From the following Sells relation: bar beer Price Joe’s Bud NULL SELECT bar FROM Sells WHERE price < 2.00 OR price >= 2.00; UNKNOWN UNKNOWN UNKNOWN 66 Multi-relation Queries • Interesting queries often combine data from more than one relation, we can address several relations in one query by listing them all in the FROM clause. – Distinguish attributes of the same name by “<relation>.<attribute>” – Example: Using relations Likes(drinker, beer) and Frequents(drinker, bar), find the beers liked by at least one person who frequents Joe’s Bar. SELECT Likes.beer FROM Likes, Frequents WHERE Frequents.bar = ‘Joe Bar’ AND Frequents.drinker = Likes.drinker; 67 Semantics • Almost the same as for single-relation queries: 1. Start with the (Cartesian) product of all the relations in the FROM clause 2. Apply the selection condition from the WHERE clause 3. Project onto the list of attributes and expressions in the SELECT clause SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions Translation to Relational algebra: Πa1,…,ak (s Conditions (R1 x R2 x … x Rn)) Select-From-Where queries are precisely Select-Project-Join 68 Semantics SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions Answer = {} for x1 in R1 do for x2 in R2 do ….. for xn in Rn do if Conditions then Answer = Answer U {(a1,…,ak) return Answer 69 Explicit Tuple-Variables • Sometimes, a query needs to use two copies of the same relation – Distinguish copies by following the relation name by the name of a tuple-variable, in the FROM clause – It’s always an option to rename relations this way, even when not essential SELECT s1.bar FROM Sells s1, Sells s2 WHERE s1.beer = s2.beer AND s1.price < s2.price; 70 SubQueries • A parenthesized SELECT-FROM-WHERE statement (subquery) can be used as a value in a number of places, including FROM and WHERE clauses – Example: in place of a relation in the FROM clause, we can place another query, and then query its result • Better use a tuple-variable to name tuples of the result • Subqueries that return Scalar – If a subquery is guaranteed to produce one tuple with one component, then the subquery can be used as a value • “Single” tuple often guaranteed by key constraint • A run-time error occurs if there is no tuple or more than one tuple 71 Example • From Sells(bar, beer, price), find the bars that serve Miller for the same price Joe charges for Bud – Two queries would surely work: 1. Find the price Joe charges for Bud 2. Find the bars that serve Miller at that price SELECT bar FROM Sells WHERE beer = ‘Miller’ AND price = (SELECT price FROM Sells WHERE bar = ‘Joe Bar’ AND beer = ‘Bud’) 72 The IN Operator • <tuple> IN <relation> is true if and only if the tuple is a member of the relation – <tuple> NOT IN <relation> means the opposite – IN-expressions can appear in WHERE clauses – The <relation> is often a subquery Query: From Beers(name, manf) and Likes(drinker, beer), find the name and manufacturer of each beer that Fred likes SELECT * FROM Beers The set of beers WHERE name IN ( SELECT beer Fred likes FROM Likes WHERE drinker = ‘Fred’ ); 73 The Exists Operator • EXISTS( <relation> ) is true if and only if the <relation> is not empty – Being a Boolean-valued operator, EXISTS can appear in WHERE clauses Query: From Beers(name, manf), find those beers that are the only beer by their manufacturer Set of beers with the same manf as b1, but not the same beer SELECT name Scope rule: manf refers to closest nested FROM with FROM Beers b1 a relation having that attribute. WHERE NOT EXISTS( SELECT * FROM Beers WHERE manf = b1.manf AND name <> b1.name); 74 The Operator ANY • x = ANY( <relation> ) is a Boolean condition meaning that x equals at least one tuple in the relation • Similarly, = can be replaced by any of the comparison operators – Example: x >= ANY( <relation> ) means x is not smaller than some tuples in the relation – Note tuples must have one component only 75 The Operator ALL • x <> ALL( <relation> ) is true if and only if for every tuple t in the relation, x is not equal to t – That is, x is not a member of the relation. • The <> can be replaced by any comparison operator – Example: x >= ALL( <relation> ) means there is no tuple larger than x in the relation Query: From Sells(bar, beer, price), find the beer(s) sold for the highest price SELECT beer FROM Sells WHERE price >= ALL( SELECT price FROM Sells); price from the outer Sells must not be less than any price 76 Bag (Set) Semantics for SFW Queries • The SELECT-FROM-WHERE statement uses bag semantics – Selection: preserve the number of occurrences – Projection: preserve the number of occurrences (no duplicate elimination) – Cartesian product, join: no duplicate elimination • The default for union, intersection, and difference is set semantics, and is expressed by the following forms, each involving subqueries: – ( subquery ) UNION ( subquery ) – ( subquery ) INTERSECT ( subquery ) – ( subquery ) EXCEPT ( subquery ) 77 Example • Happy Drinker: From relations Likes(drinker, beer), Sells(bar, beer, price) and Frequents(drinker, bar), find the drinkers and beers such that: 1. The drinker likes the beer, and 2. The drinker frequents at least one bar that sells the beer (SELECT * FROM Likes) INTERSECT (SELECT drinker, beer FROM Sells, Frequents WHERE Frequents.bar = Sells.bar ); The drinker frequents a bar that sells the beer 78 Set vs. Bag: Efficiency • When doing projection in relational algebra, it is easier to avoid eliminating duplicates – Just work tuple-at-a-time • When doing intersection or difference, it is most efficient to sort the relations first – At that point you may as well eliminate the duplicates anyway 79 Controlling Duplicate Elimination • Force the result to be a set by SELECT DISTINCT – From Sells(bar, beer, price), find all the different prices charged for beers: SELECT DISTINCT price FROM Sells; • Force the result to be a bag (i.e., don’t eliminate duplicates) by ALL, as in . . . UNION ALL . . . – Lists drinkers who frequent more bars than they like beers, and does so as many times as the difference of those counts (SELECT drinker FROM Frequents) EXCEPT ALL (SELECT drinker FROM Likes); 80 Aggregations • SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column – e.g. COUNT(*) counts the number of tuples • Query: From Sells(bar, beer, price), find the average price of Bud SELECT AVG(price) FROM Sells WHERE beer = ‘Bud’ 81 Group By • We may follow a SELECT-FROM-WHERE expression by GROUP BY and a list of attributes – The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group • Query: From Sells(bar, beer, price), find the average price for each beer: SELECT beer, AVG(price) FROM Sells GROUP BY beer 82 Example • Query: From Sells(bar, beer, price) and Frequents (drinker, bar), find for each drinker the average price of Bud at the bars they frequent: SELECT drinker, AVG(price) FROM Frequents, Sells WHERE beer = ‘Bud’ AND Compute drinker-barprice of Bud tuples first, then group by drinker Frequents.bar = Sells.bar GROUP BY drinker; 83 Restriction on SELECT Lists With Aggregation • If any aggregation is used, then each element of the SELECT list must be either: 1. 2. • Aggregated, or An attribute on the GROUP BY list Question: How about this query? SELECT bar, MIN(price) FROM Sells WHERE beer = ‘Bud’; 84 Having Clause • HAVING <condition> may follow a GROUP BY clause. If so, the condition applies to each group, and groups not satisfying the condition are eliminated – These conditions may refer to any relation or tuple-variable in the FROM clause – They may refer to attributes of those relations, as long as the attribute makes sense within a group; i.e., it is either: 1. A grouping attribute, or 2. Aggregated 85 Having Clause: Example SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer = ‘michelob’; 86 General form of Grouping and Aggregation SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 S = may contain attributes a1,…,ak and/or any aggregates but NO OTHER ATTRIBUTES C1 = is any condition on the attributes in R1,…,Rn C2 = is any condition on aggregate expressions or grouping attributes 87 General form of Grouping and Aggregation SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 Evaluation steps: 1. Compute the FROM-WHERE part, obtain a table with all attributes in R1,…,Rn 2. Group by the attributes a1,…,ak 3. Compute the aggregates in C2 and keep only groups satisfying C2 4. Compute aggregates in S and return the result 88 Modifications • A modification command does not return a result as a query does, but it changes the database in some way • There are three kinds of modifications: 1. Insert a tuple or tuples 2. Delete a tuple or tuples 3. Update the value(s) of an existing tuple or tuples 89 Insertion • To insert a single tuple: INSERT INTO <relation> VALUES ( <list of values> ); • Example: add to Likes(drinker, beer) the fact that Sally likes Bud: INSERT INTO Likes VALUES(‘Sally’, ‘Bud’); 90 Specifying Attributes in INSERT • We may add to the relation name a list of attributes • There are two reasons to do so: 1. We forget the standard order of attributes for the relation 2. We don’t have values for all attributes, and we want the system to fill in missing components with NULL or a default value • Another way to add the fact that Sally likes Bud to Likes(drinker, beer): INSERT INTO Likes(beer, drinker) VALUES(‘Bud’, ‘Sally’); 91 Inserting Many Tuples • We may insert the entire result of a query into a relation, using the form: INSERT INTO <relation> ( <subquery> ); E.g., INSERT INTO Beers(name) SELECT beer from Sells; 92 Example: Insert a Subquery • Using Frequents(drinker, bar), enter into the new relation PotBuddies (name) all of Sally’s “potential buddies,” i.e., those drinkers who frequent at least one bar that Sally also frequents The other INSERT INTO PotBuddies drinker (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = ‘Sally’ AND d2.drinker <> ‘Sally’ AND d1.bar = d2.bar ); Pairs of Drinker tuples where the first is for Sally, the second is for someone else, and the bars are the same 93 Deletion • To delete tuples satisfying a condition from some relation: DELETE FROM <relation> WHERE <condition>; • Example: Delete from Likes(drinker, beer) the fact that Sally likes Bud: DELETE FROM Likes WHERE drinker = ‘Sally’ AND beer = ‘Bud’; 94 Delete all Tuples • Make the relation Likes empty: DELETE FROM Likes; • Note no WHERE clause needed 95 Delete Many Tuples • Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer. Beers with the same manufacturer DELETE FROM Beers b and a different name from the name of the beer represented by tuple b WHERE EXISTS ( SELECT name FROM Beers a WHERE a.manf = b.manf AND a.name <> b.name ); 96 Semantics of Deletion • Suppose Busch makes only Bud and Bud Lite, and suppose we come to the tuple b for Bud first – The subquery is nonempty, because of the Bud Lite tuple, so we delete Bud – Now, When b is the tuple for Bud Lite, do we delete that tuple too? • The answer is that we do delete Bud Lite as well. The reason is that deletion proceeds in two stages: 1. 2. Mark all tuples for which the WHERE condition is satisfied in the original relation Delete the marked tuples 97 Updates • To change certain attributes in certain tuples of a relation: UPDATE <relation> SET <list of attribute assignments> WHERE <condition on tuples>; • Example: Change drinker Fred’s phone number to 5551212: UPDATE Drinkers SET phone = ‘555-1212’ WHERE name = ‘Fred’; 98 Update Several Tuples • Increase price that is cheap: UPDATE Sells SET price = price * 1.07 WHERE price < 3.0; 99 Views • A view is a “virtual table”, a relation that is defined in terms of the contents of other tables and views – Declare by: CREATE VIEW <name> AS <query>; • In contrast, a relation whose value is really stored in the database is called a base table 100 Example: View Definition • CanDrink (drinker, beer) is a view “containing” the drinker-beer pairs such that the drinker frequents at least one bar that serves the beer: CREATE VIEW CanDrink AS SELECT drinker, beer FROM Frequents, Sells WHERE Frequents.bar = Sells.bar; 101 Example: Accessing a View • You may query a view as if it were a base table – There is a limited ability to modify views if the modification makes sense as a modification of the underlying base table • Example: SELECT beer FROM CanDrink WHERE drinker = ‘Sally’; 102 What Happens When a View Is Used? • The DBMS starts by interpreting the query as if the view were a base table – Typical DBMS turns the query into something like relational algebra • The queries defining any views used by the query are also replaced by their algebraic equivalents, and “spliced into” the expression tree for the query 103 Example: View Expansion PROJbeer SELECTdrinker=‘Sally’ CanDrink PROJdrinker, beer JOINFrequents.bar Frequents = Sells.bar Sells 104 Have fun! Tallahassee, Florida, 2016