CMSC424: Database Design Lecture 5 CMSC424, Spring 2005 Review: Relational Algebra Relational Algebra Operators 1. Select () 2. Project () 3. Set Union (U) 4. Set Difference (-) 5. Cartesian Product () 6. Rename () These are called fundamental operations CMSC424, Spring 2005 Relational Algebra Redundant Operators 1. Natural Join ( ) 2. Division ( ) 3. Outer Joins ( ) 4. Update ( ) CMSC424, Spring 2005 Natural Join Notation: Relation1 Relation2 Idea: match tuples on common attributes A B C D E B D A B C D E α α α β 10 10 20 10 ‘a’ ‘a’ ‘b’ ‘c’ α α β β 10 20 10 10 1 2 2 3 3 α α α β β 10 10 20 10 10 ‘a’ ‘a’ ‘a’ ‘b’ ‘c’ 1 2 2 3 + + r = s CMSC424, Spring 2005 + + + Division Notation: Relation1 Relation2 Idea: expresses “for all” queries Query: Find customers who have accounts in all branches in Brooklyn r1 all branches in Brooklyn r2 associate customers with branches they have accounts in Now what ? Use the division operator CMSC424, Spring 2005 Outer Joins Motivation: loan = bname lno Downtown Redwood Perry L-170 L-230 L-260 amt cname lno 3000 borrower = 4000 1700 Jones Smith Hayes L-170 L-230 L-155 = loan borrower = bname lno amt cname Downtown Redwood L-170 L-230 3000 4000 Jones Smith Join result loses… any record of Perry any record of Hayes CMSC424, Spring 2005 Outer Joins loan = bname lno amt Downtown Redwood Perry L-170 L-230 L-260 3000 4000 1700 borrower = cname lno Jones Smith Hayes L-170 L-230 L-155 1. Left Outer Join ( ) • preserves all tuples in left relation loan borrower bname Downtown Redwood Perry = lno amt cname L-170 3000 L-230 4000 L-260 1700 Jones Smith ┴ CMSC424, Spring 2005 ┴ = NULL Outer Joins loan = bname lno amt Downtown Redwood Perry L-170 L-230 L-260 3000 4000 1700 borrower = cname lno Jones Smith Hayes L-170 L-230 L-155 2. Right Outer Join ( ) • preserves all tuples in right relation loan borrower = bname lno amt cname Downtown Redwood ┴ L-170 L-230 L-155 3000 4000 ┴ Jones Smith Hayes CMSC424, Spring 2005 ┴ = NULL Outer Joins loan = bname lno amt Downtown Redwood Perry L-170 L-230 L-260 3000 4000 1700 borrower = cname lno Jones Smith Hayes L-170 L-230 L-155 3. Full Outer Join ( ) • preserves all tuples in both relations loan borrower = bname lno amt cname Downtown Redwood Perry ┴ L-170 L-230 L-260 L-155 3000 4000 1700 ┴ Jones Smith ┴ Hayes CMSC424, Spring 2005 ┴ = NULL Update Notation: Identifier Query Common Uses: 1. Deletion: r r – s e.g., account account – σbname=Perry (account) (deletes all Perry accounts) 2. Insertion: r r s e.g., branch branch {(Waltham, Boston, 7M)} (inserts new branch with bname = Waltham, bcity = Boston, assets = 7M) 3. Update: r πe1,…,en (r) e.g., account πbname,acct_no,bal*1.05 (account) (adds 5% interest to account balances) CMSC424, Spring 2005 Extended Relational Algebra 1. Generalized projection 2. Aggregates CMSC424, Spring 2005 Generalized Projection Notation: e ,…,e (Relation) 1 n e1,…,en can include arithmetic expressions – not just attributes Example credit = cname limit balance Jones Turner 5000 3000 2000 2500 Then… π cname, limit - balance (credit) = cname limit-balance Jones Turner 3000 500 CMSC424, Spring 2005 Generalized Projection Notation: e ,…,e (Relation) 1 n e1,…,en can include arithmetic expressions – not just attributes Example credit = cname limit balance Jones Turner 5000 3000 2000 2500 Then… π cname, limit - balance as limitbalance (credit) = CMSC424, Spring 2005 cname limitbalance Jones Turner 3000 500 Aggregate Functions and Operations Aggregation function takes a collection of values and returns a single value as a result. avg: average value min: minimum value max: maximum value sum: sum of values count: number of values CMSC424, Spring 2005 Aggregate Operation – Example Relation r: g sum(c) as sumC (r) A B C 7 7 3 10 sum-C 27 CMSC424, Spring 2005 Aggregate Functions and Operations General form: G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E) E is any relational-algebra expression G1, G2 …, Gn is a list of attributes on which to group (can be empty) Each Fi is an aggregate function Each Ai is an attribute name CMSC424, Spring 2005 Aggregate Operation – Example Relation account grouped by branch-name: branch-name account-number Perryridge Perryridge Brighton Brighton Redwood branch-name g balance A-102 A-201 A-217 A-215 A-222 sum(balance) 400 900 750 750 700 (account) branch-name Perryridge Brighton Redwood CMSC424, Spring 2005 balance 1300 1500 700 Other Theoretical Languages Relational Calculus Non-procedural Tuple relational calculus Examples Safety Domain relational calculus CMSC424, Spring 2005 Review: Query Languages Relational Algebra Theoretical Use Practical Use Formal semantics of practical QL’s Internal query representation for query optimizers Foundation for SQL TRC Language Expressivity SQL-92 - Standard for Relational DB Query Languages SQL-99 - Standard for Object-Relational DB Query Languages OQL - Standard for Object-Oriented DB Query Languages XQuery - Standard for XML-based DB Query Languages CMSC424, Spring 2005 SQL - Introduction Standard DML/DDL for relational DB’s • DML = Data Manipulation Language (queries, updates) • DDL = Data Definition Language (create tables, indexes, …) Also includes • • • • View definition Security (Authorization) Integrity constraints Transactions History • Early 70’s, IBM system R project (SEQUEL) • Later, become standard (Structured Query Language) CMSC424, Spring 2005 SQL: Basic Structure SELECT FROM WHERE P A1, ….., An r1, ….., rm Equivalent to: A1,A2,…,An (σP (r1 … rn )) CMSC424, Spring 2005 A Simple SELECT-FROM-WHERE Query SELECT FROM WHERE Similar to bname loan amt > 1000 bname ( amt > 1000 (loan) But not quite bname Redwood Perry Downtown Perry ) Why preserve duplicates? Can instead write : SELECT DISTINCT bname FROM loan WHERE amt > 1000 Duplicates are retained (i.e., result not a set) (removes duplicates from result) We will discuss bag algebra a bit later CMSC424, Spring 2005 Another SELECT-FROM-WHERE Query SELECT FROM WHERE Similar to cname, balance depositor, account depositor.acct_no = account.acct_no cname, Returns: cname Johnson Smith Hayes Turner Johnson Jones Lindsay balance (depositor account ) Note: balance 500 700 400 350 900 750 700 Can also write SELECT FROM WHERE d.cname, a.balance depositor as d, account as a d.acct_no = a.acct_no (neccessary for self-joins) CMSC424, Spring 2005 The SELECT Clause • Equivalent to (generalized) projection, despite name • Can use ‘*’ to get all attributes e.g: SELECT FROM * loan • Can write SELECT DISTINCT to eliminate duplicates • Can write SELECT ALL to preserve duplicates (default) • Can include arithmetic expressions e.g: SELECT FROM bname, acct_no, balance*1.05 account CMSC424, Spring 2005 The FROM Clause • Equivalent to cartesian product () (or , depending on WHERE clause) • Binds tuples in relations to variable names e.g: FROM borrower, loan • Computes borrower loan • Identifies borrower, loan columns in result, allowing one to write WHERE borrower.lno = loan.lno e.g: FROM borrower as b, loan as l allows one to write WHERE b.lno = l.lno CMSC424, Spring 2005 The WHERE Clause • Equivalent to Selection, despite name • WHERE predicate can be: 1. Simple attribute relop attribute (or constant) (relop: =, <>, <, >, <=, >=) 2. Complex (using AND, OR, NOT, BETWEEN) e.g: SELECT FROM WHERE lno loan amt BETWEEN 90000 AND 100000 is the same as… SELECT FROM WHERE lno loan amt >= 90000 AND amt <= 100000 CMSC424, Spring 2005 Data Definition Language Allows specification of relation schema as well as: Attribute domains Integrity constraints Security and authorization information Creation of Indexes … CMSC424, Spring 2005 DDL CREATE TABLE branch (branch-name char(15) not null, branch-city char(30), assets integer, primary key (branch-name), check (assets >= 0)) DROP TABLE branch ALTER TABLE branch ADD zipcode integer CMSC424, Spring 2005