CMSC424: Database Design Lecture 4 CMSC424, Spring 2005 Review: Relational Data Model Key Abstraction: Relation Mathematical relations Given sets: R = {1, 2, 3}, S = {3, 4} • R S = { (1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4) } • A relation on R, S is any subset () of R S (e.g: { (1, 4), (3, 4)}) Database relations Given attribute domains Branches = Accounts = Balances = { Downtown, Brighton, … } { A-101, A-201, A-217, … } R Account Branches Accounts Balances { (Downtown, A-101, 500), (Brighton, A-201, 900), (Brighton, A-217, 500) } CMSC424, Spring 2005 bname acct_no balance Downtown Brighton Brighton A-101 A-201 A-217 500 900 500 Review: Terms and Definitions 1. 2. 3. 4. Tables = Relations Columns = Attributes Rows = Tuples Relation Schema (or Schema) A list of attributes and their domains We will require the domains to be atomic E.g. account(account-number, branch-name, balance) 5. Relation Instance A particular instantiation of a relation with actual values Will change with time CMSC424, Spring 2005 Bank Database: Schema Account bname acct_no Branch balance bname bcity assets Depositor cname Borrower acct_no cname lno Customer cname cstreet ccity Loan bname CMSC424, Spring 2005 lno amt Bank Database: An Instance Account Branch bname acct_no balance bname bcity assets Downtown Mianus Perry R.H. Brighton Redwood Brighton A-101 A-215 A-102 A-305 A-201 A-222 A-217 500 700 400 350 900 700 750 Downtown Redwood Perry Mianus R.H. Pownel N. Town Brighton Brooklyn Palo Alto Horseneck Horseneck Horseneck Bennington Rye Brooklyn 9M 2.1M 1.7M 0.4M 8M 0.3M 3.7M 7.1M Depositor cname acct_no Johnson Smith Hayes Turner Johnson Jones Lindsay A-101 A-215 A-102 A-305 A-201 A-217 A-222 Borrower Customer cname cstreet ccity Jones Smith Hayes Curry Lindsay Turner Williams Adams Johnson Glenn Brooks Green Main North Main North Park Putnam Nassau Spring Alma Sand Hill Senator Walnut Harrison Rye Harrison Rye Pittsfield Stanford Princeton Pittsfield Palo Alto Woodside Brooklyn Stanford cname lno Jones Smith Hayes Jackson Curry Smith Williams Adams L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 Loan CMSC424, Spring 2005 bname lno amt Downtown Redwood Perry Downtown Mianus R.H. Perry L-17 L-23 L-15 L-14 L-93 L-11 L-16 1000 2000 1500 1500 500 900 1300 Review: Keys and Relations As in the E/R Model: 1. Superkeys • set of attributes of table for which every row has distinct set of values 2. Candidate keys •“minimal” superkeys 3. Primary keys •DBA-chosen candidate keys Act as Integrity Constraints i.e., guard against illegal/invalid instance of given schema e.g., Branch = (bname, bcity, assets) bname bcity assets Brighton Brighton Brooklyn Boston 5M 3M CMSC424, Spring 2005 Invalid!! More on Keys Determining Primary Keys If relation schema derived from E-R diagrams, we can determine the primary keys using the original entity and relationship sets Otherwise, same way we do it for E-R diagrams • Find candidate keys (minimal sets of attributes that can uniquely identify a tuple) • Designate one of them to be primary key Foreign Keys If a relation schema includes the primary key of another relation schema, that attribute is called the foreign key CMSC424, Spring 2005 Schema Diagram for the Banking Enterprise CMSC424, Spring 2005 Relational Query Languages Recall: Query = “Retrieval Program” Language Examples: Theoretical: 1. Relational Algebra 2. Relational Calculus a. Tuple Relational Calculus (TRC) b. Domain Relational Calculus (DRC) Practical: 1. SQL (originally: SEQUEL from System R) 2. Quel (used in Ingres) 3. Datalog (Prolog-like – used in research lab systems) Theoretical QL’s give semantics to Practical QL’s CMSC424, Spring 2005 Relational Algebra Basic Operators 1. 2. 3. 4. 5. 6. select ( σ ) project ( p ) union ( ) set difference ( – ) cartesian product ( ) rename ( ρ ) Relation Relational Operator Relation CMSC424, Spring 2005 Relation Select ( σ ) Notation: σpredicate (Relation) Relation: Can be name of table, or another query Predicate: 1. Simple • attribute1 = attribute2 • attribute = constant value (also: ≠, <, >, ≤, ≥) 2. Complex • predicate AND predicate • predicate OR predicate • NOT predicate CMSC424, Spring 2005 Select ( σ ) Notation: σpredicate (Relation) Examples: σ bcity = “Brooklyn” (branch) = bname bcity Downtown Brooklyn Brighton Brooklyn assets 9M 7.1M σ assets > 8M (σ bcity = “Brooklyn” (branch)) = bname bcity Downtown Brooklyn CMSC424, Spring 2005 assets 9M Project ( p ) Notation: pA1, …, An (Relation) • Each Ai an attribute • Idea: p selects columns (vs. σ which selects rows) Examples: p cstreet, ccity (customer) = cstreet ccity Main North Park Putnam Nassau Spring Alma Sand Hill Senator Walnut Harrison Rye Pittsfield Stanford Princeton Pittsfield Palo Alto Woodside Brooklyn Stanford CMSC424, Spring 2005 Project ( p ) Notation: pA1, …, An (Relation) • Each Ai an attribute • Idea: p selects columns (vs. σ which selects rows) Examples: p bcity (σassets > 5M (branch)) = bcity Brooklyn Horseneck CMSC424, Spring 2005 Union ( ) Notation: Relation1 Relation2 R S valid only if: 1. 2. R, S have same number of columns (arity) R, S corresponding columns have same domain (compatibility) Example: (p cname (depositor)) (p cname (borrower)) = CMSC424, Spring 2005 cname Johnson Smith Hayes Turner Jones Lindsay Jackson Curry Williams Adams Set Difference ( – ) Notation: Relation1 - Relation2 R - S valid only if: 1. 2. R, S have same number of columns (arity) R, S corresponding columns have same domain (compatibility) Example: (p bname (σamount ≥ 1000 (loan))) – (p bname (σ balance < 800 (account))) = bname lno amount bname acct_no balance Downtown Redwood Perry Downtown Perry L-17 L-23 L-15 L-14 L-16 1000 2000 1500 1500 1300 Mianus Brighton Redwood Brighton A-215 A-201 A-222 A-217 700 900 700 750 CMSC424, Spring 2005 Set Difference ( – ) Notation: Relation1 - Relation2 R - S valid only if: 1. 2. R, S have same number of columns (arity) R, S corresponding columns have same domain (compatibility) Example: (p bname (σamount ≥ 1000 (loan))) – (p bname (σ balance < 800 (account))) = bname lno amount bname acct_no balance Downtown Redwood Perry Downtown Perry L-17 L-23 L-15 L-14 L-16 1000 – 2000 1500 1500 1300 Mianus Brighton Redwood Brighton A-215 A-201 = A-222 A-217 700 900 700 750 CMSC424, Spring 2005 bname Downtown Perry Cartesian Product ( ) Notation: Relation1 Relation2 R S like cross product for mathematical relations: • every tuple of R appended to every tuple of S Example: depositor borrower = How many tuples in the result? A: 56 depositor. cname acct_no borrower. cname lno Johnson Johnson Johnson Johnson Johnson Johnson Johnson Johnson Smith … A-101 A-101 A-101 A-101 A-101 A-101 A-101 A-101 A-215 … Jones Smith Hayes Jackson Curry Smith Williams Adams Jones … L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 L-17 … CMSC424, Spring 2005 Rename ( ρ ) Notation: r identifier (Relation) renames a relation, or Notation: r identifier0 (identifier1, …, identifiern) (Relation) renames relation and columns of n-column relation Use: massage relations to make , – valid, or more readable CMSC424, Spring 2005 Rename ( ρ ) Notation: r identifier0 (identifier1, …, identifiern) (Relation) renames relation and columns of n-column relation Example: r res (dcname, acctno, bcname, lno) (depositor borrower) = depositor. cname acct_no borrower. cname lno Johnson Johnson Johnson Johnson Johnson Johnson Johnson Johnson Smith … A-101 A-101 A-101 A-101 A-101 A-101 A-101 A-101 A-215 … Jones Smith Hayes Jackson Curry Smith Williams Adams Jones … L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 L-17 … CMSC424, Spring 2005 Rename ( ρ ) Notation: r identifier0 (identifier1, …, identifiern) (Relation) renames relation and columns of n-column relation Example: r res (dcname, acctno, bcname, lno) (depositor borrower) = res = dcname acctno bcname lno Johnson Johnson Johnson Johnson Johnson Johnson Johnson Johnson Smith … A-101 A-101 A-101 A-101 A-101 A-101 A-101 A-101 A-215 … Jones Smith Hayes Jackson Curry Smith Williams Adams Jones … L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 L-17 … CMSC424, Spring 2005 Example Query in RA Determine lno’s for loans that are for an amount that is larger than the amt of some other loan. (i.e. lno’s for all non-minimal loans) Can do in steps: Temp1 … Temp2 … Temp1 … … CMSC424, Spring 2005 Bank Database: An Instance Account Branch bname acct_no balance bname bcity assets Downtown Mianus Perry R.H. Brighton Redwood Brighton A-101 A-215 A-102 A-305 A-201 A-222 A-217 500 700 400 350 900 700 750 Downtown Redwood Perry Mianus R.H. Pownel N. Town Brighton Brooklyn Palo Alto Horseneck Horseneck Horseneck Bennington Rye Brooklyn 9M 2.1M 1.7M 0.4M 8M 0.3M 3.7M 7.1M Depositor cname acct_no Johnson Smith Hayes Turner Johnson Jones Lindsay A-101 A-215 A-102 A-305 A-201 A-217 A-222 Borrower Customer cname cstreet ccity Jones Smith Hayes Curry Lindsay Turner Williams Adams Johnson Glenn Brooks Green Main North Main North Park Putnam Nassau Spring Alma Sand Hill Senator Walnut Harrison Rye Harrison Rye Pittsfield Stanford Princeton Pittsfield Palo Alto Woodside Brooklyn Stanford cname lno Jones Smith Hayes Jackson Curry Smith Williams Adams L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 Loan CMSC424, Spring 2005 bname lno amt Downtown Redwood Perry Downtown Mianus R.H. Perry L-17 L-23 L-15 L-14 L-93 L-11 L-16 1000 2000 1500 1500 500 900 1300 Example Query in RA 1. Find the base data we need Temp1 p lno,amt (loan) lno amt L-17 L-23 L-15 L-14 L-93 L-11 L-16 1000 2000 1500 1500 500 900 1300 2. Make a copy of (1) Temp2 ρ Temp2 (lno2,amt2) (Temp1) CMSC424, Spring 2005 lno2 amt2 L-17 L-23 L-15 L-14 L-93 L-11 L-16 1000 2000 1500 1500 500 900 1300 Example Query in RA 3. Take the cartesian product of 1 and 2 Temp3 Temp1 Temp2 lno amt lno2 amt2 L-17 L-17 … L-17 L-23 L-23 … L-23 … 1000 1000 … 1000 2000 2000 … 2000 … L-17 L-23 … L-16 L-17 L-23 … L-16 … 1000 2000 … 1300 1000 2000 … 1300 … CMSC424, Spring 2005 Example Query in RA 4. Select non-minimal loans Temp4 σamt > amt2 (Temp3) 5. Project on lno Result p lno (Temp4) … or, if you prefer… p lno ( σamt > amt2 (p lno,amt (loan) (ρTemp2 (lno2,amt2) (p lno,amt (loan))))) CMSC424, Spring 2005 What we learned so far… Relational Algebra Operators 1. Select 2. Project 3. Set Union 4. Set Difference 5. Cartesian Product 6. Rename These are called fundamental operations CMSC424, Spring 2005 Formal Definition Basic expression A relation in the database A constant relation e.g. {(A-101, Downtown, 500), (A-215, Mianus, 700)…} Let E1 and E2 be two relational-algebra expressions, then the following are also: 1. 2. 3. 4. 5. 6. σP(E1), where P is a predicate on attributes in E1 pS(E1), where S is a list containing some attributes in E1 E1 E2, E1 – E2 E1 E2 ρx(E1), where x is the new name for the result of E1 CMSC424, Spring 2005 Relational Algebra Redundant Operators 1. Natural Join ( ) 2. Division ( ) 3. Outer Joins ( ) 4. Update ( ) (we’ve already been using) • Redundant: Above can be expressed in terms of minimal RA e.g. depositor borrower = π …(σ…(depositor ρ…(borrower))) • Added as convenience CMSC424, Spring 2005 Natural Join Notation: Relation1 Relation2 Idea: combines ρ, , σ A B C D E B D A B C D E α α α β 10 10 20 10 ‘a’ ‘a’ ‘b’ ‘c’ α α β β 10 20 10 10 1 2 2 3 3 α α α β β 10 10 20 10 10 ‘a’ ‘a’ ‘a’ ‘b’ ‘c’ 1 2 2 3 + + r = s depositor + + + borrower ≡ πcname,acct_no,lno (σcname=cname2 (depositor ρt(cname2,lno) (borrower))) CMSC424, Spring 2005 Division Notation: Relation1 Relation2 Idea: expresses “for all” queries A B α α α β γ γ γ γ δ δ 1 2 3 1 1 3 4 6 1 2 r B 1 2 A = α δ s Query: Find values for A in r which have corresponding B values for all B values in s CMSC424, Spring 2005 Division Another way to look at it: and 173 = 5 The largest value of i such that: i Relational Division 3 ≤ 17 A B α α α β γ γ γ γ δ δ 1 2 3 1 1 3 4 6 1 2 r B 1 2 A = s α δ t The largest value of t such that: (tsr) CMSC424, Spring 2005 Division A More Complex Example A B C D E α α α β β γ γ γ a a a a a a a a α γ γ γ γ γ γ β a a b a b a b b 1 1 1 1 3 1 1 1 D E a 1 b 1 s r CMSC424, Spring 2005 = A B C α γ ?a a t γ γ