CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Today… Integrity Constraints Relational Database Design Referential Integrity Constraints Idea: prevent “dangling tuples” (e.g.: a loan with a bname, Kenmore, when no Kenmore tuple in branch) Referencing Relation (e.g. loan) “foreign key” bname Referenced Relation (e.g. branch) primary key bname Ref Integrity: ensure that: foreign key value primary key value (note: don’t need to ensure , i.e., not all branches have to have loans) Referential Integrity Constraints bname Referencing Relation (e.g. loan) x x bname x Referenced Relation (e.g. branch) In SQL: CREATE TABLE branch( bname CHAR(15) PRIMARY KEY ....) CREATE TABLE loan ( ......... FOREIGN KEY bname REFERENCES branch); Affects: 1) Insertions, updates of referencing relation 2) Deletions, updates of referenced relation Referential Integrity Constraints c ti x tj x c x A B what happens when we try to delete this tuple? Ans: 3 possibilities 1) reject deletion/ update 2) set ti [c], tj[c] = NULL 3) propagate deletion/update DELETE: delete ti, tj UPDATE: set ti[c], tj[c] to updated values Referential Integrity Constraints c ti x tj x A c x B what happens when we try to delete this tuple? CREATE TABLE A ( ..... FOREIGN KEY c REFERENCES B action .......... ) Action: 1) left blank (deletion/update rejected) 2) ON DELETE SET NULL/ ON UPDATE SET NULL sets ti[c] = NULL, tj[c] = NULL 3) ON DELETE CASCADE deletes ti, tj ON UPDATE CASCADE sets ti[c], tj[c] to new key values Global Constraints Idea: two kinds 1) single relation (constraints spans multiple columns) E.g.: CHECK (total = svngs + check) declared in the CREATE TABLE 2) multiple relations: CREATE ASSERTION SQL examples: 1) single relation: All Bkln branches must have assets > 5M CREATE TABLE branch ( .......... bcity CHAR(15), assets INT, CHECK (NOT(bcity = ‘Bkln’) OR assets > 5M)) Affects: insertions into branch updates of bcity or assets in branch Global Constraints SQL example: 2) Multiple relations: every loan has a borrower with a savings account CHECK (NOT EXISTS ( SELECT * FROM loan AS L WHERE NOT EXISTS( SELECT * FROM borrower B, depositor D, account A WHERE B.cname = D.cname AND D.acct_no = A.acct_no AND L.lno = B.lno))) Problem: Where to put this constraint? At depositor? Loan? .... Ans: None of the above: CREATE ASSERTION loan-constraint CHECK( ..... ) Checked with EVERY DB update! very expensive..... Summary: Integrity Constraints Constraint Type Where declared Affects... Expense Key Constraints CREATE TABLE Insertions, Updates Moderate Insertions, Updates Cheap 1.Insertions into referencing rel’n 1,2: like key constraints. Another reason to index/sort on the primary keys (PRIMARY KEY, UNIQUE) Attribute Constraints CREATE TABLE CREATE DOMAIN Referential Integrity (Not NULL, CHECK) Table Tag (FOREIGN KEY .... REFERENCES ....) Global Constraints Table Tag (CHECK) or outside table (CREATE ASSERTION) 2. Updates of referencing rel’n of relevant attrs 3,4: depends on 3. Deletions from referenced rel’n a. update/delete policy chosen 4. Update of referenced rel’n b. existence of indexes on foreign key 1. For single rel’n constraint, with insertion, deletion of relevant attrs 1. cheap 2. For assesrtions w/ every db modification 2. very expensive SQL Is that it ? Unfortunately No SQL 3 standard is several hundreds of pages (if not several thousands) And expensive too.. We will discuss a few more constructs along the way E.g. Embedded SQL, creating indexes etc Again, this is what the reference books are for; you just need to know where to look in the reference book Questions ? Next: Relational Database Design Relational Database Design Where did we come up with the schema that we used ? E.g. why not store the actor names with movies ? Topics: Formal definition of what it means to be a “good” schema. How to achieve it. Movies Database Schema Movie(title, year, length, inColor, studioName, producerC#) StarsIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) Changed to: Movie(title, year, length, inColor, studioName, producerC#, starName) <StarsIn merged into above> MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) Movie(title, year, length, inColor, studioName, producerC#, starName) Title Year Length inColor StudioName prodC# StarName Star wars 1977 121 Yes Fox 128 Hamill Star wars 1977 121 Yes Fox 128 Fisher Star wars 1977 121 Yes Fox 128 H. Ford King Kong 2005 187 Yes Universal 150 Watts King Kong 1933 100 no RKO 20 Fay Issues: 1. Redundancy higher storage, inconsistencies (“anomalies”) 2. Need nulls Unable to represent some information without using nulls How to store movies w/o actors (pre-productions etc) ? Movie(title, year, length, inColor, studioName, producerC#, starNames) Title Year Length inColor StudioName prodC# StarNames Star wars 1977 121 Yes Fox 128 {Hamill, Fisher, H. ford} King Kong 2005 187 Yes Universal 150 Watts King Kong 1933 100 no RKO 20 Fay Issues: 3. Avoid sets - Hard to represent - Hard to query Smaller schemas always good ???? Split Studio(name, address, presC#) into: Studio1 (name, presC#) Studio2(name, address)??? Name presC# Name Address Fox 101 Fox Address1 Studio2 101 Studio2 Address1 Universial 102 Universial Address2 This process is also called “decomposition” Issues: 4. Requires more joins (w/o any obvious benefits) 5. Hard to check for some dependencies What if the “address” is actually the presC#’s address ? No easy way to ensure that constraint (w/o a join). Smaller schemas always good ???? Decompose StarsIn(movieTitle, movieYear, starName) into: StarsIn1(movieTitle, movieYear) StarsIn2(movieTitle, starName) ??? movieTitle movieYear movieTitle starName Star wars 1977 Star Wars Hamill King Kong 1933 King Kong Watts King Kong 2005 King Kong Faye Issues: 6. “joining” them back results in more tuples than what we started with (King Kong, 1933, Watts) & (King Kong, 2005, Faye) This is a “lossy” decomposition We lost some constraints/information The previous example was a “lossless” decomposition. Desiredata No sets Correct and faithful to the original design Avoid lossy decompositions As little redundancy as possible To avoid potential anomalies No “inability to represent information” Nulls shouldn’t be required to store information Dependency preservation Should be possible to check for constraints Approach We will encode and list all our knowledge about the schema somehow Functional dependencies (FDs) SSN name (SSN “implies” length) If two tuples have the same “SSN”, they must have the same “name” movietitle length --- Not true. But, (movietitle, movieYear) length --- True. We will define a set of rules that the schema must follow to be considered good “Normal forms”: 1NF, 2NF, 3NF, BCNF, 4NF, … Rules specify constraints on the schemas and FDs