CMSC424: Database Design Instructor: Amol Deshpande

advertisement
CMSC424: Database
Design
Instructor: Amol Deshpande
amol@cs.umd.edu
Today…


Integrity Constraints
Relational Database Design
Referential Integrity Constraints
Idea: prevent “dangling tuples” (e.g.: a loan with a
bname, Kenmore, when no Kenmore tuple in branch)
Referencing
Relation
(e.g. loan)
“foreign key”
bname
Referenced
Relation
(e.g. branch)
primary key
bname
Ref Integrity:
ensure that:
foreign key value 
primary key value
(note: don’t need to ensure , i.e., not all branches have to have loans)
Referential Integrity Constraints
bname
Referencing
Relation
(e.g. loan)
x
x
bname
x
Referenced
Relation
(e.g. branch)
In SQL:
CREATE TABLE branch(
bname CHAR(15) PRIMARY KEY
....)
CREATE TABLE loan (
.........
FOREIGN KEY bname REFERENCES branch);
Affects:
1) Insertions, updates of referencing relation
2) Deletions, updates of referenced relation
Referential Integrity Constraints
c
ti
x
tj
x
c
x
A
B
what happens when
we try to delete
this tuple?
Ans: 3 possibilities
1) reject deletion/ update
2) set
ti [c], tj[c] = NULL
3) propagate deletion/update
DELETE: delete ti, tj
UPDATE: set ti[c], tj[c] to updated values
Referential Integrity Constraints
c
ti
x
tj
x
A
c
x
B
what happens when
we try to delete
this tuple?
CREATE TABLE A ( .....
FOREIGN KEY c REFERENCES B action
.......... )
Action:
1) left blank (deletion/update rejected)
2) ON DELETE SET NULL/ ON UPDATE SET NULL
sets ti[c] = NULL, tj[c] = NULL
3) ON DELETE CASCADE
deletes ti, tj
ON UPDATE CASCADE
sets ti[c], tj[c] to new key values
Global Constraints
Idea: two kinds
1) single relation (constraints spans multiple columns)
E.g.: CHECK (total = svngs + check) declared in the CREATE TABLE
2) multiple relations: CREATE ASSERTION
SQL examples:
1) single relation: All Bkln branches must have assets > 5M
CREATE TABLE branch (
..........
bcity CHAR(15),
assets INT,
CHECK (NOT(bcity = ‘Bkln’) OR assets > 5M))
Affects:
insertions into branch
updates of bcity or assets in branch
Global Constraints
SQL example:
2) Multiple relations: every loan has a borrower with a savings account
CHECK (NOT EXISTS (
SELECT *
FROM loan AS L
WHERE NOT EXISTS(
SELECT *
FROM borrower B, depositor D, account A
WHERE B.cname = D.cname AND
D.acct_no = A.acct_no AND
L.lno = B.lno)))
Problem: Where to put this constraint? At depositor? Loan? ....
Ans: None of the above:
CREATE ASSERTION loan-constraint
CHECK( ..... )
Checked with EVERY DB update!
very expensive.....
Summary: Integrity Constraints
Constraint Type
Where declared
Affects...
Expense
Key Constraints
CREATE TABLE
Insertions, Updates
Moderate
Insertions, Updates
Cheap
1.Insertions into
referencing rel’n
1,2: like key constraints.
Another reason to
index/sort on the primary
keys
(PRIMARY KEY, UNIQUE)
Attribute Constraints
CREATE TABLE
CREATE DOMAIN
Referential Integrity
(Not NULL, CHECK)
Table Tag
(FOREIGN KEY ....
REFERENCES ....)
Global Constraints
Table Tag (CHECK)
or
outside table
(CREATE ASSERTION)
2. Updates of
referencing rel’n of
relevant attrs
3,4: depends on
3. Deletions from
referenced rel’n
a. update/delete policy
chosen
4. Update of
referenced rel’n
b. existence of indexes
on foreign key
1. For single rel’n
constraint, with
insertion, deletion of
relevant attrs
1. cheap
2. For assesrtions w/
every db modification
2. very expensive
SQL
Is that it ?
Unfortunately No
SQL 3 standard is several hundreds of pages (if not several
thousands)
And expensive too..
We will discuss a few more constructs along the way
E.g. Embedded SQL, creating indexes etc
Again, this is what the reference books are for; you just need
to know where to look in the reference book
Questions ?
Next:
Relational Database Design
Relational Database Design
Where did we come up with the schema that we
used ?
E.g. why not store the actor names with movies ?
Topics:
Formal definition of what it means to be a “good”
schema.
How to achieve it.
Movies Database Schema
Movie(title, year, length, inColor, studioName, producerC#)
StarsIn(movieTitle, movieYear, starName)
MovieStar(name, address, gender, birthdate)
MovieExec(name, address, cert#, netWorth)
Studio(name, address, presC#)
Changed to:
Movie(title, year, length, inColor, studioName, producerC#, starName)
<StarsIn merged into above>
MovieStar(name, address, gender, birthdate)
MovieExec(name, address, cert#, netWorth)
Studio(name, address, presC#)
Movie(title, year, length, inColor, studioName, producerC#, starName)
Title
Year
Length
inColor StudioName
prodC#
StarName
Star wars
1977
121
Yes
Fox
128
Hamill
Star wars
1977
121
Yes
Fox
128
Fisher
Star wars
1977
121
Yes
Fox
128
H. Ford
King Kong
2005
187
Yes
Universal
150
Watts
King Kong
1933
100
no
RKO
20
Fay
Issues:
1.
Redundancy  higher storage, inconsistencies (“anomalies”)
2.
Need nulls
Unable to represent some information without using nulls
How to store movies w/o actors (pre-productions etc) ?
Movie(title, year, length, inColor, studioName, producerC#, starNames)
Title
Year
Length
inColor StudioName
prodC#
StarNames
Star wars
1977
121
Yes
Fox
128
{Hamill,
Fisher, H.
ford}
King Kong
2005
187
Yes
Universal
150
Watts
King Kong
1933
100
no
RKO
20
Fay
Issues:
3. Avoid sets
- Hard to represent
- Hard to query
Smaller schemas always good ????
Split Studio(name, address, presC#) into:
Studio1 (name, presC#)
Studio2(name, address)???
Name
presC#
Name
Address
Fox
101
Fox
Address1
Studio2
101
Studio2
Address1
Universial
102
Universial
Address2
This process is also called “decomposition”
Issues:
4. Requires more joins (w/o any obvious benefits)
5. Hard to check for some dependencies
What if the “address” is actually the presC#’s address ?
No easy way to ensure that constraint (w/o a join).
Smaller schemas always good ????
Decompose StarsIn(movieTitle, movieYear, starName) into:
StarsIn1(movieTitle, movieYear)
StarsIn2(movieTitle, starName) ???
movieTitle
movieYear
movieTitle
starName
Star wars
1977
Star Wars
Hamill
King Kong
1933
King Kong
Watts
King Kong
2005
King Kong
Faye
Issues:
6. “joining” them back results in more tuples than what we started with
(King Kong, 1933, Watts) & (King Kong, 2005, Faye)
This is a “lossy” decomposition
We lost some constraints/information
The previous example was a “lossless” decomposition.
Desiredata
No sets
Correct and faithful to the original design
Avoid lossy decompositions
As little redundancy as possible
To avoid potential anomalies
No “inability to represent information”
Nulls shouldn’t be required to store information
Dependency preservation
Should be possible to check for constraints
Approach
We will encode and list all our knowledge about the schema
somehow
Functional dependencies (FDs)
SSN  name
(SSN “implies” length)
If two tuples have the same “SSN”, they must have the same “name”
movietitle  length --- Not true.
But, (movietitle, movieYear)  length --- True.
We will define a set of rules that the schema must follow to be
considered good
“Normal forms”: 1NF, 2NF, 3NF, BCNF, 4NF, …
Rules specify constraints on the schemas and FDs
Download