CS 405G: Introduction to Database Systems Instructor: Chen Qian

advertisement
CS 405G: Introduction to
Database Systems
Lecture 8: SQL III and Functional Dependency
Instructor: Chen Qian


3/4 HW3 due
Midterm exam to 3/7 (1-2pm)
7/1/2016
Chen Qian @ University of Kentucky
2
Trigger options

Possible events include:




Granularity—trigger can be activated:



INSERT ON table
DELETE ON table
UPDATE [OF column] ON table
FOR EACH ROW modified
FOR EACH STATEMENT that performs modification
Timing—action can be executed:

AFTER or BEFORE the triggering event
7/1/2016
Chen Qian @ Univ. of Kentucky
3
Transition variables





OLD ROW: the modified row before the triggering event
NEW ROW: the modified row after the triggering event
OLD TABLE: a hypothetical read-only table containing all
modified rows before the triggering event
NEW TABLE: a hypothetical table containing all modified rows
after the triggering event
Not all of them make sense all the time, e.g.

AFTER INSERT statement-level triggers


BEFORE DELETE row-level triggers


Can use only NEW TABLE
Can use only OLD ROW
etc.
7/1/2016
Chen Qian @ Univ. of Kentucky
4
Statement-level trigger example
CREATE TRIGGER AutoRecruit
AFTER INSERT ON Student
REFERENCING NEW TABLE AS newStudents
FOR EACH STATEMENT
INSERT INTO Enroll
(SELECT SID, ’CS405’
FROM newStudents
WHERE GPA > 3.0);
Efficiency???
7/1/2016
Chen Qian @ Univ. of Kentucky
5
BEFORE trigger example



Never give faculty more than 50% raise in one update
CREATE TRIGGER NotTooGreedy
BEFORE UPDATE OF salary ON Faculty
REFERENCING OLD ROW AS o, NEW ROW AS n
FOR EACH ROW
WHEN (n.salary > 1.5 * o.salary)
SET n.salary = 1.5 * o.salary;
BEFORE triggers are often used to “condition” data
Another option is to raise an error in the trigger body to abort the
transaction that caused the trigger to fire
7/1/2016
Chen Qian @ Univ. of Kentucky
6
Statement- vs. row-level triggers
Why are both needed?
 Certain triggers are only possible at statement level


If the average GPA of students inserted by this statement
exceeds 3.0, do …
Simple row-level triggers are easier to implement and
may be more efficient


Statement-level triggers require significant amount of
state to be maintained in OLD TABLE and NEW TABLE
However, a row-level trigger does get fired for each row,
so complex row-level triggers may be inefficient for
statements that generate lots of modifications
7/1/2016
Chen Qian @ Univ. of Kentucky
7
Another statement-level trigger

Give faculty a raise if GPA’s in one update statement are all
increasing
CREATE TRIGGER AutoRaise
AFTER UPDATE OF GPA ON Student
REFERENCING OLD TABLE AS o, NEW TABLE AS n
FOR EACH STATEMENT
WHEN (NOT EXISTS(SELECT * FROM o, n
WHERE o.SID = n.SID
AND o.GPA >= n.GPA))
UPDATE Faculty SET salary = salary + 1000;

A row-level trigger would be difficult to write in this case
7/1/2016
Chen Qian @ Univ. of Kentucky
8
System issues

Recursive firing of triggers


Action of one trigger causes another trigger to fire
Can get into an infinite loop



Some DBMS restrict trigger actions
Most DBMS set a maximum level of recursion (16 in DB2)
Interaction with constraints (very tricky to get right!)

When do we check if a triggering event violates constraints?



After a BEFORE trigger (so the trigger can fix a potential violation)
Before an AFTER trigger
AFTER triggers also see the effects of, say, cascaded deletes caused
by referential integrity constraint violations
7/1/2016
Chen Qian @ Univ. of Kentucky
9
Summary of SQL features covered so
far




Query
Modification
Constraints
Triggers
7/1/2016
Chen Qian @ Univ. of Kentucky
10
Exercise

Consider the following relational schema and briefly
answer the questions that follow:

Define a table constraint on Emp that will ensure that
every employee makes at least $10,000.
7/1/2016
Chen Qian @ Univ. of Kentucky
11
Exercise

Define a table constraint on Emp that will ensure that
every employee makes at least $10,000.
7/1/2016
Chen Qian @ Univ. of Kentucky
12
Exercise

Define a table constraint on Dept that will ensure that all
managers have age > 30.
7/1/2016
Chen Qian @ Univ. of Kentucky
13
Exercise

Print the names and ages of each employee who works
in both the Hardware department and the Software
department.
7/1/2016
Chen Qian @ Univ. of Kentucky
14
Exercise

For each department with more than 20 full-time-equivalent
employees (i.e., where the part-time and full-time employees add
up to at least that many full-time employees. Each full-time
employee time is counted as 100.), print the did together with the
number of employees that work in that department.
7/1/2016
Chen Qian @ Univ. of Kentucky
15
Exercise

Print the name of each employee whose salary exceeds
the budget of all of the departments that he or she works
in.
7/1/2016
Chen Qian @ Univ. of Kentucky
16
Exercise

Find the enames of managers who manage the
departments with the largest budgets.
7/1/2016
Chen Qian @ Univ. of Kentucky
17
Exercise

If a manager manages more than one department, he or
she controls the sum of all the budgets for those
departments. Find the managerids of managers who
control more than $5 million.
7/1/2016
Chen Qian @ Univ. of Kentucky
18
Exercise

Find the managerids of managers who control the
largest amounts.
7/1/2016
Chen Qian @ Univ. of Kentucky
19
Exercise

Find the enames of managers who manage only
departments with budgets larger than $1 million, but at
least one department with budget less than $5 million.
7/1/2016
Chen Qian @ Univ. of Kentucky
20
Homework 3
7/1/2016
Chen Qian @ Univ. of Kentucky
21
7/1/2016
Chen Qian @ Univ. of Kentucky
22

(b) Find the snames of suppliers who supply every part.
7/1/2016
Chen Qian @ Univ. of Kentucky
23

(c) Find the sids of suppliers who charge more for some
part than the average cost of that part (averaged over all
the suppliers who supply that part).
7/1/2016
Chen Qian @ Univ. of Kentucky
24

(d) Find the sids of suppliers who supply a red part and
a green part.
7/1/2016
Chen Qian @ Univ. of Kentucky
25
7/1/2016
Chen Qian @ Univ. of Kentucky
26

(a). Write the SQL statements required to create these re
lations, including appropriateversions of all primary and
foreign key integrity constraints
7/1/2016
Chen Qian @ Univ. of Kentucky
27

(a). Write the SQL statements required to create these re
lations, including appropriateversions of all primary and
foreign key integrity constraints
7/1/2016
Chen Qian @ Univ. of Kentucky
28



(b). Express each of the following integrity constraints
in SQL unless it is implied by the primary and foreign
key constraint; if so, explain how it is implied. If the
constraint cannot be expressed in SQL, say so.
I. Every class has a minimum enrollment of 5 students
and a maximum enrollment of 30 students.
Add
7/1/2016
Chen Qian @ Univ. of Kentucky
29

II. The department with the most faculty members must
have fewer than twice the number of faculty members in
the department with the fewest faculty members
7/1/2016
Chen Qian @ Univ. of Kentucky
30

Functional Dependency is not included in the Midterm
Exam
7/1/2016
Chen Qian @ University of Kentucky
31
Today’s Topic




Functional Dependency.
Normalization
Decomposition
BCNF
7/1/2016
Chen Qian @ Univ of Kentucky
Motivation

How do we tell if a design is bad, e.g.,
Enroll(SID, Sname, CID, Cname, grade)?

This design has redundancy, because the name of an employee
is recorded multiple times, once for each project the employee
is taking
7/1/2016
SID
CID
Sname
Cname
grade
1234
10
John Smith
DB
A
1123
9
Ben Liu
NET
A
1234
9
John Smith
NET
B
1123
10
Ben Liu
DB
C
1023
10
Susan Sidhuk
DB
B
Chen Qian @ Univ of Kentucky
7/1/2016
SID
Sname
1234
John Smith
1123
Ben Liu
1023
Susan Sidhuk
CID
Cname
9
NET
10
DB
SID
CID
grade
1234
10
A
1123
9
A
1234
9
B
1123
10
C
1023
10
B
Chen Qian @ Univ of Kentucky
Why redundancy is bad?

Waste disk space.

What if we want to perform update operations to the relation
 INSERT an new project that no employee has been
assigned to it yet.
 UPDATE the name of “John Smith” to “John L. Smith”
 DELETE the last employee who works for a certain
project
7/1/2016
SID
CID
Sname
Cname
grade
1234
10
John Smith
DB
A
1123
9
Ben Liu
NET
A
1234
9
John Smith
NET
B
1123
10
Ben Liu
DB
C
1023
10
Susan Sidhuk
DB
B
Chen Qian @ Univ of Kentucky
Functional dependencies


A functional dependency (FD) has the form X -> Y,
where X and Y are sets of attributes in a relation R
X -> Y means that whenever two tuples in R agree on all
the attributes in X, they must also agree on all attributes
in Y

t1[X] = t2[X]  t1[Y] = t2[Y]
X
Y
Z
a
b
c
a
b?
d?
Could be anything,
e.g. d
Must be “b”
7/1/2016
Chen Qian @ Univ of Kentucky
FD examples
Address (street_address, city, state, zip)
 street_address, city, state -> zip
 zip -> city, state
 zip, state -> zip?



This is a trivial FD
Trivial FD: LHS RHS
zip -> state, zip?


This is non-trivial, but not completely non-trivial
Completely non-trivial FD: LHS ∩ RHS = ?
7/1/2016
Chen Qian @ Univ of Kentucky
Functional Dependencies

An FD is a property of the attributes in the schema R

The constraint must hold on every relation instance
r(R)

If K is a key of R, then K functionally determines all
attributes in R (since we never have two distinct tuples
with t1[K]=t2[K])
7/1/2016
Chen Qian @ Univ of Kentucky
Keys redefined using FD’s
Let attr(R) be the set of all attributes of R, a set of
attributes K is a (candidate) key for a relation R if
 K -> attr(R) - K, and


No proper subset of K satisfies the above condition


That is, K is a “super key”
That is, K is minimal (full functional dependent)
Address (street_address, city, state, zip)




{street_address, city, state, zip}
{street_address, city, zip}
{street_address, zip}
{zip}
7/1/2016
Chen Qian @ Univ of Kentucky
Super key
Super key
Key
Non-key
Reasoning with FDs
Given a relation R and a set of FDs F
 Does another FD follow from F?


Are some of the FDs in F redundant (i.e., they follow
from the others)?
Is K a key of R?

What are all the keys of R?
7/1/2016
Chen Qian @ Univ of Kentucky
Attribute closure


Given R, a set of FDs F that hold in R, and a set of
attributes Z in R:
The closure of Z (denoted Z+) with respect to F is the set
of all attributes {A1, A2, …} functionally determined by
Z (that is, Z -> A1 A2 …)
Algorithm for computing the closure



Start with closure = Z
If X -> Y is in F and X is already in the closure, then also
add Y to the closure
Repeat until no more attributes can be added
7/1/2016
Chen Qian @ Univ of Kentucky
A more complex example
WorkOn(EID, Ename, email, PID, Pname, Hours)




EID -> Ename, email
email -> EID
PID -> Pname
EID, PID -> Hours
(Not a good design, and we will see why later)
7/1/2016
Chen Qian @ Univ of Kentucky
Example of computing closure

F includes:







{ PID, email }+ = ?
Starting from: closure = { PID, email }
email -> EID


Add Ename, email; closure is now { PID, email, EID, Ename }
PID -> Pname


Add EID; closure is now { PID, email, EID }
EID -> Ename, email


EID -> Ename, email
email -> EID
PID -> Pname
EID, PID -> Hours
Add Pname; close is now { PID, Pname, email, EID, Ename }
EID, PID -> hours

Add hours; closure is now all the attributes in WorksOn
7/1/2016
Chen Qian @ Univ of Kentucky
Using attribute closure
Given a relation R and set of FDs F

Does another FD X -> Y follow from F?



Is K a super key of R?



Compute X+ with respect to F
If Y  X+, then X -> Y follow from F
Compute K+ with respect to F
If K+ contains all the attributes of R, K is a super key
Is a super key K a key of R?

Test where K’ = K – { a | a K} is a superkey of R for all
possible a
7/1/2016
Chen Qian @ Univ of Kentucky
Rules of FDs

Armstrong’s axioms
 Reflexivity: If Y X, then X -> Y



Augmentation: If X -> Y, then XZ -> YZ for any Z
Transitivity: If X -> Y and Y -> Z, then X -> Z
Rules derived from axioms


Splitting: If X -> YZ, then X -> Y and X -> Z
Combining: If X -> Y and X -> Z, then X -> YZ
7/1/2016
Chen Qian @ Univ of Kentucky
Using rules of FD’s
Given a relation R and set of FDs F
 Does another FD X -> Y follow from F?


Use the rules to come up with a proof
Example:


F includes:
EID -> Ename, email; email -> EID; EID, PID -> Hours,
Pid -> Pname
PID, email ->hours?
email -> EID (given in F)
PID, email -> PID, EID (augmentation)
PID, EID -> hours (given in F)
PID, email -> hours (transitivity)
7/1/2016
Chen Qian @ Univ of Kentucky
Example of redundancy


WorkOn (EID, Ename, email, PID, hour)
We say X -> Y is a partial dependency if there exist a X’
 X such that X’ -> Y


e.g. EID, email-> Ename, email
Otherwise, X -> Y is a full dependency

e.g. EID, PID -> hours
EID
PID
Ename
email
Pname
Hours
1234
10
John Smith
jsmith@ac.com
B2B platform
10
1123
9
Ben Liu
bliu@ac.com
CRM
40
1234
9
John Smith
jsmith@ac.com
CRM
30
1023
10
Susan Sidhuk
7/1/2016
ssidhuk@ac.com B2B platform
Chen Qian @ Univ of Kentucky
40
Database Normalization
 Database normalization relates to the level of
redundancy in a relational database’s structure.
 The key idea is to reduce the chance of having multiple
different version of the same data.
 Well-normalized databases have a schema that reflects
the true dependencies between tracked quantities.
 Any increase in normalization generally involves
splitting existing tables into multiple ones, which must
be re-joined each time a query is issued.
7/1/2016
Chen Qian @ University of Kentucky
49
Normalization

A normalization is the process of organizing the fields
and tables of a relational database to minimize
redundancy and dependency.

A normal form is a certification that tells whether a
relation schema is in a particular state
7/1/2016
Chen Qian @ University of Kentucky
50
Normal Forms


Edgar F. Codd originally established three normal
forms: 1NF, 2NF and 3NF.
3NF is widely considered to be sufficient.

Normalizing beyond 3NF can be tricky with current
SQL technology as of 2005

Full normalization is considered a good exercise to
help discover all potential internal database
consistency problems.
7/1/2016
Chen Qian @ University of Kentucky
51
First Normal Form ( 1NF )

NF is to characterize a relation (not an attribute, a key,
etc…)


We can only say “this relation or table is in 1NF”
A relation is in first normal form if the domain of each
attribute contains only atomic values, and the value of
each attribute contains only a single value from that
domain.
7/1/2016
Chen Qian @ University of Kentucky
52
7/1/2016
Chen Qian @ Univ of Kentucky
2nd Normal Form


An attribute A of a relation R is a nonprimary attribute if
it is not part of any key in R, otherwise, A is a primary
attribute.
R is in (general) 2nd normal form if every nonprimary
attribute A in R is not partially functionally dependent
on any key of R
7/1/2016
Chen Qian @ University of Kentucky
54
Redundancy Example

If a key will result a partial dependency of a nonprimary
attribute.

e.g. EID, PID-> Ename

In this case, the attribute (Ename) should be separated
with its full dependency key (EID) to be a new table.

So, to check whether a table includes redundancy. Try
every nonprimary attribute and check whether it fully
depends on any key.
7/1/2016
Chen Qian @ University of Kentucky
55
7/1/2016
Chen Qian @ Univ of Kentucky
Second normal Form ( 2NF )




2NF prescribes full functional dependency on the
primary key.
It most commonly applies to tables that have composite
primary keys, where two or more attributes comprise the
primary key.
It requires that there are no non-trivial functional
dependencies of a non-key attribute on a part (subset) of
a candidate key.
A table is said to be in the 2NF if and only if it is in the
1NF and every non-key attribute is irreducibly dependent
on the primary key
7/1/2016
Chen Qian @ University of Kentucky
57
Decomposition
EID
PID
Ename
email
Pname
Hours
1234
10
John Smith
jsmith@ac.com
B2B platform
10
1123
9
Ben Liu
bliu@ac.com
CRM
40
1234
9
John Smith
jsmith@ac.com
CRM
30
1023
10
Susan Sidhuk
Decomposition
ssidhuk@ac.com B2B platform
40
Foreign key
EID
Ename
email
EID
PID
Pname
Hours
1234
John Smith
jsmith@ac.com
1234
10
B2B platform
10
1123
Ben Liu
bliu@ac.com
1123
9
CRM
40
1023
Susan Sidhuk
ssidhuk@ac.com
1234
9
CRM
30
1023
10
B2B platform
40


Decomposition eliminates redundancy
To get back to the original relation, use natural join.
7/1/2016
Chen Qian @ University of Kentucky
58
Decomposition

Decomposition may be applied recursively
7/1/2016
EID
PID
Pname
Hours
1234
10
B2B platform
10
1123
9
CRM
40
1234
9
CRM
30
1023
10
B2B platform
40
PID
Pname
EID
PID
Hours
10
B2B platform
1234
10
10
9
CRM
1123
9
40
1234
9
30
1023
10
40
Chen Qian @ University of Kentucky
59
Unnecessary decomposition


EID
Ename
email
1234
John Smith
jsmith@ac.com
1123
Ben Liu
bliu@ac.com
1023
Susan Sidhuk
ssidhuk@ac.com
EID
Ename
EID
email
1234
John Smith
1234
jsmith@ac.com
1123
Ben Liu
1123
bliu@ac.com
1023
Susan Sidhuk
1023
ssidhuk@ac.com
Fine: join returns the original relation
Unnecessary: no redundancy is removed, and now EID
is stored twice->
7/1/2016
Chen Qian @ University of Kentucky
60
Bad decomposition


EID
PID
Hours
1234
10
10
1123
9
40
1234
9
30
1023
10
40
EID
PID
EID
Hours
1234
10
1234
10
1123
9
1123
40
1234
9
1234
30
1023
10
1023
40
Association between PID and hours is lost
Join returns more rows than the original relation
7/1/2016
Chen Qian @ University of Kentucky
61
Lossless join decomposition

Decompose relation R into relations S and T





attrs(R) = attrs(S)  attrs(T)
S = πattrs(S) ( R )
T = πattrs(T) ( R )
The decomposition is a lossless join decomposition if,
given known constraints such as FD’s, we can guarantee
that R = S
T
Any decomposition gives R S T (why?)
 A lossy decomposition is one with R  S
T
7/1/2016
Chen Qian @ University of Kentucky
62
Loss? But I got more rows->

“Loss” refers not to the loss of tuples, but to the loss of
information

Or, the ability to distinguish different original tuples
7/1/2016
EID
PID
Hours
1234
10
10
1123
9
40
1234
9
30
1023
10
40
EID
PID
EID
Hours
1234
10
1234
10
1123
9
1123
40
1234
9
1234
30
1023
10
1023
40
Chen Qian @ University of Kentucky
63
Questions about decomposition

When to decompose

How to come up with a correct decomposition (i.e.,
lossless join decomposition)
7/1/2016
Chen Qian @ University of Kentucky
64
Download