Relational Model - Department of Information Systems • NJIT

advertisement
The Relational Model –
Functional Dependencies
& Normalization
Objectives
 Optimal
Database Design
Selection
Of Appropriate Relations/Tables
For A Given Set Of Attributes
 Minimize
Update Anomalies
Redundancy
Update
Inconsistent
Additions
Deletions
Data
Definition of Anomaly
Something that deviates from our
expectations
Example
CUSTNUMB CUSTNAME
CUSTADDR
123
456
461
489
514
Jones, R.
Lan, J.
Chu, W.
Obie, S.
Wise, R
19 Oak St.
...
999
...
Side, E.
...
4 Pine St.
22 Main St.
76 High St.
17 Birch St.
87 Bay St.
SNUMB SLSRNAME
3
6
12
6
3
Adams, M.
...
12
...
Smith, R.
Brown, M.
Smith, R.
Adams, M.
Brown, N.
Specific Anomalies In This Relation
Redundancy
Why repeat the Sales Rep Name for Adams in each record?
Suppose Adams has 500 customers? That means 500 times you
repeat Adams’ name!
 Update
Suppose Slsr Mary Adams marries and changes her name?
How many rows do we need to update?
 Inconsistent data
Notice Brown's first initial varies : M, N
 Additions
New Slsr J. Doe can't be entered until he has a customer
 Deletion
Delete all customers of Adams, and we lose the name of the
salesrep Adams

Decomposition Of Relations
The previous table can be decomposed into the following two tables
CUSTNUMB
123
456
461
489
514
...
999
CUSTNAME
Jones, R.
Lan, J.
Chu, W.
Obie, S.
Wise, R
...
Side, E.
SNUMB
3
6
12
CUSTADDR
19 Oak St.
4 Pine St.
22 Main St.
76 High St.
17 Birch St.
...
87 Bay St.
SLSRNAME
Adams, M.
Smith, R.
Brown, M.
SNUMB
3
6
12
6
3
...
12
Notice That This Decomposition Resolved All
Database Anomalies
 REDUNDANCY
NONE EXISTS
 UPDATE
JUST CHANGE MARY ADAMS' LAST NAME (ONCE)
IN salesrep relation
 INCONSISTENT DATA
IMPOSSIBLE - M. BROWN'S NAME APPEARS ONLY
ONCE!
 ADDITIONS
ADD NEW SLSR J. DOE TO salesrep relation
 DELETIONS
WE CAN DELETE ALL OF ADAMS' CUSTOMERS AND
STILL HAVE ADAMS IN salesrep
Conceptual Tools Needed For
Decomposition
Functional
Dependencies
Lossless Join Decomposition
Normal Forms
Functional Dependencies
Common Issue in Designing a New
Database From Existing Data
We have obtained one or more tables of
existing data (such as from a spreadsheet
or extracts from an existing corporate
database).
The data is to be stored in a new database.
DATABASE DESIGN QUESTION: Should
the data be stored as received, or should it
be transformed for storage?
Should We Combine ORDER_ITEM and
SKU_DATA into One Table (SKU_DATA)?
Should we store these two tables as they are, or should we combine them
into one table in our new database?
But First—
We need to understand:
The relational model
 Relational model terminology

The Relational Model
Introduced in 1970
Created by E.F. Codd
He was an IBM engineer
 The model used mathematics known as
“relational algebra”

Now the standard model for
commercial DBMS products.
Important Relational Model Terms
Entity
Relation
Functional Dependency
Determinant
Candidate Key
Composite Key
Primary Key
Surrogate Key
Foreign Key
Referential integrity constraint
Normal Form
Multivalued Dependency (new for us)
Entity
An entity is some identifiable thing that
users want to track:
Customers
 Computers
 Sales

Relations
A relation is a two-dimensional table that
has the following characteristics:








Rows contain data about an entity.
Columns contain data about attributes of
the entity.
All entries in a column are of the same kind.
Each column has a unique name.
Cells of the table hold a single value.
The order of the columns is unimportant.
The order of the rows is unimportant.
No two rows may be identical
A Typical Relation
Tables That Are Not Relations:
Multiple Entries per Cell
Tables That Are Not Relations:
Table with Required Row
Order
A Valid Relation with Values of
Different Length
An INVALID relation (Cells in a valid relation are
supposed to hold a single value, but the Phone
“cell” for Employees 400 and 700 have multiple
phone numbers)
Alternative Terminology
Although not all tables are relations, as
we have seen on the previous slides, the
terms table and relation are generally
used interchangeably.
The following sets of terms are
equivalent:
Functional Dependency
A functional dependency occurs when the value of one
(set of) attribute(s) determines the value of a second (set
of) attribute(s):
StudentID  StudentName
StudentID  (DormName, DormRoom, Fee)
The attribute on the left side of the functional dependency
is called the determinant.
Functional dependencies may be based on equations:
ExtendedPrice = Quantity X UnitPrice
(Quantity, UnitPrice)  ExtendedPrice
But, function dependencies are definitely not equations!
Functional Dependencies Are
Not Equations: An Example
We can deduce the following set of Functional Dependencies from
the above diagram
ObjectColor  Weight
ObjectColor  Shape
ObjectColor  (Weight, Shape)
But, does Shape functionally determine anything? (NO!)
Composite Determinants
Composite determinant: a determinant of a functional
dependency that consists of more than one attribute.
(StudentName, ClassName)  (Grade)
Functional Dependency Rules
(Not a complete list)
If A  (B, C), then A  B and A C
If (A,B)  C, then neither A nor B
determines C by itself
Functional Dependency Review
A functional dependency occurs when the value of
one (or set of) attribute(s) determines the value of a
second (or set of) attribute(s):
StudentID  StudentName
StudentID  (DormName, DormRoom, Fee)
The attribute on the left side of the functional
dependency is called the determinant, the attribute
on the right side is called the dependent.
Functional dependencies may be based on
equations:
ExtendedPrice = Quantity X UnitPrice
(Quantity, UnitPrice)  ExtendedPrice
Function dependencies are not equations
Composite Determinants
Composite determinant: A
determinant of a functional
dependency that consists of more than
one attribute
Example of a Composite Determinant:
(StudentName, ClassName)  (Grade)
Find the functional dependencies
in the SKU_DATA Table
Ask yourself the question – if we know the value of a particular
attribute, will that value determine a unique value of some other
attribute? (If “yes,” then we have a functional dependency between
the attributes.)
Functional Dependencies in
the SKU_DATA Table

SKU  (SKU_Description, Department, Buyer)

SKU_Description  (SKU, Department, Buyer)

Buyer  Department
Find the functional dependencies
in the ORDER_ITEM Table
Functional dependencies in ORDER_ITEM
Table
(OrderNumber, SKU)  (Quantity, Price, ExtendedPrice)
 Note that OderNumber by itself does not functionally
determine any other attribute
 While SKU, from the data, does appear to functionally
determine Price, we always need to be very careful in
making inferences from data. Prices may change in the
future, and the price might often be tied to a particular
order. So, we would prefer to use the composite of SKU
and OrderNumber as a determinant in a functional
dependency, rather than SKU by itself.
 (Quantity, Price)  (ExtendedPrice)
 Note that this is derived from the equation
ExtendedPrice = Quantity * Price

When are determinant values unique?
A determinant has unique values (i.e., all values are
different) in a relation if, and only if, it functionally
determines every other attribute in the relation

So, in SKU_Data, SKU has all different (unique) values,
and it functionally determines every attribute in the table.
On the other hand, Buyer, though a determinant, does
not have unique values, and does not functionally
determine all the other attributes in the relation.
So, you cannot find the determinants of all functional
dependencies simply by looking for unique values in
one column
A
a(1)
a(1)
a(2)
a(2)
a(2)
B
b(1)
b(1)
b(1)
b(2)
b(2)
C
c(1)
c(2)
c(1)
c(1)
c(2)
D
d(1)
d(1)
d(1)
d(2)
d(3)
E
e(1)
e(1)
e(1)
e(1)
e(2)
BC ----> D (True or False?)
B ----> A (True or False?)
D ----> BE (True or False?)
AB ----> C (True or False?)
The Answers
BC ----> D (True or False?)
B ----> A (True or False?)
D ----> BE (True or False?)
AB ----> C (True or False?)




Deducing Functional
Dependencies
Since BC ----> D and D ----> BE, can we
conclude that BC ----> BE ?
YES! (We will call this transitivity)
If BC ----> D and BC ----> A, can we
conclude that D ----> A ?
NO! Nor can we conclude A ----> D.
Superkeys & FD's
 A superkey is an attribute or a set of attributes that identify
an entity UNIQUELY.
 In a relation (table), a SUPERKEY is any column or set of
columns whose values can be used to distinguish one
row from another.
 Since a superkey identifies each item uniquely, it
functionally determines all the attributes of a relation.





STUID is a superkey
SOCSEC is a superkey
STUNAME is NOT a superkey
STUID,STUNAME IS a superkey
STUID,ANY OTHER SET OF ATTRIBUTES is a superkey
The Formal Theory
Definition Of A Superkey
A set
of attributes K is a
superkey of relation (table) R,
if K ----> R
In
other words, a superkey functionally
determines all the attributes in R
More On Superkeys
A superkey is a candidate key if it is
minimal, i.e., if X is a superkey, then X
minus {any attribute of X} is NOT a
superkey.
A primary key is a candidate key which
we choose to be THE "key."
Superkeys, Candidate Keys And
Primary Keys
Superkey: a set of attributes which functionally
determines all of the attributes in the relation
 Candidate key:from the set of superkeys, we eliminate all
those superkeys which have "extra" attributes (a superkey
will have an "extra" attribute if, when we remove this
attribute, the resulting set of attributes is also a superkey).
 Primary key: if there is more than 1 candidate key, then
the candidate key we choose for THE key is called the
primary key - if there is exactly 1 candidate key, then that
candidate key is the primary key.

Example - Obtain Candidate Keys
Consider the following scheme from an airline database
system:
( P(pilot) , F(flight# ), D(date), T (scheduled time to depart) )
We have the following FD's :
 F ----> T
PDT ----> F
FD ----> P
Provide some superkeys:
 PDT is a superkey, and FD is a superkey.
 Is PDT a candidate key?
 PD is not a superkey, nor is DT, nor is PT.
 So, PDT is a candidate key.
 FD is also a candidate key, since neither F or D are
superkeys.
Surrogate Keys
A surrogate key is an artificial
attribute/column added to a relation to
serve as a primary key:
Often DBMS supplied
 Short, numeric and never changes – an
ideal primary key!
 Has artificial values that are meaningless
to users
 Normally hidden in forms and reports

Example of Surrogate Keys
(NOTE: The primary key of the relation is underlined below)
RENTAL_PROPERTY without surrogate key:
RENTAL_PROPERTY (Street, City,
State/Province, Zip/PostalCode, Country, Rental_Rate)
RENTAL_PROPERTY with surrogate key:
RENTAL_PROPERTY (PropertyID, Street, City,
State/Province, Zip/PostalCode, Country, Rental_Rate
Trivial FD's
A functional dependency is defined
to be trivial if it is satisfied by every
relation
Example of a trivial functional
dependency:
 AB ----> A is satisfied by every
relation involving A.
Trivial Fd's
Generalization and rule for trivial FD's:
An FD is trivial if it has the form:
X ----> Y, where Y is a subset of X.
 So, ABCD ----> ABC is a trivial FD.
A trivial FD does not make a significant
statement about real world constraints - we are
thus only interested in non-trivial FD's.
Another FD “Rule”
If (A,B)  C, then neither A nor B by
itself will functionally determine C.


Normal Forms
There are numerous "normal forms" which are
categorizations based upon the kinds of “problems”
that relations have.
These will be discussed:
First
Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
FIRST NORMAL FORM

A relation is in first normal form (1NF) iff every
attribute in every row can contain only a single
value. A 1NF relation cannot have any row that
contains a repeating grouping of attribute values.
Example Of A Relation Not In 1NF
Ordnumb
12489
12491
Orddte
30109
30209
12495
30409
Partnumb
AX12
BT04
BZ66
CX11
Numbord
11
1
1
2
*We can convert the above table to 1NF by flattening *
Ordnumb
12489
12491
12491
12495
Orddte
30109
30209
30209
30409
Partnumb
AX12
BT04
BZ66
CX11
Numbord
11
1
1
2
Second Normal Form


Definition: an attribute is a non-key attribute if it is
not a part of the primary key
Definition: A relation is in second normal form
(2NF) if it is in first normal form and no non-key
attribute is dependent on only a portion of the
primary key (when the primary key is composite consisting of 2 or more attributes)
Example Of A Relation In 1NF,
But Not 2NF
Ordnumb Orddte
12489
90509
12491
90509
12491
90509
12495
90709
Partnumb
AX12
BT04
BZ66
AX12
PartDesc Numbord
MOUSE
11
DRV270G
1
DRV180G
1
MOUSE
4
Quoprice
14.95
120.99
80.95
14.95
*****The following FD's hold on this relation*******
Ordnumb ----> Orddte
Partnumb ---> PartDesc
Ordnumb, Partnumb ----> Numbord, Quoprice
******The relation is NOT in 2NF because ...*********
PartDesc is dependent on only a portion of primary key,
and similarly for Orddte
Transform Relation To 2NF

First, take each subset of the set of attributes which make up
the primary key, and begin a relation with this subset as its
primary key
(Ordnumb)
(Partnumb)
(Ordnumb, Partnumb)

Then, place each of the other attributes with the appropriate
primary key, i.e., place each one with the minimal collection on
which it depends
(Ordnumb, Orddte)
(Partnumb, Partdesc)
(Ordnumb, Partnumb, Numbord, Quoteprice)
Third Normal Form


A relation is in Third Normal Form (3NF) iff it is in
Second Normal Form and there is no non-key
attribute which is functionally dependent upon
another non-key attribute in any functional
dependency
("each non-key attribute must depend upon the
key, the whole key, and nothing but the key")
Example Of Relation In 2NF, But Not 3NF



Consider STUDENT(STUID, STUNAME, MAJOR,
CREDITS, FSJS) with the following FD's:
Stuid ----> Stuname, Major, Credits, FSJS
Credits---> FSJS
Since attribute FSJS depends on credits, student is not
in 3NF
To create 3NF here, form a new relation (STATS) with
the functionally dependent attribute and its determinant
STU2 ( Stuid, Stuname, Major, Credits) R1
STATS ( Credits, FSJS ) R2
Boyce-Codd Normal Form (BCNF)



Reminder: a determinant is an attribute (or
collection of attributes) that functionally
determines another attribute (or set of attributes),
i.e., it is the LHS of a functional dependency
Example: in sosec ---------> stuname, sosec is a
determinant
Def.: A relation is in Boyce-Codd normal form if
every determinant is a candidate key
Another Example Of 2NF Relation
(Not In 3NF And Not In BCNF)
GIVEN: PC (TAGNUM, COMPID, EMPNUM,EMPNAME,LOCATION)
and given the following functional dependencies:
FD1: TAGNUM ---->COMPID,EMPNUM,EMPNAME,LOCATION.
FD2: EMPNUM-----> EMPNAME
This Relation Satisfies 2NF, But
Not 3NF Or BCNF
TAGNUM
COMPID
EMPNUM
EMPNAME
LOCATION
32808
M759
611
DINH, M.
ACCOUNTING
37691
B121
124
ALVAREZ, R
SALES
57772
C007
567
FEINSTEIN, B
INFO
SYSTEMS
124
ALVAREZ, R
HOME
59836
B221
77740
M759
567
FEINSTEIN, B
HOME
Some Anomalies Present In
This Relation



UPDATE: If Betty Feinstein gets married, must change
more than 1 record
INCONSISTENT DATA: Potential problem due to
redundancy
ADDITIONS: New employee 347 cannot be added until a
pc is assigned
Why Is The PC Relation Not In 3NF
Or Boyce Codd Normal Form?
1) It is in 2NF (there is no non-key attribute dependent on only a
portion of the primary key, since the primary key consists of only 1
attribute)
2) The primary key is TAGNUM.
3) The only candidate key is TAGNUM.
4) There are 2 determinants - TAGNUM AND EMPNUM
.
5) Since EMPNUM is a determinant but not a candidate key, the
relation is not in BCNF. And it's not in 3NF either.
Changing Our PC Relation To 3NF
 PC (TAGNUM, COMPID, EMPNUM, EMPNAME,
LOCATION) is replaced by
PC (TAGNUM, COMPID, EMPNUM, LOCATION)
and
EMPLOYEE (EMPNUM, EMPNAME)
Transforming A 3NF
Relation To BCNF
1) For each determinant that is not a candidate key, remove from
the relation the attributes which are functionally determined by
this determinant.
2) Create a new table containing all the attributes from the
original relation which were functionally determined by this
determinant.
3) Make the determinant the primary key of this new relation.
Important Points
 A relation in 3NF may or may not be in
Boyce Codd Normal Form
 BUT, a relation in Boyce Codd Normal Form will ALWAYS
be in 3NF.
 {Some textbooks consider Boyce Codd Normal Form to be
"the" third Normal Form. Ours does not. }
Example of a relation in 3NF which is
NOT in BCNF
Suppose that, in a given university:
1. Students may have one or more majors.
2. A major may have several faculty members as as advisers.
3. A faculty member can advise in only one major area.
SID
100
150
200
250
300
300
MAJOR
Math
Psychology
Math
Math
Psychology
Math
FACNAME
Cauchy
Jung
Riemann
Cauchy
Perls
Riemann
Things to note from this example
 The primary key is not SID !!
 The primary key consists of two attributes: SID and MAJOR.
 There is an important functional dependency corresponding to
the statement "A Faculty member can advise students in only
one major area."
FACNAME -----> MAJOR
 The relation IS in 2NF, since there are no non-key attributes
dependent on only a portion of the primary key.
 The relation is in 3NF, but NOT in BCNF.
The ADVISOR relation transformed to
Boyce Codd Normal Form
STU-ADV(SID, FACNAME)
SID
FACNAME
100
150
200
250
300
300
Cauchy
Jung
Riemann
Cauchy
Perls
Riemann
ADV-MAJOR(FACNAME, Major)
FACNAME MAJOR
Cauchy
Jung
Math
Psychology
Riemann
Perls
Math
Psychology
Going Directly to BCNF
Example 1 of Going Directly to BCNF
The SKU_DATA TABLE
Working Through The Example
SKU_DATA (SKU, SKU_Description, Department, Buyer)
Identify the FDs:
a) SKU  (SKU_Description, Department, Buyer)
b) SKU_Description  (SKU, Department, Buyer)
c) Buyer  Department
SKU and SKU_Description are candidate keys, Buyer is NOT a
candidate key, so SKU_DATA is not in BCNF. Placing the columns
of the problem FD (c) into a separate relation, with the determinant
Buyer as the primary key, and making Buyer a foreign key in the
SKU_DATA relation, we obtain:
SKU_DATA2 (SKU, SKU_Description, Buyer)
BUYER
(Buyer, Department)
Where BUYER.Buyer must exist in SKU_DATA2.Buyer
The Resulting Populated SKU_DATA2 and
BUYER Relations, in BCNF
Example 2 of Going Directly to BCNF
The EQUIPMENT_REPAIR table
Working Through The Example
EQUIPMENT_REPAIR (ItemNumber, Type, AcquisitionCost,
RepairNumber, RepairDate, RepairAmount)
Identify the FDs:
a) ItemNumber  (Type, AcquisitionCost)
b) RepairNumber  (ItemNumber, Type, AcquisitionCost,
RepairDate, RepairAmount)
RepairNumber is a candidate key, ItemNumber is NOT a candidate key, so
EQUIPMENT_REPAIR is not in BCNF. Placing the columns of the
problem FD (a) into a separate relation, with the determinant ItemNumber
as the primary key, and making ItemNumber a foreign key in the REPAIR
relation, we obtain:
ITEM (ItemNumber, Type, AcquisitionCost)
REPAIR (RepairNumber, RepairDate, RepairAmount, ItemNumber, )
Where REPAIR.ItemNumber must exist in
ITEM.ItemNumber
The Resulting Populated REPAIR
and ITEM Relations, in BCNF
SUMMARY OF NORMAL FORMS
WE HAVE COVERED
1NF – A table that qualifies as a relation is in 1NF
2NF – A relation is in 2NF if all of its nonkey attributes are
dependent on all of the primary key
3NF – A relation is in 3NF if it is in 2NF and there is no nonkey attribute which is functionally dependent upon another
non-key attribute in any functional dependency, or,
equivalently, there are no determinants except the primary
key, (or, equivalently, there are no transitive dependencies
{i.e., there are no FDs where A  B and B  C} )
Boyce-Codd Normal Form (BCNF) – A relation is in BCNF
if every determinant is a candidate key
Download