Database Processing
Normalization
Chapter 3
David M. Kroenke
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-1
Chapter Premise
• We have one or more tables of data
• The data is to be stored in a new database
• QUESTION: Should the data be stored as
received, or should it be transformed for
storage?
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-2
Important Relational Model Terms
•
•
•
•
•
•
•
•
•
•
•
Entity
Relation
Functional Dependency
Determinant
Candidate Key
Composite Key
Primary Key
Surrogate Key
Foreign Key
Referential integrity constraint
Normal Form
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-3
Entity
• An entity is some identifiable thing that
users want to track:
– Customers
– Computers
– Sales
• Rows contain data about an entity
• Columns contain data about attributes of
the entity
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-4
Relation
• Relational DBMS products store data about entities in
relations, which are a special type of table
• A relation is a two-dimensional table that has the
following characteristics:
–
–
–
–
–
All entries in a column are of the same kind
Each column has a unique name
Cells of the table hold a single value
The order of the columns and rows is unimportant
No two rows may be identical
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-5
A Relation
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-6
Tables That Are Not Relations:
Multiple Entries per Cell
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-7
Functional Dependency
• A functional dependency occurs when the value of one
(a set of) attribute(s) determines the value of a second
(set of) attribute(s):
StudentID  StudentName
StudentID  (DormName, DormRoom, Fee)
• The attribute on the left side of the functional
dependency is called the determinant
• Functional dependencies may be based on equations:
ExtendedPrice = Quantity X UnitPrice
(Quantity, UnitPrice)  ExtendedPrice
• Function dependencies are not equations!
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-8
Functional Dependencies Are Not Equations
ObjectColor  Weight
ObjectColor  Shape
ObjectColor  (Weight, Shape)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-9
Composite Determinants
• Composite determinant: A determinant
of a functional dependency that consists of
more than one attribute
(StudentName, ClassName)  (Grade)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-10
Functional Dependency Rules
• If A  (B, C), then A  B and A C
• If (A,B)  C, then neither A nor B
determines C by itself
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-11
Functional Dependencies in the
SKU_DATA Table
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-12
Functional Dependencies in the
SKU_DATA Table
SKU  (SKU_Description, Department, Buyer)
SKU_Description  (SKU, Department, Buyer)
Buyer  Department
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-13
What Makes Determinant Values Unique?
• A determinant is unique in a relation if, and
only if, it determines every other column in
the relation
• You cannot find the determinants of all
functional dependencies simply by looking
for unique values in one column:
– Data set limitations
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-14
Keys
• A key is a combination of one or more
columns that is used to identify rows in a
relation
• A composite key is a key that consists of
two or more columns
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-15
Candidate and Primary Keys
• A candidate key is a key that determines all of
the other columns in a relation
• A primary key is a candidate key selected as
the primary means of identifying rows in a
relation:
– There is one and only one primary key per relation
– The primary key may be a composite key
– The ideal primary key is short, numeric and never
changes
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-16
Surrogate Keys
• A surrogate key as an artificial column
added to a relation to serve as a primary
key:
– DBMS supplied
– Short, numeric and never changes – an ideal
primary key!
– Has artificial values that are meaningless to
users
– Normally hidden in forms and reports
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-17
Surrogate Keys
Relation Descriptions:
• RENTAL_PROPERTY without surrogate
key:
RENTAL_PROPERTY (Street, City,
State/Province, Zip/PostalCode, Rental_Rate)
• RENTAL_PROPERTY with surrogate key:
RENTAL_PROPERTY (PropertyID, Street, City,
State/Province, Zip/PostalCode, Rental_Rate)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-18
Foreign Keys
• A foreign key is the primary key of one
relation that is placed in another relation to
form a link between the relations:
– A foreign key can be a single column or a
composite key
– The term refers to the fact that key values are
foreign to the relation in which they appear as
foreign key values
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-19
Foreign Keys
Relation Descriptions:
DEPARTMENT (DeptName, BudgetCode, ManagerName)
EMPLOYEE
(EmpNumber, EmpName, DeptName)
Or
DEPARTMENT (DeptName, BudgetCode, ManagerName)
EMPLOYEE
(EmpNumber, EmpName, DeptName)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-20
The Referential Integrity Constraint
• A referential integrity constraint is a
statement that limits the values of the
foreign key to those already existing as
primary key values in the corresponding
relation
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-21
Foreign Key with a
Referential Integrity Constraint
SKU_DATA
(SKU, SKU_Description, Department, Buyer)
ORDER_ITEM (OrderNumber, SKU, Quantity, Price,
ExtendedPrice)
Referential Integrity Constraint: ORDER_ITEM.SKU must first exist in
SKU_DATA.SKU
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-22
Modification Anomalies
An anomaly is an undesirable consequence
of data modification
– Deletion anomaly
– Insertion anomaly
– Update anomaly
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-23
Normal Forms
• Relations are categorized as a normal form
based on which modification anomalies or other
problems that they are subject to:
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-24
Normal Forms
• 1NF – A table that qualifies as a relation is in 1NF
• 2NF – A relation is in 2NF if all of its nonkey attributes
are dependent on all of the primary key
• 3NF – A relation is in 3NF if it is in 2NF and has no
transitive dependencies
• Boyce-Codd Normal Form (BCNF) – A relation is in
BCNF if every determinant is a candidate key
“I swear to construct my tables so that all nonkey
columns are dependent on the key, the whole key
and nothing but the key, so help me Codd.”
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-25
First Normal Form (1NF)
• To be in First Normal Form (1NF) a
relation must have only single-valued
attributes -- neither repeating groups nor
arrays are permitted
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-26
Violation of 1NF
PlayerID
Name
Team
AtBats Hits
123-456789
Alomar, R
Cleveland,
New York
234-567890
Alomar, S
Cleveland,
Chicago
200,
120
60,
150
50,
58
17,
40
Key: PlayerID
Determinants: PlayerID  Name
PlayerID, Team  AtBats, Hits
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-27
Second Normal Form (2NF)
• To be in Second Normal Form (2NF) the
relation must be in 1NF and each nonkey
attribute must be dependent on the whole
key (not a subset of the key)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-28
Violation of 2NF
ItemNo
CustomerID
Quantity
CreditRtg
12
57
25
OK
34
679
3
Poor
Key: ItemNo, CustomerID
Determinants: ItemNo, CustomerID  Quantity
CustomerID  CreditRtg
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-29
Third Normal Form (3NF)
• To be in Third Normal Form (3NF) the relation
must be in 2NF and no transitive dependencies
may exist within the relation.
• A transitive dependency is when an attribute is
indirectly functionally dependent on the key (that
is, the dependency is through another nonkey
attribute)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-30
Violation of 3NF
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-31
Boyce-Codd Normal Form (BCNF)
To be in Boyce-Codd Normal Form
(BCNF) the relation must be in 3NF and
every determinant must be a candidate
key.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-32
Steps for BCNF
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-33
Violation of BCNF
Client
ProbType
Consultant
Alpha
Marketing
Gomez
Alpha
Production
Raginski
Beta
Marketing
Gomez
Omega
Marketing
Taylor
Key: Client, ProbType
Candidate Key: Client, Consultant
Determinants: Consultant  ProbType
* No two consultants have same
name
* Consultant specializes in just one
problem type
* Consultant can be assigned to
multiple clients
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-34