Chapter 5: The Relational Model and Normalization

advertisement
The Relational Model,
Normalization, and Modification
Anomalies
Outline
• Part 1: Introduction
– Overview of Normalization, Big names in normalization
• Part 2: Normal Forms
– Definitions and Techniques
• Part 3: Tokenization
– Concepts and Application
• Part 4: Epilog
– BCNF Revisited, Related Activities, and Reflections
Part 1: Introduction
• Overview of Normalization
approaches
• Review of Normal Forms
• Some big names in the history of
normalization
• 3 of Codd’s 12 rules
Normalization
• Normalization
– A process of evaluating and converting a
relation to reduce modification anomalies
• Modification Anomaly
– An unexpected consequence resulting from
maintenance of the data in a database
• Two Normalization approaches
– Top-Down—relationships among entities
– Bottom-Up—relationships among attributes
Top-Down Normalization
• Many-Many associations
– “Factored”
– Net outcomes …
A
B
A
B
• More entities
• More relations
• Quality
– Many-Many associations
eliminated
C
Bottom-Up Normalization
• Application of a sequence
of transformations that
results in an improved data
model at each stage
• Quality
– A quality rating system
for a relation
– Elimination of maintenance
problems and minimization
of data replication
• Introduced by E. F. Codd
with the Relational Model
A
B
}
C
D
A
B
B
}
}
C
D
Known Normal Forms
•
Bottom-Up: In order from lowest to highest are
•
Top-Down: no specific normal form—goal might be viewed as
•
Known normal forms are related one to the other. Ordering of
Normal Forms
– First Normal Form (1NF). Elimination of repeating field types; “atomic”
fields; duplicate rows
– Second Normal Form (2NF). Elimination of partial key dependencies.
– Third Normal Form (3NF). Elimination of transitive key dependencies
among non-key attributes.
– Boyce-Codd Normal Form (BCNF). Elimination of partial key
dependencies upon non-key attributes.
– Fourth Normal Form (4NF). Elimination of multi-valued dependencies.
– Fifth Normal Form (5NF). Elimination of join anomalies.
– Domain Key Normal Form. Elimination of all modification anomalies
DKNF => 5NF => 4NF => BCNF => 3NF => 2NF => 1NF
A Normalization Strategy
• For initial Logical Model (LM)
– Apply Top-Down approach, removing all Many-to-Many
associations
– Creates new objects, relationships
• For resulting LM
– Apply Bottom-Up approach, to the level of quality desired
(usually 3NF or BCNF)
– Create new objects, relationships
• Reassess resulting LM
– LM quality is the least quality level of any object in the LM
– Optimization/Tuning Note
• “No major application will run in Third Normal Form.” George Koch
(formerly a senior vice president of Oracle)
Normalization: Contributors
• Dr. E. F. Codd
– A Relational Model of Data for Large Shared Databanks, CACM,
Vol 13, No 6, June, 1970.
– Introduced the Relational Model
– Identified the first three normal forms
• Dr. R. F. Boyce
– extended Codd's original three forms.
• Dr. R. Fagin
– Extended the theory as proposed by Codd and
– introduced another way of evaluating a design.
• Dr. David. M. Kroenke, Author & Educator
– Instrumental in clarifying the theory of normal forms
Codd's 12 rules
(for defining a fully relational DBMS)
• Published in the 1980’s by Codd to
defend his original notion of a
relational DBMS
• Really 13 rules (0-12)
• Rules 1, 2, and 3 are relevant to a
discussion of Modification Anomalies
Codd's 12 rules (continued)
• Rule 1: Information Rule
– All information in the database should be
represented in one and only one way - as values
in a table.
– Note: rows and columns are not ordered
• Source:
http://www.cse.ohio-state.edu/~sgomori/570/coddsrules.html
Codd's 12 rules (continued)
• Rule 2: Guaranteed Access Rule
– Each and every datum (atomic value) is guaranteed to be
logically accessible by resorting to a combination of table
name, primary key value and column name.
– Note: No repeating rows, no pointers, no repeated
fields—eliminates CODASYL and OO models
• Source:
http://www.cse.ohio-state.edu/~sgomori/570/coddsrules.html
Codd's 12 rules (continued)
• Rule 3: Systematic Treatment of Null Values
– Null values (distinct from empty character string or a
string of blank characters and distinct from zero or any
other number) are supported in the fully relational DBMS
for representing missing information in a systematic way,
independent of data type.
– Note: Primary Keys may not be NULL (See Rule 1)
Source:
http://www.cse.ohio-state.edu/~sgomori/570/coddsrules.html
Part 2: Normal Forms
• Definitions and terminology
–
–
–
–
Functional Dependency
Key
The first 4 normal forms
Selecting a key
Review: Functional Dependency
• For A and B are non-empty attribute
collections for a relation R, B is
Functionally Dependent on A if for each
value of A there is exactly one value of B.
• Remarks:
– A is said to Functionally Determine B
– A is called a Determinant
– The relationship is represented as A  B
Review: Key
• Definitions:
Given that ε is the set of all attributes for a relation R,
– A is an Identifier for R if A  ε
– K is a Key for a relation R if and only if
• K is an identifier for R and
• No non-empty subset of K is an identifier for R
• Note:
– All relations have a key (covered in later slides)
– An attribute which belongs to the selected key A is
called a Key Attribute
– all other attributes are called Non-Key Attributes
Example
• ε = {STUD-ID, STUD-NAME, DORM, DORM-FEE}
A = {STUD-ID}
B = {DORM}
C = {DORM-FEE}
• Two identified functional Dependencies
– Aε
– BC
• A is a Key:
– cannot discard since only 1 attribute
• B is not a Key:
– does not determine all attributes
Review: First Normal Form
• A relation R is in FIRST NORMAL FORM
(1NF) if and only if
– R has no repeating attribute types AND
– all attribute types of R are “atomic”
– No repeated rows
• Not a functional dependency but …
essential to comply with Codd’s Rule 2 for a
relational DBMS
Review: Second Normal Form
• Definitions:
– A relation R with key, K, has a Partial Key
Dependency if and only if a collection of nonkey attributes is determined by (or functionally
dependent on) a non-empty proper subset of K.
– A relation R is in SECOND NORMAL FORM
(2NF) if and only if
• R is in 1NF AND
• R has no partial key dependencies
• Note: by definition, any relation in 2NF is
in 1NF—thus the LM is improved
Review: Third Normal Form
• Definitions:
– For A and C, attribute collections for a relation R, there
is a Transitive Dependency of C upon A if there is an
attribute collection, B, of for which
• A  B and
• B  C.
– A relation R is in THIRD NORMAL FORM (3NF) if and
only if
• R is in 2NF AND
• R has no transitive dependencies of one non-key attribute
collection upon another non-key attribute collection
• Note: 3NF implies 2NF
Review: Boyce-Codd Normal Form
• Definitions:
– Attribute collections A and B of a relation R
are Candidate Keys for R if and only if
• A is a key for R and
• B is a key for R and
• A is not equal to B
– A relation R is in BOYCE-CODD NORMAL
FORM (BCNF) if and only if
• R is in 3NF AND
• all determinants of R are candidate keys
Selecting a Key
1. Identify all determinants
2. The set of all attributes is a finite set that is an
identifier for the relation—place these on the
“Key” side of the LM diagram
3. One by one, move an attribute determined by
other attribute collections from the “Key” side
to the “Non-Key” side
4. Repeat step 3 until there are no more attributes
determined by attributes on the “Key” side.
5. Those attributes on the “Key” side are a key
Key Selection
Key
{
A
B
Non - Key
What is the minimal
set of attributes
that uniquely identify
the row?
C
D
Based on functional
dependencies, we move
each non-key attribute
out of the key.
Key Selection
Key
{
Non - Key
A
B
C
D
Key Selection
Key
A
B
Non - Key
}
Now we have the
minimal set of
attributes that
uniquely identify the
row
C
D
Summary of
Normal Forms
Key
Non - Key
BCNF
A
D
Find
Key
3NF
B
C
2NF
Part 3: Tokenization
• Modification anomalies and Design flaws
• Tokenized tables
– Functional Dependencies: 2NF, 3NF, and BCNF
– Tokenized Tables: approach permitting a narrow focus on
the dependency, not the actual data
• Observe the result of a normalization process on
– Maintenance of data
– Logical Model
Normalization (Review)
• Normalization
– A process of evaluating and converting a
relation to reduce modification
anomalies
• Modification Anomaly
– An unexpected consequence resulting
from maintenance of the data in a
database
Anomalies: Types
• Two types associated with databases
– Modification Anomaly:
• Three basic types of modification anomalies:
– Insertion,
– Deletion, and
– Update
– Design Anomaly
• A flaw in the logical design of the database itself
• Connecting the two types of anomalies
modification anomaly there is a corresponding
design anomaly and
Whenever there is a design anomaly there are
modification anomalies which may surface.
– For each
–
Anomalies: Design
• Types have been
– classified and
– criteria for removal developed
• Normalization—the process of
removing design anomalies
• A Normal Form identifies a type of
anomaly
Tokenized Tables
• Designed to emphasize the relationship
between modification anomalies and normal
forms using abstractions of actual data
tables, Tokenized Tables.
• Supplement to traditional, textbook
approach.
• Daigle, Roy (1996) “Teaching Normalization
Concepts with Abstraction,” AIS
Link to scanned copy
Tokenized Tables
• Most approaches to normalization use
context-based data tables and
intuition to examine the relationship.
Definitions…
• Problem abstraction
– removing the irrelevant, i.e. the context
• Solution generalization
– Exhibiting data anomalies independent of
context
• Constraint: A “faithful” state of the database
• Verification
– demonstrate that normalization removes the
data anomalies
Tokenized Tables
• “Tokenize” refers to the process of
converting context-based data into
symbolic representation
– Each attribute is assigned its own
variable name (A, B, C, etc)
– Each data value is assigned a distinct
symbolic value (a1, a2,…;b1, b2,…; etc)
Tokenization (context)
COURSE
ID
SECTION
COURSE TITLE
INSTRUCTOR
NAME
INSTRUCTOR
LOCATION
ISC 285
101
Programming II
Chapman
FCW 9
ITE 285
101
Programming II
Chapman
FCW 9
ACC 201
501
Fund Acctg
Miller
MCOB 310
MKT 300
801
Intro Mktg
Bennett
MCOB 310
MKT 300
802
Intro Mktg
Beatty
MCOB 333
Determinants:
1. COURSE ID  COURSE TITLE (Courses could be cross-listed)
2. COURSE ID, SECTION  INSTRUCTOR NAME (A course could
be taught by different faculty)
3. INSTRUCTOR NAME  INSTRUCTOR LOCATION (faculty could
share an office location)
4. COURSE ID, SECTION is a key for this table
Tokenization (context free)
COURSE
ID
SECTION
COURSE TITLE
INSTRUCTOR
NAME
INSTRUCTOR
LOCATION
ISC 285
101
Programming II
Chapman
FCW 9
ITE 285
101
Programming II
Chapman
FCW 9
ACC 201
501
Fund Acctg
Miller
MCOB 310
MKT 300
801
Intro Mktg
Bennett
MCOB 310
MKT 300
802
Intro Mktg
Beatty
MCOB 333
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c1
d1
e1
a3
b2
c2
d2
e2
a4
b3
c3
d3
e2
a4
b4
c3
d4
e3
Tokenization (abstraction)
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c1
d1
e1
a3
b2
c2
d2
e2
a4
b3
c3
d3
e2
a4
b4
c3
d4
e3
Conceptual Diagram
A
B
}
C
D
E
Determinants (without
context):
1. A  C
2. A, B  D
3. D  E
4. A, B is a key
Tokenization (initial)
Conceptual Diagram
A
B
}
C
D
E
Tokenization (faithful representation)
Conceptual Diagram
A
B
}
C
D
Tokenized Table
E
A
B
C
D
E
Tokenization (faithful representation)
Conceptual Diagram
A
B
}
C
D
Tokenized Table
E
A
B
C
D
E
a1
b1
c1
d1
e1
a2
b1
c1
d1
e1
a3
b2
c2
d2
e2
a4
b3
c3
d3
e2
a4
b4
c3
d4
e3
An Illustration
Tokenized Conceptual Diagrams
3NF
B
Step 1
A
B
B
C
Step 4
A
C
Normalization
Procedure
Normalized Relation
Unnormalized Relation
Tokenized Data Tables
Step 2
A
a1
a2
a3
a4
a5
B
b1
b2
b1
b3
b2
C
c1
c2
c1
c1
c2
Modification Anomalies
Step 3
.Insertion b4 c3
.Deletion row a4
.Update b1 c1 to b1 c5 or
a3 b1 to a3 b2
Step 5
Corresponding
Projection
Step 6
A B
a1 b1
a2 b2
a3 b1
a4 b3
a5 b2
Modification
• Insertion
• Deletion
• Update
B
C
b1 c1
b2 c2
b3 c1
Anomalies
Removed
Removed
Removed
An Illustration
Tokenized Conceptual Diagrams
3NF
B
Step 1
A
B
B
C
Step 4
A
C
Normalization
Procedure
Normalized Relation
Unnormalized Relation
Tokenized Data Tables
Step 2
A
a1
a2
a3
a4
a5
B
b1
b2
b1
b3
b2
C
c1
c2
c1
c1
c2
Modification Anomalies
Step 3
.Insertion b4 c3
.Deletion row a4
.Update b1 c1 to b1 c5 or
a3 b1 to a3 b2
Step 5
Corresponding
Projection
Step 6
A B
a1 b1
a2 b2
a3 b1
a4 b3
a5 b2
Modification
• Insertion
• Deletion
• Update
B
C
b1 c1
b2 c2
b3 c1
Anomalies
Removed
Removed
Removed
An Illustration
Tokenized Conceptual Diagrams
3NF
B
Step 1
A
B
B
C
Step 4
A
C
Normalization
Procedure
Normalized Relation
Unnormalized Relation
Tokenized Data Tables
Step 2
A
a1
a2
a3
a4
a5
B
b1
b2
b1
b3
b2
C
c1
c2
c1
c1
c2
Modification Anomalies
Step 3
.Insertion b4 c3
.Deletion row a4
.Update b1 c1 to b1 c5 or
a3 b1 to a3 b2
Step 5
Corresponding
Projection
Step 6
A B
a1 b1
a2 b2
a3 b1
a4 b3
a5 b2
Modification
• Insertion
• Deletion
• Update
B
C
b1 c1
b2 c2
b3 c1
Anomalies
Removed
Removed
Removed
An Illustration
Tokenized Conceptual Diagrams
3NF
B
Step 1
A
B
B
C
Step 4
A
C
Normalization
Procedure
Normalized Relation
Unnormalized Relation
Tokenized Data Tables
Step 2
A
a1
a2
a3
a4
a5
B
b1
b2
b1
b3
b2
C
c1
c2
c1
c1
c2
Modification Anomalies
Step 3
.Insertion b4 c3
.Deletion row a4
.Update b1 c1 to b1 c5 or
a3 b1 to a3 b2
Step 5
Corresponding
Projection
Step 6
A B
a1 b1
a2 b2
a3 b1
a4 b3
a5 b2
Modification
• Insertion
• Deletion
• Update
B
C
b1 c1
b2 c2
b3 c1
Anomalies
Removed
Removed
Removed
An Illustration
Tokenized Conceptual Diagrams
3NF
B
Step 1
A
B
B
C
Step 4
A
C
Normalization
Procedure
Normalized Relation
Unnormalized Relation
Tokenized Data Tables
Step 2
A
a1
a2
a3
a4
a5
B
b1
b2
b1
b3
b2
C
c1
c2
c1
c1
c2
Modification Anomalies
Step 3
.Insertion b4 c3
.Deletion row a4
.Update b1 c1 to b1 c5 or
a3 b1 to a3 b2
Step 5
Corresponding
Projection
Step 6
A B
a1 b1
a2 b2
a3 b1
a4 b3
a5 b2
Modification
• Insertion
• Deletion
• Update
B
C
b1 c1
b2 c2
b3 c1
Anomalies
Removed
Removed
Removed
An Illustration
Tokenized Conceptual Diagrams
3NF
B
Step 1
A
B
B
C
Step 4
A
C
Normalization
Procedure
Normalized Relation
Unnormalized Relation
Tokenized Data Tables
Step 2
A
a1
a2
a3
a4
a5
B
b1
b2
b1
b3
b2
C
c1
c2
c1
c1
c2
Modification Anomalies
Step 3
.Insertion b4 c3
.Deletion row a4
.Update b1 c1 to b1 c5 or
a3 b1 to a3 b2
Step 5
Corresponding
Projection
Step 6
A B
a1 b1
a2 b2
a3 b1
a4 b3
a5 b2
Modification
• Insertion
• Deletion
• Update
B
C
b1 c1
b2 c2
b3 c1
Anomalies
Removed
Removed
Removed
Guidelines for locating
Modification Anomalies
• Insertion
– of a “small” entity produces a KEY problem for
the “large” entity (violates Codd’s Rule 3)
• Deletion
– of a “large” entity loses a “small” entity (loss of
data integrity)
• Updates
– to the “small” entity need to be performed in
several places (violates Codd’s Rule 2)
2NF (Steps 1 and 2)
Step 1- Initial Design
Conceptual Diagram
A
B
}
C
D
Step 2- Create faithful table state
Tokenized Data Table
A
B
C
D
a1
b1
c1
d1
a1
b2
c2
d2
a2
b1
c1
d1
a2
b2
c3
d2
a3
b3
c2
d1
Problems?
Building the table—is it a faithful representation of the determinants?
2NF
A
B
}
Modification Anomalies
Insert: b4  d3
Deletion: row (a3, b3)
Update: b1  d1
to
b1  d4
C
Step 3-Find insertion anomaly
D
Tokenized Data Table
A
B
C
D
a1
b1
c1
d1
a1
b2
c2
d2
a2
b1
c1
d1
a2
b2
c3
d2
a3
b3
c2
d1
null
b4
null
d3
Can’t have null in Primary Key
2NF
A
B
}
Modification Anomalies
Insert: b4  d3
Deletion: row (a3, b3)
Update: b1  d1
to
b1  d4
C
Step 3-FindStep
deletion
3
anomaly
D
Tokenized Data Table
A
B
C
D
a1
b1
c1
d1
a1
b2
c2
d2
a2
b1
c1
d1
a2
b2
c3
d2
a3
b3
c2
d1
Lose information b3  d1
2NF
A
B
}
Modification Anomalies
Insert: b4  d3
Deletion: row (a3, b3)
Update: b1  d1
to
b1  d4
C
Step 3-Find
Step
update
3
anomaly
D
Tokenized Data Table
A
B
C
D
a1
b1
c1
d1
d4
a1
b2
c2
d2
a2
b1
c1
d1
a2
b2
c3
d2
a3
b3
c2
d1
You have to change it here too.
2NF
Step 4-Remove partial key dependency (Normalize to 2NF)
A
B
}
C
A
B
D
Pull out
Then
thenon-partial
partial dependency
dependency
B
}
}
(Child)
C
(Foreign Key)
(Parent)
D
2NF
Step 5-Build new tables from original with SQL (Projection/distinct rows)
(Child)
A
a1
a1
a2
a2
a3
B
b1
b2
b1
b2
b3
C
c1
c2
c1
c3
c2
D
d1
d2
d1
d2
d1
A
a1
a1
a2
a2
a3
B
b1
b2
b1
b2
b3
C
c1
c2
c1
c3
c2
(Parent)
B
b1
b2
b3
D
d1
d2
d1
2NF
Step 5-Verify removal of insertion anomaly
A
a1
Modification Anomalies a1
a2
Insert: b4  d3
Deletion: row (a3, b3) a2
Update: b1  d1
a3
to
b1  d4
B
b1
b2
b1
b2
b3
C
c1
c2
c1
c3
c2
B
b1
b2
b3
b4
D
d1
d2
d1
d3
a value for A is not required—not
an attribute for this table!
2NF
Step 5-Verify removal of deletion anomaly
Modification Anomalies
Insert: b4  d3
Deletion: row (a3, b3)
Update: b1  d1
to
b1  d4
A
a1
a1
a2
a2
a3
B
b1
b2
b1
b2
b3
C
c1
c2
c1
c3
c2
B
b1
b2
b3
b4
D
d1
d2
d1
d3
No loss of information: b3  d1
2NF
Step 5-Verify removal of update anomaly
Modification Anomalies
Insert: b4  d3
Deletion: row (a3, b3)
Update: b1  d1
to
b1  d4
A
a1
a1
a2
a2
a3
B
b1
b2
b1
b2
b3
C
c1
c2
c1
c3
c2
B
b1
b2
b3
b4
D
d4
d2
d1
d3
Only one place to change value
3NF
Conceptual Diagram
A
}
B
C
(Transitive Relationship)
Tokenized Data Table
A
B
C
a1
b1
c1
a2
b2
c2
a3
b1
c1
a4
b3
c1
a5
b2
c2
We’re building the table—does it capture what could happen over time?
3NF
A
}
B
C
Modification Anomalies
Insert: b4  c3
Deletion: row a4
Update: (b1  c1) to (b1  c5)
OR
(a3  b1) to (a3  b2)
Can’t have null in Primary Key
Tokenized Data Table
A
B
C
a1
b1
c1
a2
b2
c2
a3
b1
c1
a4
b3
c1
a5
b2
c2
null
b4
c3
3NF
A
}
B
Tokenized Data Table
C
Modification Anomalies
Insert: b4  c3
Deletion: row a4
Update: (b1  c1) to (b1  c5)
OR
(a3  b1) to (a3  b2)
A
B
C
a1
b1
c1
a2
b2
c2
a3
b1
c1
a4
b3
c1
a5
b2
c2
Lose information b3  c1
3NF
A
}
B
Tokenized Data Table
C
Modification Anomalies
Insert: b4  c3
Deletion: row a4
Update: (b1  c1) to (b1  c5)
OR
(a3  b1) to (a3  b2)
A
B
C
a1
b1
c5
c1
a2
b2
c2
a3
b1
c1
a4
b3
c1
a5
b2
c2
You have to change it here too.
3NF
A
}
B
Tokenized Data Table
C
Modification Anomalies
A = SID
B = Building
C = Fee
Insert: b4  c3
Deletion: row a4
Update: (b1  c1) to (b1  c5)
OR
(a3  b1) to (a3  b2)
What about this one?
A
B
C
a1
b1
c1
a2
b2
c2
a3
b1
b2
c1
a4
b3
c1
a5
b2
c2
change
Or c2 to
c1 c1?
to c2?
3NF
Normalize to 3NF
A
}
B
Move transitive dependency
to new Parent table
B
C
(Transitive Relationship)
A
}
}
C
(Foreign Key)
B
3NF
Normalize to 3NF
Projection/distinct row
A
B
C
a1
b1
c1
a2
b2
c2
a3
b1
c1
a4
b3
c1
a5
b2
c2
A
B
B
C
a1
b1
b1
c1
a2
b2
b2
c2
a3
b1
b3
c1
a4
b3
a5
b2
Assignment:
Verify that
anomalies were
removed!
BCNF
Conceptual Diagram
A
B
}
C
D
“D” is a determinant but not
a candidate key
Tokenized Data Table
A
B
C
D
a1
b1
c1
d1
a2
b1
c2
d1
a1
b2
c2
d2
a2
b2
c1
d2
a3
b1
c1
d3
We’re building the table—does it capture what could happen over time?
BCNF
A
B
}
C
D
Modification Anomalies
Insert: d4  b3
Deletion: row (a3, b1)
Update: (d1  b1) to (d1  b2)
OR
(a2, b1)  d1 to (a2, b1)  d2
Can’t have null in Primary Key
Tokenized Data Table
A
a1
a2
a1
a2
a3
B
b1
b1
b2
b2
b1
C
c1
c2
c2
c1
c1
D
d1
d1
d2
d2
d3
null
b3
null
d4
BCNF
A
B
}
C
D
Tokenized Data Table
Modification Anomalies
Insert: d4  b3
Deletion: row (a3, b1)
Update: (d1  b1) to (d1  b2)
OR
(a2, b1)  d1 to (a2, b1)  d2
A
a1
a2
a1
a2
a3
B
b1
b1
b2
b2
b1
C
c1
c2
c2
c1
c1
D
d1
d1
d2
d2
d3
Lose information d3  b1
BCNF
A
B
}
C
D
Modification Anomalies
Insert: d4  b3
Deletion: row (a3, b1)
Update: (d1  b1) to (d1  b2)
OR
(a2, b1)  d1 to (a2, b1)  d2
Tokenized Data Table
A
a1
a2
a1
a2
a3
B
b1
b1
b2
b2
b1
C
c1
c2
c2
c1
c1
D
d1
d1
d2
d2
d3
You have to change it here too.
BCNF
A
B
}
C
D
Tokenized Data Table
Modification Anomalies
Insert: d4  b3
Deletion: row (a3, b1)
Update: (d1  b1) to (d1  b2)
OR
(a2, b1)  d1 to (a2, b1)  d2
Hmm....
A
a1
a2
a1
a2
a3
B
b1
b1
b2
b2
b1
C
c1
c2
c2
c1
c1
D
d1
d1
d2
d2
d3
BCNF
Normalized to BCNF
A
B
}
C
D
D
A
D
}
}
B
C
BCNF
Normalized to BCNF
Projection/distinct row
A
a1
a2
a1
a2
a3
B
b1
b1
b2
b2
b1
C
c1
c2
c2
c1
c3
D
d1
d1
d2
d2
d3
A
a1
a2
a1
a2
a3
C
c1
c2
c2
c1
c3
D
d1
d1
d2
d2
d3
D
d1
d2
d3
B
b1
b2
b1
Assignment:
Verify that
anomalies were
removed!
BCNF
A
B
C
D
A
a1
a1
a1
a2
A2
a2
Advisor can only advise on one project
A project can have multiple advisors
A student can be on multiple projects
B
b1
b2
b3
b4
b1
b2
C
c1
c1
c3
c4
c5
c6
D
d1
d1
d2
d3
d3
d4
BCNF
Conceptual Diagram
B
A
}
C
D
“D” is a determinant but not
a candidate key
Tokenized Data Table
A
a1
a1
a1
a2
A2
a2
B
b1
b2
b3
b4
b1
b2
C
c1
c1
c3
c4
c5
c6
D
d1
d1
d2
d3
d3
d4
BCNF
Normalized to BCNF
B
A
}
C
D
B
D
D
}
}
C
A
BCNF
Normalized to BCNF
B
D
D
}
}
C
A
BCNF
Normalized to BCNF
B
D
D
}
}
C
A
BCNF
A
B
C
D
Epilog
• BCNF Revisited
• Related Activities
• Reflections
–
–
–
–
Data Redundancy vs Data Replication
Learning about Modification Anomalies
Past … Future
A Hypothesis?
Another way to understand
BCNF
•
I always had heartburn about BCNF…
•
So I searched for another approach…my approach
–
–
authors gave a definition involving “candidate keys” but
I never found an author that demonstrated how to use a
candidate key in the normalization step.
1. Find another candidate key (I believe that this is why authors
shied away from this approach because you have to “prove” it
is a key!)
2. Revise the diagram using the newly found candidate key
3. Ask the question: Can this diagram be normalized using known
normalization steps?
The process is illustrated on the next two slides…
BCNF--Revisited
NOT INTUITIVE!
Definition?... Candidate keys?
B
A
}
B
C
D
D
D
}
}
C
A
BCNF—Revise diagram using a different key!
Need to show that B, D is a
candidate key!
B
A
}
C
B
D
D
}
C
A
BCNF—Revise diagram using a different key!
B
A
}
C
B
D
D
}
Assert: B, D is a Key (Uses Armstrong’s Axioms—look it up!)
1.
Given: A,B  C, D  A
2. B  B, D  D (Reflexive Property)
3. D  A (Given)
4. B, D  B, A (3 & 4)
5. B, A  C (Given)
6. B, D  C (4 & 5 & Transitive property)
 B,D is a Key
 diagram on top left can be rewritten as the diagram on the
top right!
C
A
Here’s the
proof… for
those of you
who are
interested!

This diagram shows a partial
key dependency
BCNF—Normalize to 2NF
B
A
}
C
B
D
D
}
C
A
Normalize
to 2NF!
Does this help you
better understand
the original
normalization step?
B
D
D
}
}
C
A
Related Activities
• SQL exercises
– write SQL statements to create new (normalized) tables
from original table
– drop the original table
– original table can be made into a view from the new
tables (Why bother?)
• E-R diagrams
– supplements the impact on normal forms
– Establishes associations among newly created objects
Reflections: Data Redundancy vs Data
Replication
• Data Redundancy: unnecessary data
Replication
– Applying a normal forms (bottom-up),
transforms relationships among attributes into
relationships among objects
– For the relational model, data Replication of
Foreign Keys (FK) is necessary to retain the
original relationship among attributes … through
the new relationships among the objects
created as a consequence of a normalization
process
Reflections: Learning about
Modification Anomalies
• Can the relationship between the
modification anomalies and design
flaws be more clearly examined (and
learned…) by abstracting the
relationship?
• This is an empirical question.
Reflections: Past … Future
• Past—Windows-Based tool (Hari Munikar)
– permitted students to
• construct a tokenized table (in 1NF)
• find a key for the table
• normalize in steps to 3NF
– desired functionality
•
•
•
•
construct context-based table
automatic conversion to tokenized form
extend to BCNF
construct E-R diagram from resulting diagram
• Evaluation of the approach: Is it effective?
– An open research project…Anyone interested?
Reflections: A Hypothesis?
• Premises:
– students will benefit from an
examination that focuses on the
underlying principles because of problem
abstraction, solution generalization, and
verification
– Context-based examples can be a source
of confusion
Download