Normalization for Logical Design

advertisement
IT 20303
• The Relational DBMS
• Section 05
Relational Database Theory
• Normalization for Logical Database
Design
Relational Database Theory
• Normalization
– Process of analyzing a grouping of
data items
• Based on inherent characteristics
• Often applied to existing files or
databases
Relational Database Theory
• Normalization
– Principles
• Data items belong together in a
logical group
• Group of items can be identified
by own unique identifier
Relational Database Theory
• Normalization
– Data in the group describes one,
and only one, thing
– A Bottom-Up approach
Relational Database Theory
• Why Normalize
– Avoid update anomalies
• Nasty side effects
– Minimize storage of redundant data
– Support simpler logic for
manipulating data
Relational Database Theory
• Why Not Normalize
– Data is never (very rarely) updated
– Data warehouse system is seldom
normalized
Relational Database Theory
• Sample Data Not Normalized
WARD
NAME
WARD TYPE
NO. OF
BEDS
SENIOR
NURSE
PATINET
NO
PATIENT
NAME
DATE OF
BIRTH
Liston
Orthopedic
6
J Bryan
45812
D Carter
21/02/65
71384
R Willis
08/10/46
69355
G Barnes
17/06/41
52217
M Brown
21/02/35
10823
R Willis
12/03/54
Godlee
General
10
V Fox
Relational Database Theory
• How to Normalize Data using Functional
Dependencies
– Definition of Functional Dependency
• Given a relation R, attribute Y of R is
functionally dependent on attribute X of
R, if and only if each X value in R has
associated with it precisely one Y-value in
R (at any one time)
Relational Database Theory
• Y of R is Dependent on X of R
• X (-->)functionally determines Y
X
Y
WARD
NAME
WARD TYPE
NO. OF
BEDS
SENIOR
NURSE
PATINET
NO
PATIENT
NAME
DATE OF
BIRTH
Liston
Orthopedic
6
J Bryan
45812
D Carter
21/02/65
Liston
Orthopedic
6
J Bryan
71384
R Willis
08/10/46
Liston
Orthopedic
6
J Bryan
69355
G Barnes
17/06/41
Godlee
General
10
V Fox
52217
M Brown
21/02/35
Godlee
General
10
V Fox
10823
R Willis
12/03/54
Relational Database Theory
• Functional Dependency Diagram of
Hospital Ward Example
Patient Name
Patient No
Date of Birth
Ward Type
No of Beds
Ward Name
Senior Nurse
Relational Database Theory
• Table structure based on FD Diagram
WARD
WARD
NAME
WARD
TYPE
NO OF
BEDS
SENIOR
NURSE
Liston
Orthopedic
6
J Bryan
Godlee
General
10
V Fox
PATIENT
NO
PATIENT
NAME
DATE OF
BIRTH
WARD NAME
45812
D Carter
21/2/65
Liston
71384
R Willis
8/10/46
Liston
52217
M Brown
21/2/85
Godlee
PATIENT
Relational Database Theory
• Normalization using Codd’s Rules
– Origin
• Early enthusiasts wanted to use relational
theory
• Sought rules for structuring data in
relational model
Relational Database Theory
• Normalization using Codd’s Rules
– Codd and contemporaries developed rules for
“Normal Forms”
• 1NF
• 2NF
• 3NF
– Normal levels to do in database design
• Boyce/Codd NF – 3.5NF
• 4NF
• 5NF
Relational Database Theory
• Customer-Order-Line Item Example
– Assume an existing order-entry program
and data file:
ORD-NO
CUST-NO
TOT-ITEM-PRC
CUSTNAME
…
ADDR
PROD-NO
PROD-NO
PRODNAME
PRODNAME
UNIT-PRC
UNIT-PRC
QTY
QTY
TOT-ITEMPRC
Relational Database Theory
• 1NF – Break out repeating groups
ORDER
ORD-NO
CUST-NO
TOT-ITEM-PRC
CUSTNAME
ADDR
PROD-NO
…
PROD-NO
PRODNAME
CUST-NO
CUSTNAME
ADDR
PRODNAME
UNIT-PRC
UNIT-PRC
QTY
QTY
TOT-ITEMPRC
ORDER
ORD-NO
LINEITEM
ORD-NO
PROD-NO
PROD-NAME
UNIT-PRC
QTY
TOT-ITEM-PRC
ORD-NO
PROD-NO
PROD-NAME
UNIT-PRC
QTY
TOT-ITEM-PRC
Relational Database Theory
• 2NF- Break out attributes dependent on part of
the primary key
LINEITEM
ORD-NO
PROD-NO
PROD-NAME
UNIT-PRC
PROD-NO
QTY
TOT-ITEM-PRC
QTY
LINEITEM
ORD-NO
PRODUCT
PROD-NO
PROD-NAME
ORDER
ORD-NO
CUST-NO
CUSTNAME
ADDR
UNIT-PRC
TOT-ITEM-PRC
Relational Database Theory
• 3NF- Break out attributes wholly dependent on
another key
ORDER
ORD-NO
CUST-NO
CUSTNAME
ADDR
CUSTOMER
CUST-NO
ORDER
ORD-NO
CUSTNAME
CUST-NO
LINEITEM
ORD-NO
PROD-NO
QTY
PRODUCT
PROD-NO
PROD-NAME
UNIT-PRC
TOT-ITEM-PRC
ADDR
Relational Database Theory
• Rules for 1NF, 2NF, & 3NF
– 1NF
• Break out repeating groups into a separate entity
– 2NF
• Break out attributes that are dependent on part
of the primary key into a separate entity
• Called Partial Dependency
– 3NF
• Break out attributes that are wholly dependent
on another key (not PK) into a separate entity
• Called Transitive Dependency
Relational Database Theory
• Normalization
– A relation R is in 3rd Normal Form (3NF) if
and only if the non-key attributes of R (if
any) are:
• Mutually independent, and
• Fully dependent on the primary key of R
Relational Database Theory
• Normalization Cont’d
– A relation is in 3NF if all the attributes are
functionally dependent
• On the Key
• On the Whole Key, and
• On Nothing but the Key
–(So Help Me Codd)
Relational Database Theory
• Reconcile differences between the Data Model and
Normalized Data Structures
– Data model and normalized data structures must be
reconciled
– Discard data items from old files that are no longer
needed
• Calculation fields
• Redundant fields
– Resolve discrepancies in data item names
– Ensure that new fields are really necessary
• Use standard naming conventions
Relational Database Theory
• Example 01:
PART-NO
SUPP-1
SUPP-2
SUPP-3
SUPP-4
WDGT01
XYZZY
FOOBAR
NULL
NULL
– What happens when a part has more than four
suppliers?
– What happens when a supplier is dropped?
– How do you query the parts with two or more
suppliers?
– Normalized Table:
PART-NO
SUPP
WDGT01
XYZZY
WDGT01
FOOBAR
Relational Database Theory
• Example 02: Normalize this table
PART-NO
SUPP
PART_DESC SUPP-ADDRESS
WDGT01
XYZZY
Blue Widget
123 Bluejay Way
WDGT01
FOOBAR
Blue Widget
544 Old Orchard
PART-NO
SUPP
SUPP
SUPP-ADDRESS
PART-NO
PART_DESC
WDGT01
XYZZY
XYZZY
123 Bluejay Way
WDGT01
Blue Widget
WDGT01
FOOBAR
FOOBAR
544 Old Orchard
End Section
Relational Database Theory
• Multiply ways to Normalize Data
– Normalization can be accomplished in different
ways
• Well-formed E-R model is normalized
• Functional dependencies
• Codd’s Rules for 1NF, 2NF, & 3NF
– Discrepancies indicate something is missing or
changed
– One approach validates or checks another
approach
Relational Database Theory
• Impact of Normalization
– Improve the integrity of data
• Purpose is to eliminate update anomalies
– Minimize storage of redundant data
– Reduce the complexity of programming logic
• Emphasis now is on maintainability, simplicity of program
• Normalized data can minimize complexity of code that
manipulates the data
– Enhance the stability, “goodness” of database design
• Normalized data tends to be easier to understand
• Normalized data can be used by many different
applications more easily
Relational Database Theory
• Impact of Normalization on Performance
– Concern that a large number of tables-and
table joins-will result in poor performance
• Join can be a very expensive operation
• Test to determine frequency of joins,
number of tables joined
–After database is created and
available
Relational Database Theory
• Impact of Normalization on Performance
Cont’d
– Requirements for application performance,
response time dictate corrective actions
– Performance addressed in section on
physical database design
• There are alternatives to de-normalizing
data to improve performance
Relational Database Theory
• Recommendations for Data that is Updated
– First Normalize
– Don’t be dismayed by too many tables
• Normalization increases number of tables
but improves logic
– Normalization is a helpful logical database
design technique…for any DBMS
Relational Database Theory
• Objective of the design process is a “Good”
design
– The logical database design process
• Is well understood
• Uses complementary techniques
• Can be automated with CASE tools
Relational Database Theory
• Objective of the design process is a “Good” design
cont’d
– A “Good” database design
• Contains all the important entities and data items
• Has stable primary keys
• Identifies clearly all relationships
• Has table structures in 3NF
• Is understood by designers and users
• Accurately models the real world, as described
in the requirements
Relational Database Theory
• Questions?
Download