Data Normalization and Denormalization

advertisement
Normalization
A337
Structure

What is a database?
◦ Tables of information
 Rows are referred to as records
 Columns are referred to as fields or attributes
 Record identifier is referred to as a record key

Types
◦ Relational - Most common, Object-Oriented
◦ Hierarchical, Network (much older types)
A337 - Reed Smith
2
Database structure

Two approaches to the structure issue:
◦ Conceptual (you start with the question of “what
information should I have?”)
 ERD from “scratch”
◦ Empirical (you already know what data there will be you just want to organize it into tables) –
 NORMALIZATION
A337 - Reed Smith
3
Database Tables and Normalization

Normalization - Process for evaluating and
correcting table structures to minimize data
redundancies
◦ Works through a series of stages called normal
forms:
 Normal form (1NF)
 Second normal form (2NF)
 Third normal form (3NF)
◦ There are higher forms but are rarely necessary
Normalization

Why?
◦ Data structures need to:
 Minimize redundancy
 Avoid insertion, update, and deletion anomalies

How?
◦ Restructure information such that:
 Only flat (rectangular) files exist (1st normal form) – No Nulls
 All items in each record depend upon) the primary record key (2nd
normal form) – No Partial Dependencies
 If a field depends upon another then the “other” must be a primary
key (3rd normal form) – No Transitive Dependencies
A337 - Reed Smith
5
Normalize the following table:
SALES_ORDERS
SO_Number
1010
Item_Number
Item_Name
Qty_Ordered
Cust_Code
Cust_Name
2010-0050
Formed Handlebar
2 WHEEL
Wheelaway Cycle Center
1000-1
20 in. Bicycle
5 WHEEL
Wheelaway Cycle Center
1002-1
24 in. Bicycle
5 ETC
Bikes Et Cetera
1001-1
26 in. Bicycle
1003-1
20 in. Bicycle
1001-1
26 in. Bicycle
10
WHEEL
Wheelaway Cycle Center
1013
1001-1
26 in. Bicycle
50
IBS
Inter. Bicycle Sales
1014
1003-1
20 in. Bicycle
25
ETC
Bikes Et Cetera
1015
1003-1
20 in. Bicycle
25
WHEEL
Wheelaway Cycle Center
1016
3961-1041
Tire Tube, 26 in.
3965-1050
Spoke Reflector
1003-1
1000-1
1011
1012
10
ETC
5 WHEEL
5 ETC
50
Bikes Et Cetera
Wheelaway Cycle Center
Bikes Et Cetera
ETC
Bikes Et Cetera
20 in. Bicycle
5 ETC
Bikes Et Cetera
20 in. Bicycle
4 ETC
Bikes Et Cetera
A337 - Reed Smith
6
What is wrong with this solution?
A337 - Reed Smith
7
First Normal Form

Eliminate Nulls/Repeating Groups –
◦ Eliminate repeating groups by eliminating nulls,
filling in cells with implied values with actual
values

Select a primary key
◦ may be a composite key
1NF:
SALES_ORDERS
SO_Number
Item_Number
Item_Name
Qty_Ordered
Cust_Code
Cust_Name
1010
2010-0050
Formed Handlebar
2 WHEEL
Wheelaway Cycle Center
1010
1000-1
20 in. Bicycle
5 WHEEL
Wheelaway Cycle Center
1011
1002-1
24 in. Bicycle
5 ETC
Bikes Et Cetera
1011
1001-1
26 in. Bicycle
1012
1003-1
20 in. Bicycle
1012
1001-1
26 in. Bicycle
10
WHEEL
Wheelaway Cycle Center
1013
1001-1
26 in. Bicycle
50
IBS
Inter. Bicycle Sales
1014
1003-1
20 in. Bicycle
25
ETC
Bikes Et Cetera
1015
1003-1
20 in. Bicycle
25
WHEEL
Wheelaway Cycle Center
1016
3961-1041
Tire Tube, 26 in.
1016
3965-1050
Spoke Reflector
1016
1003-1
1016
1000-1
10
ETC
5 WHEEL
5 ETC
50
Bikes Et Cetera
Wheelaway Cycle Center
Bikes Et Cetera
ETC
Bikes Et Cetera
20 in. Bicycle
5 ETC
Bikes Et Cetera
20 in. Bicycle
4 ETC
Bikes Et Cetera
A337 - Reed Smith
9
What is wrong with this solution?

Partial Dependencies
◦ For example, the sales order number is not
relevant in the determination of the item
name
◦ Similarly, the customer code and customer
name do not depend upon the Item ID, they
only depend upon the sales order number.
A337 - Reed Smith
10
Second Normal Form

Eliminate Partial Dependencies
◦ Write each key component on separate line, and then
write the original (composite) key on the last line
◦ Each component will become the key in a new table

Identify the Dependent Attributes
◦ Determine which attributes are dependent on which
other attributes
Creating 3 Tables

SO_Number, Item_Number

SO_Number,

Item_Number
2NF:
INVENTORY_ITEMS
Item _ Nu m ber
SAL ES_ORDER line ite m INVENTORY
SO_Numb er
Item_ Nu mber
Qty_Ord ered
10 10
20 10 -0 050
2
10 10
10 00 -1
5
10 11
10 02 -1
5
10 11
10 01 -1
10
10 12
10 03 -1
5
10 12
10 01 -1
10
10 13
10 01 -1
50
10 14
10 03 -1
25
10 15
10 03 -1
25
10 16
39 61 -1 041
5
10 16
39 65 -1 050
50
10 16
10 03 -1
5
10 16
10 00 -1
4
Item _ Na m e
10 00 -1
20 in. Bicycle
10 01 -1
26 in. Bicycle
10 02 -1
24 in. Bicycle
10 03 -1
20 in. Bicycle
10 03 -1
20 in. Bicycle
20 10 -0 050
Fo rm e d Hand leb ar
39 61 -1 041
Tire Tube , 26 in .
39 65 -1 050
Spo ke Reflector
SAL ES_ORDERS
SO_Numb er
Cust_ Co de
Cust_ Na me
10 10
WHEEL
Whe elawa y Cycle Cen ter
10 11
ETC
Bikes Et Cete ra
10 12
WHEEL
Whe elawa y Cycle Cen ter
10 13
IBS
Inte r. Bicycle Sale s
10 14
ETC
Bikes Et Cete ra
10 15
WHEEL
Whe elawa y Cycle Cen ter
10 16
ETC
Bikes Et Cete ra
A337 - Reed Smith
13
What is wrong with this solution?

Transitive Dependencies
◦ Notice that the third column of the Sales Orders
file has the Customer name and that depends
upon the customer number.
◦ But the customer number is not the primary key
A337 - Reed Smith
14
Third Normal Form
1.
For every transitive dependency, write its
determinant as a PK for a new table
2.
Identify the attributes dependent on each
determinant identified in Step 1 and identify
the dependency
3.
Remove the dependent attributes in
transitive relationship(s) from each table that
has such a transitive relationship
Database Systems: Design, Implementation,
& Management, 6th Edition, Rob & Coronel
3NF:
INVENTORY_ITEMS
Item _ Nu m ber
SAL ES_ORDER line ite m INVENTORY
SO_Numb er
Item_ Nu mber
Qty_Ord ered
10 10
20 10 -0 050
2
10 10
10 00 -1
5
10 11
10 02 -1
5
10 11
10 01 -1
10
10 12
10 03 -1
5
10 12
10 01 -1
10
10 10
WHEEL
10 13
10 01 -1
50
10 11
ETC
10 14
10 03 -1
25
10 12
WHEEL
10 15
10 03 -1
25
10 16
39 61 -1 041
5
10 13
IBS
10 16
39 65 -1 050
50
10 14
ETC
10 16
10 03 -1
5
10 15
WHEEL
10 16
10 00 -1
4
10 16
ETC
Item _ Na m e
10 00 -1
20 in. Bicycle
10 01 -1
26 in. Bicycle
10 02 -1
24 in. Bicycle
10 03 -1
20 in. Bicycle
10 03 -1
20 in. Bicycle
20 10 -0 050
Fo rm e d Hand leb ar
39 61 -1 041
Tire Tube , 26 in .
39 65 -1 050
Spo ke Reflector
SAL ES_ORDERS
SO_Numb er
Cust_ Co de
CUSTOMERS
Cust_ Co de
Cust_ Na me
ETC
Bikes Et Cete ra
IBS
Inte r. Bicycle Sales
WHEEL
Whe elawa y Cycle Cen ter
A337 - Reed Smith
16
Denormalization

Creation of normalized relations is
important database design goal

Processing requirements should also be a
goal

If tables decomposed to conform to
normalization requirements
◦ Number of database tables expands
17
Denormalization (continued)

Joining larger number of tables takes additional
disk input/output (I/O) operations and
processing logic
◦ Reduces system speed

Conflicts among design efficiency, information
requirements, and processing speed are often
resolved through compromises that may
include denormalization
A337
Database Systems: Design, Implementation, & Management, 6th
Edition, Rob & Coronel
18
Download