normalforms

advertisement
Normalization
• Also called “loss-less decomposition”
• Process of optimizing table structures to
eliminate redundancy and avoid anomalies
and problems with extensibility.
• Supports the golden rule: Each fact should
be stored in the database only once.
• Does not provide the solution to all design
problems but provides a solid foundation.
Normal Forms
•
•
•
•
•
•
•
1st Normal Form
2nd Normal Form
3rd Normal Form
BCNF
4th Normal Form
5th Normal Form
Domain-Key Normal Form
1st Normal Form
First Normal Form is violated if:
• The relation has no identifiable primary key.
• Any attempt has been made to store a multivalued fact in a tuple.
1st NF - Example
Evaluate the design solutions on the next four slides for:
•
•
•
•
•
Query-ability
Join-ability
Constrain-ability
Extensibility (of Language Domain)
Extensibility (of Schema)
1NF Example – Schema 1 (correct)
Programs Table
Employees Table
EMPID LANGUAGE
EMPID LNAME
23
Jones
25
Smith
26
Billings
31
Dance
32
Jones
35
Barker
36
Woods
37
Jones
23
23
23
31
32
32
32
32
36
36
36
37
37
COBOL
JAVA
SQL
SQL
JAVA
SQL
VB
COBOL
VB
SQL
JAVA
COBOL
SQL
FNAME SEX DEPT
Mark
M
ITR
Sara
F FINC
David
M ACTG
Ivanna
F ACTG
Mary
F
ITR
Bob
M ACTG
Robin
M
ITR
Mary
F FINC
PHONE
555-1087
555-2222
555-4356
444-4887
555-8745
555-6565
555-9812
555-1234
Languages Table
NAME
COBOL
JAVA
SQL
VB
FULLNAME
COmmon Business Oriented Language
JAVA
Structured Query Language
Visual Basic
SALARY
45000
55000
42000
60000
70000
44000
90000
56000
1NF Example – Schema 2 (incorrect)
Employees Table
EMPID LNAME
23
Jones
25
Smith
26
Billings
31
Dance
32
Jones
35
Barker
36
Woods
37
Jones
FNAME SEX DEPT
Mark
M
ITR
Sara
F FINC
David
M ACTG
Ivanna
F ACTG
Mary
F
ITR
Bob
M ACTG
Robin
M
ITR
Mary
F FINC
PHONE
555-1087
555-2222
555-4356
444-4887
555-8745
555-6565
555-9812
555-1234
SALARY
45000
55000
42000
60000
70000
44000
90000
56000
LANGUAGES
COBOL, JAVA, SQL
SQL
JAVA, SQL, VB, COBOL
VB, SQL, JAVA
COBOL, SQL
Languages Table
NAME
COBOL
JAVA
SQL
VB
FULLNAME
COmmon Business Oriented Language
JAVA
Structured Query Language
Visual Basic
1NF Example – Schema 3 (incorrect)
Employees Table
EMPID LNAME
23
Jones
25
Smith
26
Billings
31
Dance
32
Jones
35
Barker
36
Woods
37
Jones
FNAME SEX DEPT
Mark
M
ITR
Sara
F FINC
David
M ACTG
Ivanna
F ACTG
Mary
F
ITR
Bob
M ACTG
Robin
M
ITR
Mary
F FINC
PHONE
555-1087
555-2222
555-4356
444-4887
555-8745
555-6565
555-9812
555-1234
SALARY LANG1
45000
COBOL
55000
42000
60000
SQL
70000
JAVA
44000
90000
VB
56000
COBOL
LANG2
LANG3
JAVA
SQL
SQL
VB
SQL
SQL
JAVA
Languages Table
NAME
COBOL
JAVA
SQL
VB
FULLNAME
COmmon Business Oriented Language
JAVA
Structured Query Language
Visual Basic
LANG4
COBOL
1NF Example – Schema 4 (incorrect)
Employees Table
EMPID LNAME
23
Jones
25
Smith
26
Billings
31
Dance
32
Jones
35
Barker
36
Woods
37
Jones
FNAME SEX DEPT
Mark
M
ITR
Sara
F FINC
David
M ACTG
Ivanna
F ACTG
Mary
F
ITR
Bob
M ACTG
Robin
M
ITR
Mary
F FINC
PHONE
555-1087
555-2222
555-4356
444-4887
555-8745
555-6565
555-9812
555-1234
SALARY
45000
55000
42000
60000
70000
44000
90000
56000
COBOL
JAVA
SQL
VB
T
F
F
F
T
F
F
T
T
F
F
F
T
F
T
F
T
F
F
T
T
F
T
T
F
F
F
F
T
F
T
F
Languages Table
NAME
COBOL
JAVA
SQL
VB
FULLNAME
COmmon Business Oriented Language
JAVA
Structured Query Language
Visual Basic
2nd Normal Form
Second Normal Form is violated if:
• First Normal Form is violated
• If there exists a non-key field(s) which is
functionally dependent on a partial key.
partial key
non-key
2NF Example – Raw Data
JE #1 02-JAN-2003
100 Cash
310 Smith-Capital
(owner investment)
JE #2 03-JAN-2003
100 Cash
220 Notes Payable
(borrowed money)
JE #3 03-JAN-2003
120 Supplies
100 Cash
220 Notes Payable
(purchased supplies)
20,000
20,000
30,000
30,000
5,000
1,000
4,000
2NF Example – Violation
Transactions Table
JENO LINENO
DATE
DESCRIPTION
1
1
02-JAN-2003 Owner investment
ACCTNO ACCTNAME AMOUNT
100
Cash
20,000
1
2
02-JAN-2003 Owner investment
310
Smith-Capital
2
1
03-JAN-2003 Borrowed money
100
Cash
2
2
03-JAN-2003 Borrowed money
220
Notes Payable
3
1
03-JAN-2003 Purchased Supplies
120
Supplies
3
2
03-JAN-2003 Purchased Supplies
100
Cash
(1,000)
3
3
03-JAN-2003 Purchased Supplies
220
Notes Payable
(4,000)
Is there a non-key field which is functional dependent
on a partial key?
(20,000)
30,000
(30,000)
5,000
2NF Example – Violation
FDs that indicate violation of 2NF
JENO LINENO
DATE
DESCRIPTION
1
1
02-JAN-2003 Owner investment
ACCTNO ACCTNAME AMOUNT
100
Cash
20,000
1
2
02-JAN-2003 Owner investment
310
Smith-Capital
(20,000)
2
1
03-JAN-2003 Borrowed money
100
Cash
2
2
03-JAN-2003 Borrowed money
220
Notes Payable
3
1
03-JAN-2003 Purchased Supplies
120
Supplies
3
2
03-JAN-2003 Purchased Supplies
100
Cash
(1,000)
3
3
03-JAN-2003 Purchased Supplies
220
Notes Payable
(4,000)
30,000
(30,000)
5,000
2NF Example – Corrected
Journal_Entry Table
JENO
DATE
DESCRIPTION
1
02-JAN-2003 Owner investment
2
03-JAN-2003 Borrowed money
3
03-JAN-2003 Purchased Supplies
Transactions Table
JENO LINENO ACCTNO ACCTNAME AMOUNT
1
1
100
Cash
20,000
1
2
310
Smith-Capital
(20,000)
2
1
100
Cash
2
2
220
Notes Payable
3
1
120
Supplies
3
2
100
Cash
(1,000)
3
3
220
Notes Payable
(4,000)
30,000
(30,000)
5,000
3rd Normal Form
Third Normal Form is violated if:
• Second Normal Form is violated
• If there exists a non-key field(s) which is
functionally dependent on another non-key
field(s).
non-key
non-key
Note: A candidate key is not a non-key field.
3NF Example – Violation
Journal_Entry Table
Are there any non-key
fields which functional
determine another non-key
field?
JENO
DATE
DESCRIPTION
1
02-JAN-2003 Owner investment
2
03-JAN-2003 Borrowed money
3
03-JAN-2003 Purchased Supplies
Transactions Table
JENO LINENO ACCTNO ACCTNAME AMOUNT
1
1
100
Cash
20,000
Are there any
redundant
facts?
1
2
310
Smith-Capital
(20,000)
2
1
100
Cash
2
2
220
Notes Payable
3
1
120
Supplies
3
2
100
Cash
(1,000)
3
3
220
Notes Payable
(4,000)
30,000
(30,000)
5,000
3NF Example – Violation
FD that indicates violation of 3NF
Journal_Entry Table
Anomalies if not corrected:
• update (if name of account
100 changes it must be
changed in multiple places
risking inconsistancy)
• deletion (can't delete JE#3
and its transactions without
losing information about
account 120)
• insertion (can't set up a
new account, Jones-capital,
for a new partner unless we
first have a transaction
involving that account.
JENO
DATE
DESCRIPTION
1
02-JAN-2003 Owner investment
2
03-JAN-2003 Borrowed money
3
03-JAN-2003 Purchased Supplies
JENO LINENO ACCTNO ACCTNAME AMOUNT
1
1
100
Cash
20,000
1
2
310
Smith-Capital
(20,000)
2
1
100
Cash
2
2
220
Notes Payable
3
1
120
Supplies
3
2
100
Cash
(1,000)
3
3
220
Notes Payable
(4,000)
30,000
(30,000)
5,000
3NF Example – Corrected
Journal_Entry Table
Accounts Table
ACCTNO ACCTNAME
100
Cash
120
Supplies
220
Notes Payable
310
Smith-Capital
JENO
DATE
DESCRIPTION
1
02-JAN-2003 Owner investment
2
03-JAN-2003 Borrowed money
3
03-JAN-2003 Purchased Supplies
Transactions Table
JENO LINENO ACCTNO AMOUNT
1
1
100
20,000
1
2
310
(20,000)
2
1
100
30,000
2
2
220
(30,000)
3
1
120
5,000
3
2
100
(1,000)
3
3
220
(4,000)
3NF Example – Corrected
Final Dependencies
ACCTNO ACCTNAME
100
Cash
JENO
DATE
DESCRIPTION
1
02-JAN-2003 Owner investment
120
Supplies
2
03-JAN-2003 Borrowed money
220
Notes Payable
3
03-JAN-2003 Purchased Supplies
310
Smith-Capital
All non-key fields
are FD on the PK
and only the PK.
JENO LINENO ACCTNO AMOUNT
1
1
100
20,000
1
2
310
(20,000)
2
1
100
30,000
2
2
220
(30,000)
3
1
120
5,000
3
2
100
(1,000)
3
3
220
(4,000)
BCNF Normal Form
Boyce-Codd Normal Form is violated if:
• Third Normal Form is violated
• If there exists a partial key which is
functionally dependent on a non-key
field(s).
non-key
partial-key
BCNF Example
Semantics
• A student can have more than one major
• A student has a different advisor for each
major.
• Each advisor advises for only one major.
BCNF Example – Violation
Student_Majors Table
SID
1
ADVISOR
MAJOR
PHYSICS
EINSTEIN
1
BIOLOGY
LIVINGSTON
2
PHYSICS
BOHR
2
COMPUTER SCIENCE
CODD
3
PHYSICS
EINSTEIN
4
BIOLOGY
LIVINGSTON
4
ACCOUNTING
PACIOLI
5
PHYSICS
EINSTEIN
6
PHYSICS
BOHR
6
BIOLOGY
DARWIN
7
COMPUTER SCIENCE
CODD
7
BIOLOGY
DARWIN
Does this relation violate third normal form?
Are there any redundant facts?
BCNF Example – Violation
FD that violates BCNF
It is important
that you convince
yourself that major
does not FD
advisor.
SID
1
ADVISOR
MAJOR
PHYSICS
EINSTEIN
1
BIOLOGY
LIVINGSTON
2
PHYSICS
BOHR
2
COMPUTER SCIENCE
CODD
3
PHYSICS
EINSTEIN
4
BIOLOGY
LIVINGSTON
4
ACCOUNTING
PACIOLI
5
PHYSICS
EINSTEIN
6
PHYSICS
BOHR
6
BIOLOGY
DARWIN
7
COMPUTER SCIENCE
CODD
7
BIOLOGY
DARWIN
BCNF Example – Corrected
Advisors Table
ADVISOR
MAJOR
BOHR
PHYSICS
CODD
COMPUTER SCIENCE
DARWIN
BIOLOGY
EINSTEIN
PHYSICS
SID
1
LIVINGSTON
BIOLOGY
1
LIVINGSTON
PACIOLI
ACCOUNTING
2
BOHR
2
CODD
3
EINSTEIN
4
LIVINGSTON
4
PACIOLI
5
EINSTEIN
6
BOHR
6
DARWIN
7
CODD
7
DARWIN
Note that the if the original key,
counter-intuitively, in schema 1
had been defined as SID & ADVISOR
this would have been a 2NF violation.
Student_Advisors Table
ADVISOR
EINSTEIN
4th Normal Form
4th Normal Form is violated if:
• Boyce Codd Normal Form is violated
• If there exists a partial key which has
multiple independent multi-valued
functional dependencies to other partial
keys.
partial-key1
partial-key2
partial-key3
4NF Example – Violation
Instruments_Languages
Name
Fred
Instrument
Piano
Language
French
Fred
Flute
Italian
Fred
Flute
Spanish
Jane
Piano
French
Jane
Oboe
French
Sam
Piano
French
Sam
Oboe
Spanish
Sam
Flute
Spanish
4NF Example – Violation
Name
Fred
Instrument
Piano
Language
French
Fred
Flute
Italian
Fred
Flute
Spanish
Jane
Piano
French
Jane
Oboe
French
Sam
Piano
French
Sam
Oboe
Spanish
Sam
Flute
Spanish
Does this relation violate 1st, 2nd, 3rd, or BCNF?
Are there any redundant facts?
4NF Example – Correction
LanguagesSpoken
InstrumentsPlayed
Name
Fred
Language
French
Name
Fred
Instrument
Piano
Fred
Italian
Fred
Flute
Fred
Spanish
Jane
Piano
Jane
French
Jane
Oboe
Sam
French
Sam
Piano
Sam
Spanish
Sam
Oboe
Sam
Flute
Download