NORMALIZATION

advertisement
NORMALIZATION
Relation P:
P#
PNAME COLOR WEIGHT
CITY
P1
Nut
Red
12.0
London
P2
Bolt
Green
17.0
Paris
P3
Screw
Blue
17.0
Rome
P4
Screw
Red
14.0
London
P5
Cam
Blue
12.0
Paris
P6
Cog
Red
19.0
London
Relation S:
S#
SNAME
STATUS
CITY
S1
Smith
20
London
S2
Jones
10
Paris
S3
Black
30
Paris
S4
Clerk
20
London
S5
Adams
30
Athens
Relation SP:
S#
S1
S1
P#
P1
P2
QTY
300
200
S1
S1
S1
P3
P4
P5
400
200
100
S1
S2
S2
P6
P1
P2
100
300
400
S3
S4
S4
P2
P2
P4
200
200
300
S4
P5
400
SCP
S#
CITY
P#
QTY
STATUS
S1
London
P1
100
20
S1
London
P2
100
20
S2
Paris
P1
200
10
S2
Paris
P2
200
10
S3
Paris
P2
300
10
S4
London
P2
400
20
S4
London
P4
400
20
S4
London
P5
400
20
• Problem with SCP is redundancy.
• Relation are always normalized so far as it has
legal values.
• Than we can say that relation are always
normalized or in first normal form “1NF”.
• A given relation might be normalized yet still
possess certain undesirable properties.
• The principal of further normalization allow us to
recognize such cases and to replace such
relation by ones that are more desirable in some
way.
• In the case of SCP, they would tell us how to
replace it by two more desirable relations
{s#,city,status} and {s#,p#,qty}.
Normalization
• Database normalization is the process of
organizing the fields and tables of a relational
database to minimize redundancy.
• Normalization usually involves dividing large
tables into smaller (and less redundant) tables
and defining relationships between them.
• The objective is to isolate data so that additions,
deletions, and modifications of a field can be
made in just one table and then propagated
through the rest of the database using the
defined relationships.
Non loss decomposition
• Normalization procedure involves breaking
down or decomposing a given relation into
other relation and that decomposition is
required to be reversible so that no
information is lost.
•
S-
S# STATUS CITY
S3
S5
(A)
30
30
Paris
Athens
SST- S# STATUS
S3 30
S5 30
SC- S# CITY
S3 Paris
S5 Athens
--NONLOSS DECOMPOSTION
(B) SST - S# STATUS
S3
30
S5
30
--LOSSY DECOMPOSTION
STC - STATUS CITY
30
Paris
30
Athens
FD diagram
SNAME
S#
S#
QTY
STATUS
P#
CITY
FIRST, SECOND, THIRD
NORMAL FORMS
• (informal defi):
• Third normal form: A relvar is in 3NF if and only if the
nonkey attributes (if any) are
A. mutually independent, and
B. irreducibly dependent on the primary key.
• (even more informal defi):
• Third normal form: A relvar is in 3NF if and only if, for
all time, each tuple consists of primary key value that
identifies entity, together with a set of 0 or more mutually
independent attribute values that describe that entity in
some way.
First NF
• A relvar is in 1NF if and only if, in every legal value of
that relvar, every tuple contains exactly one value for
each attribute.
• First { s#, status, city, p#, qty} primary key {s#,p#}
S#
CITY
QTY
P#
STATUS
S#
STATUS CITY
S1
20
LONDON
S2
10
PARIS
S3
10
PARIS
S4
20
LONDON
P#
QTY
P1
300
P2
200
P3
400
P4
200
P5
100
P6
100
P2
400
P2
200
P2
200
P2
200
P4
300
P5
400
FIRST
S#
S1
S1
STATUS
20
20
CITY
London
London
P#
P1
P2
QTY
300
200
S1
S1
S1
S1
20
20
20
20
London
London
London
London
P3
P4
P5
P6
400
200
100
100
S2
S2
S3
10
10
10
Paris
Paris
Paris
P1
P2
P2
300
400
200
S4
S4
S4
20
20
20
London
London
London
P2
P4
P5
200
300
400
Example
Faculty Name
code
100
101
102
103
Yogesh
Amit
Omprakash
Nitin
Date of
birth
17/07/64
24/12/72
03/02/80
28/11/66
Subject hours
DSA
16
SS
8
IS
12
MIS
16
PM
8
IS
8
PWRC
8
PCOM
8
IP
16
DT
10
PCOM
8
SS
8
0
Faculty code
Faculty name Date of birth
Subject
hours
100
Yogesh
17/07/64
DSA
16
100
Yogesh
17/07/64
SS
8
100
Yogesh
17/07/64
IS
12
101
Amit
24/12/72
MIS
16
101
Amit
24/12/72
PM
8
101
Amit
24/12/72
IS
12
102
Omprakash
03/02/80
PWRC
8
102
Omprakash
03/02/80
PCOM
8
102
Omprakash
03/02/80
IP
16
103
Nitin
28/11/66
DT
10
103
Nitin
28/11/66
PCOM
8
103
Nitin
28/11/66
SS
8
Difficulties with the update
operation
• Redundancies in relation FIRST lead to a
variety of difficulties in operation named
Update anomalies.
• INSERT
• DELETE
• UPDATE
• SECOND
-SP
S#
STATUS
CITY
S#
P#
QTY
S1
20
London
S1
P1
100
S2
10
Paris
S1
P2
100
S3
10
Paris
S2
P1
200
S4
20
London
S2
P2
200
S5
30
Athens
S3
P2
300
S4
P2
400
S4
P4
400
S4
P5
400
FD diagram of SECOND and
SP
CITY
S#
S#
QTY
STATUS
P#
Second NF
• A relvar is in 2NF if and only if, it is in 1NF and every
nonkey attribute is irreducibly dependent on the primary
key.
OR
• A relvar is in 2NF if and only if, it is in 1NF and every
nonkey attribute is fully functionaly dependent on the
whole primary key. And not just on part of the primary
key.
Third NF
• A relvar is in 3NF if and only if it is in 2NF
and every nonkey attribute is
nontransitively dependent on the primary
key.
• (There should not be any transitive dependency).
• SC:
S#
CS:
CITY
CITY
STATUS
S1
London
Athens
30
S2
Paris
London
20
S3
Paris
Paris
10
S4
London
Rome
50
S5
Athens
FD diagram of SC and CS
S#
CITY
CITY
STATUS
BCNF
• Boyce/Codd normal form
• Upto now it was assumed that every
relation has just one candidate key
• Consider what happens when more than
one candidate keys
• 3NF did not adequately deal with the case
of relation that
– Had two or more candidate keys
– Candidate keys were composite
– They overlapped
• So the original definition of 3NF was
replaced by a stronger definition due to
Boyce and Codd, that catered for this
case.
• Before explanation of BCNF, reminder
should be of
– Determinant (left side of FD)
• {A} --> {B} A is called as determinant
– Trival and nontrival FD
Defination of BCNF
• Formal defi.:
• A relation is in BCNF if and only if every nontrival , left
irreducible FD has a candidate key as its determinant.
• Informal defi.:
– A relation is in BCNF if and only if every
determinant is a candidate key.
– In other words, the only arrow in FD diagram
are arrows out of candidate keys.
• Before considering some examples
involving more than one candidate key, let
us convince ourselves that relation FIRST
and SECOND which were not in 3NF, are
not in BCNF.
• Relation FIRST contains 3 determinants,
{sup_no}, {city}, {sup_no, part_no}
• Relation SECOND is also not in BCNF
because the determinant {city} is not a
candidate key and FD is {city}  {status}.
• Consider another example having two
candidate keys.
• Relation SUPPLIER { sup_no, aadhar_no,
sup_name, city}
• Candidate keys: {sup_no} and
{aadhar_no}
• Assumption:
– for all time, it is the case that every supplier
has a unique sup_no and also a unique
aadhar_no.
Sup_no
Aadhar_no
Sup_nama
city
S1
12483847
John
London
S2
57475688
Devid
Paris
S3
67578488
Prince
Athens
S4
57465663
John
Rome
S5
34344567
Madonna
New York
• so FD diagram will be
Sup_no
Aadhar_no
Sup_name
city
4NF
Course
DBMS
Maths
Teachers
Texts
Teacher
Text
Devangimam
Henry korth
Tejassir
Ivan bayross
Teacher
Text
Krishnamam
Vector analysis
Patatsir
Trigonometry
Relation: CTX
Course
Teachers
Texts
DBMS
Devangimam
Hery korth
DBMS
Devangimam
Ivan bayross
DBMS
Tejassir
Hery korth
DBMS
Tejassir
Ivan bayross
Maths
Krishnamam
Vector analysis
Maths
Krishnamam
Trigonometry
Maths
Patatsir
Vector analysis
Maths
Patatsir
Trigonometry
• In relation CTX Multi-Valued Dependancy
– Course->-> Teacher and
– Course->->Text are there.
• MVDs occur when two or more independent multi
valued facts about the same attribute occur within
the same table.
• Here MVD means that
– A course does not have a unique
corresponding teacher
– But each course have well-defined set of
teachers
– So we can say that, for a given value of
course C-Maths and a given text x-vector
analysis, the set of teachers t matching the
pair (C,x) in CTX depends on the value C
alone because it makes no difference which
particular value of x we choose.
• Problem with CTX:
– It involved good deal of redundancy
– Leads to update anomalies
• For e.g. to add information that DBMS course can
be taught by a new teacher, it is necessary to
insert two separate tuples, one for each of the two
texts.
• Here problem is caused by the fact that teachers
and texts are completely independent of one
another.
– So CTX can be decomposed into two
projections called CT and CX
• CT
Course
Teachers
DBMS
Devangimam
DBMS
Tejassir
Maths
Krishnamam
Maths
Patatsir
• CX
Course
Texts
DBMS
Hery Korth
DBMS
Ivan bayross
Maths
Vector analysis
Maths
Trigonometry
Definition of 4NF
• A relation R is in Fourth Normal Form
(4NF) if and only if the following conditions
are satisfied simultaneously:
– R is already in 3NF or BCNF.
– If it contains no multi-valued dependencies.
5NF
• A relation R is in Fifth Normal Form (5NF)
if and only if the following conditions are
satisfied simultaneously:
• -R is already in 4NF.
• -It cannot be further non-loss
decomposed.
• In all of the further normal forms discussed so
far, no loss decomposition was achieved by the
decomposing of a single table into two separate
tables.
• No loss decomposition is possible because of
the availability of the join operator as part of the
relational model.
• In considering 5NF, consideration must be given
to tables where this non-loss decomposition can
only be achieved by decomposition into three or
more separate tables.
• Such decomposition is not always possible as is
shown by the following example.
Under these circumstances,
the 'agent company product' table as shown
below:
This table can be decomposed into its three
projections without loss of information as
demonstrated below:
If the natural join of PI and P2 IS taken, the result
is:
(The spurious row as asterisked. )
-Now, if this result is joined with P3 over the
column 'company 'product_name' the following
table is obtained:
-The original table, therefore, violated 5NF simply
because it was non-loss decomposable into its
three projections.
-But see the below table, named
‘Agent_Company_product’.
If we devide it in two table P1 and P2 than
spurious record will be there when P1 and
P2 joins.
• but if it is devided into three table than join
of all three projection P1, P2 and
P3 (company, product_name) than also it
contains spurious record.
- so it is not simply possible of decompose
the 'AGENT_COMPANY_PRODUCT'
table, populated as shown, without losing
information.
Download