Proposed by Codd in 1972
Takes a relation through a series of steps to certify whether it satisfies a certain normal form
Initially Codd proposed three normal forms
Boyce-Codd normal form is introduced by Boyce and Codd
Based on functional dependencies between attributes of a relation
Later 4 th and 5 th normal forms were introduced based on multi-valued dependencies and join dependencies
Normalization is the process of efficiently organizing data in a database
There are two goals of the normalization process:
Eliminating redundant data
For example, storing the same data in more than one table
Ensuring data dependencies make sense
Only storing related data in a table
Reduce the amount of space a database consumes and ensure that data is logically stored
Through normalization we want to design for our relational database a set of files that
Contain all the data necessary for the purposes that the database is to serve
Have as little redundancy as possible
Accommodate multiple values for types of data that require them
Permit efficient updates of the data in the database
Avoid the danger of losing data unknowingly
Normalization Avoids
Duplication of Data
The same data is listed in multiple lines of the database
Insert Anomaly
A record about an entity cannot be inserted into the table without first inserting information about another entity –
Cannot enter a customer without a sales order
Delete Anomaly
A record cannot be deleted without deleting a record about a related entity. Cannot delete a sales order without deleting all of the customer’s information.
Update Anomaly
Cannot update information without changing information in many places. To update customer information, it must be updated for each sales order the customer has placed
Guidelines for ensuring that databases are normalized
Numbered from
1 through
5
1
NF,
2
NF,
3
NF,
4
NF and
5
NF
In practical applications,
We often see first three normal forms
Occasionally we can see 4 th normal form
And 5 th normal form is rarely seen
Normalization is a three stage process –
After the first stage, the data is said to be in first normal form
After the second, it is in second normal form
After the third, it is in third normal form
Begin with a list of all of the fields that must appear in the database. Think of this as one big table.
Do not include computed fields
One place to begin getting this information is from a printed document used by the system.
Additional attributes besides those for the entities described on the document can be added to the database.
ORDERS
SalesOrderNo,
Date,
CustomerNo,
CustomerName,
CustomerAdd,
ClerkNo,
ClerkName,
ItemNo,
Description,
Qty,
UnitPrice
Functional Dependency
The value of one attribute in a table is determined entirely by the value of the primary key
Partial Dependency
A type of functional dependency where an attribute is functionally dependent on only part of the primary key
(primary key must be a composite key).
Transitive Dependency
A type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.
To disallow multi-valued attributes, composite attributes and complex attributes
Domain of an attribute must include only atomic values (simple and indivisible)
Disallows ‘ relations within relations ’ or ‘ relations as attribute values within tuples ’
DNAME
Research
Administration 4
Headquarters 1
DNUMBER DMGRENO
5 333445555
987654321
888665555
DLOCATIONS
{Bangalore, New Delhi, Hyderabad}
{Chennai}
{Hyderabad}
DLOCATION is not an atomic attribute
The domain of DLOCATIONS contain atomic values
The domain of DLOCATIONS contain sets of values
(nonatomic)
Techniques to achieve 1NF
1.
Remove the attribute DLOCATIONS that violates 1NF and place it in a separate relation DEPT_LOCATIONS
DEPARTMENT
DNAME
Research
Administration
Headquarters
DNUMBER DMGRENO
5 333445555
4
1
987654321
888665555
DEPT_LOCATIONS
DNUMBER
5
4
5
5
1
DLOCATION
Bangalore
New Delhi
Hyderabad
Chennai
Hyderabad
2.
Expand the key so that there will be separate tuple in the original DEPARTMENT for each location disadvantage : introduces redundancy in relation
DNAME
Research
Research
Research
Administration 4
Headquarters 1
5
5
DNUMBER DMGRENO
5 333445555
333445555
333445555
987654321
888665555
DLOCATIONS
Bangalore
New Delhi
Hyderabad
Chennai
Hyderabad
3.
If the maximum number of values is known for the attribute, replace the attribute by number of atomic attributes disadvantage : introduces null values
DNAME
Research
Administration 4
DNUMBER DMGRENO DLOCATIO
N1
5 333445555
DLOCATIO
N2
Bangalore New Delhi
987654321 Chennai
Headquarters 1 888665555 Hyderabad
DLOCATIO
N3
Hyderabad
ENO ENAME PROJS
PNUMBER HOURS
EMP_PROJ( ENO, ENAME, {PROJS ( PNUMBER,
HOURS ) } )
ENO is the primary key and PNUMBER is partial key of relation
PERSON ( IDNO, ENAME, ADDRESS, AGE,
PROFESSION, { CAR_LIC }, { PHONE } )
The relation should be in first normal form
Based on full functional dependency
A functional dependency X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more
A є X, (X - {A}) does not functional determine Y
A partial dependency X Y is a partial dependency if some attribute A є X, (X – {A}) Y
ENO PNO HOURS ENAME PNAME PLOCATION
A relation R is in 2NF if every non-prime attribute A in
R is fully functionally dependent on the primary key of
R
If primary key contains one attribute, the test need not be applied at all
ENAME ENO DOB ADDRESS DNUMBER DNAME DMERGENO
If the relation is not in 2NF, it can be ‘ second normalized ’ in to a number of 2NF relations in which non-prime attributes are associated only with the part of the primary key on which they are fully functionally dependent
ENO PNO HOURS ENAME PNAME PLOCATION
ENO PNUMBER HOURS ENO ENAME PNUMBER PNAME PLOCATION
Relation should be in second normal form
Based on transitive dependency
A functional dependency A Y in a relation R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R, and X Z and Z Y hold
ENAME ENO DOB ADDRESS DNUMBER DNAME DMERGENO
The dependency ENO DNUMBER and
DNUMBER DMGRENO hold and DNUMBER is neither a key nor a subset of a key
A relation is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key
ENAME ENO DOB ADDRESS DNUMBER DNAME DMERGENO
ENAME ENO DOB ADDRESS DNUMBER
DNUMBER DNAME DMGRENO
In what normal form this relation is ????
GRADES ( StudentID , Course# , Semester# , Grade)
Suppose you are given a relation R = (A,B,C,D,E) with the following functional dependencies: {CE ! D,D ! B,C ! A}.
a. Find all candidate keys.
b. Identify the best normal form that R satisfies
( 1NF , 2NF , 3NF )
What is normalization ????
A relational database is basically composed of tables that contain related data. The process of organizing this data is called as normalization
What is 1 NF (Normal Form)????
The domain of attribute must include only atomic (simple, indivisible) values.
What is 2NF????
A relation schema R is in 2NF if it is in 1NF and every non-prime attribute A in R is fully functionally dependent on primary key.
What is 3NF?
A relation schema R is in 3NF if it is in 2NF and for every FD X A either of the following is true
X is a Super-key of R.
A is a prime attribute of R.
In other words, if every non prime attribute is non-transitively dependent on primary key.
NORMAL
FORM
TEST
1NF Relation should have no non-atomic attributes or nested relations
REMEDY
Form new relation for each non-atomic attribute or nested relation
2NF For relations where primary key contains multiple attributes, no nonkey attribute should be functionally dependent on a part of the primary key
Decompose and set up a new relation for each partial key with its dependent attribute(s). Make sure to keep relation with the original primary key and any attributes that are fully functional dependent on it
3NF Relation should not have a non-key attribute functionally determined by another non-key attribute. There should be no transitive dependency of a non-key attribute on the primary key
Decompose and set up a relation that includes the non-key attribute(s) that functionally determine other non-key attribute(s)
NID Name Age ContactDetails Ward WardInCharge WardLocation
Address TelePhone
Conceptual Design :
Patient ( NID , Name , Age , {CotactDetails ( Address , {Telephone})}
, Ward , WardInCharge, WardLocation)
Convert this relation into 1 st Normal Form, 2 nd Normal Form, 3 rd
Normal Form
BCNF
Simpler form of 3NF
Stricter than 3NF
Every relation in BCNF is also in 3NF
Relation in 3NF is not necessarily in BCNF
is trivial (i.e.,
)
is a superkey for R
R = (A, B, C)
F = {A
B
B
C}
Key = {A}
R is not in BCNF
Decomposition R
1
= (A, B), R
2
= (B, C)
R
1 and R
2 in BCNF
Lossless-join decomposition
Dependency preserving
PROPERTY_ID LOCATION PROVINCE AREA PRICE TAX_RATE
Patient No Patient
Name
1
2
3
4
5
John
Kerr
Adam
Robert
Zane
1
0
0
1
0
Appointme nt Id
Time
09:00
09:00
10:00
13:00
14:00
Doctor
Zorro
Killer
Zorro
Killer
Zorro
Patno --> PatName
Patno,appNo --> Time,doctor
Time --> appNo
Grade_report (
StudNo, StudName,
Grade (
Major,
Advisor,
Grade (
CourseNo,
Ctitle,
InstrucName,
InstructLocn,
Grade
)
)
)