NORMALIZATION Relation P: P# PNAME COLOR WEIGHT CITY P1 Nut Red 12.0 London P2 Bolt Green 17.0 Paris P3 Screw Blue 17.0 Rome P4 Screw Red 14.0 London P5 Cam Blue 12.0 Paris P6 Cog Red 19.0 London Relation S: S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Black 30 Paris S4 Clerk 20 London S5 Adams 30 Athens Relation SP: S# S1 S1 P# P1 P2 QTY 300 200 S1 S1 S1 P3 P4 P5 400 200 100 S1 S2 S2 P6 P1 P2 100 300 400 S3 S4 S4 P2 P2 P4 200 200 300 S4 P5 400 SCP S# CITY P# QTY STATUS S1 London P1 100 20 S1 London P2 100 20 S2 Paris P1 200 10 S2 Paris P2 200 10 S3 Paris P2 300 10 S4 London P2 400 20 S4 London P4 400 20 S4 London P5 400 20 • Problem with SCP is redundancy. • Relation are always normalized so far as it has legal values. • Than we can say that relation are always normalized or in first normal form “1NF”. • A given relation might be normalized yet still possess certain undesirable properties. • The principal of further normalization allow us to recognize such cases and to replace such relation by ones that are more desirable in some way. • In the case of SCP, they would tell us how to replace it by two more desirable relations {s#,city,status} and {s#,p#,qty}. Normalization • Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy. • Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. • The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships. Non loss decomposition • Normalization procedure involves breaking down or decomposing a given relation into other relation and that decomposition is required to be reversible so that no information is lost. • S- S# STATUS CITY S3 S5 (A) 30 30 Paris Athens SST- S# STATUS S3 30 S5 30 SC- S# CITY S3 Paris S5 Athens --NONLOSS DECOMPOSTION (B) SST - S# STATUS S3 30 S5 30 --LOSSY DECOMPOSTION STC - STATUS CITY 30 Paris 30 Athens FD diagram SNAME S# S# QTY STATUS P# CITY FIRST, SECOND, THIRD NORMAL FORMS • (informal defi): • Third normal form: A relvar is in 3NF if and only if the nonkey attributes (if any) are A. mutually independent, and B. irreducibly dependent on the primary key. • (even more informal defi): • Third normal form: A relvar is in 3NF if and only if, for all time, each tuple consists of primary key value that identifies entity, together with a set of 0 or more mutually independent attribute values that describe that entity in some way. First NF • A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute. • First { s#, status, city, p#, qty} primary key {s#,p#} S# CITY QTY P# STATUS S# STATUS CITY S1 20 LONDON S2 10 PARIS S3 10 PARIS S4 20 LONDON P# QTY P1 300 P2 200 P3 400 P4 200 P5 100 P6 100 P2 400 P2 200 P2 200 P2 200 P4 300 P5 400 FIRST S# S1 S1 STATUS 20 20 CITY London London P# P1 P2 QTY 300 200 S1 S1 S1 S1 20 20 20 20 London London London London P3 P4 P5 P6 400 200 100 100 S2 S2 S3 10 10 10 Paris Paris Paris P1 P2 P2 300 400 200 S4 S4 S4 20 20 20 London London London P2 P4 P5 200 300 400 Example Faculty Name code 100 101 102 103 Yogesh Amit Omprakash Nitin Date of birth 17/07/64 24/12/72 03/02/80 28/11/66 Subject hours DSA 16 SS 8 IS 12 MIS 16 PM 8 IS 8 PWRC 8 PCOM 8 IP 16 DT 10 PCOM 8 SS 8 0 Faculty code Faculty name Date of birth Subject hours 100 Yogesh 17/07/64 DSA 16 100 Yogesh 17/07/64 SS 8 100 Yogesh 17/07/64 IS 12 101 Amit 24/12/72 MIS 16 101 Amit 24/12/72 PM 8 101 Amit 24/12/72 IS 12 102 Omprakash 03/02/80 PWRC 8 102 Omprakash 03/02/80 PCOM 8 102 Omprakash 03/02/80 IP 16 103 Nitin 28/11/66 DT 10 103 Nitin 28/11/66 PCOM 8 103 Nitin 28/11/66 SS 8 Difficulties with the update operation • Redundancies in relation FIRST lead to a variety of difficulties in operation named Update anomalies. • INSERT • DELETE • UPDATE • SECOND -SP S# STATUS CITY S# P# QTY S1 20 London S1 P1 100 S2 10 Paris S1 P2 100 S3 10 Paris S2 P1 200 S4 20 London S2 P2 200 S5 30 Athens S3 P2 300 S4 P2 400 S4 P4 400 S4 P5 400 FD diagram of SECOND and SP CITY S# S# QTY STATUS P# Second NF • A relvar is in 2NF if and only if, it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key. OR • A relvar is in 2NF if and only if, it is in 1NF and every nonkey attribute is fully functionaly dependent on the whole primary key. And not just on part of the primary key. Third NF • A relvar is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key. • (There should not be any transitive dependency). • SC: S# CS: CITY CITY STATUS S1 London Athens 30 S2 Paris London 20 S3 Paris Paris 10 S4 London Rome 50 S5 Athens FD diagram of SC and CS S# CITY CITY STATUS BCNF • Boyce/Codd normal form • Upto now it was assumed that every relation has just one candidate key • Consider what happens when more than one candidate keys • 3NF did not adequately deal with the case of relation that – Had two or more candidate keys – Candidate keys were composite – They overlapped • So the original definition of 3NF was replaced by a stronger definition due to Boyce and Codd, that catered for this case. • Before explanation of BCNF, reminder should be of – Determinant (left side of FD) • {A} --> {B} A is called as determinant – Trival and nontrival FD Defination of BCNF • Formal defi.: • A relation is in BCNF if and only if every nontrival , left irreducible FD has a candidate key as its determinant. • Informal defi.: – A relation is in BCNF if and only if every determinant is a candidate key. – In other words, the only arrow in FD diagram are arrows out of candidate keys. • Before considering some examples involving more than one candidate key, let us convince ourselves that relation FIRST and SECOND which were not in 3NF, are not in BCNF. • Relation FIRST contains 3 determinants, {sup_no}, {city}, {sup_no, part_no} • Relation SECOND is also not in BCNF because the determinant {city} is not a candidate key and FD is {city} {status}. • Consider another example having two candidate keys. • Relation SUPPLIER { sup_no, aadhar_no, sup_name, city} • Candidate keys: {sup_no} and {aadhar_no} • Assumption: – for all time, it is the case that every supplier has a unique sup_no and also a unique aadhar_no. Sup_no Aadhar_no Sup_nama city S1 12483847 John London S2 57475688 Devid Paris S3 67578488 Prince Athens S4 57465663 John Rome S5 34344567 Madonna New York • so FD diagram will be Sup_no Aadhar_no Sup_name city 4NF Course DBMS Maths Teachers Texts Teacher Text Devangimam Henry korth Tejassir Ivan bayross Teacher Text Krishnamam Vector analysis Patatsir Trigonometry Relation: CTX Course Teachers Texts DBMS Devangimam Hery korth DBMS Devangimam Ivan bayross DBMS Tejassir Hery korth DBMS Tejassir Ivan bayross Maths Krishnamam Vector analysis Maths Krishnamam Trigonometry Maths Patatsir Vector analysis Maths Patatsir Trigonometry • In relation CTX Multi-Valued Dependancy – Course->-> Teacher and – Course->->Text are there. • MVDs occur when two or more independent multi valued facts about the same attribute occur within the same table. • Here MVD means that – A course does not have a unique corresponding teacher – But each course have well-defined set of teachers – So we can say that, for a given value of course C-Maths and a given text x-vector analysis, the set of teachers t matching the pair (C,x) in CTX depends on the value C alone because it makes no difference which particular value of x we choose. • Problem with CTX: – It involved good deal of redundancy – Leads to update anomalies • For e.g. to add information that DBMS course can be taught by a new teacher, it is necessary to insert two separate tuples, one for each of the two texts. • Here problem is caused by the fact that teachers and texts are completely independent of one another. – So CTX can be decomposed into two projections called CT and CX • CT Course Teachers DBMS Devangimam DBMS Tejassir Maths Krishnamam Maths Patatsir • CX Course Texts DBMS Hery Korth DBMS Ivan bayross Maths Vector analysis Maths Trigonometry Definition of 4NF • A relation R is in Fourth Normal Form (4NF) if and only if the following conditions are satisfied simultaneously: – R is already in 3NF or BCNF. – If it contains no multi-valued dependencies. 5NF • A relation R is in Fifth Normal Form (5NF) if and only if the following conditions are satisfied simultaneously: • -R is already in 4NF. • -It cannot be further non-loss decomposed. • In all of the further normal forms discussed so far, no loss decomposition was achieved by the decomposing of a single table into two separate tables. • No loss decomposition is possible because of the availability of the join operator as part of the relational model. • In considering 5NF, consideration must be given to tables where this non-loss decomposition can only be achieved by decomposition into three or more separate tables. • Such decomposition is not always possible as is shown by the following example. Under these circumstances, the 'agent company product' table as shown below: This table can be decomposed into its three projections without loss of information as demonstrated below: If the natural join of PI and P2 IS taken, the result is: (The spurious row as asterisked. ) -Now, if this result is joined with P3 over the column 'company 'product_name' the following table is obtained: -The original table, therefore, violated 5NF simply because it was non-loss decomposable into its three projections. -But see the below table, named ‘Agent_Company_product’. If we devide it in two table P1 and P2 than spurious record will be there when P1 and P2 joins. • but if it is devided into three table than join of all three projection P1, P2 and P3 (company, product_name) than also it contains spurious record. - so it is not simply possible of decompose the 'AGENT_COMPANY_PRODUCT' table, populated as shown, without losing information.