Relational Database Design Relational Database Design RDBMS design issues – Pitfalls and Normalization. Overview of Normal Forms Pitfalls in Relational Database Design Functional Dependencies Decomposition Boyce-Codd Normal Form Third Normal Form Multivalued Dependencies and Fourth Normal Form Overall Database Design Process Database System Concepts 7.2 ©Silberschatz, Korth and Sudarshan The Banking Schema branch = (branch_name, branch_city, assets) customer = (customer_id, customer_name, customer_street, customer_city) loan = (loan_number, amount) account = (account_number, balance) employee = (employee_id. employee_name, telephone_number, start_date) dependent_name = (employee_id, dname) account_branch = (account_number, branch_name) loan_branch = (loan_number, branch_name) borrower = (customer_id, loan_number) depositor = (customer_id, account_number) cust_banker = (customer_id, employee_id, type) works_for = (worker_employee_id, manager_employee_id) payment = (loan_number, payment_number, payment_date, payment_amount) savings_account = (account_number, interest_rate) checking_account = (account_number, overdraft_amount) Database System Concepts 7.3 ©Silberschatz, Korth and Sudarshan Combine Schemas? Suppose we combine borrower and loan to get bor_loan = (customer_id, loan_number, amount ) Result is possible repetition of information (L-100 in example below) Database System Concepts 7.4 ©Silberschatz, Korth and Sudarshan A Combined Schema Without Repetition Consider combining loan_branch and loan loan_amt_br = (loan_number, amount, branch_name) No repetition (as suggested by example below) Database System Concepts 7.5 ©Silberschatz, Korth and Sudarshan What About Smaller Schemas? Suppose we had started with bor_loan. How would we know to split up (decompose) it into borrower and loan? Write a rule “if there were a schema (loan_number, amount), then loan_number would be a candidate key” Denote as a functional dependency: loan_number amount In bor_loan, because loan_number is not a candidate key, the amount of a loan may have to be repeated. This indicates the need to decompose bor_loan. Not all decompositions are good. Suppose we decompose employee into employee1 = (employee_id, employee_name) employee2 = (employee_name, telephone_number, start_date) The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition. Database System Concepts 7.6 ©Silberschatz, Korth and Sudarshan A Lossy Decomposition Database System Concepts 7.7 ©Silberschatz, Korth and Sudarshan Pitfalls in Relational Database Design Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to Repetition of Information. Inability to represent certain information. Design Goals: Avoid redundant data Ensure that relationships among attributes are represented Facilitate the checking of updates for violation of database integrity constraints. Database System Concepts 7.8 ©Silberschatz, Korth and Sudarshan RDBMS Design issues So far we have assumed that attributes are grouped to form a relation schema by using the common sense of database designer or by mapping a schema defined by ER model. We still need some formal measure of why one grouping of attributes into a relation schema may be better than another. Unsatisfactory relation schemas that do not meet certain conditions – the normal form tests – are decomposed into smaller relation schemas that meet the tests and hence possess the desirable properties. Thus, the normalization procedure provides database designers with; A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes. A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree. Database System Concepts 7.9 ©Silberschatz, Korth and Sudarshan First Normal Form A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria. These criteria are basically concerned with ensuring that the table is a faithful representation of a relation and that it is free of repeating groups. Some definitions of 1NF, most notably that of Edgar F. Codd, make reference to the concept of atomicity. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions).“ Meaning a field should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same field. Database System Concepts 7.10 ©Silberschatz, Korth and Sudarshan First Normal Form Domain is atomic if its elements are considered to be indivisible units Examples of non-atomic domains: Set of names, composite attributes Identification numbers like CS101 that can be broken up into parts A relational schema R is in first normal form if the domains of all attributes of R are atomic Non-atomic values complicate storage and encourage redundant (repeated) storage of data Example: Set of accounts stored with each customer, and set of owners stored with each account We assume all relations are in first normal form (and revisit this again!) Database System Concepts 7.11 ©Silberschatz, Korth and Sudarshan First Normal Form (Cont’d) Atomicity is actually a property of how the elements of the domain are used. Example: Strings would normally be considered indivisible Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. Doing so is a bad idea: leads to encoding of information in application program rather than in the database. Database System Concepts 7.12 ©Silberschatz, Korth and Sudarshan Goal — Devise a Theory for the Following Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that each relation is in good form the decomposition is a lossless-join decomposition Our theory is based on: functional dependencies multivalued dependencies Database System Concepts 7.13 ©Silberschatz, Korth and Sudarshan Overview of Normal Forms 1NF ( First Normal Form) To understand 2NF 3NF Concept of FD’s ( Functional Dependency) required BCNF To understand 4NF 5NF Concept of MVD (Multi Valued Dependency) is required Database System Concepts 7.14 ©Silberschatz, Korth and Sudarshan Normal Forms & FDs: Review Unnormalized – There are multivalued attributes or repeating groups 1 NF – No composite attributes or repeating groups, 2 NF – 1 NF plus no partial dependencies 3 NF – 2 NF plus no transitive dependencies Database System Concepts 7.15 ©Silberschatz, Korth and Sudarshan Normalization The basic objective of normalization is to reduce the various anomalies in the database. Normalization can be looked upon as a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of ; Minimizing redundancy Minimizing the insertion, deletion, and update anomalies. Unsatisfactory relation schemas that do not meet certain conditions – the normal form tests – are decomposed into smaller relation schemas that meet the tests and hence possess the desirable properties. Thus, the normalization procedure provides database designers with; A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes. A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree. Database System Concepts 7.16 ©Silberschatz, Korth and Sudarshan Normalization… The normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized. Normal forms when considered in isolation from other factors, do not guarantee a good database design. It is generally not sufficient to check separately that each relation schema in the database is, say, in BCNF or 3NF. Rather, the process of normalization through decomposition must also confirm the existence of additional properties that the relation schemas, taken together should possess; The Lossless join, The dependency preservation property, which ensures that each functional dependency is represented in some individual relations resulting after decomposition. Database System Concepts 7.17 ©Silberschatz, Korth and Sudarshan RDBMS design RDBMS design involves checking the current design through Normal Form Test. If the design is not in desired Normal form, then Decompose the Relations (Tables) into smaller ones Fulfill the properties of decomposition. Properties of decomposition Functional dependency preservation – Identify the FDs in the given Relation schema – Apply Armstrong Axioms to find set of all FDs (Closure) Database System Concepts 7.18 ©Silberschatz, Korth and Sudarshan Overview To understand 2NF 3NF FD’s required BCNF FD’s & Closure of FDs required To understand 4NF 5NF Database System Concepts MVD is required 7.19 ©Silberschatz, Korth and Sudarshan First Normal Form Domain is atomic if its elements are considered to be indivisible units Examples of non-atomic domains: Set of names, composite attributes Identification numbers like CS101 that can be broken up into parts A relational schema R is in first normal form if the domains of all attributes of R are atomic Non-atomic values complicate storage and encourage redundant (repeated) storage of data Database System Concepts 7.20 ©Silberschatz, Korth and Sudarshan Keys & Functional Dependency A Key Attributes is special attribute which can uniquely identify the record for Entity A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. Ex. Super key : ( customer-id, Customer-name) A candidate key of an entity set is a minimal super key Customer-id is candidate key of customer account-number is candidate key of account Although several candidate keys may exist, one of the candidate keys is selected to be the primary key. Functional Dependencies shows us Inter – relation among attributes 21 Database System Concepts 7.21 ©Silberschatz, Korth and Sudarshan Functional Dependencies and Keys A FD is a generalization of the notion of a key. How ? If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y X - Y For Example in Relation schema Student (sid, name, supervisor_id, specialization), we write the functional dependency: {sid} {name, supervisor_id, specialization} The sid determines all attributes (i.e., the entire record) If two tuples in the relation student have the same sid, then they must have the same values on all attributes. In other words they must be the same tuple (since the relational modes does not allow duplicate records) Database System Concepts 7.22 ©Silberschatz, Korth and Sudarshan Functional Dependency Functional Dependency : Inter – relation among attributes of an entity. Let R be a relation on the relation scheme R, then R satisfies the functional – dependency X - Y If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y. ( X determines Y Y functionally dependent on X ) FD’s can be used to group the attributes into Relation-scheme, which is in a particular Normal forms Approach of Normalization : Step I : Test the given Relation scheme for any Normal form Step II : If the given R-scheme is not in any N.F. (2NF/3NF) Step III : Decompose the Relation-scheme into small Relationschema. Step IV : Test the decomposed schema for Normal forms. Database System Concepts 7.23 ©Silberschatz, Korth and Sudarshan FDs example Database System Concepts 7.24 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.25 ©Silberschatz, Korth and Sudarshan Functional Dependency Inter – relation among attributes of an entity. Let R be a relation on the relation scheme R, then R satisfies the functional – dependency X - Y If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y. X determines Y Y functionally dependent on X FDs allows us to express constraints that we can not express with Superkeys ! FD’s can be used to group the attributes into Relationscheme, which is in a particular Normal forms Database System Concepts 7.26 ©Silberschatz, Korth and Sudarshan FDs Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of Loan may, by chance, satisfy amount customer_name. Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: bor_loan = (customer_id, loan_number, amount , customer_name). We expect this functional dependency to hold: loan_number amount but would not expect the following to hold: amount customer_name Database System Concepts 7.27 ©Silberschatz, Korth and Sudarshan How to identify FDs Identification of correct FDs are key to efficient DBMS Design . There are many commonly committed errors in identifying FDs. R satisfies the functional – dependency X Y If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y. X determines Y While identifying FDs we should ensure that each attribute in X uniquely determines and not that some tuples satisfy it ! Database System Concepts 7.28 ©Silberschatz, Korth and Sudarshan Common pitfalls in identifying FDs Design Goals & Normalization Design Anomalies Identifying FDs : When & How ? Procedure of FDs identifications ? Pitfalls in FDs identification Limitations with Normal Forms Decompose the Relation Scheme FDs to be taken care of while decomposition Dependency preservation, Lossless join. Database System Concepts 7.29 ©Silberschatz, Korth and Sudarshan Procedure to identify FDs How to Identify the FDs in Scheme. We start finding any two tuples with the same X value and then Y values in these tuples must be same. Repeat these procedure until all such pairs of tuples with the same X values are examined with Y values. Then verify that whether R in real world scenario satisfies X Y or not ? Should FDs be identified only by observing the tuples/data ? Analyse the FDs and see whether it will work ? Database System Concepts 7.30 ©Silberschatz, Korth and Sudarshan Common pitfalls in FDs in the Schema Whether the FDs Course Professor Professor Course Satisfied in the following Schema ? Why so ? If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y. Then verify that whether R in real world scenario satisfies Professor -> Course or not ? Can same value for Professor attribute may have more than one values for Course in real world scenario? DBMS Design & Normalization : Step I : Test the given Relation scheme for any Normal form Step II : Identify & Analyse the FDs of the given Scheme. Step III If the given R-scheme is not in any N.F. (2NF/3NF) then Step IV. Step IV: Decompose the Relation-scheme into small Relation-schema. Step V : Test the decomposed schema for Normal forms. Database System Concepts 7.32 ©Silberschatz, Korth and Sudarshan Normalization Using Functional Dependencies When we decompose a relation schema R with a set of functional dependencies F into R1, R2,.., Rn we want Lossless-join decomposition: Otherwise decomposition would result in information loss. No redundancy: The relations Ri preferably should be in either BoyceCodd Normal Form or Third Normal Form. Dependency preservation: Let Fi be the set of dependencies F+ that include only attributes in Ri. Preferably the decomposition should be dependency preserving, that is, (F1 F2 … Fn)+ = F+ Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive. Database System Concepts 7.33 ©Silberschatz, Korth and Sudarshan Functional Dependencies (Cont.) A functional dependency is trivial if it is satisfied by all instances of a relation Example: customer_name, loan_number customer_name customer_name customer_name In general, is trivial if Loss less Join : so there is no information loss after decomposition Dependency Preservation : Ensures that each functional dependency is represented in some relation after decomposition. Database System Concepts 7.34 ©Silberschatz, Korth and Sudarshan Closure of a Set of Functional Dependencies Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. E.g. If A B and B C, then we can infer that A C The set of all functional dependencies logically implied by F is the closure of F. We denote the closure of F by F+. We can find all of F+ by applying Armstrong’s Axioms: if , then (reflexivity) if , then (augmentation) if , and , then (transitivity) These rules are sound (generate only functional dependencies that actually hold) and complete (generate all functional dependencies that hold). Database System Concepts 7.35 ©Silberschatz, Korth and Sudarshan Example R = (A, B, C, G, H, I) F={ AB AC CG H CG I B H} some members of F+ AH by transitivity from A B and B H AG I by augmenting A C with G, to get AG CG and then transitivity with CG I CG HI from CG H and CG I : “union rule” can be inferred from – definition of functional dependencies, or – Augmentation of CG I to infer CG CGI, augmentation of CG H to infer CGI HI, and then transitivity Database System Concepts 7.36 ©Silberschatz, Korth and Sudarshan Functional Dependencies (Cont.) K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if K R, and for no K, R Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: bor_loan = (customer_id, loan_number, amount ). We expect this functional dependency to hold: loan_number amount but would not expect the following to hold: amount customer_name Database System Concepts 7.37 ©Silberschatz, Korth and Sudarshan 1NF 1NF :- A Relation scheme is said to be in 1NF .if there are no composite attributes, and every attribute is having atomic or indivisible values. 2NF :- A Relation is said to be in 2NF; 1. IF it is in 1NF and 2. Non –key attribute are functionally-dependent on the key – attributes. further , if the key has more than one attribute, then no non key attributes should be functionally dependent upon a part of the key attributes. Database System Concepts 7.38 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.39 ©Silberschatz, Korth and Sudarshan Design Goals & Pitfalls with FDs The basic objective of normalization is to reduce the various anomalies in the database. Normalization can be looked upon as a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of ; Minimizing redundancy Minimizing the insertion, deletion, and update anomalies. How to identify FDs and prime / nonprime attributes? An Attribute A in a relation is called a prime attribute if A is a part of any candidate key of the relation. There are some pitfalls in FDs identifying ! Database System Concepts 7.40 ©Silberschatz, Korth and Sudarshan Decomposition The bad design suggests that we should decompose that table. It is generally not sufficient to check separately that each relation schema in the database is, say, in 2NF or 3NF. Rather, the process of normalization through decomposition must also confirm the existence of additional properties that the relation schemas, taken together should possess; The Lossless join, The dependency preservation property, which ensures that each functional dependency is represented in some individual relations resulting after decomposition. Database System Concepts 7.41 ©Silberschatz, Korth and Sudarshan How to identify Normal Forms Unnormalized – There are multivalued attributes or repeating groups, No Key attributes 1NF 2NF 3NF BCNF Concept of FD’s ( Functional Dependency) required What are the Key Attributes ? What are the Non-Key Attributes ? Find out Functional Dependencies ? Database System Concepts 7.42 ©Silberschatz, Korth and Sudarshan Second Normal Form 2NF :- A Relation is said to be in 2NF; 1. IF it is in 1NF and 2. Non –key attribute are functionally-dependent on the key – attributes. further ,if the key has more than one attribute ,then no non key attributes should be functionally dependent upon a part of the key attributes. Database System Concepts 7.43 ©Silberschatz, Korth and Sudarshan 2 NF Cont… Key attribute B C A D E Entity Nonkey attribute F G Partial Dependency is not allowed Why ? What is the effect of allowing partial dependency ? Is Second Normal Form capable to give good Database implementation ? Database System Concepts 7.44 ©Silberschatz, Korth and Sudarshan First Normal Form A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria. These criteria are basically concerned with ensuring that the table is a faithful representation of a relation and that it is free of repeating groups. Some definitions of 1NF, most notably that of Edgar F. Codd, make reference to the concept of atomicity. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." First Normal Form Scheme should be such that in a given table there should be No Composite Attributes No Repeating groups and All the attributes can be uniqueky identified by Key Attributes Database System Concepts 7.45 ©Silberschatz, Korth and Sudarshan Normal Forms & FDs: Review Unnormalized – There are multivalued attributes or repeating groups 1 NF – No composite attributes or repeating groups, 2 NF – 1 NF plus no partial dependencies 3 NF – 2 NF plus no transitive dependencies Database System Concepts 7.46 ©Silberschatz, Korth and Sudarshan Normal Form Test for TEACHES Let us consider Teaches Relation for testing whether it fulfills any Normal Form ? What are the FDs in this scheme ? TEACHES contains attributes Professor, Course, Room, Room_Cap, Enrol_Lmt (Enrolment Limit). The relation scheme for the relation TEACHES is (Prof, Course, Room, Room_Cap,Enrol_lmt) The domain the attribute Prof is all the faulty members of the university. The domain of the attribute Course is the courses offered by the university. The domain of Room is the rooms in the buildings of the university. The domain of Room_Cap is an integer value indicating the seating capacity of the room. Database System Concepts 7.47 ©Silberschatz, Korth and Sudarshan Design - Analysis Go for Normal Form Tests ? The Teaches relation is in First Normal Form or not ? Identify the FDs in Scheme. Should FDs be identified only by observing the tuples/data ? Analyse the FDs and see how will it work ? Go for higher Normal Form Tests ? What is the remedy of the problem ? Database System Concepts 7.48 ©Silberschatz, Korth and Sudarshan Teaches Relation First Normal Form What are the Key Attributes ? What are the Non-Key Attributes ? Find out Functional Dependencies ? Are Non –key attribute functionally-dependent on the key – attributes.? Database System Concepts 7.49 ©Silberschatz, Korth and Sudarshan FDs in Scheme The domain of the Enrlo_Lmt is also integer value and should be less than or equal to the corresponding value for Room_Cap. The TEACHES relation is in first normal form since it’s attributes contain only atomic values and there are key attributes uniquely identifying record. FDs ? The course is scheduled in a given room and each Course uniquely identifies Room Course Room since the room has the given maximum number of available seats, there is a functional dependency Room Room_Cap And hence from transitivity Course Room Room_Cap Thus the functional dependencies in this relation are {Course (Prof, Room, Room_Cap, Enrol_Lmt), Room Room_Cap} Database System Concepts 7.50 ©Silberschatz, Korth and Sudarshan FDs Analysis Database System Concepts 7.51 ©Silberschatz, Korth and Sudarshan Analysis of Scheme There is an other transitive dependency (Here we assume that Enrol_Lmt is the upper limit on registration for a course and is based solely on the room capacity) Room Room_Cap Enrol_Lmt The presence of these transitive dependency in TEACHES will cause the following problems: The capacity of a room cannot be entered in the database unless a course is scheduled in the room ( Insert Anamoly)and The capacity of a room in which only one course is scheduled will be deleted if the only course scheduled in that room is deleted. ( Delete Anamoly) Because the same room can appear more than once in the database, there could be inconsistencies between the multiple occurrences of the attribute pair Room and Room_Cap. Database System Concepts 7.52 ©Silberschatz, Korth and Sudarshan Why Anomalies? TEACHES ( Professor, Course, Room, Room_Cap, Enrol_Lmt ) Anamolies occur in the given TEACHES scheme because of partial dependencies; which means Non-key attributes are NOT functionally dependent on Key attributes ! This violates the condition of Second Normal form ! Thus the given relation is in First Normal Form only. We have to Decompose the relation . Database System Concepts 7.53 ©Silberschatz, Korth and Sudarshan Second Normal Form 2NF :- A Relation is said to be in 2NF; 1. IF it is in 1NF and 2. Non –key attribute are functionally-dependent on the key – attributes. further ,if the key has more than one attribute ,then no non key attributes should be functionally dependent upon a part of the key attributes. Database System Concepts 7.54 ©Silberschatz, Korth and Sudarshan Normal Forms & Decomposition Case Study of TEACHES Relation First Normal Form? Anomalies in 1NF Decomposition for 2NF Is 2NF without anomalies? Decomposition for 3NF Decomposition principles Definition of 3NF Need for Closure Armstrong Axiom for Closure Database System Concepts 7.55 ©Silberschatz, Korth and Sudarshan Decomposition The bad design suggests that we should decompose that table. It is generally not sufficient to check separately that each relation schema in the database is, say, in 2NF or 3NF. Rather, the process of normalization through decomposition must also confirm the existence of additional properties that the relation schemas, taken together should possess; The Lossless join, The dependency preservation property, which ensures that each functional dependency is represented in some individual relations resulting after decomposition. Database System Concepts 7.56 ©Silberschatz, Korth and Sudarshan First Normal Form Database System Concepts 7.57 ©Silberschatz, Korth and Sudarshan Procedure to identify FDs How to Identify the FDs in Scheme. We start finding any two tuples with the same X value and then Y values in these tuples must be same. Repeat these procedure until all such pairs of tuples with the same X values are examined with Y values. Then verify that whether R in real world scenario X Y satisfies or not ? Should FDs be identified only by observing the tuples/data ? Analyse the FDs and see whether it will work ? Database System Concepts 7.58 ©Silberschatz, Korth and Sudarshan Decomposition into 2NF The Decomposed Relation does not have Partial Dependency. Thus the Schema is now in Second Normal Form . Is Second Normal Form capable to remove all anomalies and give good Database implementation ? Database System Concepts 7.59 ©Silberschatz, Korth and Sudarshan No Anomaly ? Although 2 NF does not give any Partial dependency but there can be other possibilities of dependencies, There is a interrelation join dependency between COURSE_DETAILS and ROOM_DETAILS to enforce the constraint that the Enrol_Lmt be less than or equal to the Room_Cap. Relation COURSE_DETAILS has a Transitive dependency Course Room Enrol_Lmt We can not insert a Room value in the relation alongwith an Enrol_Lmt value unless we have a Course value to go alonwith Room value. Non-key attribute is functionally dependent on another non key attribute.! Database System Concepts 7.60 ©Silberschatz, Korth and Sudarshan Further decomposition Database System Concepts 7.61 ©Silberschatz, Korth and Sudarshan Third Normal Form : Third Normal Form : A relation scheme is said to be in 3 NF if ; It is in 2 NF. No non-key attribute is functionally-dependent upon any Non- key attribute. Thus, there should be no transitive-dependency of a non-key attribute on the primary key. 3 NF Scheme does not allow partial dependencies like in 2NF but additionally 3NF does not allow any transitive dependencies. A relation scheme R < S, F > is in 3 NF if for all non-trivial Functional dependencies in F + (closure of F) of the form X A, either X contains a key (i.e. X is a super-key) or A is a prime attribute Database System Concepts 7.62 ©Silberschatz, Korth and Sudarshan Normal Forms & FDs: Review Unnormalized – There are multivalued attributes or repeating groups and record can not be uniquely identified by Primary key 1 NF – No composite attributes or repeating groups, 2 NF – 1 NF plus no partial dependencies 3 NF – 2 NF plus no transitive dependencies Database System Concepts 7.63 ©Silberschatz, Korth and Sudarshan Dependencies Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part). As we can see Course_ID and Units (credits) are not uniquely identified by Stud_ID and Name STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Database System Concepts 7.64 ©Silberschatz, Korth and Sudarshan Dependencies Type Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key. Name is non-key attribute which is determined by Cust_ID and does not require the other part of primary key namely Order_ID Partial Dependency CUSTOMER Database System Concepts Cust_ID Name Order_ID 101 AT&T 1234 101 AT&T 156 125 Cisco 1250 7.65 ©Silberschatz, Korth and Sudarshan Dependencies types Transitive Dependency – when a non-key attribute determines another non-key attribute. Employee is the relation in which Emp_ID is the key attribute Dept_ID is a non-key attribute and also the Dept_Name is non-key attribute. Transitive Dependency EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg Database System Concepts 7.66 ©Silberschatz, Korth and Sudarshan Example 1: Determine NF with FDs There is no composite attributes. All attributes are directly or indirectly determined by the primary key;, thus the relation is at least in 1 NF ISBN Title ISBN Publisher Publisher Address BOOK ISBN Database System Concepts Title Publisher 7.67 Address ©Silberschatz, Korth and Sudarshan Example 1: Determine NF ISBN Title ISBN Publisher Publisher Address The relation is at least in 1NF. There is no COMPOSITE primary key, therefore there can’t be partial dependencies. Therefore, the relation is at least in 2NF BOOK ISBN Database System Concepts Title Publisher 7.68 Address ©Silberschatz, Korth and Sudarshan Example 1: Determine NF ISBN Title ISBN Publisher Publisher Address Publisher is a non-key attribute, and it determines Address, another non-key attribute. Therefore, there is a transitive dependency, which means that the relation is NOT in 3 NF. BOOK ISBN Database System Concepts Title Publisher 7.69 Address ©Silberschatz, Korth and Sudarshan Example 1: Determine NF ISBN Title ISBN Publisher Publisher Address We know that the relation is at least in 2NF, and it is not in 3 NF. Therefore, we conclude that the relation is in 2NF. BOOK ISBN Database System Concepts Title Publisher 7.70 Address ©Silberschatz, Korth and Sudarshan Example 1: Determine NF ISBN Title ISBN Publisher Publisher Address In our solution we will write the following justification: 1) No Composite attributes, therefore at least 1NF 2) No partial dependencies, therefore at least 2NF 3) There is a transitive dependency (Publisher Address), therefore, not 3NF Conclusion: The relation is in 2NF BOOK ISBN Database System Concepts Title Publisher 7.71 Address ©Silberschatz, Korth and Sudarshan Example 2: Determine NF Product_ID Description All attributes are directly or indirectly determined by the primary key; therefore, the relation is at least in 1 NF ORDER Order_No Database System Concepts Product_ID 7.72 Description ©Silberschatz, Korth and Sudarshan Example 2: Determine NF Product_ID Description The relation is at least in 1NF. There is a COMPOSITE Primary Key (PK) (Order_No, Product_ID), therefore there can be partial dependencies. Product_ID, which is a part of PK, determines Description; hence, there is a partial dependency. Therefore, the relation is not 2NF. No sense to check for transitive dependencies! ORDER Order_No Database System Concepts Product_ID 7.73 Description ©Silberschatz, Korth and Sudarshan Example 2: Determine NF Product_ID Description We know that the relation is at least in 1NF, and it is not in 2 NF. Therefore, we conclude that the relation is in 1 NF. ORDER Order_No Database System Concepts Product_ID 7.74 Description ©Silberschatz, Korth and Sudarshan Example 2: Determine NF Product_ID Description In your solution you will write the following justification: 1) No M/V attributes, therefore at least 1NF 2) There is a partial dependency (Product_ID Description), therefore not in 2NF Conclusion: The relation is in 1NF ORDER Order_No Database System Concepts Product_ID 7.75 Description ©Silberschatz, Korth and Sudarshan Example 3: Determine NF Comp_ID and No are not determined by the primary key; therefore, the relation is NOT in 1 NF. No sense in looking at partial or transitive dependencies. Part_ID Description Part_ID Price Part_ID, Comp_ID No PART Part_ID Database System Concepts Descr Price 7.76 Comp_ID No ©Silberschatz, Korth and Sudarshan Example 3: Determine NF Part_ID Description Part_ID Price Part_ID, Comp_ID No In your solution you will write the following justification: 1) There are M/V attributes; therefore, not 1NF Conclusion: The relation is not normalized. PART Part_ID Database System Concepts Descr Price 7.77 Comp_ID No ©Silberschatz, Korth and Sudarshan Bringing a Relation to 1NF STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Database System Concepts 7.78 ©Silberschatz, Korth and Sudarshan Bringing a Relation to 1NF Option 1: Make a determinant of the repeating group (or the multivalued attribute) a part of the primary key. Composite Primary Key STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Database System Concepts 7.79 ©Silberschatz, Korth and Sudarshan Bringing a Relation to 1NF Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation, the primary key from the original relation and the determinant of the repeating group will comprise a primary key. STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Database System Concepts 7.80 ©Silberschatz, Korth and Sudarshan Bringing a Relation to 1NF STUDENT Stud_ID Name 101 Lennon 125 Jonson STUDENT_COURSE Stud_ID Course Units 101 MSI 250 3 101 MSI 415 3 125 MSI 331 3 Database System Concepts 7.81 ©Silberschatz, Korth and Sudarshan Bringing a Relation to 2NF Composite Primary Key STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Database System Concepts 7.82 ©Silberschatz, Korth and Sudarshan Bringing a Relation to 2NF Goal: Remove Partial Dependencies Partial Dependencies Composite Primary Key STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Database System Concepts 7.83 ©Silberschatz, Korth and Sudarshan Bringing a Relation to 2NF Remove attributes that are dependent from the part but not the whole of the primary key from the original relation. For each partial dependency, create a new relation, with the corresponding part of the primary key from the original as the primary key. STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Database System Concepts 7.84 ©Silberschatz, Korth and Sudarshan Bringing a Relation to 2NF CUSTOMER STUDENT_COURSE Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Stud_ID Course_ID 101 MSI 250 101 MSI 415 125 MSI 331 COURSE STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Database System Concepts 7.85 ©Silberschatz, Korth and Sudarshan Bringing a Relation to 3NF Goal: Get rid of transitive dependencies. Transitive Dependency EMPLOYEE Emp_ID F_Name L_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg Database System Concepts 7.86 Dept_ID Dept_Name ©Silberschatz, Korth and Sudarshan Bringing a Relation to 3NF Remove the attributes, which are dependent on a non-key attribute, from the original relation. For each transitive dependency, create a new relation with the non-key attribute which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent. EMPLOYEE Emp_ID F_Name L_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg Database System Concepts 7.87 Dept_ID Dept_Name ©Silberschatz, Korth and Sudarshan Bringing a Relation to 3NF EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg EMPLOYEE Emp_ID F_Name L_Name Dept_ID 111 Mary Jones 1 122 Sarah Smith 2 DEPARTMENT Dept_ID Dept_Name Database System Concepts 1 Acct 2 Mktg 7.88 ©Silberschatz, Korth and Sudarshan Decomposition The bad design suggests that we should decompose that table. It is generally not sufficient to check separately that each relation schema in the database is, say, in 2NF or 3NF. Rather, the process of normalization through decomposition must also confirm the existence of additional properties that the relation schemas, taken together should possess; The Lossless join, The dependency preservation property, which ensures that each functional dependency is represented in some individual relations resulting after decomposition. Database System Concepts 7.89 ©Silberschatz, Korth and Sudarshan Boyce-Codd Normal Form A relation scheme R < S, F > is in Boyce Codd Normal Form if for every non-trivial FD in F+ of the form X A Where X subset S A belongs S X is Superkey of R “ BCNF is stronger than 3 NF ” How ? If a relation is in BCNF then it must be in 3NF, but it is not necessary that if it is in BCNF then it should be in 3 NF as well. Database System Concepts 7.90 ©Silberschatz, Korth and Sudarshan Example R = (A, B, C, G, H, I) F={ AB AC CG H CG I B H} some members of F+ AH by transitivity from A B and B H AG I by augmenting A C with G, to get AG CG and then transitivity with CG I CG HI from CG H and CG I : “union rule” can be inferred from – definition of functional dependencies, or – Augmentation of CG I to infer CG CGI, augmentation of CG H to infer CGI HI, and then transitivity Database System Concepts 7.91 ©Silberschatz, Korth and Sudarshan Boyce-Codd Normal Form A relation scheme R < S, F > is in Boyce Codd Normal Form if for every non-trivial FD in F+ of the form X A Where X subset S A belongs S X is Superkey of R “ BCNF is stronger than 3 NF ” How ? If a relation is in BCNF then it must be in 3NF, but it is not necessary that if it is in BCNF then it should be in 3 NF as well. Database System Concepts 7.92 ©Silberschatz, Korth and Sudarshan BCNF A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form , where R and R, at least one of the following holds: is trivial (i.e., ) is a superkey for R Boyce-Codd normal form (or BCNF) requires that there are no non-trivial functional dependencies of attributes on something else than a superset of a candidate key. At this stage, all attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A). Database System Concepts 7.93 ©Silberschatz, Korth and Sudarshan Overview To understand 2NF 3NF BCNF FD’s required To understand 4NF 5NF Database System Concepts MVD is required 7.94 ©Silberschatz, Korth and Sudarshan Normalization Using Functional Dependencies When we decompose a relation schema R with a set of functional dependencies F into R1, R2,.., Rn we want Lossless-join decomposition: Otherwise decomposition would result in information loss. No redundancy: The relations Ri preferably should be in either BoyceCodd Normal Form or Third Normal Form. Dependency preservation: Let Fi be the set of dependencies F+ that include only attributes in Ri. Preferably the decomposition should be dependency preserving, that is, (F1 F2 … Fn)+ = F+ Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive. Database System Concepts 7.95 ©Silberschatz, Korth and Sudarshan Example R = (A, B, C) F = {A B, B C) R1 = (A, B), R2 = (B, C) Lossless-join decomposition: R1 R2 = {B} and B BC Dependency preserving R1 = (A, B), R2 = (A, C) Lossless-join decomposition: R1 R2 = {A} and A AB Not dependency preserving (cannot check B C without computing R1 Database System Concepts 7.96 R2) ©Silberschatz, Korth and Sudarshan Testing for Dependency Preservation To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following simplified test (with attribute closure done w.r.t. F) result = while (changes to result) do for each Ri in the decomposition t = (result Ri)+ Ri result = result t If result contains all attributes in , then the functional dependency is preserved. We apply the test on all dependencies in F to check if a decomposition is dependency preserving This procedure takes polynomial time, instead of the exponential time required to compute F+ and (F1 F2 … Fn)+ Database System Concepts 7.97 ©Silberschatz, Korth and Sudarshan Design Objectives & FDs Preservation The basic objective of normalization is to reduce the various anomalies in the database. Normalization can be looked upon as a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties. How to identify FDs and prime / nonprime attributes? An Attribute A in a relation is called a prime attribute if A is a part of any candidate key of the relation. Normalization goals: Dependency preservation Trivial Dependency :defn. 3 Normal Form & BCNF! Database System Concepts 7.98 ©Silberschatz, Korth and Sudarshan Normalization Using Functional Dependencies When we decompose a relation schema R with a set of functional dependencies F into R1, R2,.., Rn we want Lossless-join decomposition: Otherwise decomposition would result in information loss. No redundancy: The relations Ri preferably should be in either BoyceCodd Normal Form or Third Normal Form. Dependency preservation: Let Fi be the set of dependencies F+ that include only attributes in Ri. Preferably the decomposition should be dependency preserving, that is, (F1 F2 … Fn)+ = F+ Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive. Database System Concepts 7.99 ©Silberschatz, Korth and Sudarshan Functional Dependencies (Cont.) A functional dependency is trivial (obvious) if it is satisfied by all instances of a relation E.g. customer-name, loan-number customer-name customer-name customer-name In general, is trivial if Database System Concepts 7.100 ©Silberschatz, Korth and Sudarshan Design Goals Goal for a relational database design is: BCNF. Lossless join. Dependency preservation. If we cannot achieve this, we accept one of Lack of dependency preservation Redundancy due to use of 3NF Interestingly, SQL does not provide a direct way of specifying functional dependencies other than superkeys. Can specify FDs using assertions, but they are expensive to test Even if we had a dependency preserving decomposition, using SQL we would not be able to efficiently test a functional dependency whose left hand side is not a key. Database System Concepts 7.101 ©Silberschatz, Korth and Sudarshan Multivalued Dependencies (MVDs) For Achieving Higher Normal Forms like 4 NF and 5 NF, we have to understand MVDs. FDs rule out certain tuples from being in a relation, if A B then we can not have two tuples with same A value but different B values. MVDs do not rule out existence of certain tuples as there can be MultiValues Dependency ! Let R be a relation schema and let R and R. The multivalued dependency holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in r such that: Database System Concepts t1[] = t2 [] = t3 [] = t4 [] t3[] = t1 [] t3[R – ] = t2[R – ] t4 [] = t2[] t4[R – ] = t1[R7.102 – ] ©Silberschatz, Korth and Sudarshan Decomposition Properties Decompose the relation schema Lending-schema (branch-name, branch-city, assets, customer-name, loannumber, amount) into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1 R2 Lossless-join decomposition. For all possible relations r on schema R r = R1 (r) R2 (r) A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+: R1 R2 R1 R1 R2 R2 Database System Concepts 7.103 ©Silberschatz, Korth and Sudarshan Dependency Preservation Let Fi be the set of dependencies F + that include only attributes in Ri. A decomposition is dependency preserving, if (F1 F2 … Fn )+ = F + If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive. Database System Concepts 7.104 ©Silberschatz, Korth and Sudarshan Third Normal Form A relation schema R is in third normal form (3NF) if for all: in F+ at least one of the following holds: is trivial (i.e., ) is a superkey for R Each attribute A in – is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later). Database System Concepts 7.105 ©Silberschatz, Korth and Sudarshan Closure of a Set of Functional Dependencies We can find all of F+ by applying Armstrong’s Axioms: if , then (reflexivity) if , then (augmentation) if , and , then (transitivity) These rules are sound (generate only functional dependencies that actually hold) and complete (generate all functional dependencies that hold). Database System Concepts 7.106 ©Silberschatz, Korth and Sudarshan Third Normal Form: Motivation There are some situations where BCNF is not dependency preserving, and efficient checking for FD violation on updates is important Solution: define a weaker normal form, called Third Normal Form. Allows some redundancy (with resultant problems; we will see examples later) But FDs can be checked on individual relations without computing a join. There is always a lossless-join, dependency-preserving decomposition into 3NF. Database System Concepts 7.107 ©Silberschatz, Korth and Sudarshan Boyce–Codd Normal Form (BCNF) Based on functional dependencies that take into account all candidate keys in a relation, however BCNF also has additional constraints compared with general definition of 3NF. BCNF - A relation is in BCNF if and only if every determinant is a candidate key. Database System Concepts 7.108 ©Silberschatz, Korth and Sudarshan Third Normal Form : Closure Third Normal Form : A relation scheme is said to be in 3 NF if ; It is in 2 NF. No non-key attribute is functionally-dependent upon any Non- key attribute. Thus, there should be no transitive-dependency of a non-key attribute on the primary key. 3 NF Scheme does not allow partial dependencies like in 2NF but additionally 3NF does not allow any transitive dependencies. A relation scheme R < S, F > is in 3 NF if for all non-trivial Functional dependencies in F + (closure of F) of the form X A, either X contains a key (i.e. X is a super-key) or A is a prime attribute Database System Concepts 7.109 ©Silberschatz, Korth and Sudarshan Boyce–Codd normal form (BCNF) Difference between 3NF and BCNF is that for a functional dependency A B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key. Whereas, BCNF insists that for this dependency to remain in a relation, A must be a candidate key. Every relation in BCNF is also in 3NF. However, relation in 3NF may not be in BCNF. Database System Concepts 7.110 ©Silberschatz, Korth and Sudarshan Boyce–Codd normal form (BCNF) Violation of BCNF is quite rare. Potential to violate BCNF may occur in a relation that: contains two (or more) composite candidate keys; the candidate keys overlap (i.e. have at least one attribute in common). Database System Concepts 7.111 ©Silberschatz, Korth and Sudarshan General Definitions of 2NF and 3NF Second normal form (2NF) A relation that is in 1NF and every nonprimary-key attribute is fully functionally dependent on any candidate key. Third normal form (3NF) A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on any candidate key. Database System Concepts 7.112 ©Silberschatz, Korth and Sudarshan Third Normal Form A relation schema R is in third normal form (3NF) if for all: in F+ at least one of the following holds: is trivial (i.e., ) is a superkey for R Each attribute A in – is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later). Database System Concepts 7.113 ©Silberschatz, Korth and Sudarshan Third Normal Form: Motivation There are some situations where BCNF is not dependency preserving, and efficient checking for FD violation on updates is important Solution: define a weaker normal form, called Third Normal Form. Allows some redundancy (with resultant problems; we will see examples later) But FDs can be checked on individual relations without computing a join. There is always a lossless-join, dependency-preserving decomposition into 3NF. Database System Concepts 7.114 ©Silberschatz, Korth and Sudarshan Third Normal Form A relation schema R is in third normal form (3NF) if for all: in F+ at least one of the following holds: is trivial (i.e., ) is a superkey for R Each attribute A in – is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later). Database System Concepts 7.115 ©Silberschatz, Korth and Sudarshan 3NF A Relation scheme is said to be in 3nf if; IT is 2NF No non-key attribute is functionaly dependent upon upon any non –key attribute . Thus ,there should be no Transitive-dependencey of a non-key attribute on the primary key Database System Concepts 7.116 ©Silberschatz, Korth and Sudarshan 3NF (Cont.) Example R = (J, K, L) F = {JK L, L K} Two candidate keys: JK and JL R is in 3NF JK L LK JK is a superkey K is contained in a candidate key BCNF decomposition has (JL) and (LK) Testing for JK L requires a join There is some redundancy in this schema Equivalent to example in book: Banker-schema = (branch-name, customer-name, banker-name) banker-name branch name branch name customer-name banker-name Database System Concepts 7.117 ©Silberschatz, Korth and Sudarshan Testing for 3NF Optimization: Need to check only FDs in F, need not check all FDs in F+. Use attribute closure to check, for each dependency , if is a superkey. If is not a superkey, we have to verify if each attribute in is contained in a candidate key of R this test is rather more expensive, since it involve finding candidate keys testing for 3NF has been shown to be NP-hard Interestingly, decomposition into third normal form (described shortly) can be done in polynomial time Database System Concepts 7.118 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.119 ©Silberschatz, Korth and Sudarshan Overview and Higher Normal Form Overview of Normal Forms Third Normal Form: Weaker & Better ? Why BCNF & Why not BCNF ? Closure Approach FDs Analysis MVDs & 4 NF Performance & Denormalization ? Database System Concepts 7.120 ©Silberschatz, Korth and Sudarshan Relationship Between Normal Forms Database System Concepts 7.121 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.122 ©Silberschatz, Korth and Sudarshan Normal Forms & FDs: Review UNF (Unnormalized) – There are multivalued attributes or repeating groups, Record can not be uniquely identified. 1 NF – No composite attributes or repeating groups, 2 NF – 1 NF plus no partial dependencies 3 NF – 2 NF plus no transitive dependencies BCNF - 3 NF plus A relation is in BCNF if and only if every determinant is a candidate key. Database System Concepts Determinant: If A B is a FD and if B is not functionally dependent upon any subset of A , then A is said to be the determinant of B. 7.123 ©Silberschatz, Korth and Sudarshan 3 NF & BCNF 3 NF can also be defined in another way, using Closure Concept of FDs. Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. E.g. If A B and B C, then we can infer that A C The set of all functional dependencies logically implied by F is the closure of F. We denote the closure of F by F+. The Closure F+ can be used to give alternate definition of 3 NF. Database System Concepts 7.124 ©Silberschatz, Korth and Sudarshan Third Normal Form : Third Normal Form : A relation scheme is said to be in 3 NF if ; There should be no transitive-dependency of a non-key attribute on the primary key. 3 NF Scheme does not allow partial dependencies like in 2NF but additionally 3NF does not allow any transitive dependencies. A relation scheme R < S, F > is in 3 NF if for all non-trivial Functional dependencies in F + (closure of F) of the form X A, either X contains a key (i.e. X is a super-key) or A is a prime attribute Database System Concepts 7.125 ©Silberschatz, Korth and Sudarshan Third Normal Form A relation schema R is in third normal form (3NF) if for all: in F+ at least one of the following holds: is trivial (i.e., ) is a superkey for R Each attribute A in – is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later). Database System Concepts 7.126 ©Silberschatz, Korth and Sudarshan Boyce-Codd Normal Form A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form , where R and R, at least one of the following holds: is trivial (i.e., ) is a superkey for R Boyce-Codd normal form (or BCNF) requires that there are no non-trivial functional dependencies of attributes on something else than a superset of a candidate key. At this stage, all attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A). Database System Concepts 7.127 ©Silberschatz, Korth and Sudarshan Fourth Normal Form A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form , where R and R, at least one of the following hold: is trivial (i.e., or = R) is a superkey for schema R If a relation is in 4NF it is in BCNF Database System Concepts 7.128 ©Silberschatz, Korth and Sudarshan When to use BCNF Although BCNF is stricter than 3 NF band it is not neccesary that if some scheme is in 3 NF then it must be in BCNF but In most cases 3 NF is also BCNF . 3 NF is NOT BCNF only when a relation has composite candidate keys that have atleast one attribute in common. The BCNF is implemented in cases where the table has (a) Multiple candidate keys (b) Composite candidate keys (c) Candidate keys that overlap Database System Concepts 7.129 ©Silberschatz, Korth and Sudarshan Why we require BCNF Assume that a relation has more than one possible key, Assume further that the composite keys have a common attribute. IF an attribute of a composite key is dependent on an attribute of the other composite key, then this ,normalization called BCNF is needed For example:- professor ( professor code, dept, head of dept, percent time) Now it is assumed that 1. A professor can work in more than one department. 2 .The percentage of the time he spends in each department is given. 3. Each department has only one head of department. Database System Concepts 7.130 ©Silberschatz, Korth and Sudarshan 3NF relation Professor code Department Head of Department Percent P1 Physics Ghosh 50 P1 mathmematics Krishnan 50 P2 Chemistry Rao 25 p2 physics ghosh 75 p3 mathematics krishnan 100 Here the names of dept, and head are duplicated, if professor p2 resigns rows 3 and row 4 are deleted . We lose the information that rao is the head of the department of chemistry. So the Normalization of the relation is done by creating a new relation for department of chemistry Database System Concepts 7.131 ©Silberschatz, Korth and Sudarshan The Relationship diagram The two possible composite keys are professor code & dept or professor code & head of dept Observe that department as well as head of dept are not non-key attribute. They are part of a composite key. Head of the department department Professor code Percent time department Head of department Department Head of department Professor code Database System Concepts Percent time 7.132 ©Silberschatz, Korth and Sudarshan Now Relation is in BCNF department Percent time Professor code Department Head of department Prof code department percent P1 Physics 50 P1 Maths 50 P2 Chemistry 25 P2 Physics 75 p3 Maths 100 Database System Concepts 7.133 department Head of dept Physics Ghosh Maths Krishnan chemistry rao ©Silberschatz, Korth and Sudarshan 3NF Table Not in BCNF Figure 4.7 Database System Concepts 7.134 ©Silberschatz, Korth and Sudarshan Decomposition of Table Structure to Meet BCNF Database System Concepts 7.135 ©Silberschatz, Korth and Sudarshan BCNF Conversion Results Database System Concepts 7.136 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.137 ©Silberschatz, Korth and Sudarshan DBMS Design MVDs & Higher Normal Forms Why MVDs MVDs Rules Splitting in MVDs DBMS Design & Role of Normalization Limitation of Normalization Database System Concepts 7.138 ©Silberschatz, Korth and Sudarshan Multivalued Dependencies (MVDs) FDs rule out certain tuples from being in a relation, if A B then we can not have two tuples with same A value but different B values. MVDs do not rule out existence of certain tuples as there can be MultiValued Dependency ! Let R be a relation schema and let R and R. The multivalued dependency which can be read as α multidetermines β holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in r such that: t1[] = t2 [] = t3 [] = t4 [] t3[] = t1 [] t3[R – ] = t2[R – ] t4 [] = t2[] t4[R – ] = t1[R – ] Database System Concepts 7.139 ©Silberschatz, Korth and Sudarshan MVDs Multivalued dependencies are also referred to as tuple generating dependencies. After the Boyce -Codd normal form the results may be devoid of any functional dependencies but it may encounter multivalued dependencies as the multivalued dependencies also cause redundancy of data. For eg: If there are 3 attributes involved in a Relation A,B, and C.. Then for every value of A we will have respective values for B and C.. But it is a necessary in the 4th normal form that both B and C values are independent of each other. This is represented by .,, A->>B A->>C.. MVD or Multivalued Dependency is a dependency where one attribute value is potentially a "multivalued fact" about another and the attributes must be independent of each other. Database System Concepts 7.140 ©Silberschatz, Korth and Sudarshan Theory of MVDs From the definition of multivalued dependency, we can derive the following rule: If , then That is, every functional dependency is also a multivalued dependency The closure D+ of D is the set of all functional and multivalued dependencies logically implied by D. We can compute D+ from D, using the formal definitions of functional dependencies and multivalued dependencies. We can manage with such reasoning for very simple multivalued dependencies, which seem to be most common in practice For complex dependencies, it is better to reason about sets of dependencies using a system of inference rules . Database System Concepts 7.141 ©Silberschatz, Korth and Sudarshan MVDs & Higher Normal Forms For Achieving Higher Normal Forms like 4 NF, we have to understand MVDs. Consider following Scheme of a database of teaching courses, the books recommended for the course, and the lecturers who will be teaching the course: Because the lecturers attached to the course and the books attached to the course are independent of each other, this database design has a Multi-Valued Dependency; Course Book Lecturer AHA Silberschatz John D AHA Nederpelt William M AHA Silberschatz William M AHA Nederpelt John D AHA Silberschatz Christian G AHA Nederpelt Christian G OSO Silberschatz John D OSO Silberschatz William M Database System Concepts If we were to add a new book to the AHA course, we would have to add one record for each of the lecturers on that course, and vice versa. 7.142 There are two multivalued dependencies in this relation: {course} {book} & equivalently {course} {lecturer} ©Silberschatz, Korth and Sudarshan Boyce-Codd Normal Form A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form , where R and R, at least one of the following holds: is trivial (i.e., ) is a superkey for R Boyce-Codd normal form (or BCNF) requires that there are no non-trivial functional dependencies of attributes on something else than a superset of a candidate key. At this stage, all attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A). Database System Concepts 7.143 ©Silberschatz, Korth and Sudarshan Fourth Normal Form A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form , where R and R, at least one of the following hold: is trivial (i.e., or = R) is a superkey for schema R If a relation is in 4NF it is in BCNF Database System Concepts 7.144 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.145 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.146 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.147 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.148 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.149 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.150 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.151 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.152 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.153 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.154 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.155 ©Silberschatz, Korth and Sudarshan Database System Concepts 7.156 ©Silberschatz, Korth and Sudarshan Further Normal Forms Join dependencies generalize multivalued dependencies lead to project-join normal form (PJNF) (also called fifth normal form) A class of even more general constraints, leads to a normal form called domain-key normal form. Problem with these generalized constraints: are hard to reason with, and no set of sound and complete set of inference rules exists. Hence rarely used Database System Concepts 7.157 ©Silberschatz, Korth and Sudarshan 5 NF Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multivalued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R Database System Concepts 7.158 ©Silberschatz, Korth and Sudarshan Normalization and Database design While we have tried to present the normal forms in a simple and understandable way, we are by no means suggesting that the data design process is correspondingly simple. The design process involves many complexities which are quite beyond the scope of general discussion. In the first place, an initial set of data elements and records has to be developed, as candidates for normalization. Then the factors affecting normalization have to be assessed: Single-valued vs. multi-valued dependencies. Dependency on the entire key. Independent vs. dependent facts. The presence of mutual constraints. The presence of non-unique or non-singular representations. And, finally, the desirability of normalization has to be assessed, in terms of its performance impact on retrieval applications. Database System Concepts 7.159 ©Silberschatz, Korth and Sudarshan DBMS: Normalization & Beyond Normalization : » Advantages & Disadvantages Normalization : Sufficiency »Does it ensure best design Recent developments in DBMS and Data & Knowledge Engineering Database System Concepts 7.160 ©Silberschatz, Korth and Sudarshan ER Model and Normalization When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not need further normalization. However, in a real (imperfect) design, there can be functional dependencies from non-key attributes of an entity to other attributes of the entity Example: an employee entity with attributes department_number and department_address, and a functional dependency department_number department_address Good design would have made department an entity Functional dependencies from non-key attributes of a relationship set possible, but rare --- most relationships are binary Database System Concepts 7.161 ©Silberschatz, Korth and Sudarshan Denormalization for Performance May want to use non-normalized schema for performance Normalization imposes conditions for Check thus causing computations and slowing down the operations. For example, displaying customer_name along with account_number and balance requires join of account with depositor Alternative 1: Use denormalized relation containing attributes of account as well as depositor with all above attributes faster lookup extra space and extra execution time for updates extra coding work for programmer and possibility of error in extra code Alternative 2: use a materialized view defined as account depositor Benefits and drawbacks same as above, except no extra coding work for programmer and avoids possible errors Database System Concepts 7.162 ©Silberschatz, Korth and Sudarshan Other Design Issues Some aspects of database design are not caught by normalization Examples of bad database design, to be avoided: Instead of earnings (company_id, year, amount ), use earnings_2004, earnings_2005, earnings_2006, etc., all on the schema (company_id, earnings). Above are in BCNF, but make querying across years difficult and needs new table each year company_year(company_id, earnings_2004, earnings_2005, earnings_2006) Also in BCNF, but also makes querying across years difficult and requires new attribute each year. Is an example of a crosstab, where values for one attribute become column names Used in spreadsheets, and in data analysis tools Database System Concepts 7.163 ©Silberschatz, Korth and Sudarshan Advantages of Normalization 1) Avoids data modification (INSERT/DELETE/UPDATE) anomalies as each data item lives in One place 2) Normalization is conceptually cleaner and easier to maintain and change as your needs change 3) Fewer null values and less opportunity for inconsistency 4) A better handle on database security 5) The normalization process helps maximize the use of clustered indexes, which is the most powerful and useful type of index available. As more data is separated into multiple tables because of normalization the more clustered indexes become available to help speed up data access. Database System Concepts 7.164 ©Silberschatz, Korth and Sudarshan Disadvantages 1) Requires much more CPU, memory, and I/O to process thus normalized data gives reduced database performance 2) Requires more joins to get the desired result. A poorly-written query can bring the database down 3) Maintenance overhead. The higher the level of normalization, the greater the number of tables in the database Database System Concepts 7.165 ©Silberschatz, Korth and Sudarshan Adequacy of Normalization Is Normalization alone is enough for a good design ? Does all the design anomalies are removed by Normalization to highest Normal Forms ? Is normalization always lead to best design? Do we sometime need to De-normalized the Data ? When do we need to De-normalize ? Database System Concepts 7.166 ©Silberschatz, Korth and Sudarshan