Normalization Stanislava Armstrong http://www.cs.nott.ac.uk/~saw/teaching/G64DBS/lecture_slides.html 1 Last Lecture Determine the outcome from executing each of the following statements on the Holidays table below: ID Destination Price 1 Cuba 0 2 NULL 250 3 Cuba 750 4 Colombia NULL 5 Cuba 500 6 ‘’ 1000 SELECT Destination, COUNT(*) FROM Holiday WHERE ISNULL(Price) = TRUE GROUP BY Destination; ID Destination Price 4 Colombia NULL 2 Coursework • http://www.cs.nott.ac.uk/~saw/teaching/G64DBS/as sessment.html • 3 parts: design, implement, use • There are 100 marks available • Electronic submission through cw submit • Due on Monday 6th Dec 2010, 5pm • Late submissions will be penalized by 5% per day • If you are working from home pay special attention to be consistent with the names of columns 3 Normalization There are many forms of normalization: • First Normal Form • Second Normal Form • Third Normal Form • Boyce-Cod Normal Form • Fourth Normal Form • Fifth Normal Form • Domain/Key Normal Form • Sixth Normal Form 4 Redundancy and Normalisation • Redundant data – Can be determined from other data in the database – Leads to various problems • INSERT anomalies • UPDATE anomalies • DELETE anomalies • Normalisation – Aims to reduce data redundancy – Redundancy is expressed in terms of dependencies – Normal Forms are hierarchical – Each normal form required the removal of a particular type of dependency in addition to those removed for the normal forms that comes before it. 5 First Normal Form • In most definitions of the relational model – All data values should be atomic – This means that table entries should be single values, not sets or composite objects • A relation is said to be in first normal form (1NF) if all data values it contains are atomic • To convert to a 1NF relation, split up any non-atomic values 6 Normalisation to 1NF Order_date Kingdom King Execution_num Execution_date 2010-11-04, 2010-11-12 Fairyland Queen Fairy 1, 2 2010-11-17 2010-11-14 Trolland The Troll King 3 2010-11-18 Order_date Kingdom King Execution_num Execution_date 2010-11-04 Fairyland Queen Fairy 1 2010-11-17 2010-11-12 Fairyland Queen Fairy 2 2010-11-17 The Troll King 3 2010-11-18 2010-11-14 Trolland 7 Problems in 1NF • INSERT anomalies – Can't add a new kingdom without adding an Order_date for it • DELETE anomalies – If we remove the order from 2010-11-14 we will remove Trolland as well • UPDATE anomalies – To change the king of Fairyland we have to change two rows Order_date Kingdom King Execution_num Execution_date 2010-11-04 Fairyland Queen Fairy 1 2010-11-17 2010-11-12 Fairyland Queen Fairy 1 2010-11-17 The Troll King 2 2010-11-18 2010-11-14 Trolland 8 Functional Dependencies • Redundancy is often caused by a functional dependency • A functional dependency (FD) is a link between two sets of attributes in a relation • We can normalise a relation by removing undesirable FDs • A set of attributes, A, functionally determines another set, B, or: there exists a functional dependency between A and B (A B), if whenever two rows of the relation have the same values for all the attributes in A, then they also have the same values for all the attributes in B. 9 Properties of FDs • In any relation – The primary key FDs any set of attributes in the relation KX • K is the primary key, X is a set of attributes – Same for candidate keys – Any set of attributes is a FD on itself XX • Rules for FDs (Just for fun:) – Reflexivity: If B is a subset of A then AB – Augmentation: If A B then AUCBUC – Transitivity: If A B and B C then A C 10 FDs and Normalisation • We define a set of 'normal forms' – Each normal form has fewer FDs than the last – Since FDs represent redundancy, each normal form has less redundancy than the last • Not all FDs cause a problem – We identify various sorts of FD that do – Each normal form removes a type of FD that is a problem – We will also need a way to remove FDs 11 Representation • • • • Diagrammatically functional dependencies are represented by arrows which start from the set of columns, which functionally determines the set of columns where the arrow ends The candidate keys in the relation are represented by a box which is drawn around them Trivial FD are not represented on the diagram FDs dependent on the whole candidate key are not represented on the diagram Order_date Kingdom King Execution_num Execution_date Functional dependencies can also be represented in the following way: • {Execution_num} {Execution_date} • {Kingdom} {King} 12 Partial FDs and 2NF • Partial FDs: • Let us call attributes which are part of some candidate key, key attributes, and the rest non-key attributes. – A FD, A B is a partial FD, if some attribute of A can be removed and the FD still holds – Formally, there is some proper subset of A, C A, such that C B Second normal form: • A relation is in second normal form (2NF) if it is in 1NF and no non-key attribute is partially dependent on a candidate key • In other words, no C B where C is a strict subset of a candidate key and B is a nonkey attribute. • Normalizing to 2NF can be achieved by splitting the relation to two relations, where: – Given that the original relation is not in 2NF due to FD A B – The first new relation will contain all of the columns contained in A and from B – The second relation will contain all of the columns which are in the original relation, but are not contained in either A or B, and the columns contained in A 13 Example Order_date Order_date Kingdom Kingdom King Execution_num Execution_num Execution_Date Execution_Date Kingdom King 14 Problems Resolved in 2NF • Problems in 1NF • In 2NF we have resolved all of those problems, but we have some other problems remaining – INSERT – Can't add a new kingdom without adding an Order_date for it – UPDATE – To change the king of Fairyland we have to change two rows – DELETE – If we remove the order from 2010-11-14 we will remove Trolland as well Order_date Kingdom Execution_num Execution_date 2010-11-04 Fairyland 1 2010-11-17 2010-11-12 Fairyland 1 2010-11-17 2010-11-14 Trolland 2010-11-18 2 15 Problems Remaining in 2NF • INSERT anomalies – Can’t assign a date to an execution number that is not yet assigned to an order • UPDATE anomalies – To change the Execution_date for execution 1 we have to change two rows Order_date Kingdom Execution_num Execution_date 2010-11-04 Fairyland 1 2010-11-17 2010-11-12 Fairyland 1 2010-11-17 2010-11-14 Trolland 2010-11-18 2 16 Transitive FDs and 3NF • Transitive FDs: – A FD, A C is a transitive FD, if there is some set B such that A B and B C are non-trivial FDs – A B non-trivial means: B is not a subset of A – We have ABC • Third normal form – A relation is in third normal form (3NF) if it is in 2NF and no non-key attribute is transitively dependent on a candidate key • To move a relationship from 2NF to 3NF: – Given the transitive FD A B C – We split the relation into two new relations – The first contains all of the columns contained in B and C – The second contains all of the columns which are not contained in A, B or C and the columns contained in A and B 17 Example Order_date Kingdom Execution_num Order_date Kingdom Execution_num Kingdom Execution_Date Execution_num Kingdom King Execution_Date King 18 Problems Resolved in 3NF • Problems in 2NF – INSERT – Can’t assign a date to an execution number that is not yet assigned to an order – UPDATE – To change the Execution_date for execution 1 we have to change two rows • In 3NF all of these are resolved (for this relation – but 3NF can still have anomalies!) Order_date Kingdom Execution_num Execution_num Execution_date 2010-11-04 Fairyland 1 1 2010-11-17 2010-11-12 Fairyland 1 1 2010-11-17 2010-11-14 Trolland 2 2010-11-18 2 19 Problems Remaining in 3NF • INSERT anomalies – You can’t add melon without adding a kingdom • UPDATE anomalies – Changing the execution_num for the Fairlyland order, means changing two rows • DELETE anomalies – Deleting Fairlyland deletes apple Order_date Kingdom Fruit Execution_num 2010-11-04 Fairyland apple 1 2010-11-12 Fairyland kiwi 1 2010-11-14 Trolland kiwi 2 20 Boyce-Codd Normal Form • A relation is in Boyce-Codd normal form (BCNF) if for every FD A B either – B is contained in A (the FD is trivial), or – A contains a candidate key of the relation, • The same as 3NF except in 3NF we only worry about non-key Bs • If there is only one candidate key then 3NF and BCNF are the same • In other words: every determinant in a non-trivial dependency is a (super) key. 21 Example Order_date fruit OrderID Kingdom fruit OrderID Order_date Kingdom OrderID 22 Decomposition Properties • Lossless: Data should not be lost or created when splitting up relations • Dependency preservation: It is desirable that FDs are preserved when splitting up relations • Normalisation to 3NF is always lossless and dependency preserving • Normalisation to BCNF is lossless, but may not preserve all dependencies 23 Normalisation and Design • Normalisation is related to DB design – A database should normally be in 3NF at least – If your design leads to a non3NF DB, then you might want to revise it • When you find you have a non-3NF DB – Identify the FDs that are causing a problem – Think whether they will lead to any insert, update, or delete anomalies – Try to remove them 24 Normalisation Summary • First normal form – All data values are atomic • Second normal form – In 1NF plus no non-key attribute is partially dependent on a candidate key • Third normal form – In 2NF plus no non-key attribute depends transitively on a candidate key • Boyce-Codd Normal Form – In 2NF plus no attribute depends on a non super key 25 Denormalisation • Normalisation – Removes data redundancy – Solves INSERT, UPDATE, and DELETE anomalies – This makes it easier to maintain the information in the database in a consistent state • However – It leads to more tables in the database – Often these need to be joined back together, which is expensive to do – So sometimes (not often) it is worth ‘denormalising’ 26 Denormalisation • You might want to denormalise if – Database speeds are unacceptable (not just a bit slow) – There are going to be very few INSERTs, UPDATEs, or DELETEs – There are going to be lots of SELECTs that involve the joining of tables 27 Exercise • Given the following schema determine what normal form it is in and transform it to BCNF if possible. Price Paid Due Hotel Location Client Client_address BookingID 28 Reading Material • The Manga Guide to Databases – chapter 3 • Database Systems – A Practical Approach to Design, Implementation and Management by Connolly and Begg –chapters 13 and 14 • Any other book – chapter on normalization 29