Normalisation Before normalisation: One table with lots of repetition and no relationships During normalisation: Remove all calculated fields Unmerge all cells Remove things that are not needed, e.g. headings, pictures, etc. Make data atomic, i.e. split up a field if it has too much different concepts in Get rid of repeating groups, i.e. simplify data Identify different relations with their primary keys and split up into different tables. Use foreign keys to link table to primary key of another table, i.e. create relationships No two tables may have the same primary key!!! Remove data redundancy (unnecessary repetition in records), but keep necessary repetition, e.g. many learners are in Grade 12. At the end each table must be field headings and content only and all tables must be related and only have useful data Terms and alternative names table entity / relation record row/ tuple/ entity instance field column / attribute • A table is also called a relation • Table definition is often written as a list of field names, with the primary key underlined • For example the Student table: Student (StudentID, Name, Class, House) • Normalization is a technique for designing relational database tables – to minimise duplication of information – and to safeguard the database against certain types of logical or structural problems, namely data anomalies avoid the following: • repeating groups Repetition in a record in a field Repetition in a record in multiple fields that have the same kind of data Name Grade Sport Roger Philips Patience Mbata 10 10 Hennie Venter 12 Sarah Cohen 11 Cricket Netball, Hockey Hockey, Rugby, Golf Golf Name Grade Sport1 Roger Philips 10 Cricket Patience Mbata 10 Netball Hockey Hennie Venter 12 Hockey Rugby Sarah Cohen 11 Golf – data redundancy Repetition in multiple records in one or more field Name Grade Sport 1 Roger Philips 10 Cricket Patience Mbata 10 Netball Patience Mbata 10 Hockey Hennie Venter 12 Hockey Sport2 Sport3 Golf • Hennie Venter 12 Rugby Hennie Venter 12 Golf anomalies – Update anomaly • when we have to update the same data in more than one place • human error could lead to inconsistencies • example: Hennie Venter’s real name is Hendrik; we might change it in one place only instead of all the places his name appear. Or what if there is 2 Hennies… – Deletion anomaly • when a deletion causes loss of data unnecessarily • example: Sarah Cohen doesn’t do golf anymore, but just because she doesn’t do any sports doesn’t mean we should delete her completely from database – Insertion anomaly • When we add a record that doesn’t satisfy the design of the primary key requirements. • example: if name and sport is a primary key and we add a new learner that doesn’t do sport. • INSERT, DELETE and UPDATE SQL statements can cause problems if a database is not structured correctly NB to PLAN before you DESIGN your database tables • Normalisation involves iteratively (repetitively) levelling relations, producing new relations in various Normal forms. • The data in the database is broken down into tables in clear defined stages using a set of rules You have to: • understand the problem • try to get the database to represent the real world as accurately as possible • be aware of any assumption you are making (deliberately or not) Derived or Calculated field • When you use a field to calculate another field • Example: if you have a cost price, sales price and profit field, then the profit field would be derived (calculated) from the cost price and sales price field • a Derived field should be left out! You can use a SQL statement to calculate this field again. Non-atomic data • When a field contains to much information that could be split into multiple fields • Example: address field contains: 31 Vonkle street, Linksfield, Germiston, 1778 • If I wanted to find all students living in Germiston, the SQL statement would be problematic. • Solution: break address field into multiple fields, e.g. street address, postal code, area, province, zip code, etc.