Uploaded by Eshan Naidoo

Normalisation summary

advertisement
Normalisation
Before normalisation: One table with lots of repetition and no relationships
During normalisation:
 Remove all calculated fields
 Unmerge all cells
 Remove things that are not needed, e.g. headings, pictures, etc.
 Make data atomic, i.e. split up a field if it has too much different concepts in
 Get rid of repeating groups, i.e. simplify data
 Identify different relations with their primary keys and split up into different tables. Use foreign keys
to link table to primary key of another table, i.e. create relationships
 No two tables may have the same primary key!!!
 Remove data redundancy (unnecessary repetition in records), but keep necessary repetition, e.g. many
learners are in Grade 12.
 At the end each table must be field headings and content only and all tables must be related and only
have useful data
Terms and alternative names
table
entity / relation
record
row/ tuple/ entity instance
field
column / attribute
• A table is also called a relation
• Table definition is often written as a list of field names, with the primary key underlined
• For example the Student table:
Student (StudentID, Name, Class, House)
• Normalization is a technique for designing relational database tables
– to minimise duplication of information
– and to safeguard the database against certain types of logical or structural problems, namely
data anomalies
avoid the following:
• repeating groups
Repetition in a record in a field
Repetition in a record in multiple fields that have the
same kind of data
Name
Grade Sport
Roger Philips
Patience Mbata
10
10
Hennie Venter
12
Sarah Cohen
11
Cricket
Netball,
Hockey
Hockey,
Rugby,
Golf
Golf
Name
Grade
Sport1
Roger
Philips
10
Cricket
Patience
Mbata
10
Netball
Hockey
Hennie
Venter
12
Hockey
Rugby
Sarah
Cohen
11
Golf
– data redundancy
Repetition in multiple records in one or more field
Name
Grade Sport 1
Roger Philips
10
Cricket
Patience Mbata
10
Netball
Patience Mbata
10
Hockey
Hennie Venter
12
Hockey
Sport2
Sport3
Golf
•
Hennie Venter
12
Rugby
Hennie Venter
12
Golf
anomalies
– Update anomaly
• when we have to update the same data in more than one place
• human error could lead to inconsistencies
• example: Hennie Venter’s real name is Hendrik; we might change it in one place only
instead of all the places his name appear. Or what if there is 2 Hennies…
– Deletion anomaly
• when a deletion causes loss of data unnecessarily
• example: Sarah Cohen doesn’t do golf anymore, but just because she doesn’t do any
sports doesn’t mean we should delete her completely from database
– Insertion anomaly
• When we add a record that doesn’t satisfy the design of the primary key requirements.
• example: if name and sport is a primary key and we add a new learner that doesn’t do
sport.
• INSERT, DELETE and UPDATE SQL statements can cause problems if a database is not structured
correctly  NB to PLAN before you DESIGN your database tables
• Normalisation involves iteratively (repetitively) levelling relations, producing new relations in various
Normal forms.
• The data in the database is broken down into tables in clear defined stages using a set of rules
You have to:
• understand the problem
• try to get the database to represent the real world as accurately as possible
• be aware of any assumption you are making (deliberately or not)
Derived or Calculated field
• When you use a field to calculate another field
• Example: if you have a cost price, sales price and profit field, then the profit field would be derived
(calculated) from the cost price and sales price field
• a Derived field should be left out!  You can use a SQL statement to calculate this field again.
Non-atomic data
• When a field contains to much information that could be split into multiple fields
• Example: address field contains: 31 Vonkle street, Linksfield, Germiston, 1778
• If I wanted to find all students living in Germiston, the SQL statement would be problematic.
• Solution: break address field into multiple fields, e.g. street address, postal code, area, province, zip
code, etc.
Download