SDLC Tute - Monash University, Victoria, School of Information

advertisement
Monash University
School of Information Management and Systems
IMS1907 Database Systems
Semester 2, 2004
Tutorial Weeks 8 & 9 - Normalisation of Data
Tutors Notes
Tutorial Objectives:
- to develop further understanding of detailed data modelling
to practice normalisation (detailed data modelling) skills
Tutorial Resources:
- Tutorial 8 & 9 handout – Normalisation of data
- IMS1907 Lecture Weeks 7 and 8 – Data modelling and Normalisation
Tutorial Task:
These questions will be considered over the next 2 weeks. There is no need to
address all questions – choose a selection that gradually increases in complexity
over weeks 8 and 9. Stress the importance of practice to learning this skill and
encourage students to do all these exercises and to find additional exercises in
textbooks
1. Provide short answers to the following questions:
a) What are the objectives of the process of normalising data?
Stable, robust, flexible data structures – easy to add new attributes, new
structures, data structures do not change much
b) What is meant by the term 'primary key'?
An attribute or combination of attributes that uniquely identifies all other attributes
in a record or relation
c) What is meant by the term 'foreign key'?
An attribute that appears in one table as a PK and also in a second table as a
means of joining the tables together
d) What is meant by the term 'candidate key'?
Two or more attributes that appear in a relation where any of those attributes can
be used to uniquely identify all the other attributes in the relation – we refer to
these attributes as candidate keys and choose one to act as the PK – any further
dependencies on the other candidate keys are ignored from that point. We
generally choose the most stable attribute or the one over which we have most
control.
e) What is meant by the term 'functional dependency'?
A functional dependency occurs where for each value of an attribute (or
combination of attributes) in a relation, there is only ever one value of a second
attribute – all attributes in a relation should be fully functionally dependent on the
PK
f) Describe the steps required to convert an unnormalised relation to Third Normal
Form relations.
1 – identify PK
2 – identify repeating groups
1
3 – remove repeating groups  1NF
4 – identify partial dependencies within the PK
5 - identify partial dependencies of non-key attributes on part of the PK
6 – remove partial dependencies  2NF
7 – identify transitive dependencies between non-key attributes
8 – remove transitive dependencies  3NF
2. Investigate the data in the following examples to establish all the business rules.
Some clues are given in each case. Draw an ER model for the relation. Then use
the steps of normalisation to fully normalise the data given. Check your answer by
drawing a data structure diagram for the answer and comparing it with the
expected diagram of your business rules.
These are very straightforward – you can do a selection of these but recommend
that students attempt them all. It is important that you work through these – show
how you get to each form and indicate all dependencies using arrows – I expect
this in the exam.
a) EMPLOYEE
(EMP-NO., EMP-NAME, SALARY, (PROJ-NO, PROJNAME, COMPLETION-DATE))
[Each project has a due date by which completion is expected.]
EMPLOYEE
(EMP-NO., EMP-NAME, SALARY, (PROJ-NO, PROJ-NAME,
COMPLETION-DATE))
1NF
EMPLOYEE
(EMP-NO., EMP-NAME, SALARY)
EMPLOYEE-PROJECT (EMP-NO, PROJ-NO, PROJ-NAME, COMPLETIONDATE)
2NF
The proj-name and completion date depend on proj-no resulting in the following
relations:
EMPLOYEE
(EMP-NO., EMP-NAME, SALARY)
PROJECT (PROJ-NO, PROJ-NAME, COMPLETION-DATE))
EMPLOYEE-PROJECT (EMP-NO, PROJ-NO)
2NF=3NF
b) EMPLOYEE (EMP-NO, EMP-NAME, EMP-LOCATION, DEPT-NO, DEPTNAME)
[Each employee has an office that is his/her "location"]
The data set here indicates that an employee only works for one dept – it would
result in slightly different 3NF if the opposite was true.
EMPLOYEE (EMP-NO, EMP-NAME, EMP-LOCATION, DEPT-NO, DEPT-NAME)
1NF
2
EMPLOYEE (EMP-NO, EMP-NAME, EMP-LOCATION, DEPT-NO, DEPT-NAME)
There is only a single part PK so there are no partial dependencies so
1NF=2NF
3NF
The following transitive dependency exists in the 2NF relation
Dept-name depends on dept-no
This results in the following 3NF relations:
EMPLOYEE (EMP-NO, EMP-NAME, EMP-LOCATION, DEPT-NO)
DEPT (DEPT-NO, DEPT-NAME)
c) PROGRAMMER (PROGRAMMER-ID, PROGRAMMER-NAME, (PACKAGENO, PACKAGE-NAME, NO-HRS-WORKED))
[A package is a collection of programs. Several programmers may work on the
same package at the same time. For costing purposes, the Department wants to
know how many hours each programmer spent on each package.]
PROGRAMMER (PROGRAMMER-ID, PROGRAMMER-NAME, (PACKAGE-NO,
PACKAGE-NAME, NO-HRS-WORKED))
1NF
PROGRAMMER (PROGRAMMER-ID, PROGRAMMER-NAME)
PROGRAMMER-PACKAGE (PROGRAMMER-ID, PACKAGE-NO, PACKAGENAME, NO-HRS-WORKED)
2NF
The package-name depends on the package-no so the following 2NF result
PROGRAMMER (PROGRAMMER-ID, PROGRAMMER-NAME)
PACKAGE ( PACKAGE-NO, PACKAGE-NAME)
PROGRAMMER-PACKAGE (PROGRAMMER-ID, PACKAGE-NO, NO-HRSWORKED)
2NF=3NF
d) PART (PART-NO, PART-DESCRIPTION, (SUPPLIER-NO, SUPPLIER-NAME,
SUPPLIER-ADDRESS, PRICE))
[The same part may be available from different suppliers at different prices.]
PART (PART-NO, PART-DESCRIPTION, (SUPPLIER-NO, SUPPLIER-NAME,
SUPPLIER-ADDRESS, PRICE))
1NF
PART (PART-NO, PART-DESCRIPTION)
PART-SUPPLIER (PART-NO, SUPPLIER-NO, SUPPLIER-NAME, SUPPLIERADDRESS, PRICE)
2NF
3
The supplier-name and supplier address depend on the supplier-no so the
following 2NF result
PART (PART-NO, PART-DESCRIPTION)
SUPPLIER (SUPPLIER-NO, SUPPLIER-NAME, SUPPLIER-ADDRESS)
PART-SUPPLIER (PART-NO, SUPPLIER-NO, PRICE)
2NF=3NF
e) EMPLOYEE (EMP-NO, EMP-NAME, (SKILL-CODE, SKILL-DESC),
SALARY)
[An employee's initial salary may have taken his skill levels into account but there
is no direct relationship between skills and salary level.]
EMPLOYEE (EMP-NO, EMP-NAME, (SKILL-CODE, SKILL-DESC), SALARY)
1NF
EMPLOYEE (EMP-NO, EMP-NAME, SALARY)
EMPLOYEE-SKILL (EMP-NO, SKILL-CODE, SKILL-DESC)
2NF
The skill-desc depends on the skill-code so the following 2NF result
EMPLOYEE (EMP-NO, EMP-NAME, SALARY)
SKILL (SKILL-CODE, SKILL-DESC)
EMPLOYEE-SKILL (EMP-NO, SKILL-CODE)
2NF=3NF
f) REGION
(REGION-NAME, REGION-MANAGER, LOCATION (CUSTNAME, CUST-ADDRESS))
[A customer is serviced only by his local region. Any customers with multiple
branches are given different customer names.]
REGION (REGION-NAME, REGION-MANAGER, LOCATION (CUST-NAME,
CUST-ADDRESS))
The PK here is interesting – strictly speaking the only key we need is custnameas this will uniquely ID the rest of the relation – we will allow the regionname to be included but strictly speaking it is redundant and over-specified.
Also region-name, region-manager and location can be considered as candidate
keys as they identify each other – we choose region-name as the most stable
1NF
REGION (REGION-NAME, REGION-MANAGER, LOCATION)
REGION-CUSTOMER (REGION-NAME, CUST-NAME, CUST-ADDRESS)
2NF
The cust-address depends on the cust-name so the following 2NF result
REGION (REGION-NAME, REGION-MANAGER, LOCATION)
CUSTOMER ( CUST-NAME, CUST-ADDRESS))
REGION-CUSTOMER (REGION-NAME, CUST-NAME)
2NF=3NF
4
Although region-manager and location are dependent on each other we ignore
this as we discarded them as candidate keys and can disregard them so
The next question was covered in the workshop so it can be ignored – solution
included anyway
2. The data in the following table contains an example of data that is not fully
normalised.
 Draw the ER model for this relation
 Define the create anomaly using data from the table.
 Define the delete anomaly using data from the table.
 Define the update anomaly using data from the table.
 Express the structure of the table above as a set of 3NF relations. Show the
steps you follow to obtain these relations.
 Draw the resulting DSD
Book-no
1256
3297
2672
1256
3357
6889
Copy
3
1
1
1
2
2
Call-no
102.64.c
356.66d
785.99e
102.64c
557.22a
229.89d
Borrower-no
12345
35666
24287
35926
23510
35926
Name
Adams
Boyle
Boyle
Brown
Dent
Brown
Address
Brighton
Caulfield
Frankston
Caulfield
Prahran
Caulfield
BOOK (Book-no, copy, call-no, borrower-no, name, address)
Book-no and call-no are candidate keys – we control book-no
The PK of this relation needs both parts to uniquely ID the borrower of a book but it
needs to be arranged to ID repeating groups.
BOOK (Book-no, call-no, (copy, borrower-no, name, address))
1NF
BOOK (Book-no, call-no)
BORROWED-BOOK (Book-no, copy, borrower-no, name, address)
No partial dependencies exist in borrowed-book so
1NF=2NF
3NF
Name and address are dependent on borrower-no resulting in the following 3NF
BOOK (Book-no, call-no)
BORROWER (borrower-no, name, address)
BORROWED-BOOK (Book-no, copy, borrower-no)
5
3. Consider the data in the table below. The table has been designed to record
information about purchase orders. Some business rules may be inferred from the
data values but you should list any further assumptions you make.
PO-No
Supp-No
Supp-Name
Item-No
Item-Desc
Qty
Cost
PO-Date
158976
4576
Grey
3593
Nut
35
$8
15/5/98
158976
4576
Grey
9284
Bolt
40
$25
15/5/98
158976
4576
Grey
3598
Washer
30
$5
15/5/98
454638
7589
White
3485
Spring
400
$200
14/5/98
374365
3849
White
3593
Nut
10
$3
12/5/98
374365
3489
White
5467
Screw
11
$3
12/5/98




Draw the ER model for this relation
Consider the create, delete and update anomalies using the table data.
Express the structure of the table above as a set of 3NF relations. Show the
steps you follow to obtain these relations.
Draw the resulting DSD
PO (PO-no, supp-no, supp-name, item-no, item-desc, qty, cost, PO-date)
This needs rearranging to ID repeating groups.
PO (PO-no, PO-date, supp-no, supp-name, (item-no, item-desc, qty, cost))
1NF
PO (PO-no, PO-date, supp-no, supp-name)
PO-ITEM (PO-no, item-no, item-desc, qty, cost)
2NF
Item-desc depends on item-no, and there is no apparent relationship between itemno and its cost. This results in the following 2NF
PO (PO-no, PO-date, supp-no, supp-name)
ITEM (item-no, item-desc)
PO-ITEM (PO-no, item-no, qty, cost)
3NF
Supp-name depends on supp-no resulting in the following 3NF
PO (PO-no, PO-date, supp-no)
SUPPLIER (supp-no, supp-name)
ITEM (item-no, item-desc)
PO-ITEM (PO-no, item-no, qty, cost)
4. The TopTech Computer Training Company offers courses in IT to businesses and
other organisations. The following database table currently stores the trainee
records for the Course and Trainee Records information system at TopTech.
6
Assume that this is the sole table to record data about companies, trainees and the
courses they take.
Some of the business rules may be inferred from the data in the table. Other
business rules you need to consider are as follows:
– A trainee may have attended courses while working for the same or different
companies
– A trainee can only attend one course on a given day
– More than one course may be conducted on a single day
– A trainee may ‘fail’ a course and therefore attend it twice
No
Company
Name
Phone
Trainee
No
Trainee
Name
Address
Course
Date
Paid
123
BJP
9812 3456
4067
Bill Nguyen
Clayton
Notes 1
1/6/98
YES
123
BJP
9812 3456
4067
Bill Nguyen
Clayton
Notes 2
9/6/98
YES
245
Henderson
Consulting
9574 1234
2122
Amanda
Pappas
Dandenong
Notes 1
20/7/98
YES
378
Dunlop Uni
9905 5000
4067
Bill Nguyen
Clayton
MS Office
3/8/98
NO
378
Dunlop Uni
9905 5000
3095
Jenny Tran
Berwick
Notes 1
20/7/98
NO
378
Dunlop Uni
9905 5000
1997
John Murphy
Altona
MS Office
3/8/98
NO




Draw the ER model based on the above information
Check that your ER diagram is correct and captures all business rules
Express the structure of the table above as a set of 3NF relations. Show the
steps you follow to obtain these relations.
Draw the resulting DSD
COMPANY (Company-no, name, phone, (trainee-no, trainee-name, address,
(course, (date, paid))))
1NF
COMPANY (Company-no, name, phone)
COMPANY-TRAINEE (Company-no, trainee-no, trainee-name, address)
COMPANY-TRAINEE-COURSE (Company-no, trainee-no, course)
COURSE-STATUS (Company-no, trainee-no, course, date, paid)
2NF
trainee-name and address depend on trainee-no
company-no, course and paid are all dependent on trainee-no and date (ie if we
know trainee-no and date we know the company, course and its paid status ‘ços the
trainee can only be working for one company on that date and can only attend one
course)
This results in the following 2NF relations
COMPANY (Company-no, name, phone)
TRAINEE (trainee-no, trainee-name, address)
COMPANY-TRAINEE (Company-no, trainee-no)
COMPANY-TRAINEE-COURSE (Company-no, trainee-no, course)
COURSE-STATUS (Company-no, trainee-no, course, date, paid)
7
There are no transitive dependencies so
2NF=3NF
The DSD for this is as follows
COMPANY
COMPANY
TRAINEE
TRAINEE
COMPANY
TRAINEE
COURSE
COURSE
STATUS
5. The relation below and its accompanying business rules is a portion of the data
from a personnel system. Convert this relation into a set of third normal form
relations.
Employee (Personnel number, employee name, employee address, employee
telephone number, employee date of birth, department number, department
name, commencement date, job title, (training course name, training course
date, course duration, skill level acquired), (project number, project name,
project start date, project end date))
1. Each project may have a number of employees assigned to it.
2. Each course may be attended by a number of employees.
3. Every time a training course is run it is of the same duration.
4. Project start date is the date on which a project commences and project end
date is the date on which it is completed.
List any further assumptions you make about the business rules that apply.
Assumptions:
– commencement date and job title relate to the starting date with the company
and job title at this time
– an employee may attend a particular course more than once
– an employee only attends one course on a given day
8
–
each course has a fixed duration
Employee (Personnel number, employee name, employee address,
employee telephone number, employee date of birth, department
number, department name, commencement date, job title, (training
course name, training course date, course duration, skill level
acquired), (project number, project name, project start date, project
end date))
1NF
Employee (Personnel number, employee name, employee address,
employee telephone number, employee date of birth, department
number, department name, commencement date, job title)
Employee-Course (Personnel number, training course name, training course
date, course duration, skill level acquired)
Employee-Project (Personnel number, project number, project name, project
start date, project end date)
2NF
training course duration is dependent on training course name
training course name is dependent on personnel number and training course
date
project name, project start date and project end date are dependent on
project number
Employee (Personnel number, employee name, employee address,
employee telephone number, employee date of birth, department
number, department name, commencement date, job title)
Course (training course name, course duration)
Employee-Course (Personnel number, training course name, training course
date, skill level acquired)
Project (project number, project name, project start date, project end date)
Employee-Project (Personnel number, project number)
3NF
department name depends on department number
This results in the following 3NF relations
Employee (Personnel number, employee name, employee address,
employee telephone number, employee date of birth, department
number, commencement date, job title)
Department (department number, department name)
Course (training course name, course duration)
Employee-Course (Personnel number, training course name, training course
date, skill level acquired)
Project (project number, project name, project start date, project end date)
Employee-Project (Personnel number, project number)
9
6. The SIMS Alumni Association wishes to keep records of all their past students and
their employment history ie the companies they have worked for and the positions
they have held within the companies.
The data for these records is contained in a single table as follows:
Student
No
Student
Name
Student
Address
Course
Code
Course
Name
Company
Name
1256
Jane
Brighton
9458
M. Comp
Boles
3297
Jack
Caulfield
2358
Bach. I.S.
2672
Bill
Frankston
9458
1256
Jane
Brighton
1256
Jane
1256
Jane
Company
Address
Position
Held
Date
Carnegie
Programmer
050297
Felstra
Melbourne
Programmer
030593
M. Comp
Mobil
Melbourne
Analyst
Programmer
020794
9458
M. Comp
Waysafe
Geelong
Systems
Analyst
100195
Brighton
2358
Bach. I.S.
Sands
Caulfield
Systems
Analyst
050593
Brighton
9458
M. Comp
Waysafe
Geelong
IT Project
Manager
180899
The business rules may be inferred from the data in the table but list any further
assumptions you make.
 Draw the ER model based on the above information
 Check that your ER diagram is correct and captures all business rules
 Express the structure of the table above as a set of 3NF relations. Show the
steps you follow to obtain these relations.
 Draw the resulting DSD.
From the table above the following UNF relation can be formed
STUDENT (student-no, student-name, student-address, course-code, course-name,
company-name, company-address, position-held, date)
From the table above the following rules can be determined:
–
–
–
A student can take more than one course
A student can work at more than one company
A student can have held more than one position with the same company but
on different dates
The following assumptions are also made:
On any given date a student will only hold one position with one company
The following UNF can be determined:
STUDENT (student-no, student-name, student-address, (course-code, coursename), (company-name, company-address, (position-held, date)
1NF
STUDENT (student-no, student-name, student-address)
STUDENT-COURSE (student-no, course-code, course-name)
STUDENT-COMPANY (student-no, company-name, company-address)
STUDENT-COMPANY-POSITION (student-no, company-name, position-held, date)
The following dependencies exist:
10
course-code  course-name
company-name  company-address
student-no, date  company-name, position-held (on any particular date, a student
can have only been holding one position at one company)
This leads to the following 2NF relations
2NF
STUDENT (student-no, student-name, student-address)
COURSE (course-code, course-name)
STUDENT-COURSE (student-no, course-code)
COMPANY (company-name, company-address)
STUDENT-COMPANY (student-no, company-name)
STUDENT-COMPANY-POSITION (student-no, company-name, position-held, date)
There are no transitive dependencies so
2NF=3NF
7. The relations below represent a portion of the data from an insurance claims
system. Convert these relations into a set of third normal form relations. Show all
intermediate forms of the relation between unnormalised and third normal form.
List any assumptions you make concerning the "business rules".
Claim (claim no., claim date, claimant name, claim type, claim type payment
rate, claim details, policy no., policy type)
Policy (policy no., policy type, policy date, policy renewal date, policy amount
due, no. of claims processed, (claim no., date of claim, claim type,
claim amount paid))
The following relatively straightforward ER can be drawn
POLICY
CLAIM
So merging the two relations above we arrive at the following two UNF relations
Claim (claim no., claim date, claimant name, claim type, claim type payment rate,
claim amount paid, claim details, policy no., policy type, policy date, policy
renewal date, policy amount due, no. of claims processed)
Policy (policy no., policy type, policy date, policy renewal date, policy amount due,
no. of claims processed, (claim no., claim date, claimant name, claim type,
claim type payment rate, claim amount paid, claim details))
We should now normalize both of these separately and merge our final 3NF
relations.
11
Claim (claim no., claim date, claimant name, claim type, claim type payment rate,
claim amount paid, claim details, policy no., policy type, policy date, policy
renewal date, policy amount due, no. of claims processed)
There are no repeating groups so
1NF
Claim (claim no., claim date, claimant name, claim type, claim type payment rate,
claim amount paid, claim details, policy no., policy type, policy date, policy
renewal date, policy amount due, no. of claims processed)
There are no partial dependencies so
2NF
Claim (claim no., claim date, claimant name, claim type, claim type payment rate,
claim amount paid, claim details, policy no., policy type, policy date, policy
renewal date, policy amount due, no. of claims processed)
The following transitive dependencies exist
claim-type  claim type payment rate
policy no  policy type, policy date, policy renewal date, policy amount due, no. of
claims processed
These result in the following 3NF relations
3NF
Claim-Type (claim type, claim type payment rate)
Policy (policy no., policy type, policy date, policy renewal date, policy amount due,
no. of claims processed)
Claim (claim no., claim date, claimant name, claim type, claim amount paid, claim
details, policy no.)
Now we normalize the Policy relation
Policy (policy no., policy type, policy date, policy renewal date, policy amount due,
no. of claims processed, (claim no., claim date, claimant name, claim type,
claim type payment rate, claim amount paid, claim details))
1NF
Policy (policy no., policy type, policy date, policy renewal date, policy amount due,
no. of claims processed)
Policy (policy no., claim no., claim date, claimant name, claim type, claim type
payment rate, claim amount paid, claim details)
The following dependencies exist
claim no  policy no, claim date, claimant name, claim type, claim type payment
rate, claim amount paid, claim details
Removing these dependencies we end up with
2NF
Policy (policy no., policy type, policy date, policy renewal date, policy amount due,
no. of claims processed)
12
Claim (claim no., policy no, claim date, claimant name, claim type, claim type
payment rate, claim amount paid, claim details)
The following transitive dependencies exist
claim type  claim type payment rate
Removing this dependency we end up with
3NF
Policy (policy no., policy type, policy date, policy renewal date, policy amount due,
no. of claims processed)
Claim (claim no., policy no, claim date, claimant name, claim type, claim amount
paid, claim details)
Claim Type (claim type, claim type payment rate)
Merging the two groups of relations we end up with the following final 3NF relations
3NF
Policy (policy no., policy type, policy date, policy renewal date, policy amount due,
no. of claims processed)
Claim (claim no., policy no, claim date, claimant name, claim type, claim amount
paid, claim details)
Claim Type (claim type, claim type payment rate)
13
Download