What is a database?

advertisement
Normalisation
Relation
ABCDEF
1NF?
Relation1
AB
Relation2
A* C D E*
Help me Codd!!
Reading: Connolly and Begg 13 & 14 (4th ed),
Relation3
EF
id
lecturer_
name
lecturer_
address
qual
position
requires
title
result
code
student_
address
sex
student_
name
From this…
regno
Normalisation
55101
55144
55633
55633
55633
55981
55981
55981
55981
55981
Smith
Brown
Brown
Brown
Brown
Adams
Adams
Adams
Adams
Adams
Edinburgh
London
Abingdon
Abingdon
Abingdon
London
London
London
London
London
BSc
BSc
PhD
PhD
PhD
Meng
Meng
Meng
Meng
Meng
Lecturer
Lecturer
Reader
Reader
Reader
Lecturer
Lecturer
Lecturer
Lecturer
Lecturer
43414 Jones Female
Edinburgh
40986
42331
40986
40986
42331
40986
42331
Jones
Smith
Jones
Jones
Smith
Jones
Smith
MaleOxford
Female
London
MaleOxford
MaleOxford
Female
London
MaleOxford
Female
London
…to this
In 3+ easy(?) steps
3011
3011
3080
3025
3025
3081
3081
3082
65 Data Structures
72 Data Structures
Spreadsheets
78 Databases
81 Databases
76 Artificial Intelligence
Artificial Intelligence
Software Engineering
3005
3005
3011
3011
3011
2080
2080
What is normalisation?

A method for database design
–
–
–

Takes a set of attributes and derives the
relational model
–

By separating out the required tables
Completely different approach to ERM
–

Theory examines how “good” is a schema?
Transform non-normalised schemas
Minimise storage
But should get the same result
A minimum of 3 steps are used:


For each stage, the normal form gets stronger (i.e.
removes redundancy) so less open to update anomalies
All based on functional dependencies
Functional Dependency


Underpins normalisation process
If every value of column A uniquely determines the value
in column B, then
–
–

B is functionally dependent on A (B depends on A)
A determines B, or, formally, A B (A is called the determinant)
For example,
–
–
EmpID  Age, Dept (AB,C) Employee ID, Project  Role (X,
Y  Z)
Note multiple attributes are often involved
EmpID Project
Age
Dept
Dsize
Budget
Role
Rules for functional dependency

A  B does NOT automatically mean B  A
–
E.g. student ID  name but not name  ID

Transitive dependency:
If AB and BC then AC

Many other rules
–
–

E.g. if X,YZ but XZ also
In this case Z is partially dependent on X,Y
“Transitive” and “partial” dependency are two
key concepts of the normalisation process
A Question for you!
EmpID Project
A
B
C
D
EmpID Project
E1
E1
E2
E2
P2
P1
P1
P2
Age
Age
33
33
34
34
Dept Dsize
Budget
Role
Dept Dsize Budget Role
D2
D2
D5
D5
10
10
10
20
100
200
200
100
Analyst
Prog.
Prog.
Analyst
Which functional dependency is violated by the data?
Unnormalised Form

Relation contains:
–
non-atomic attribute values
ID Employee
1 Grey
2 Brown
3 White
4 Black
Salary
31000
35000
55000
47000
Project
A
B,C
A,B,C
A,C
Violation of 1NF
non-atomic values
First Normal Form
ID Employee
1
Grey
2
Brown
2
Brown
3
White
3
White
3
White
4
Black
4 Black
Salary
31000
35000
35000
55000
55000
55000
47000
47000
redundancy
ID Employee Salary
1 Grey
31000
2 Brown
35000
3 White
55000
4 Black
47000
Project
A
B
C
A
B
C
A
C
Budget
10
5
5
5
5
5
10
5
Permits only single
(atomic) attribute
values
Repeating
ID (fk)
1
2
2
3
3
3
4
4
Project
A
B
C
A
B
C
A
C
Budget
10
5
5
5
5
5
10
5
Remove
Repeating
Group along
with primary
key from other
Table
Second Normal Form


Full Functional Dependency (FFD)
X  Y is FFD
–

X  Y is partially dependent
–

if removal of attribute from X leaves the dependency
intact
2NF test
–

if removal of any attribute from X removes the
dependency
involves testing for partial dependency on the PK
(therefore PK MUST be composite to test for 2NF)
Relation R is in 2NF if:
–
every non-primary-key attribute in R is FFD on the
primary key of R
EmpID Project


Age
Dept Dsize
Budget
Role
So which FD’s are violating 2NF?
“Second Normalised” by:
– removing non-primary-key attributes and forming a
FFD on appropriate part of primary key
{EmpID ,Age, Dept , Dsize}
{Project , Budget}
{EmpID*, Project*, Role}
2NF
Third Normal Form


Remove Transitive Dependency
Conditions
–
A non-primary-key attribute Z is transitively
dependent on primary key X if:

X  Y; Y  Z (Y attribute provides the transition to the PK)
A [EmpID* Project*
B [EmpID Age
Dept
Role]
Dsize]
Budget]
C [Project
D None of the above
Which of the above could have transitive dependency?
Here is an un-normalised Table
Ord#
1
1
2
2
2
3
Date Cust#
12/1/01 1
12/1/01 1
13/1/01 2
13/1/01 2
13/1/01 2
13/1/01 1
Name
Jones
Jones
Black
Black
Black
Jones
Prod# Desc
1
Disk
2
CD
1
Disk
2
CD
3
Mouse
3
Mouse
Qty
3
5
1
1
1
1
Supplier
X
Y
X
Y
X
X
Tel
101
223
101
223
101
101
Normalise it to 1NF
Ord# Date Cust#
Name Prod# Desc Qty
1
1
2
2
2
3
Jones
Jones
Black
Black
Black
Jones
12/1/01
12/1/01
13/1/01
13/1/01
13/1/01
13/1/01
1
1
2
2
2
1
Ord# Date Cust#
Name
1
2
3
Jones
Black
Jones
12/1/01
13/1/01
13/1/01
1
2
1
1
2
1
2
3
3
Disk
CD
Disk
CD
Mouse
Mouse
Supplier Tel
3
5
1
1
1
1
X
Y
X
Y
X
X
101
223
101
223
101
101
fk
Ord# Prod# Desc
Qty Supplier Tel
1
1
2
2
2
3
3
5
1
1
1
1
1
2
1
2
3
3
Disk
CD
Disk
CD
Mouse
Mouse
X
Y
X
Y
X
X
101
223
101
223
101
101
Ord# Date Cust#
Name
Ord# Prod# Desc Qty Supplier Tel
1
2
3
Jones
Black
Jones
1
1
2
2
2
3
12/1/01
13/1/01
13/1/01
1
2
1
Already in 2NF
Prod# Desc
1
2
3
1
2
1
2
3
3
Disk
CD
Disk
CD
Mouse
Mouse
3
5
1
1
1
1
X
Y
X
Y
X
X
101
223
101
223
101
101
Supplier Tel
Disk
X
CD
Y
Mouse X
101
223
101
Now we normalise this to 2NF
remembering to test on the PK
for any partial dependency
Ord# Prod# Qty
1
1
2
2
2
3
1
2
1
2
3
3
fk
fk
3
5
1
1
1
1
So, any transitive dependency?
Ord# Date Cust#
Name
1
2
3
Jones
Black
Jones
12/1/01
13/1/01
13/1/01
1
2
1
Prod# Desc Supplier Tel
Ord# Prod# Qty
1
1
2
2
2
3
1
2
1
2
3
3
fk
fk
3
5
1
1
1
1
1
2
3
Disk
X
CD
Y
Mouse X
101
223
101
Yes! But not in all …………….
Ord# Date Cust#
Name
1
2
3
Jones
Black
Jones
12/1/01
13/1/01
13/1/01
1
2
1
Prod# Desc
1
2
3
Supplier Tel
Disk
X
CD
Y
Mouse X
101
223
101
Ord# Prod# Qty
Cust# Name
1
2
Jones
Black
1
1
2
2
2
3
1
2
1
2
3
3
3
5
1
1
1 OK!
1
Supplier Tel
X
Y
Ord# Date Cust# (fk)
Prod# Desc
1
2
3
1
2
3
12/1/01
13/1/01
13/1/01
1
2
1
101
223
Supplier (fk)
Disk
X
CD
Y
Mouse X
Final Decomposition
Ord#{fk}
Prod#{fk} Qty
1
1
2
2
2
3
1
2
1
2
3
3
3
5
1
1
1
1
Ord# Date Cust# (fk)
1
2
3
12/1/01
13/1/01
13/1/01
Cust# Name
1
2
Prod# Desc
1
2
3
Supplier (fk)
Disk
X
CD
Y
Mouse X
1
2
1
Jones
Black
Supplier Tel
X
Y
101
223
Now in 3NF
The underlying E-R Model …..
Ord# Date Cust#
Name Prod# Desc Qty
1
1
2
2
2
3
Jones
Jones
Black
Black
Black
Jones
12/1/01
12/1/01
13/1/01
13/1/01
13/1/01
13/1/01
1
1
2
2
2
1
1
2
1
2
3
3
Disk
CD
Disk
CD
Mouse
Mouse
3
5
1
1
1
1
Supplier Tel
X
Y
X
Y
X
X
101
223
101
223
101
101
makes
Customer
Order
1..1
0..*
0..*
has
How many
 0..*
despatches
tables would

Product
Supplier
you get from
1..*
1..1
mapping?
So Normalisation to 3NF is Normal!!



Remember, 2NF and 3NF disallow partial and
transitive dependencies respectively on the
PK, otherwise they are open to update
anomalies
But ….. even at 3NF, a relation may be open to
update anomalies on rare occasions due to
redundancy too
So we look briefly at these
–
–
Boyce-Codd
4NF
Boyce-Codd NF




Is a stronger normalised form then 3NF
Definition: A relation is in BCNF, if and only if,
every determinant is a candidate key
And remember that a candidate key is any key
that could become the PK of the relation (i.e.
there may be competition for it!)
Potential to violate BCNF comes from:
–
–
A relation containing at least 2 composite candidate
keys
Or candidate keys overlapping (i.e. they have at
least one attribute in common)
BCNF Example





Consider the candidate keys for:
clientNo
interviewDate
interviewTime
staffNo
roomNo
CR76
13/5/08
10.30
SG5
G101
CR56
13/5/08
12.00
SG5
G101
CR74
13/5/08
12.00
SG37
G102
CR56
1/7/08
10.30
SG5
G102
FD1 {PK}: clientNo, interviewDate  interviewTime, staffNo, roomNo
FD2 {CK}: staffNo, interviewDate, interviewTime  clientNo
FD3 {CK}: roomNo, interviewDate, interviewTime  staffNo, clientNo
FD4: staffNo, interviewDate  roomNo
PK is primary key and CK is candidate key.
But what about FD4? It is not a CK
Adapted from Connolly and Begg, 2005, 4th ed. Page 420
So new decomposition?
clientNo
interviewDate*
interviewTime
staffNo*
CR76
13/5/08
10.30
SG5
CR56
13/5/08
12.00
SG5
CR74
13/5/08
12.00
SG37
CR56
1/7/08
10.30
SG5
interviewDate
staffNo
roomNo
13/5/08
SG5
G101
13/5/08
SG37
G102
1/7/08
SG5
G102
So duplication in the room number is now eradicated
4NF


Comes from 2 multivalued attributes in a
relation
E.g. for each value of A
there is a set of values
for B and a set for C,
while B and C remain
independent of each
other
Branch
BranchNo
staffName[1..*]
ownerName[1..*]
So if you model your databases from ERM’s this type
of dependency should not arise.
Example of 4NF
branchNo
staffName
ownerName
C003
Anne
Carol
C003
David
Carol
C003
Anne
Tina
C003
David
Tina
branchNo*
staffName
C003
Anne
C003
David
branchNo*
ownerName
C003
Carol
C003
Tina
Note: if step 9 applied to multi-valued attributes
then we should map this correctly and avoid such
redundancy as the two tables on the right would be
the result of the mapping!
Adapted from Connolly and Begg, 2005, 4
th
ed. Page 428
Normal Form Summary


A Relation’s degree of normalisation
Stronger in format at each stage
–

First Normal Form (1NF)
–
–

–
The relation has no transitive dependencies
Boyce-Codd
–

The relation has no partial dependencies
All non-key attributes are fully functionally dependent on the PK
3rd Normal Form (3NF)
–

The relation has no non-atomic values
Or the relation has “no repeating group”
2nd Normal Form (2NF)
–

less vulnerable to update anomalies
Every determinant is a candidate key
4NF – no multi-valued dependencies
Download