Database Design

advertisement
Database Design
Foundations
8/09)
FoundationsofofInformation
InformationManagement
Management(WS
(WS200
2008/09)
– Chapter 4 –
?
© 2008 Prof. Dr. Rainer Manthey
Principles
Principlesof
ofDatabase
DatabaseDesign
Design
Foundations of IM
1
Design of DB applications
Relative costs for compensating
the consequences of errors:
11
Database development
Requirements
Requirementsanalysis
analysis
10
10
Design
Design
100
100
Implementation
Implementation
>>
>>100
100
Usage
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
Topic of this
chapter
2
DB-Design: Phases
Application area
in the "real world"
Conceptual
design
Design documentation
Conceptual level
Logical
design
Logical structure
„on paper"
Logical level
Physical
design
Physical structure
in storage
© 2008 Prof. Dr. Rainer Manthey
Physical level
Foundations of IM
3
Conceptual design
Application area
in the "real world"
Formally
Formallydefine
definewhat
whatyou
youknow
know
about
the world
“ without
about„„the
world“
withoutthinking
thinking
ofofcomputers
yet)!
computersand
anddatabases
databases((yet)!
Conceptual
design
Design documentation
Conceptual level
Logical
design
Logical structure
„on paper"
Logical level
Physical
design
Physical structure
in storage
© 2008 Prof. Dr. Rainer Manthey
Physical level
Foundations of IM
4
The Entity-Relationship model
• Entity-Relationship data model (ER model):
• Proposed in 1976 in a paper by Peter Chen
• Graphical notation for application modeling (ER diagrams)
• Independent semantic data model
(aiming at the meaning of concepts in real world)
• Predecessor of today‘s object-oriented data models
• Extremely successful as a means of „pre-design“ of relational DBs
•
The ER model offers few very simple and basic concepts:
• Entities (objects), characterized by attributes (properties)
• Binary or n-ary relationships between entities,
possibly characterized by attributes as well
• Often not mentioned explicitly, but important and basic:
Values: printable symbols as values of attributes;
play a subordinate role (characterizing objects)
• Roles: Names for the special meaning an entity has within a
relationship
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
5
An entity
Each entity is completely characterized
by the values of all its attributes.
an
anentity
entity
its
itsattributes
attributes
765
persNr firstName
name age
"Rainer"
"Manthey"
53
topic
rank
"Informatik" "C3"
their
theirvalues
values
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
6
Two similar entities
persNr firstName
name age
765
"Rainer"
"Manthey"
topic
53
rank
"Informatik" "C3"
another
anotherentity
entity
persNr firstName
name age
??
„Armin B." „Cremers"
60
topic
attributes
identicalattributes
rank identical
"Informatik" "C4"
partially
partiallydifferent
differentvalues
values
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
7
Entity types
•
Similar entities can be combined into entity types.
• „Similarity" requires at least identical attribute structure.
(Attribute names and corresponding value domains are identical.)
•
Entity types are graphically represented by rectangles. Attributes label the
line connecting an entity type and a value domain (often symbolized
by an oval):
professor
persNr firstName
name age
Int
String
© 2008 Prof. Dr. Rainer Manthey
String
Name of the type
topic
Int
Foundations of IM
rank
String
String
8
Entity types: Alternative representations
• Domain names are often omitted (in case they are obvious or irrelevant).
Instead attributes are placed inside ovals:
professor
persNr
firstName
name
age
topic
rank
• In larger diagrams, the attribute structure is often entirely omitted in order to
save space (or is written down in abbreviated form only):
professor
persNr, firstName, name, . . .
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
9
Instances and population of an entity type
• Each entity „belongs to" at least one entity type:
It is called an instance of this type.
an instance of
professor
• The set of all current instances of an entity type is calles its current population.
a population of
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
professor
10
Multiple classification
• One and the same entity can be an instance of various entity types.
professor
instance of
instance of
fan of
Werder Bremen
• In such a case, the attributes of the different types of this entity may well be
quite different.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
11
Classification in presence of identical attributes
• Entities with the same attributes do not at all have to be instances of the same
type!
instance of
Ass
name, firstName, age
instance of
Fan of football club
Werder Bremen
name, firstName, age
• Almost always additional classification criteria are required, usually not
derivable from the attribute structure.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
12
Key attributes
•
As in the relational model (Access, SQL), there are usually one or more attributes
per entity type the values of which are sufficient for uniquely identifying each
instance:
Key
Keyattributes
attributes
professor
persNr
firstName
name
age
topic
rank
• (Primary) key attributes are usually underlined in an ER-diagram.
• Keys ought to be "minimal" (no attribute can be omitted).
• A distinction between primary key and other candidate keys is not made in the
ER-model, even though it would be useful to do so.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
13
Relationships
• 2nd main concept of the ER-model: elementary relationships between two
or more entities (possibly with their own attribute values)
Relationship
Relationship
PersNr
765
year of assignment
1992
name
Universität Bonn
• Each relationship is uniquely characterized by the key values of the participating entities and by the values of all relationship attributes.
• However, there are no separate key attributes for relationships, as the keys of
the participating entities always suffice for unique identification.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
14
Multiple relationships
Entitites may participate in various relationships (also similar ones).
year of assignment
1992
year of assignment
1999
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
15
Relationship types
•
Similar relationships may be grouped into relationship types.
• „Similarity" requires at least identical attribute structure and identical types
(and number) of participating entities.
•
Relationship types are graphically denoted in the ER-model by a diamond.
Attributes are written as for entity types.
assigned_to
university
professor
persNr, . . .
name, . . .
year of assignment
Int
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
16
Instances of a relationship type
assigned_to
university
professor
name, . . .
persNr, . . .
year of assignment
Int
Relationship types have instances, too: Individual relationships between individual
entities are analogously considered instances of the corresponding R-type.
assigned_to
year of assignment
persNr
765
name
...
© 2008 Prof. Dr. Rainer Manthey
1992
Foundations of IM
...
Universität Bonn
17
Type vs. instance
assigned_to
university
professor
persNr, . . .
year of assignment
name, . . .
relationship
relationshiptype
type
entity
entitytype
type
Int
The distinction between types and instances is difficult even for specialists:
Try to be precise from the very beginning!
assigned_to
year of assignment
persNr
765
name
...
entity
entity
© 2008 Prof. Dr. Rainer Manthey
1992
...
relationship
relationship Universität Bonn
Foundations of IM
18
Roles
• One and the same entity type may participate in a relationship type more than
once, e.g.:
mother
descendent_of
person
daughter
•
For a (syntactical) distinction between the different „forms of participation“,
special designators are used, called roles. Roles are used as labels of the lines
connecting the resp. entity and relationship type.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
19
Extended Geo-DB: ER-diagram
neighbour_country
capital
country 1
capital_of_country
country
city
country 2
city_in_country
stadt_an_fluss
region_
in_country
river_
through_
country
region
river
source river
relationship type
entity type
source_of
© 2008 Prof. Dr. Rainer Manthey
role
Foundations of IM
20
Relationship types and mathematical relations
On the level of instances, a relation over the populations of the entity types involved
is associated with each relationship type.
E1
R
E2
entities not
involved in R
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
21
Functional relationships
In many cases, at most one entity on one „side" of a relationship type may be
associated with a particular entity on the other „side“ of the relationship, e.g.:
country
city
city_in_country
••
••
Each
.
Eachcity
cityisisininexactly
exactlyone
onecountry
country.
In
, however
, arbitrarily
.
Ineach
eachcountry
country,
however,
arbitrarilymany
manycities
citiesmay
maybe
besituated
situated.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
22
Functional relationships (2)
• Mathematically, city_in_country is a function:
city_in_country
: city
city_in_country:
city
→
→ country
country
• More exactly: . . . a function from the population of city into the
population of country.
population(city)
© 2008 Prof. Dr. Rainer Manthey
population(country)
Foundations of IM
23
Functional relationships (3)
• In the ER-model, such restrictions of the admissible combinations can be
expressed by means of so-called functionalities, annotations attached to the
edges connecting entity and relationship types.
• There are four different kinds of binary relationships expressible by means of
functionalities:
11:: 11
11::NN
NN:: 11 NN::M
M
• In this context, N resp. M stands for arbitrary integer values ≥ 0.
• In the ‚city_in_country'-example, an N:1-relationship is appropriate: There is
exactly one country per city, but arbitrarily many citites per country (N ≥ 0).
city
N
1
country
city_in_country
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
24
Functional relationships (4)
• Functionalities of type 1 : 1, 1 : N or N : 1 define partial functions where some
of the instances of the types involved possibly are not related at all.
city
N
1
country
city_in_country
City without country!
• In the „normal“ ER-model, total functions cannot be distinguished from partial
ones – in extensions of the model there are additional graphical means for explicitly stating whether a function is partial or total.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
25
Functional relationships (5)
• In a 1 : 1-relationship each instance of one of the entity types involved is related
to none or exactly one of the instances of the other entity type.
• An N : M-relationship can be considered the „normal case" without restrictions
on the number of participating entities.
• If no functionalities have been stated for a relationship type, then an implicit
N : M functionality is assumed.
• Functionalities can be defined for relationships with more than two entities involved, too:
E1
N
M
E2
R:
R: EE11 ×× EE22 →
→ EE33
R
1
E3
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
26
Functional relationships (6)
• If in an n-ary relationship several edges are marked by '1',
then the resp. relationship type represents several partial functions:
E1
N
1
E2
RR(1)(1):: EE1 ×× EE2 →
E
1
2 → E33
R
RR(2)(2):: EE1 ×× EE3 →
E
1
3 → E22
1
E3
• . . . and analogously:
E1
1
1
E2
RR(1)(1):: EE1 ×× EE2 →
E
1
2 → E33
R
RR(2)(2):: EE1 ×× EE3 →
E
1
3 → E22
1
RR(3)(3):: EE2 ×× EE3 →
E
2
3 → E11
E3
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
27
Extended ER model: Generalization
• The concepts introduced so far have been contained in Chen‘s original proposal
throughout. Since then, however, various extensions have been proposed:
Extended Entity-Relationship Model (EER-model)
• Most important extensions (as in object-oriented models):
Generalization
Generalization
• This means:
• formation of subtypes of entity types
• sub-/supertype relationships (type hierarchy)
• inheritance of attributes and of „participations" in R-types
•
Special graphical notation for generalization of E-types:
is_a
...
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
28
Generalization (2)
supertype
capital_of
city
region
name, area, inhabitants
is_a
subtypes
country
state
prime minister
governor
„Inheritance" in this example means:
• Both subtypes inherit all attributes of the supertype,
i.e., they „own" these attributes without explicit definition.
• Both subtypes participate in the relationship type capital_of',
which has been explicitly defined for the supertype only.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
29
Generalization (3)
region
Generalization always means
that the populations of the subtypes
are subsets of the population of the
supertype.
is_a
country
state
population(region)
population(country)
This circumstance motivates the
notion 'is_a'-relationship:
"Every country is a region."
population(state)
Quantifier over instances!
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
30
Generalization (4)
In the example: special case
disjoint generalization
(Empty intersection of the populations)
In general:
This form of the 'is_a'-notation
just means some form of
subset formation, i.e.
overlapping, incomplete
subdivision.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
31
Logical design: ER ⇒ Relations
Conceptual
design
ER
ERmodel
model
Mapping
Design documentation
Logical
design
Relational
Relationalmodel
model
Logical structure
„on paper"
Improvement
Physical
design
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
11stststep
step
Map
MapER
ERtypes
typessystema
systematically
ticallytotorelations:
relations:
From
Fromsemantic
semanticconcepts
concepts
totosyntactic
syntacticstructures
structures!!
32
ER ⇒ Relations: Mapping Entity types
• Mapping from ER model to relational model:
• in principle very easy: per type one relation (table)
• in detail and for extensions: quite difficult
• Mapping of entity types: rather obvious
⇒ table name
• attribute ⇒ column name
• type
E
A1 A2 . . . An
E
A1 A2
...
An
• Key attributes are mapped to primary key columns.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
33
ER ⇒ Relations: Mapping Relationship types
•
For relationship types:
• Use the same basic idea: one relation per type.
• but: How are participating entity types represented?
city_in_state
city
name
inhabitants
...
state
since
name
area
...
• Obvious solution as well: The participating entities are represented by means
of the values of their primary key attributes (possibly after renaming).
city_in_state
city.name
state.name
since
Foreign key
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
34
ER ⇒ Relations: Mapping Relationship types (2)
In presence of special functionalities (1 : N, N : 1, 1 : 1 resp.), a separate relationship table is not necessary, as the relationship information can be embedded into
the table of the entity type on the N-side:
city
city_in_state
N
name
inhabitants
...
city
city.name
© 2008 Prof. Dr. Rainer Manthey
1
since
state
name
area
...
inhabitants . . . state.name
Foundations of IM
Foreign key
since
35
Relational representation of generalization hierarchies
How
Howtotorealize
realizeinheritance
inheritanceand
and
sub
-/super type
sub-/super
typerelationships
relationshipsrelationally
relationally??
E1
A1, A2, . . .
• Relational model:
does not know any inheritance!
inheritance • Inheritance thus has to be "simulated" .
is_a
E2
B1, B2, . . .
Relational representation of a super type E1 is obvious:
E1
© 2008 Prof. Dr. Rainer Manthey
A1 A2
. . .
Foundations of IM
36
Relational representation of generalization hierarchies (2)
E1
E1
A1, A2, . . .
A1
a11
a21
A2
a12
a22
. . .
. . .
. . .
is_a
inherited
Inheritance
E2
E2
B1, B2, . . .
„native" attributes
A1 A2 . . . B1 B2 . .
.a11 a12 . . . b11 b12 . . .
• Obvious relational realization of a subtype:
subtype relation E2 owns „native" and inherited attributes.
• But: Values of the inherited attributes of all E2-instances have to be
(redundantly) repeated in the E1-relation in this case!
• Reason: Each E2 -instance is an E1-instance, too !
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
37
Relational representation of generalization hierarchies (3)
Avoiding duplication: Store only „native" attributes
(+ key for joining) in the subtype relation!
E1
A1, A2, . . .
inherited attributes
is_a
inheritance
E1
E2
A1 A2
a11 a12
a21 a22
. . .
. . .
. . .
B1, B2, . . .
joint
key
E2
But
does
Butininthis
thiscase
caserelation
relationEE22does
no
nolonger
longercontain
containall
allattributes
attributesof
of
EE1--entities
entities !!
1
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
„native" attributes
A1 B1 B2
. .
.a11 b11 b12 . . .
38
Relational representation of generalization hierarchies (4)
way out: E2-population is completely realized by means of a view joining
the inherited and the native attributes.
inherited attributes
E1
E2-global
A1 A2 . . . B1 B2 . . .
a11 a12 . . . b11 b12 . . .
A1 A2
a11 a12
a21 a22
. . .
. . .
. . .
„native" attributes
E2-local
CREATE
-global AS
CREATEVIEW
VIEWE2
E2-global
AS
(SELECT
*
(SELECT *
FROM
-local
FROM E1
E1 INNER
INNERJOIN
JOIN E2
E2-local
ON
-local.A1)
ON E1.A1
E1.A1==E2
E2-local.A1)
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
A1 B1 B2
. .
.a11 b11 b12 . . .
39
Relational representation of generalization hierarchies (5)
3rd alternative (also free of redundancies and using a view):
Distribute values of the inherited attributes to different relations –
super types are reconstructed via views.
E1-local
E1-global
A1 A2
. . .
a11 a12 . . .
a21 a22 . . .
A1 A2
a21 a22
∪
π
CREATE
-global AS
CREATEVIEW
VIEWE1
E1-global
AS
( (TABLE
-local ) )
TABLE E1
E1-local
UNION
UNION
(SELECT
(SELECT AA1,1,AA2,2, . . . . . .
FROM
FROM E2
E2) )
© 2008 Prof. Dr. Rainer Manthey
E2
inherited
. . .
. . .
„native" attributes
A1 A2 . . . B1 B2 . .
.a11 a12 . . . b11 b12 . . .
Foundations of IM
40
Relational representation of generalization hierarchies (6)
Which
the best"
Whichof
ofthe
thethree
threealternatives
alternativesisis„„the
best"??
1. Relations E1 and E2, no views:
+
−
short access time (without any join of tables)
high requirements for space (due to redundant storage)
2. Relations E1 and E2-local, view E2-global (JOIN):
+
−
Only key attribute values are stored redundantly.
Access to E2-attributes is slower (due to join).
3. Relations E2 and E1-local, view E1-global (PROJECT-UNION):
+
−
© 2008 Prof. Dr. Rainer Manthey
No duplication of any attribute values.
Access to E1-attributes is slower (due to projection and union) .
Foundations of IM
41
Generalization and updates
What
entity isisdeleted
deleted??
Whathappens
happensififan
anEE22--entity
E1
A1, A2, . . .
is_a
E2
B1, B2, . . .
• relational variant 1 (inherited attributes duplicated):
Deletion from both relations is necessary.
• relational variant 2 (inherited and native A. separated):
Deletion from both relations is necessary.
• relational variant 3 (E2-attributes only in one relation):
no propagation of deletions required
⇒
⇒ In
Invariants
variants11and
and2:2: referential
referentialintegrity
integrityconstraints
constraints
with
.
withdelete
deletecascade
cascadeisisrequired
required.
• For insertions and modifications: Changes in several relations may be necessary,
too (depending on the chosen strategy).
• Deletion of instances of the super type E1: Cascading deletion if the resp.
instance is an E2-instance, too (again referential integrity).
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
42
Relationships and referential integrity
Each relationship type induces FOREIGN KEY-constraints as well:
•
With N : M-functionality:
river_through_state
river
M
N
state
CREATE
CREATETABLE
TABLE river_through_state
river_through_state
( (River
String
REFERENCES
,
River String REFERENCES river
river,
State
State String
String REFERENCES
REFERENCES state
state) )
• With N : 1-functionality:
city_in_state
city
N
1
state
CREATE
CREATETABLE
TABLE city
city
( (. .. .. ., ,
State
State String
String REFERENCES
REFERENCES state
state) )
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
43
Functionalities as constraints
• But: Uniqueness of the state in the city_in_state-relationship has
not yet been expressed !
• An additional CHECK-constraint is required contraining the number of
state instances:
city_in_state
city
N
1
state
CREATE
CREATETABLE
TABLE city
city
( (. .. .. ., ,
CHECK
CHECK COUNT
COUNT(SELECT
(SELECT State
State
FROM
city
FROM
citySS
WHERE
WHERE S.Name
S.Name==Name)
Name)=<
=< 11
Implicit universal quantifier ranging
over each city row !
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
44
ER ⇒ Relations (2)
Conceptual
design
ER
ERmodel
model
Mapping
Design documentation
Logical
design
Relational
Relationalmodel
model
Logical structure
„on paper"
Improvement
Physical
design
22ndndstep
step
(after
(afterreaching
reachingrelations):
relations):
Improve
Improvedesign
designby
by
„„fine
fine tuning
“ the
tuning“
the
structure
structureof
oftables
tables
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
45
Redundancies due to bad design
• Example of „bad design“ (remaining after mapping from ER level):
‚City_in_state‘ and ‚capital_of‘ have been placed into a single table.
City
Bonn
Köln
Essen
...
Mainz
Trier
...
State
Capital
NW
NW
NW
Düsseldorf
Düsseldorf
Düsseldorf
RP
RP
Mainz
Mainz
Redundantly
:
Redundantlystored
storedinformation
information:
DDüsseldorf
üsseldorf isisthe
thecapital
capital
of
-Westphalia.
ofNorth
NorthRhine
Rhine-Westphalia.
• Obviously one topic (Which city is the capital of . . . ?) has been combined with
another topic (In which state is a certain city situated ?) in such an „unlucky“
manner that considerable redundancies occur, resulting in waste of space.
What
a topic
“ mean
Whatdoes
does„„a
topic“
mean??
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
46
Anomalies resulting from redundancies
• An immediate consequence of such cases of storing multiple topics in one table
is the occurrence of so-called anomalies when updating such tables:
Assume Köln (as largest
city in NW) replaces
Düsseldorf as capital:
One fact changes,
but multiple updates
have to be made.
City
State
Capital
Bonn
Köln
Essen
...
Mainz
Trier
...
NW
NW
NW
Düsseldorf
Düsseldorf
Düsseldorf
RP
RP
Mainz
Mainz
• Analoguos anomalies may happen due to insertions and deletions:
• An instance of topic 1 disappears, as soon as it is no longer associated
with any instance of topic 2.
• A new instance of topic 1 can only be inserted, if it is combined
with an instance of topic 2 (or null values are used).
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
47
Decompositions avoid redundancies and anomalies
How
defects“ ((anomalies,
anomalies, redundancies
) ??
Howtotoprevent
preventsuch
such„„defects“
redundancies)
In the example, there is a simple remedy:
Separate the two topics into different relations!
City
State
Capital
Bonn
Köln
Essen
...
Mainz
Trier
...
NW
NW
NW
Düsseldorf
Düsseldorf
Düsseldorf
Decomposition
RP
RP
Mainz
Mainz
No redundancy, no anomaly !
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
City
Bonn
Köln
Neuss
...
State
NW
NW
NW
State
Capital
NW
RP
...
Düsseldorf
Mainz
48
Functional dependencies and decompositions
•
Already discovered by Codd before 1970: Functional relationships between
attributes are of help for finding meaningful decompositions and for avoiding
reduncancies!
•
Resulting from this observation, Codd developed an elaborate theory of
relational normal forms.
•
Prerequisite: Designers identify such functional relationships during schema
design (quite similar to identifying functionalities in the ER model) and
express them as special integrity constraints.
Functional
Functionaldependency
dependency
•
(short: FD)
Principle of functional dependency:
• Let A and B be attributes of a relation R.
• B depends functionally on A, if in each state of R each A-value always
occurs in combination with the same, uniquely determined B-value.
• symbolic notation: A → B
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
49
FDs in the example
•
•
•
•
Each city lies in exactly one state: City → State
Each state has exactly one capital: State → Capital
But also: If a city is a capital then it is the capital of exactly one state:
Capital → State
Each city is associated with exactly one capital (namely the capital of its state):
City → Capital
• Decomposition separates FDs:
City
State
Capital
Bonn
Köln
Essen
...
Mainz
Trier
...
NW
NW
NW
Düsseldorf
Düsseldorf
Düsseldorf
City
State
RP
RP
Mainz
Mainz
State
Capital
• One FD seems to be lost, however: City → Capital
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
50
"Codd's thesis"
• Thesis (claim) on which the normalization theory of Codd is based:
Attributes
Attributesconnected
connectedvia
viaan
anFD
FDrepresent
representsemantically
semanticallysignificant
significanttopics
topics
of
.
ofthe
theapplication
applicationdomain
domain.
• That is: Every FD identifies a topic – thus separating topics means separating FDs.
• But: Not every such topic is necessarily represented by an FD.
• Moreover: FD-connection is a sufficient, but not a necessary criterion for the
existence of a ‚topic‘.
• Basic idea of Codd‘s approach to normalization of relations:
Decompose relations in such a way that „normally" each FD has a component relation of its own. But try to identify exceptions where several FDs
may „coexist“ in one and the same relation !
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
51
Transitive FDs and Armstrong‘s axioms
• There are FDs which are derivable from other FDs, already known. An important
example are so-called transitive FDs:
α → β is a transitive FD if there is an attribute set γ,
such that α → γ and γ → β are both FDs, but not γ → α.
• Every such transitive case leads to a (new) functional dependency.
• There are two other such inference rules for FDs, called the „Armstrong axioms“
(as they have been discovered by the Canadian scientist W. Armstrong).
ββ⊆⊆αα ⇒
⇒ αα→
→ββ
Every subset depends on its superset.
αα→
→ββ ⇒
⇒ ααγγ →
→ ββγγ
Augmentation on both sides.
• Not to be found in the literature, but quite useful: Special notion for FDs which
are not transitive FDs:
α → β is a direct FD if there is no attribute set γ,
such that α → γ and γ → β are both FDs, but not γ → α.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
52
Normal forms
• Properly determining FDs and investigating their properties is the basis for each
meaningful decomposition of relations into components free of redundancies.
„„Codd's
Codd's recepy
“ (in
):
recepy“
(inshort
shortfrom
from):
Redundancies
Redundanciescan
canbe
besafely
safelyavoided
avoidedififFDs
FDsalways
alwaysoriginate
originatefrom
from
candidate
.
candidatekeys
keysof
ofaarelation
relation.
• Codd defined various degrees of FD separation, called normal forms – the process
of transforming a given schema into relations all of which exhibit a given normal
form is called normalization.
• The most important normal form is the third normal form (short: 3NF) defined as
follows:
rd
AArelation
relationisisinin33rdnormal
normalform
form ⇔
⇔
Each
-key attribute
Eachnon
non-key
attributefunctionally
functionallydepends
dependsdirectly
directly
on
oneach
eachcandidate
candidatekey
keyof
ofthe
therelation
relation. .
• There are various other normal forms (1NF, 2NF, 4NF and others).
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
53
Normalization in the example schema
• Our example schema from the geographic domain originally was not in 3NF,
as it still contains a transitive dependency pointing from a candidate key (City)
to a non-key attribute (Capital):
(ℜ ,
)
City
Bonn
Köln
Essen
...
3NF
State
NW
NW
NW
Capital
Düsseldorf
Düsseldorf
Düsseldorf
(ℜ ,
3NF
)
City
(ℜ ,
State
)
3NF
State
Capital
• After decomposition, however, each of the resulting component schemas is
in 3NF! Such a decomposition will always be possible.
• The „lost“ dependency City → Capital can be reconstructed by means of the
transitivity axiom of Armstrong.
© 2008 Prof. Dr. Rainer Manthey
Foundations of IM
54
Download