Functional Dependency

advertisement
Functional
Dependency
CS157a Sec. 2
Koichiro Hongo
What is Functional Dependency?
A functional dependency is a constraint between
two sets of attributes in a relation from a database.
●
An attribute or set of attributes X is said to
functionally determine another attribute
Y (written X  Y) if and only if each X value is
associated with at most one Y value.
Customarily we call X determinant set and
Y a dependent set.
●
So if we are given the value of X we can determine
the value of Y.
●
Why is Fanctional Dependency
Important?
The determination of functional dependencies
is an important part of designing databases in
the relational model, and in database normalization
and denormalization.
●
The functional dependencies, along with the attribute
domains, are selected so as to generate constraints
that would exclude as much data inappropriate to
the user domain from the system as possible.
●
Functional Dependency
Inference Rules
Reflexivity:
If B is a subset of A then A functionally determines B
For example:
{name, location}  {name}
Functional Dependency
Inference Rules(cont.)
Augmentation:
If B is a subset of A and C functionally determines D
then A and C functionally determine B and D
For example:
{name, location} and {birthdate, time}  {name} and {age}
(as {name} is a subset of {name, location} and {birthdate, time}
functionally determines {age})
Functional Dependency
Inference Rules(cont.)
Transitivity:
If A functionally determines B and B functionally
determines C then A functionally determines C
For example:
{name, location}  {initials}
(as {name, location} functionally determines
{name} and {name} functionally determines {initials})
Functional Dependency
Inference Rules(cont.)
Pseudo transitivity:
If A functionally determines B and
B and C functionally determine D
then A and C functionally determine D
Functional Dependency
Inference Rules(cont.)
Union:
If A functionally determines B and A functionally
determines C then A functionally determines B and C
For example:
{name, location, birthdate, time}  {initials, age}
(as {name, location, birthdate, time}  {initials} and
{name, location, birthdate, time}  {age})
Functional Dependency
Inference Rules(cont.)
Decomposition:
If A functionally determines B and C
then A functionally determines B and
A functionally determines C
For example:
{name, location, birthdate, time}  {initials, age} implies that
{name, location, birthdate, time}  {initials} and
{name, location, birthdate, time}  {age}
Trivial
Functional Dependencies
Some functional dependencies are said to be trivial
because they are satisfied by all relation.
For example:
AA
X  Y if Y is a subset of X
Keys and
Functional Dependencies
●
Keys and, more generally, functional dependencies,
are constraints on the database that require relations
to satisfy certain properties. Relations that satisfy
all such constraints are legal relations.
What is Superkey?
(review)
A superkey is defined in the relational model as a
set of attributes of a relation for which it holds that
in all instances of the relation there are no two
distinct tuples that have the same values for the
attributes in this set.
Equivalently a superkey can also be defined as
those sets of attributes of a relation upon which all
attributes of the relation are functionally dependent.
Keys and Functional
Dependencies(cont.)
Functional dependencies allow us to express constraints
that we cannot express with superkeys.
Let's consider the schema of the example in the
textbook(p.265) Figure 7.2
Figure 7.2
Partial list of tuples in relations loan, borrower, and bor_loan
customer_id
loan_number
amount
.
.
.
L-100
.
.
.
.
.
.
.
.
.
.
.
.
23-652
15-202
23-521
10000
.
.
.
.
.
.
loan
loan_number
L-100
L-100
L-100
.
.
.
borrower
customer_id
.
.
.
23-652
15-202
23-521
.
.
.
loan_number
.
.
.
amount
.
.
.
L-100
L-100
L-100
.
.
.
10000
10000
10000
.
.
.
bor_loan
Keys and Functional
Dependencies(cont.)
Figure 7.2, we consider the schema
bor_loan = (customer_id, loan_number, amount)
in which the functional dependency loan_number  amount holds
because for each loan (identified by loan_number) there is a
unique amount. We denote the fact that the pair of
attributes(customer_id, loan_amount) forms a superkey for
bor_loan by writing:
customer_id, loan_number  customer_id, loan_number, amount
or, equivalently,
customer_id, loan_number  bor_loan
Keys and Functional
Dependencies(cont.)
We shall use functional dependencies in two ways:
1. To test relations to see whether they are legal under
a given set of functional dependencies. If a relation r
is a legal under a set F of functional dependencies,
we say that r satisfies F.
2. To specify constraints on the set of legal relations.
We shall thus concern ourselves with only those
relations that satisfy a given set of functional
dependencies. If we wish to constrain ourselves
to relations on schema R that satisfy a set F of
functional dependencies, we say that F holds on R.
Database Normalization
Database normalization relates to the level of redundancy
in a relational database's structure. The key idea is to
reduce the chance of having multiple different versions
of the same data, like an address, by storing all potentially
duplicated data in different tables and linking to them
instead of using a copy. Then updating the address in
one place will instantly change all the places where the
address is used.
●
Well-normalized databases have a schema that reflects the
true dependencies between tracked quantities. This means
that updates can be quickly performed with little risk of
data becoming inconsistent.
●
History of
Database Normalization
British computer scientist who made
seminal contributions to the theory of
relational databases.
●
While working for IBM, he created the
relational model for database management.
●
Edgar F. Codd first proposed the process
of normalization and what came to be
known as the 1st normal form in his
paper A Relational Model of Data for
Large Shared Data Banks
Communications of the ACM, Vol. 13,
No. 6, June 1970, pp. 377-387
●
Edgar F. Codd
Normal Forms
In the relational model, formal methods exist
for quantifying "how normalized" a database is.
These classifications are called normal forms,
and there are algorithms for converting a given
database between them.
●
Edgar F. Codd originally established three normal
forms: 1NF, 2NF and 3NF.
There are now others that are generally accepted,
but 3NF is widely considered to be sufficient for
many practical applications. Most tables when
reaching 3NF are also in BCNF.
●
1NF(First Normal Form)
In the relational model, we formalize this idea that
attributes do not have any substructure. A domain
is atomic if elements of the domain are considered
to be indivisible units. We say that a relation schema
R is in first normal form if the domains of all
attributes of R are atomic.
In other words, a relation schema R is in 1NF if there
are no muntivalued attributes.
1NF(First Normal Form)
(cont.)
To understand first normal form (1NF),
consider these two examples of things you might know:
"What is your favorite color?"
"What food will you not eat?"
A difference between these two examples is that, you
can have only one favorite color; but, there is very little
limitation on the number of foods you might not eat.
Data that has a single value such as "person's favorite
color" is inherently in first normal form. Storing such
data has not much changed since Codd wrote and needs
no further explanation here. Data that has multiple values
must be stored differently.
2NF(Second Normal Form)
Second normal form (2NF) prescribes full functional
dependancy on the primary key. It most commonly
applies to tables that have composite primary keys,
where two or more attributes comprise the primary key.
It requires that there are no non-trivial functional
dependencies of a non-key attribute on a part (subset)
of a candidate key.
A table is said to be in the 2NF if and only if it is in the
1NF and every non-key attribute is irreducibly
dependent on the primary key.
2NF(Second Normal Form)
(cont.)
How about this example?
Example:
Machine_parts(part_number, supplier_name, price, supplier_address)
In this case, price is fully dependent on the primary
key(different suppliers may charge different price
on the same part).
However, supplier_address is partially dependent
because it only depends on the supplier_name.
Therefore, this table is not in 2NF.
3NF(Third Normal Form)
Third normal form (3NF) requires that there are no
non-trivial functional dependencies of non-key
attributes on something other than a superset of a
candidate key.
A table is in 3NF if it is in 2NF, and none of the nonprimary key attributes is a fact about any other nonprimary key attribute.
In summary, all non-key attributes are mutually
independent(i.e. there should not be transitive
dependencies).
3NF(Third Normal Form)
(cont.)
How about this example?
Example:
Machine_parts(part_number, supplier_name, supplier_address)
In this case, supplier_address is dependent on
supplier_name. Therefore, there is a transitive
dependency in the table.
It means that it is not in 3NF.
BCNF
(Boyce-Codd Normal Form)
Boyce-Codd normal form (or BCNF) requires that there are
no non-trivial functional dependencies of attributes on
something else than a superset of a candidate key. At this stage,
all attributes are dependent on a key, a whole key and nothing
but a key (excluding trivial dependencies, like AA).
A table is said to be in the BCNF if and only if it is in the 3NF
and every non-trivial, left-irreducible functional dependency
has a candidate key as its determinant.
In more informal terms, a table is in BCNF if it is in 3NF and
the only determinants are the candidate keys.
Work Cited:
Silberschatz, Avi. Hank Korth, and S. Sudarshan.
DATABASE SYSTEM CONCEPTS FIFTH
EDITION. New York: The McGraw-Hill
Companies, Inc., 205. 263-309.
Wikipedia:
<http://en.wikipedia.org/wiki/Functional_dependency>
<http://en.wikipedia.org/wiki/Database_normalization>
Download