Uploaded by Mohammud Fudhail Salamuth

Functional Dependency

advertisement
Lecture 14
Functional Dependency
Designing a database for an
enterprise
• Objective
• Create an accurate representation of data, relationships
between the data and constraints on the data that is
pertinent to the enterprise.
• Techniques
• Entity-Relationship (ER) Modeling
• Normalization
Informal Design Guidelines for
Relational Databases
• Relational database design: The grouping of attributes to form
"good" relation schemas
• Two levels of relation schemas:
– The logical "user view" level
– The storage "base relation" level
• Design is concerned mainly with base relations.
• Criteria for "good" base relations:
– Discuss informal guidelines for good relational design
– Discuss formal concepts of functional dependencies and normal forms
1NF 2NF 3NF BCNF
3
Semantics of the Relation Attributes
• Each tuple in a relation should represent one entity
or relationship instance
– Only foreign keys should be used to refer to other entities
– Entity and relationship attributes should be kept apart as
much as possible
– Design a schema that can be explained easily relation by
relation. The semantics of attributes should be easy to
interpret.
4
5
Example tables suffering from poor designs; by
mixing attributes from distinct real-world entities.
6
Redundant Information in tuples
and Update Anomalies
• Mixing attributes of multiple entities may
cause problems
– Information is stored redundantly wasting
storage
• Problems with update anomalies:
– Insertion anomalies
– Deletion anomalies
– Modification anomalies
7
Normalization Problem
• In designing a table for EMP_DEPT with the
attributes ENAME, SSN, BDATE, ADDRESS,
DNUMBER, DNAME, DMGRSSN
What's wrong with just using a single entity class (or
table):
EMP_DEPT( ENAME, SSN, BDATE, ADDRESS,
DNUMBER, DNAME, DMGRSSN) ?
8
Answer: What's Wrong …
• Redundancy: DNAME & DMGRSSN
Extra Work: If we change name of a department,
we have to do it at multiple places
Anomalies: Could change DNUMBER without
changing DNAME or vice versa.
Too many NULLs: If an employee is unassigned,
both DNUMBER and DNAME would be NULL.
9
redundancy
redundancy
redundancy
10
Redundancy: DNUMBER → {DNAME, DMGRSSN}
– Entities with the same value for DNUMBER have the same value for
DNAME and DMGRSSN
– Including DNAME and DMGRSSN in the entity class is redundant,
since it can be derived from DNUMBER
• Redundancy causes duplicate work
• Suppose the company wants to change DNUMBER 5 to be the Sales
department. That change must be made to multiple employees
11
Redundancy and Anomaly
• Redundancy can cause anomalies (inconsistencies) if
modifications are not done carefully
• Update Anomaly:
– Updating a value in a single cell can make the database
inconsistent
• Insertion Anomaly:
– Adding an entity can make the database inconsistent
• Deletion Anomaly:
– Deleting some information can make the database inconsistent
or cause unintended loss of information
12
EXAMPLE OF AN UPDATE ANOMALY
• Changing the manager of department 5 from
333445555 to 999999999 may cause this update
to be made for all employees who work in that
department; otherwise the database will become
inconsistent.
13
EXAMPLE OF AN INSERT ANOMALY
NULL
James, Bill
NULL
NULL
111111111
1980-03-12
NULL
333 Sims Av, Chicago
6
Sales
999999999
NULL
NULL
NULL
• Cannot insert a new department unless an employee is assigned to it
– or insert null values in attributes for employees for that department.
• Inversely- Cannot insert a new employee unless he/she is assigned to a
department.
– Or insert null values if the employee does not work for a department yet.
14
EXAMPLE OF A DELETE ANOMALY
• When a department is deleted, it will result in deleting all the employees
who work on that department.
• Alternately, if an employee is the sole employee on a department, deleting
that employee would result in deleting the corresponding department.
– The information concerning that department is lost from the database.
15
Anomalies and good design
• Ideally, design a schema that does not suffer
from the insertion, deletion and update
anomalies.
• Normalization is a process that can often be
used to arrive at such schemas.
16
Null Values in Tuples
• Relations should be designed such that their tuples
will have as few NULL values as possible
• The NULL value means ‘no data’ and is different from
values such as 0 for numeric types or the empty string
for string types.
• Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
• Nulls can have multiple interpretations:
– Attributes not applicable or invalid
– Unknown values
– Known but absent values
17
Decomposing Relations
• The problems we came across in the
example can be eliminated by splitting (or
decomposing) the relation schema into
multiple relations.
– Care should be taken not to lose the information which is
called Lossy Decomposition
– We should achieve Lossless Decomposition
18
Illustration of lossy decomposition
• The original database schema is as follows
EMP_DEPT( ENAME, SSN, BDATE, ADDRESS, DNUMBER, DNAME,
DMGRSSN)
• If the Relation is decomposed into
DEPARTMENT(DNUMBER, DNAME, DMGRSSN)
EMPLOYEE(SSN, ENAME, BDATE, ADDRESS)
• This kind of design eliminates
– Need for NULL Values
– Get rid of the types of anomalies we have come across
BUT
We have LOST the information of the relationship between
EMPLOYEE and the DEPARTMENT in which the employee is
19
Lossless Decomposition
• To prevent the LOSS of information, the solution is to place
a foreign key in one of the relations.
• A foreign key is a set of non-key attributes in one relation
which acts as a primary key for another relation.
The solution
DEPARTMENT(DNUMBER, DNAME, DMGRSSN)
EMPLOYEE(SSN, DNUMBER, ENAME, BDATE, ADDRESS)
What happens when SSN is the foreign key in
DEPARTMENT instead of DNUMBER being the Foreign key
in EMPLOYEE?
20
Candidate Keys of a relation
• A candidate key for a relation is a set of its
attributes that satisfy:
– Uniqueness.
• The values of the attributes uniquely identify a tuple.
– Minimality.
• No proper subset of the attributes has the uniqueness
property.
• If uniqueness is satisfied (but not necessarily
minimality) the attributes are said to form a
superkey.
21
Candidate Keys of a relation Examples
• Example: Employee Table
• key – {SSN}
• Candidate key – {SSN}
• Superkeys – {SSN}, {SSN, ENAME},
{SSN, ENAME, BDATE}
• {SSN, ENAME} and {SSN, ENAME, BDATE} are
superkeys but not a candidate key.
22
Candidate keys vs Primary Keys
• Note that the concept of a candidate key is defined
with respect to the relation (schema), and not with
respect to any particular instance of the relation.
• The primary key of a relation in a DBMS should be
a candidate key, but there could be several
candidate keys to choose from. When talking
about normalization, it is irrelevant which key is
chosen as primary key.
23
Functional Dependencies
• Functional dependencies (FDs) are used to
specify formal measures of the "goodness" of
relational designs
• FDs and keys are used to define normal forms
for relations
• FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes
• FDs specify which attributes in a table or entity
class are determined by other attributes
24
Simple Functional Dependencies (FD's)
• Dependencies among attributes
A→B
A functionally determines B
B functionally depends on A
The value of A uniquely determines a single value for B
If two or more tuples (of a specific table or entity class)
have the same value for A, they have the same value
for B
(e.g. Every employee that has the same value for DNUMBER –
e.g. 5 has the same value for DNAME – e.g. Research)
25
FD's for a Normalized Example
EMPLOYEE ( SSN, ENAME, BDATE, ADDRESS)
SSN → ENAME
SSN → BDATE
SSN → ADDRESS
SSN can be used to lookup (and
therefore uniquely determine) all
the other attributes of an Employee
tuple
This can also be written as
SSN → ENAME, BDATE, ADDRESS
or SSN → { ENAME, BDATE, ADDRESS }
Also
SSN → SSN
(this is a trivial FD, which we usually don't write)
26
An Example Functional Dependency
27
Simple Candidate Keys
• A simple candidate key is any attribute of an entity class
or table which uniquely identifies a tuple UNIQUE and
NOT NULL
Employee
EMPNO
SSN
ENAME
BDATE
ADDRESS
Simple
Candidate
Keys
Both EMPNO and SSN uniquely
identify an employee
A designer chooses a primary key from one
of the candidate keys
28
Determinants & Dependents
empno
Determinant
→
addr
Dependent
In a Simple FD,
the determinant
is a single attribute
29
Example tables suffering from poor designs; by
mixing attributes from distinct real-world entities.
30
Example tables suffering from poor designs; by
mixing attributes from distinct real-world entities.
31
FD- Full Functional Dependency
A full functional dependency X  Y is a functional
dependency where the removal of any attributes from X
will render X Y invalid.
• For example, if we have a primary key
{EMPNO, SSN}
and we define an Full FD based on it, then
{EMPNO, SSN}  {ENAME, BDATE, ADDRESS}
• If either EMPNO or SSN is missing, then the FD no longer
holds (i.e. it will become invalid) because both are required
for the primary key
32
Inference rules
• (Reflexive) : If X is a subset of Y, then XY
• (Augmentation): If XYthen XZYZ
•
•
•
•
– (Notation: XZ stands for X U Z)
(Transitive): If XY and YZ, then XZ
(Decomposition): If XYZ, then XY
(Union): If XY and XZ, then XYZ
(Pseudotransitive):
If XY and WYZ, then WXZ
33
Inference rules for functional
dependencies - Example
• Given F =
{SSN {ENAME, BDATE, ADDRESS, DNUMBER},
DNUMBER  {DNAME, DMGRSSN} }
we can infer
SSN {DNAME, DMGRSSN}
SSN  SSN
DNUMBER  DNUMBER
34
Question 1
Examine Table 1 shown below. This table represents
the hours worked per week for temporary staff at each
branch of a company.
Question 1
a)Describe the concept of functional dependency.
b)Identify the functional dependencies represented
by the data shown in the Table 1. State any
assumptions you make about the data (if
necessary).
c)Table 1 is susceptible to anomalies. Provide
examples of how such anomalies could occur on
this table.
References
• Fundamentals of Database Systems, by
Elmasri and Navathe (chapter 10)
• http://www.openlineconsult.com/db
• Database Systems by Carolyn E. Begg,
Thomas M. Connolly (chapter 13 and 14)
Download