Student Relation ITCS 3160 Jing Yang 2010 Fall

advertisement
11/17/2010
ITCS 3160
Jing Yang
2010 Fall
Class 16: Basics of Functional
Dependencies and Normalization for
Relational Databases
(ch.15)
1
Student Relation
Is this a good db design? Can you suggest a better design?
2
1
11/17/2010
Informal Design Guidelines
for Relation Schemas
• Measures of quality
– Making sure attribute semantics are clear
– Reducing redundant information in tuples
– Reducing NULL values in tuples
– Disallowing possibility of generating spurious p
tuples
3
Imparting Clear Semantics to Attributes in Relations
• Semantics of a relation – Meaning resulting from interpretation of attribute values in a tuple
• Easier to explain semantics of relation
– Indicates better schema design
4
2
11/17/2010
Guideline 1
• Design relation schema so that it is easy to explain its meaning
• Do not combine attributes from multiple entity types and relationship types into a single relation
5
Redundant Information in Tuples and Update Anomalies
• Redundancy is at the root of several problems associated with relational schemas:
associated with relational schemas:
– redundant storage, update anomalies
• Storing natural joins of base relations leads to update anomalies
• Types of update anomalies:
– Insertion
– Deletion
– Modification
6
3
11/17/2010
• What if you want to add a tuple with advisor id 440?
• What if you delete the first and the last tuples?
• What if you want to update advisor_office for one advisor?
7
Guideline 2
• Design base relation schemas so that no update anomalies are present in the relations
d t
li
t i th
l ti
• If any anomalies are present:
– Note them clearly
– Make sure that the programs that update the p
y
database will operate correctly
8
4
11/17/2010
NULL Values in Tuples
• May group many attributes together into a “f t” l ti
“fat” relation
– Can end up with many NULLs
• Problems with NULLs
– Wasted storage space
– Problems understanding meaning
Problems understanding meaning
9
10
5
11/17/2010
Guideline 3
• Avoid placing attributes in a base relation whose values may frequently be NULL
h
l
f
tl b NULL
• If NULLs are unavoidable:
– Make sure that they apply in exceptional cases only, not to a majority of tuples
11
Generation of Spurious Tuples
• spurious tuples: meaningless tuples produced b
by a natural join
t lj i
12
6
11/17/2010
Guideline 4
• Design relation schemas to be joined with equality conditions on attributes that are lit
diti
tt ib t th t
appropriately related – Guarantees that no spurious tuples are generated
• Avoid relations that contain matching attributes that are not (foreign key, primary
attributes that are not (foreign key, primary key) combinations
13
Discussion
• Anomalies cause redundant work to be done
• Waste of storage space due to NULLs • Generation of invalid and spurious data during joins
14
7
11/17/2010
Normalization
• ideas → E/R → relations → better (normalized) relations relations
• Normalization: process of making relations better by decomposing them into smaller relations to – reduce redundancy – eliminate update anomalies • final goal: all relations in Boyce‐Codd Normal Form (BCNF)
• Analytic tool: Functional Dependency
15
Functional Dependency (cont’d)
• R is a relation, A and B are its attributes. R.A ‐> R.B if and only if for each value of A no more than one value of B is associated. Namely if t1 and t2 are two tuples in the relation R and t1(A) t2(A) then we must have t1(B) = t2(B)
t1(A) = t2(A)
then we must have t1(B) t2(B)
16
8
11/17/2010
Student Relation
17
Functional Dependency (cont’d)
• R is a relation, R.A ‐> R.B • A and B can be sets of attributes
18
9
11/17/2010
Functional Dependency (cont’d.)
• Given a populated relation
– Cannot determine which FDs hold and which do not – Unless meaning of and relationships among attributes known
– Can state that FD does not hold if there are tuples that show violation of such an FD
19
Normalization of Relations
• Assumption: a set of FD is given to each relation; each relation has a designated l ti
h l ti h
d i t d
primary key • Takes a relation schema through a series of tests – Certify whether it satisfies a certain normal form
Certify whether it satisfies a certain normal form
– Proceeds in a top‐down fashion
• Normal form tests
20
10
11/17/2010
Normalization of Relations (cont’d.)
• Properties that the relational schemas should have:
– Nonadditive join property
• Must be achieved at any cost
– Dependency preservation property
• desirable
21
Practical Use of Normal Forms
• Normalization carried out in practice – Resulting designs are of high quality and meet the desirable properties stated previously
– Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF
• Do not need to normalize to the highest possible normal form
22
11
11/17/2010
Definitions of Keys and Attributes Participating in Keys
• Definition of superkey and key (key has to be minimal)
i i l)
• Candidate key
– If more than one key in a relation schema
• One is primary key
• Others are secondary keys
23
24
12
11/17/2010
First Normal Form
• Part of the formal definition of a relation in th b i (fl t) l ti
the basic (flat) relational model
l
d l
• Only attribute values permitted are single atomic (or indivisible) values
25
Practice
26
13
11/17/2010
First Normal Form (cont’d.)
• Techniques to achieve first normal form
– Remove attribute and place in separate relation
– Expand the key
– Use several atomic attributes
Question: Which approach is the best?
Question: Which approach is the best? 27
First Normal Form (cont’d.)
• Does not allow nested relations • To change to 1NF:
To change to 1NF:
– Remove nested relation attributes into a new relation
– Propagate the primary key into it
– Unnest relation into a set of 1NF relations
28
14
11/17/2010
• employee (man#, name, birthdate)
• jobhistory (man#, jobdate, title)
• salaryhistory (man#, jobdate, salarydate, salary)
• children (man#, childname, birthyear)
29
Second Normal Form
• Based on concept of full functional d
dependency
d
– Versus partial dependency
Consider a new column called “Employ_Name” is added into this table
30
15
11/17/2010
Second Normal Form
• Practice: second normalize the previous table with “Emplay_name” attribute”
• Second normalize into a number of 2NF relations – Nonprime attributes are associated only with part of primary key on which they are fully functionally dependent
31
Practice
32
16
11/17/2010
Third Normal Form
• Based on concept of transitive dependency
– X ‐> Y, Y‐>Z and Y is neither a candidate key or a subset of any key
• Example: 33
• Problematic FD
– Left‐hand side is part of primary key
– Left‐hand side is a nonkey attribute
34
17
11/17/2010
Practice
35
General Definition of Third Normal Form
A functional dependency X → Y is trivial if Y is a subset of X
36
18
11/17/2010
Boyce‐Codd Normal Form
• Every relation in BCNF is also in 3NF
– Relation in 3NF is not necessarily in BCNF
• Difference:
– Condition which allows A to be prime is absent p
from BCNF
• Most relation schemas that are in 3NF are also in BCNF
37
Example
38
19
11/17/2010
Interesting Example
• Is this relation in BCNF?
• Is there any redundancy? 39
Multivalued Dependency
Ename ->-> Pname
Ename ->->Dname
40
20
11/17/2010
Fourth Normal Form
• Fourth normal form (4NF)
Fourth normal form (4NF)
– Violated when a relation has undesirable multivalued dependencies
41
Fifth Normal Form
• A table is in the fifth normal form (5NF) if it cannot have a lossless decomposition into any th
l l
d
iti i t
number of smaller tables. 42
21
11/17/2010
Practice
• CAR_SALE (CarID, Option_type, O ti
Option_Listprice, Sale_date, Li t i S l d t
Discounted_price)
• CarID → Sale_date
• Option_type→ Option_Listprice
• CarID, Option_type
CarID Option type → Discounted_price
Discounted price
43
Practice
• Student (Sid, Name, Age, Did, Dname) 44
22
11/17/2010
Practice
• Car_Sale(Car#, Date_sold, Salesperson#, C
Commission%, Discount_amt)
i i % Di
t
t)
• Date_sold‐>Discount_amt
• Salesperson#‐>Commission%
45
Homework 3
• Textbook: 15.30, 15.31
46
23
11/17/2010
47
Normalize the following tables
L_C (Loan_no, Amount, Type, Ssn, Phone, Name, Add )
Addr)
48
24
11/17/2010
Author (AID, Aname)
Book (Bid, Btitle, Pid, Year) Publisher(Pid, Pname, Paddress)
Author_Book(AID, Bid)
Queries:
1. Find the title and year of
all books published by
“Addison Wesley” (a
publisher)
2. Find all authors who have
published any books by
“Addison Wesley”
3. How many books are
published by “Addision
Wesley” in year 2009?
4. Print all the name and the
number of books
published by each author.
5. Which publisher published
the largest number of
50
books?
25
Download