The Entity-Relationship Model

advertisement
ISOM
MIS710 Module 1b
Relational Model and Normalization
Arijit Sengupta
Structure of this semester
ISOM
MIS710
1. Design
0. Intro
Database
Fundamentals
Conceptual
Modeling
Relational
Model
2. Querying
Query
Languages
Advanced
SQL
3. Applications
4. Advanced
Topics
Java DB
Applications –
JDBC
Transaction
Management
Data
Mining
Normalization
Newbie
Users
Designers
Developers
Professionals
Today’s Buzzwords
ISOM
• Relational Model
• Superkey, Candidate Key, Primary Key
and Foreign Key
• Entity Integrity Rule
• Referential Integrity Rule
• Normalization
• First, Second, Third, and Boyce-Codd
Normal Forms
• Unnormalization
Objectives of this lecture
ISOM
• Understand the Relational Model and its properties
• Understand the notion of keys
• Understand the use and importance of referential
integrity
• Provide an alternative way to design relations using
semantics rather than concepts
• Take an existing “flat file” design and creating a
relational design from it through the process of
Normalization
• Identify sources of problems (or anomalies) within a
given relational design
• Argue about improvements to designs created by
others
Relational Data Model
ISOM
• Originally proposed by Codd in 1970
• Based on mathematical set theory
Attribute Names
Relation
Tuples
ID
S1
S2
S3
S4
S5
Name
Jose
Alice
Lin
Joyce
Sunil
Age
21
18
32
20
27
Attributes
Address
Stoned Hill
BigHead
Done-Audy
Atlanta
Mare-iota
GPA
3.1
3.2
2.9
3.7
3.2
Attribute
Values
Relation: Properties
ISOM
• A relation is a set of tuples
• A tuple is a set of attribute-value properties
(relations)
 Ordering of attributes is immaterial
 Ordering of Tuples is immaterial
• Tuples are distinct from one another
• Attributes contain atomic values only
Emp#
E1
Name
Jose' 'M.' 'Smith'
Address
3413 Main Street', 'Atlanta', GA
Attributes
ISOM
• Attribute name
 Attribute names are unique within a relation
• Attribute domain
 Set of all possible values an attribute may
take
 Domain (GPA) =
 Domain (name) =
 Domain (DateOfBirth) =
 Domain (year)
• Number of attributes: degree of the
relation
Tuples
ISOM
• Aggregation of attribute values
 S1 = (s1, ‘Jose’, 21, ‘StonedHill’, 3.1)
 S2 = (s2, ‘Alice’, 18, ‘BigHead’, 3.2)
• Cardinality: Number of tuples in a relation
ID
S1
S2
S3
S4
S5
Name
Jose
Alice
Lin
Joyce
Sunil
Age
21
18
32
20
27
Address
Stoned Hill
BigHead
Done-Audy
Atlanta
Mare-iota
GPA
3.1
3.2
2.9
3.7
3.2
• What is the difference between the cardinality and the
degree?
Primary Keys
ISOM
• Superkey: SK, a subset of attributes of R,
satisfying Uniqueness, that is, no two tuples
have the same combination of values for these
attributes
• Candidate Key: K, a superkey SK, satisfying
minimality, that is, no component of K can be
eliminated without destroying the uniqueness
property.
• Primary Key: PK, the selected Candidate key, K.
• Can a primary key be composed of multiple
attributes?
• Can a relation have multiple primary keys?
Keys - example
ISOM
Disk: (ISBN#, Artist_name, Album_name, Year, Producer, Genre, time, price)
• Superkeys?
• Candidate keys?
• Primary key?
Entity Integrity Rule
ISOM
• The primary key of a base relation
cannot contain a NULL value.
• Enforcement of the rule:
An update which results in a NULL value
in the primary key must be rejected.
• Are the following ok?
Primary Key
Course
201
201
NULL
Section
1
NULL
NULL
Meets
MW
TTh
MWF
Enrolled
20
25
18
Foreign Key
ISOM
Physician (ID, Name, …)
Patient (ID, Name, PhysID*, …)
Club (ID, Name, …)
Player (ID, Name, ?*, …)
Order (OrdID, Date, …, ?*)
Customer (ID, Name, …, ?*)
Dept (DeptID, Name, …, ?*)
Employee (EID, Name, …, ?*)
• Attribute(s) of one relation that reference(s) the PK of
another relation
• FK may or may not be (a part of) the PK of this relation
Course (CourseID, Name, …, ?*)
Student (SID, Name, …, ?*)
•
•
Class (ClassID, Meets, …, ?*)
Registration (?)
Can an FK refer to a part of the PK of another relation?
Can an FK refer to a PK of the same relation?
Foreign Key ..
ISOM
• FK and referenced PK may have different
names
• The values of FK must draw from the value set
of PK
Primary Key
Value Set
Foreign Key
Domain
Domain
• How do we define the Domain of an FK?
• Can an FK have a NULL value?
• What can we enforce with PKs and FKs?
Referential Integrity Rule
ISOM
• If FK is the foreign key of a relation R2, which matches the
primary key PK of the relation R1, then:
 the FK value must match the PK value in some tuple of R1,
or
 the FK value may be NULL, but only if the FK is not (a part
of) the PK of R2.
• Enforcement of the Rule
 An update on either a referenced PK or an FK must satisfy
the rule. Otherwise, the operation is rejected.
•
•
Which operation on the primary key may violate this rule?
Which operation on the foreign key may violate this rule?
Referential Integrity
Enforcement
ISOM
• If an operation violates referential integrity:
 Restrict
• reject the operation
 Cascade
• try to propagate the operation to all dependent FK
values, if it is not possible, reject the operation
 Nullify (or Default)
• set all dependent FK values to NULL (or a default
value), if that is not possible, reject the operation
• Cases for each of the above situations?
Creating Relations
ISOM
create table STUDENT (
ID char (11) not null primary key,
Name char(30) not null,
age int,
GPA number (2,1));
create table COURSE (
courseno char (6) not null primary key,
coursename char(30) not null,
credithours number (2,1));
create table REGISTRATION (
ID references STUDENT (ID)
on delete cascade,
CourseNum references COURSE (courseno),
primary key (ID, CourseNum) );
Normalization - Motivating Example
ISOM
SID
s1
s1
s1
s2
s2
s3
s3
s3
Name
Joseph
Joseph
Joseph
Alice
Alice
Tom
Tom
Tom
Grade
A
B
A
A
A
B
B
A
Course#
CIS800
CIS820
CIS872
CIS800
CIS872
CIS800
CIS872
CIS860
Text
b1
b2
b5
b1
b5
b1
b5
b1
Major
CIS
CIS
CIS
CS
CS
Acct
Acct
Acct
Dept
CIS
CIS
CIS
MCS
MCS
Acct
Acct
Acct
• Is there any redundant data?
• Can we insert a new course# with a new
textbook?
• What should be done if ‘CIS’ is changed to
‘MIS’?
• What would happen if we remove all CIS 800
students?
Why Normalization?
ISOM
• Poor Relation Design causes Anomalies
 Insertion anomalies - Insertion of some piece
of information cannot be performed unless
other irrelevant information is added to it.
 Update anomalies - Update of a single piece
of information requires updates to multiple
tuples.
 Deletion anomalies - Deletion of a piece of
information removes other unrelated but
necessary information.
• Normalization improves the design to
remove these anomalies
Why Normalization?
ISOM
• Benefits
 contain minimum amount of redundancy
 allow users to insert, delete and modify
tuples in the relation without errors or
inconsistencies.
 improve quality of information in the database
 decrease storage space for the database
• Costs
 may contribute to performance problems
 may require more storage in some cases
Unnormalized Relation
ISOM
STUDENT STUDENT COURSE
ID
NAME
ID
224 Waters
CIS20
CIS40
CIS50
351 Byron
CIS30
CIS50
421 Smith
CIS20
CIS30
CIS50
COURSE
NAME
Intro CBIS
Database Mgt
Sys.Analysis
COBOL
Sys.Analysis
Intro CBIS
COBOL
Sys.Analysis
INSTR
NAME
Greene
Hong
Purao
Brown
Purao
Greene
Brown
Purao
ROOM
CREDITS GRADE
205G
311S
139S
629G
139S
205G
629G
139S
5
5
5
3
5
5
3
5
A
B
B
B
C
B
B
B
• Create a ‘Definition’ for this relation.
• Do you see any problems in the definition?
• Do you see any anomalies in the data?
Normal Forms
ISOM
NF2
1NF
2NF
3NF
BCNF
Unnormalized Relation
Only atomic attributes
First Normal Form
Remove nonkey dependency
Second Normal Form
Remove transitive dependency
Third Normal Form
Dependency preservation: BCNF
Remove Multi-valued Dependencies: 4NF
Remove Join Dependencies: 5NF
Higher Order Forms
The Basis of Normalization
ISOM
• Functional Dependency (FD)
 Consider two attributes, X and Y, and two
arbitrary tuples r1 and r2 of a relation R.
• Y is functionally dependent on X iff:
value of x in r1 = value of x in r2
implies
value of Y in r1 = value of Y in r2
• Also stated as: R.X  R.Y or X  Y
Properties of FDs
ISOM
• If R.X  R.Y or X  Y
 X is called the determinant of Y.
 X may or may not be the key attribute of R.
 A FD changes with its semantic meaning
• Name  Address?
 X and Y may be composite
 X and Y may be mutually dependent on each other
• Husband  Wife, Wife  Husband
 The same Y value may occur in multiple tuples
• Course#  Text
Fully Functional Dependencies
ISOM
• When is X  Y a FFD?
When Y is not functionally dependent on any proper subset
of X
• X  Y is a fully functional dependency ( FFD )
( SID, Course# )  Name? ( SID, Course# )  Grade?
( SID, Name )  Major?
( SID, Name )  SID?
• By default, the term FD refers to FFD
Transitive Dependencies
ISOM
• Given attributes X, Y, and Z of a relation R,
• Z is transitively dependent on X (X  Z)
iff
X  Y and Y  Z
• For example:
SID  Dept, SID  Major,
Dept  School, Major  Dept
• Do you see any Transitive Functional Dependencies?
Some Inference Rules for FDs
ISOM
•
An FD is redundant if it can be derived from other FDs based on
a set of inference rules. Some of these rules are:
•
Reflexive rule: If X  Y, then X  Y
 X always determines a subset of itself.
•
Augmentation rule: If X  Y, then XZ  YZ
 Adding an attribute(s) on both side does not change the FD.
•
Transitive rule: If X  Y & Y  Z, then X  Z
 Functional dependencies can be ‘chained’.
•
•
Decomposition rule: If X  YZ, then X  Y and X  Z
Given: { SID  Name, SID  Major, Major  Dept }, which
ones is/are redundant?
SID  School, SID  Dept, Dept  School
SID  ( Name, Major ), (SID, Name)  (Major, Name)
SID  SID, SID  (Name, SID)
First Normal Form
ISOM
• DEFINITION
 A relation R is in first normal form (1NF) if and
only if all underlying domains contain atomic
values only.
• Translation
 To be in first normal form the table must not
contain any repeating attributes.
• Implication
 Are all ‘relations’ in First Normal Form (1NF) ?
Example - 1NF
ISOM
The ‘unnormalized’ relation
has been decomposed in two.
StudentID
224
251
421
StudentName
Waters
Byron
Smith
Relation: Student
Relation: Student-Course
StudentID
224
224
224
351
351
421
421
421
Course#
CIS20
CIS40
CIS50
CIS30
CIS50
CIS20
CIS30
CIS50
Course Title
Intro CBIS
Database Mgt
Sys.Analysis
COBOL
Sys.Analysis
Intro CBIS
COBOL
Sys.Analysis
Instrname
Greene
Hong
Purao
Brown
Purao
Greene
Brown
Purao
• What are the PKs?
ROOM
205G
311S
139S
629G
139S
205G
629G
139S
CREDITS
5
5
5
3
5
5
3
5
GRADE
A
B
B
B
C
B
B
B
Anomalies (with only 1NF)
ISOM
• Insertion Anomaly
 A new course cannot be inserted in the database (relation
Student-Course) until a student registers for that course.
• Update Anomaly
 If the instructor of a course is changed, this fact would
have to be noted at many places in the database (many
tuples of the relation Student-Course).
• Deletion Anomaly
 Withdrawal of all students from an existing course (that is,
deletion of related tuples from the relation StudentCourse) will result in unwarranted removal of that course
from the database.
Anomalies in 1NF
ISOM
Course (SID, Name, Grade, Course#, Text, Major, Dept)
• 1NF Relations have anomalies
 Redundant Information ?
 Update Anomalies ?
 Insertion Anomalies ?
 Deletion Anomalies ?
Major
SID
Name
Course#
Grade
Dept
Text
Second Normal Form
ISOM
• DEFINITION
 A relation R is in second normal form (2NF) if
and only if it is in 1NF and every nonkey
attribute is dependent on the full primary key.
• Translation
 A table is in second normal form if there are
no partial dependencies.
• Implication
 What kinds of primary keys may lead to a
violation of the Second Normal Form (2NF) ?
Bubble Chart
ISOM
• Reconsider the example ..
StudentName
Credits
StudentId+
CourseId
CourseTitle
Instructor
Classroom
Grade
Dealing with Compound
Keys
ISOM
• Revised Bubble Chart
StudentName
Credits
StudentId
CourseTitle
Instructor
CourseId
Classroom
Grade
Example - 2NF
ISOM
STUDENT STUDENT
ID
NAME
224 Waters
251 Byron
421 Smith
COURSE
ID
CIS20
CIS30
CIS40
CIS50
COURSE
TITLE
Intro to CIS
Java
DBMS
Systems Analysis
CREDITS
5
3
5
5
STUDENT COURSE
ID
ID
224 CIS20
224 CIS40
224 CIS50
351 CIS30
351 CIS50
421 CIS20
421 CIS30
421 CIS50
GRADE
A
B
B
B
C
B
B
B
Anomalies with (only) 2NF
ISOM
STUDENT STUDENT STATUS
ID
NAME
224 Waters
Junior
351 Byron
Soph
421 Smith
Junior
ADVISOR
Young
Greene
Young
ADVISOR
OFFICE
CBA221
CBA215
CBA221
ADVISOR TOTAL
PHONE CREDITS
726104
105
718434
77
726104
97
• Insertion anomaly
 Information about a faculty (potential advisor) cannot be
added to the database unless a student is assigned to
him/her.
• Update anomaly
 If the advisor’s office location or phone were changed, many
tuples would need to be changed.
• Deletion anomaly
 If all students assigned to an advisor graduate, information
about the advisor will disappear from the database.
Third Normal Form
ISOM
• DEFINITION
 A relation R is in third normal form (3NF) if and only if
it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key.
• Translation
 A table is in Third Normal Form if every non-key
attribute is determined by the key, and nothing else.
• Implication
 How many total attributes must the relation have for a
possible violation of the Third Normal Form (3NF) ?
3NF Example
ISOM
• Chalk out the relations.
StudentName
StudentId
TotalCredits
Advisor
Status
Advisor
AdvisorOffice
AdvisorPhone
How do you maintain student-advisor relation?
Boyce-Codd Normal Form (BCNF)
ISOM
• Update anomalies occur in an 3NF
relation R if
 R has multiple candidate keys,
 Those candidate keys are composite, and
 The candidate keys are overlapped.
Computer-Lab (SID, Account, Class, Hours)
• A relation R is in BCNF iff every
determinant is a candidate key.
The Normalization Process
ISOM
1.
2.
3.
4.
5.
Flatten the Table Completely (no composite
columns)
Find the Key and “all” FDs (well as many as
you can possibly detect)
Find Partial Dependencies and decompose
relation using them (2NF)
Find Transitive dependencies and decompose
using them (3NF)
Remember – this is not a deterministic method
– depends on the order in which FDs are
chosen, so same Relation, same set of FDs
can lead to different decompositions!
Lossless Decomposition
ISOM
• A bad decomposition loses information
• In a good decomposition
 The join of decomposed relations restores the original
relation
 Decomposed relations can be maintained
independently
• Rissanen’s rule for non-loss decomposition:
Two projections R1 and R2 of a relation R are
independent iff:
 Every FD in R can be logically deduced from those in
R 1 and R 2 , and
 The common attributes of R 1 and R 2 form a
candidate key for at least one of the pair.
Higher Normal Forms
ISOM
• Fourth Normal Form
 Multivalued Dependencies (Fagin 1977)
• Fifth Normal Form
 Join Dependencies (Fagin 1979)
• Other Dependencies
 Inclusion Dependencies (Casanova 1981)
 Template Dependencies (Sadri 1982)
 Domain-Key Normal Form (Fagin 1981)
In-class Exercise – Normalize this:
ISOM
Download