Database Systems: Design, Implementation, and

advertisement
4
Building the Database
E-R Diagrams and
Normalization of Database
Tables
MIS 304 Winter 2006
4
Review of Last Class
• The relational database is set of linked tables.
• The links are accomplished by using keys:
– Primary keys to identify a unique value in each row
(entity instance).
– Foreign keys use an attribute from another table(s)
primary key to connect the tables.
• In order to query two linked tables we need to do
a join.
– Joins are powerful but potentially costly in terms
of computer resources.
2
4
Class Objective
•
•
•
•
•
Fill out our understanding of E-R diagrams.
Look at how databases get “loaded”
Understand the concept of a “Normal Form”
Understand why these concepts are important.
Be able to move a database or table from one
level “Form” to another.
3
4
The Building Blocks of an E-R Diagram
•
•
•
•
Entity
Attribute
Relationship
Identifier
4
4
Data Models: Degrees of Data
Abstraction
Figure 3.1
5
4
The Entity Relationship (E-R) Model
• Represents conceptual view
• Main Components
– Entities
• Corresponds to entire table, not row
• Represented by rectangle
– Attributes
– Relationships
6
4
Attributes
• Characteristics of entities
• Domain is set of possible values
• Primary keys underlined
Figure 3.6
7
4
The Entity Relationship (E-R) Model
– A derived attribute is not physically stored
within the database; instead, it is derived by
using an algorithm.
• Example: AGE can be derived from the data of
birth and the current date.
8
Basic E-R Model Entity Presentation
4
9
4
Multivalued Attribute in Relational DBMS
• The relational DBMS cannot implement multivalued
attributes.
• Possible courses of action for the designer
– Within the original entity, create several new attributes,
one for each of the original multivalued attribute’s
components (Figure 4.9).
– Create a new entity composed of the original
multivalued attribute’s components
10
A New Entity Set Composed
of Multivalued
Attribute’s Components
4
11
Splitting the Multivalued Attributes
into New Attributes
4
12
4
Connectivity and Cardinality in an ERD
Figure 3.12
13
Weak Entity
4
• Existence-dependent on another entity
• Has primary key that is partially or totally
derived from parent entity
14
Relationship Participation
4
• The participation is optional if one entity
occurrence does not require a corresponding
entity occurrence in a particular relationship.
• An optional entity is shown by a small circle on
the side of the optional entity.
15
Relationship Degree
4
Unary
– Single entity, “Recursive”
– Exists between occurrences of same entity set
• Binary
– Two entities associated
• Ternary
– Three entities associated
16
4
17
4
The Chen Representation of the Invoicing Problem
18
The Crow’s Foot Representation
of the Invoicing Problem
4
19
4
Database Tables and Normalization
• Normalization is a process for assigning
attributes to entities. It reduces data
redundancies and helps eliminate the data
anomalies.
• Normalization works with data modeling to make
the database more useful.
• The highest level of normalization is not always
the most desirable.
20
4
Normalization Stages
• Normalization works through a series of stages
called normal forms:
–
–
–
–
–
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Fourth normal form (4NF)
Fifth normal form (5NF)
21
4
The Need for Normalization
• Case of a Construction Company
– Building project -- Project number, Name, Employees
assigned to the project.
– Employee -- Employee number, Name, Job classification
– The company charges its clients by billing the hours spent
on each project. The hourly billing rate is dependent on the
employee’s position.
– Periodically, a report is generated.
– The table whose contents correspond to the reporting
requirements is shown in Table 5.1.
22
4
Artifacts
• Companies have ingrained processes and
procedures.
• There are often visible by looking at the paper
trail.
• Data reports and forms are a big part of that trail.
• You can look at the output artifacts as well as the
input ones.
23
4
24
A Table Whose Structure
Matches the Report Format
4
25
4
Database Tables and Normalization
• Problems with the Figure 5.1
– The project number is intended to be a primary key, but
it contains nulls.
– The table displays data redundancies.
– The table entries invite data inconsistencies.
– The data redundancies yield the following anomalies:
• Update anomalies.
• Addition anomalies.
• Deletion anomalies.
26
4
Conversion to First Normal Form
• A relational table must not contain repeating groups.
– Repeating groups can be eliminated by adding the
appropriate entry in at least the primary key column(s).
27
Data Organization: First Normal Form
4
28
4
What do you notice?
• How much redundancy is there in the table?
• What are the keys?
29
4
Another Example
Region
West
Salesperson
Jones
Smith
East
Brinkley
Month
Total
Jan
1550
Feb
1290
Jan
1990
Feb
2010
Jan
2202
30
4
Tools to help in Normalization
• Data Models
• Dependency Diagrams, A new tool.
31
4
Data Models
• Last time we saw how to use a data model to
evaluate the Cost/Benefit of adding and removing
relations between tables.
32
4
Dependency Diagram (1NF)
• Dependency Diagram
– The primary key components are bold, underlined, and shaded in a
different color.
– The arrows above entities indicate all desirable dependencies, i.e.,
dependencies that are based on PK.
– The arrows below the dependency diagram indicate less desirable
dependencies -- partial dependencies and transitive dependencies.
Figure 4.4
33
4
Forms of Dependency
• Dependency
• Partial Dependency
• Transitive Dependency
34
4
Database Tables and Normalization
• 1NF Definition
– The term first normal form (1NF) describes the
tabular format in which:
• All the key attributes are defined.
• There are no repeating groups in the table.
• All attributes are dependent on the primary key.
35
4
Conversion to Second Normal Form
• Starting with the 1NF format, the database can be
converted into the 2NF format by using the
Dependency Diagram and:
– Writing each key component on a separate line, and
then writing the original key on the last line and
– Writing the dependent attributes after each new key.
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS,
CHG_HOUR)
ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
36
Second Normal Form (2NF) Conversion Results
4
37
4
2NF Definition
• A table is in 2NF if:
– It is in 1NF and
– It includes no partial dependencies; that is, no
attribute is dependent on only a portion of the
primary key.
(It is still possible for a table in 2NF to exhibit transitive
dependency; that is, one or more attributes may be
functionally dependent on nonkey attributes.)
38
4
Effect on Queries
• How does this differ from the 1NF version.
• How does this effect the SQL queries you are
likely to run on the database.
– Number of “Joins”
– Number of “Sub Queries”
39
4
Conversion to Third Normal Form
• Create a separate table with attributes in a transitive
functional dependence relationship.
PROJECT (PROJ_NUM, PROJ_NAME)
ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
40
4
3NF Definition
• A table is in 3NF if:
– It is in 2NF and
– It contains no transitive dependencies.
41
The Completed Database
4
42
4
The Query Effects
• Now how many “Joins” do we have?
• What is stored where?
43
4
Additional DB Enhancements
Figure 4.6
44
4
Boyce-Codd Normal Form (BCNF)
• A table is in Boyce-Codd normal form (BCNF) if every
determinant in the table is a candidate key.
(A determinant is any attribute whose value determines other
values with a row.)
• If a table contains only one candidate key, the 3NF and the
BCNF are equivalent.
• BCNF is a special case of 3NF.
45
4
3NF Table Not in BCNF
Figure 4.7
46
Sample Data for a BCNF Conversion
4
Table 5.2
47
4
Decomposition into BCNF
Figure 4.9
48
4
Normalization and Database Design
• Database Design and Normalization Example:
(Construction Company)
– Summary of Operations:
• The company manages many projects.
• Each project requires the services of many employees.
• An employee may be assigned to several different projects.
• Some employees are not assigned to a project and perform
duties not specifically related to a project. Some employees are
part of a labor pool, to be shared by all project teams.
• Each employee has a (single) primary job classification. This
job classification determines the hourly billing rate.
• Many employees can have the same job classification.
49
4
Normalization and Database Design
• Normalization should be part of the design
process
• E-R Diagram provides macro view
• Normalization provides micro view of entities
– Focuses on characteristics of specific entities
– May yield additional entities
• Difficult to separate normalization from E-R
diagramming
• Business rules must be determined
50
4
Initial ERD for Contracting Company
Figure 4.10
51
4
Normalization and Database Design
• Three Entities After Transitive Dependency Removed
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME,
EMP_INITIAL, JOB_CODE)
JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
52
4
Modified ERD for
Contracting Company
Figure 4.11
53
4
Initial ERD for Contracting Company
Figure 4.10
54
4
Modified ERD for
Contracting Company
Figure 4.11
55
4
Final ERD for
Contracting Company
Figure 4.12
56
4
Normalization and Database Design
• Attribute ASSIGN_HOUR is assigned to
the composite entity ASSIGN.
• “Manages” relationship is created
between EMPLOYEE and PROJECT.
PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM)
EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME,
EMP_INITIAL, EMP_HIREDATE, JOB_CODE)
JOB (JOB_CODE, JOB_DESCRIPTION,
JOB_CHG_HOUR)
ASSIGN (ASSIGN_NUM, ASSIGN_DATE, PROJ_NUM,
EMP_NUM, ASSIGN_HOURS)
57
The Relational Schema For
The Contracting Company
4
58
4
Higher-Level Normal Forms
• 4NF Definition
– A table is in 4NF if it is in 3NF and has no multiple sets
of multivalued dependencies.
59
4
Conversion to 4NF
Figure 4.15
Set of Tables in 4NF
Figure 4.14
Multivalued Dependencies
60
4
Multivalued Attributes
• Can you think of a kind of database that has lots
of Multivalued attributes?
61
4
Fifth normal form (5NF)
•
•
•
A table can be reconstructed from other tables
There exists some rule that enables a relation to be inferred
Base case
– Consultants provide skills to one more firms and firms can use many
consultants; a consultant has many skills and a skill can be used by
many firms; and a firm can have a need for many skills and the same
skill can be required by many firms
CONSULTANT
ASSIGNMENT
*consultid
…
FIRM
*firmid
…
SKILL
*skilldesc
…
62
4
Fifth normal form (5NF)
• The rule
– If a consultant has a certain skill (e.g., database) and
has a contract with the firm that requires that skill (e.g.,
IBM), then the consultant advises the firm on that skill
(i.e., he advises IBM on database)
CONSULTANT
CONTRACT
*consultid
...
ADVISE
FIRM
*firmid
...
SKILL
REQUIRE
*skilldesc
...
63
4
Why is this useful?
• Multi Locations
• Disaster Recovery
• Query optimization
64
4
Denormalization
• Normalization is one of many database design
goals
• Normalized table requirements
– Additional processing
– Loss of system speed
• Normalization purity is difficult to sustain due to
conflict in:
– Design efficiency
– Information requirements
– Processing
65
4
Unnormalized Table Defects
• Data updates less efficient
• Indexing more cumbersome
• No simple strategies for creating views
66
The Initial 1NF Structure
4
67
4
Identifying the Possible PK Attributes
68
4
Table Structures Based On The Selected PKs
69
4
70
4
Closing the Loop
• Data Modeling, Table Design, and Normalization fit into an
overall Systems Development Life Cycle (SDLC) that
describes the history of any project.
• The Database component(s) of the system exhibits its
own independent Data Base Life Cycle (DBLC)
• The Data Base Administrator’s job is to shepherd the
database through the:
Analysis > Design > Implementation > Support
path that characterizes the SDLC.
• At any two points in history the same data may be
organized very differently.
71
Systems Development Life Cycle
4
72
Database Lifecycle (DBLC)
4
Figure 6.3
73
4
How it fits together
74
4
Execution
• None of these paths are Prescribed in this
methodology.
• Time is often the enemy and multiple trips
through the system slows the project down.
• Remember the Brooks “Mythical Man Month” e.g.
No nine women, working together in any known
way, cannot produce a baby in one month.
75
Download