Relational Database Design and Normalization

advertisement
Chapter 2
Relational Database Design and
Normalization
August 2016
1
Conventional Files Versus the Database
• Database Design in Perspective
– To fully exploit the advantages of database technology,
a database must be carefully designed.
– The end product is called a database schema, a
technical blueprint of the database.
– Database design translates the data models that were
developed for the system users during the definition
phase, into data structures supported by the chosen
database technology.
– Subsequent to database design, system builders will
construct those data structures using the language and
tools of the chosen database technology.
2
• Databases
– Database Architecture:
• A systems analyst, or database analyst, designs the
structure of the data in terms of record types, fields
contained in those record types, and relationships that
exist between record types.
• These structures are defined to the database
management system using its data definition language.
– Data definition language (or DDL) is used by the DBMS to
physically establish those record types, fields, and structural
relationships. Additionally, the DDL defines views of the
database. Views restrict the portion of a database that may
be used or accessed by different users and programs. DDLs
record the definitions in a permanent data repository.
3
Programmers
Systems Analysts
and/or
Database Designers
End Users
Host-based
Transaction
Processing
Monitor
(optional)
Data
Manipulation
Language
DML
Data
Definition
Language
DDL
Internal
TP Monitor
(opt)
Proprietary Data
Manipulation
Language and/or
Report Writers
Database Management System (DBMS)
Stored Data
Metadata
4
Database Concepts
• Databases
– Database Architecture:
• Some data dictionaries include formal, elaborate
software that helps database specialists track metadata
– the data about the data –such as record and field
definitions, synonyms, data relationships, validation
rules, help messages, and so forth.
• The database management system also provides a data
manipulation language to access and use the database in
applications.
– A data manipulation language (or DML) is used to create,
read, update, and delete records in the database, and to
navigate between different records and types of records. The
DBMS and DML hide the details concerning how records are
organized and allocated to the disk.
5
Database Concepts
• Databases
– Database Architecture:
• Many DBMSs don’t require the use of a DDL to construct
the database, or a DML to access the database.
– They provide their own tools and commands to perform
those tasks. This is especially true of PC-based DBMSs.
• Many DBMSs also include proprietary report writing and
inquiry tools to allow users to access and format data
without directly using the DML.
• Some DBMSs include a transaction processing monitor
(or TP monitor) that manages on-line accesses to the
database, and ensures that transactions that impact
multiple tables are fully processed as a single unit.
6
Database Concepts
• Databases
– Relational Database Management Systems:
• There are several types of database management
systems and they can be classified according to the way
they structure records.
• Early database management systems organized records
in hierarchies or networks implemented with indexes
and linked lists.
• Relational databases implement data in a series of
tables that are ‘related’ to one another via foreign keys.
– Files are seen as simple two-dimensional tables, also known
as relations.
– The rows are records.
– The columns correspond to fields.
7
Customer
places
Order
sells
Ordered
Product
sold on
Product
8
Customers Table
Customer Number
Customer Name
10112
10113
10114
10117
Luck Star
Pemrose
Hartman
K-Jack Industries
Customer
Balance
…
1455.77
12.14
0.00
- 20.00
Orders
Table
Order
Number
Customer Number
(foreign key)
A633
A634
A635
10112
10114
10112
…
Ordered Products Table
Order
Number
(foreign
key)
Product Number
(foreign key)
Quantity
Ordered
A633
A633
A634
A634
A635
A635
77F02
77B12
77B13
77F01
77B12
77B15
1
500
100
5
300
15
…
Products Table
Product Number
Product Description
Quantity
in Stock
77B12
77B13
77B15
77F01
77F02
Widget
Widget
Widget
Gadget
Gadget
8000
0
52
20
2
…
9
Database Concepts for the Systems Analyst
• Databases
– Relational Database Management Systems:
• Both the DDL and DML of most relational databases
is called SQL (which stands for Structured Query
Language).
– SQL supports not only queries, but complete database
creation and maintenance.
– A fundamental characteristic of relational SQL is that
commands return ‘a set’ of records, not necessarily just a
single record (as in non-relational database and file
technology).
10
Database Concepts for the Systems Analyst
• Databases
– Relational Database Management Systems:
• High-end relational databases also extend the SQL
language to support triggers and stored procedures.
– Triggers are programs embedded within a table that are
automatically invoked by a updates to another table.
– Stored procedures are programs embedded within a table
that can be called from an application program.
• Both triggers and stored procedures are reusable
because they are stored with the tables themselves.
– This eliminates the need for application programmers to
create the equivalent logic within each application that use
the tables.
11
Data Analysis for Database Design
• What is a Good Data Model?
– A good data model is simple.
• As a general rule, the data attributes that describe an
entity should describe only that entity.
– A good data model is essentially non-redundant.
• This means that each data attribute, other than foreign
keys, describes at most one entity.
– A good data model should be flexible and
adaptable to future needs.
• We should make the data models as applicationindependent as possible to encourage database
structures that can be extended or modified without
impact to current programs.
12
Data Analysis for Database Design
• Data Analysis
– Data analysis is a process that prepares a data
model for implementation as a simple, nonredundant, flexible, and adaptable database.
The specific technique is called normalization.
– Normalization is a technique that organizes data
attributes such that they are grouped together to form
stable, flexible, and adaptive entities.
13
Data Analysis for Database Design
• Data Analysis
– Normalization is a three-step technique that places the
data model into first normal form, second normal form,
and third normal form.
• An entity is in first normal form (1NF) if there are no
attributes that can have more than one value for a single
instance of the entity.
• An entity is in second normal form (2NF) if it is already in 1NF,
and if the values of all non-primary key attributes are
dependent on the full primary key – not just part of it.
• An entity is in third normal form (3NF) if it is already in 2NF,
and if the values of its non-primary key attributes are not
dependent on any other non-primary key attributes.
14
Data Analysis for Database Design
• Normalization Example
– First Normal Form:
• The first step in data analysis is to place each entity
into 1NF.
15
sold
PRODUCT
------------Key Data---------------Product-Number (PK1)
Universal-Product-Code (PK2)
--------Non-Key Data------------Quantity-in-Stock
Product-Type
Suggested-Retail-Price
Club-Default-Unit-Price
Current-Special-Unit-Price
Current-Month-Units-Sold
Current-Year-Units-Sold
Total-Lifetime-Units-Sold
MEMBER ORDER
------------------Key Data--------------------Order-Number (PK)
----------------Non-Key Data----------------Order-Creation-Date
Order-Automatic-Fill-Date
Member Number (FK1)
Member-Name
Member-Address
Shipping-Address
Shipping Instructions
Club-Name (FK2)
Promotion-Number (FK2)
0 { Ordered-Product-Description } n
0 { Ordered-Product-Title } n
1 { Quantity-Ordered } n
1 { Purchased-Unit-Price } n
1 { Extended-Price } n
Order-Sub-Total-Cost
Order-Sales-Tax
Ship-Via-Method
Shipping-Charge
Order-Status
Prepaid-Amount
Method-of-Payment
placed
MEMBER
---------------------Key Data---------------------Member-Number (PK1)
------------------Non-Key Data------------------Member-Name
Member-Status
Member-Street-Address
Member-Daytime-Phone-Number
Date-of-Last-Order
Member-Balance-Due
Member-Bonus-Balance-Available
Member-Credit-Card-Information
1 { Club-Name } n
1 { Agreement-Number } n
1 { Taste Code } n
1 { Media Preference } n
1 { Date-Enrolled } n
1 { Expiration-Date } n
1 { Number-of-Credits-Required } n
1 { Number of Credits-Earned } n
enrolls in
CLUB
------------------Key Data---------------------Club-Name (PK)
--------------Non-Key Data-------------------Club-Description
Club-Charter-Date
1 { Agreement-Number } n
1 { Agreement-Active-Date } n
1 { Agreement-Expiration-Date } n
1 { Obligation-Period } n
1 { Required-Number-of-Credits } n
1 { Bonus-Credits-After-Obligation } n
sponsors
is a
generates
MERCHANDISE
-------------Key Data--------------Product-Number (PK1)
Universal-Product-Code (PK1)
---------Non-Key Data-----------Merchandise-Name
Merchandise-Description
Merchandise-Size
Merchasnise-Color
Unit-of-Measure
TITLE
--------------Key Data-------------Product-Number (PK1)
Universal-Product-Code (PK2)
----------Non-Key Data----------Title-of-Work
Title-Cover
Catalog-Description
Copyright-Date
Entertainment-Category
Credit-Value
features
PROMOTION
---------Key Data------------Club-Name (PK1)
Promotion-Number (PK1)
-------Non-Key Data-------Product-Number (FK1)
Promotion-Release-Date
Promotion-Status
Promotion-Type
Automatic-Fill-Delay
is a
AUDIO TITLE
-------------Key Data--------------Product-Number (PK1)
Universal-Product-Code (PK1)
---------Non-Key Data-----------Artist
Audio-Category
Audio-Sub-Category
Number-of-Units-in-Package
Audio-Media-Code
Content-Advisory-Code
VIDEO TITLE
-------------Key Data--------------Product-Number (PK1)
Universal-Product-Code (PK1)
---------Non-Key Data-----------Producer
Director
Video-Category
Video-Sub-Category
Closed-Captioned
Language
Running-Time
Video-media-Type
Video-Encoding
Screen-Aspect
MPA-Rating-Code
GAME TITLE
-------------Key Data--------------Product-Number (PK1)
Universal-Product-Code (PK1)
---------Non-Key Data-----------Manufacturer
Game-Category
Game-Sub-Category
Game-Platform
Game-Media-Type
Number-of-Players
Parent-Advisory-Code
16
MEMBER ORDER (1NF)
------------------Key Data--------------------Order-Number (PK)
----------------Non-Key Data----------------Order-Creation-Date
Order-Automatic-Fill-Date
Member Number (FK1)
Member-Name
Member-Address
Shipping-Address
Shipping Instructions
Club-Name (FK2)
Order-Sub-Total-Cost
Order-Sales-Tax
Ship-Via-Method
Shipping-Charge
Order-Status
Prepaid-Amount
MEMBER ORDER (unnormalized)
------------------KeyData--------------------Order-Number (PK)
---------------Non-Key Data----------------Order-Creation-Date
Order-Automatic-Fill-Date
Member Number (FK1)
Member-Name
Member-Address
Shipping-Address
Shipping Instructions
Club-Name (FK2)
Promotion-Number (FK2)
0 { Ordered-Product-Description } n
0 { Ordered-Product-Title } n
1 { Quantity-Ordered } n
1 { Purchased-Unit-Price } n
1 { Extended-Price } n
Order-Sub-Total-Cost
Order-Sales-Tax
Ship-Via-Method
Shipping-Charge
Order-Status
Prepaid-Amount
Method-of-Payment
sells
CORRECTION
MEMBER ORDERED PRODUCT (1NF)
---------------Key Data-----------------Member-Number (PK1) (FK)
Product-Number (PK1) (FK)
-------------Non-Key Data------------Ordered-Product-Description
Ordered-Product-Title
Quantity-Ordered
Purchased-Unit-Price
Extended-Price
sold as
PRODUCT (1NF)
------------Key Data---------------Product-Number (PK1)
Universal-Product-Code (PK2)
--------Non-Key Data------------Quantity-in-Stock
Product-Type
Suggested-Retail-Price
Club-Default-Unit-Price
Current-Special-Unit-Price
Current-Month-Units-Sold
Current-Year-Units-Sold
Total-Lifetime-Units-Sold
17
CLUB (1NF)
------------------Key Data---------------------Club-Name (PK)
--------------Non-Key Data-------------------Club-Description
Club-Charter-Date
establishes
CLUB (unnormalized)
------------------Key Data---------------------Club-Name (PK)
--------------Non-Key Data-------------------Club-Description
Club-Charter-Date
1 { Agreement-Number } n
1 { Agreement-Active-Date } n
1 { Agreement-Expiration-Date } n
1 { Obligation-Period } n
1 { Required-Number-of-Credits } n
1 { Bonus-Credits-After-Obligation } n
CORRECTION
AGREEMENT (1NF)
----------Key Data----------------Club-Name (PK1) (FK)
Agreement-Number (PK1)
--------Non-Key Data------------Agreement-Active-Date
Agreement-Expiration-Date
Obligation-Period
Required-Number-of-Credits
Bonus-Credits-After-Obligation
18
MEMBER (1NF)
---------------------Key Data---------------------Member-Number (PK1)
------------------Non-Key Data------------------Member-Name
Member-Status
Member-Street-Address
Member-Daytime-Phone-Number
Date-of-Last-Order
Member-Balance-Due
Member-Bonus-Balance-Available
Member-Credit-Card-Information
MEMBER (unnormalized)
---------------------Key Data---------------------Member-Number (PK1)
------------------Non-Key Data------------------Member-Name
Member-Status
Member-Address
Member-Daytime-Phone-Number
Date-of-Last-Order
Member-Balance-Due
Member-Bonus-Balance-Available
Member-Credit-Card-Information
1 { Club-Name } n
1 { Agreement-Number } n
1 { Taste Code } n
1 { Media Preference } n
1 { Date-Enrolled } n
1 { Expiration-Date } n
1 { Number-of-Credits-Required } n
1 { Number of Credits-Earned } n
enrolls in
CORRECTION
CLUB MEMBERSHIP (1NF)
-------------Key Data-------------Member-Number (PK1) (FK)
Club-Name (PK1) (FK)
Agreement-Number (PK1) (FK)
---------Non-Key Data----------Taste Code
Media Preference
Date-Enrolled
Expiration-Date
Number-of-Credits-Required
Number of Credits-Earned
binds
AGREEMENT (1NF)
----------Key Data----------------Club-Name (PK1) (FK)
Agreement-Number (PK1)
--------Non-Key Data------------Agreement-Active-Date
Agreement-Expiration-Date
Obligation-Period
Required-Number-of-Credits
Bonus-Credits-After-Obligation
sponsors
establishes
CLUB (1NF)
------------------Key Data---------------------Club-Name (PK)
--------------Non-Key Data-------------------Club-Description
Club-Charter-Date
19
Data Analysis for Database Design
• Normalization Example
– Second Normal Form:
• The next step of data analysis is to place the entities into
2NF.
– It is assumed that you have already placed all entities into
1NF.
– 2NF looks for an anomaly called a partial dependency,
meaning an attribute(s) whose value is determined by only
part of the primary key.
– Entities that have a single attribute primary key are already
in 2NF.
– Only those entities that have a concatenated key need to be
checked.
20
MEMBER ORDERED PRODUCT (1NF)
---------------Key Data-----------------Member-Number (PK1) (FK)
Product-Number (PK1) (FK)
-------------Non-Key Data------------Ordered-Product-Description
Ordered-Product-Title
Quantity-Ordered
Purchased-Unit-Price
Extended-Price
CORRECTION
MEMBER ORDERED PRODUCT (2NF)
---------------Key Data-----------------Member-Number (PK1) (FK)
Product-Number (PK1) (FK)
-------------Non-Key Data------------Quantity-Ordered
Purchased-Unit-Price
Extended-Price
sold as
PRODUCT (2NF)
------------Key Data---------------Product-Number (PK1)
Universal-Product-Code (PK2)
--------Non-Key Data------------Quantity-in-Stock
Product-Type
Suggested-Retail-Price
Club-Default-Unit-Price
Current-Special-Unit-Price
Current-Month-Units-Sold
Current-Year-Units-Sold
Total-Lifetime-Units-Sold
is a
MERCHANDISE (2NF)
-------------Key Data--------------Product-Number (PK1)
Universal-Product-Code (PK1)
---------Non-Key Data-----------Merchandise-Name
Merchandise-Description
Merchandise-Size
Merchasnise-Color
Unit-of-Measure
TITLE (2NF)
--------------Key Data-------------Product-Number (PK1)
Universal-Product-Code (PK2)
----------Non-Key Data----------Title-of-Work
Title-Cover
Catalog-Description
Copyright-Date
Entertainment-Category
Credit-Value
21
Data Analysis for Database Design
• Normalization Example
– Third Normal Form:
• Entities are assumed to be in 2NF before beginning 3NF
analysis.
• Third normal form analysis looks for two types of problems,
derived data and transitive dependencies.
– In both cases, the fundamental error is that non key attributes are
dependent on other non key attributes.
– Derived attributes are those whose values can either be calculated
from other attributes, or derived through logic from the values of
other attributes.
– A transitive dependency exists when a non-key attribute is
dependent on another non-key attribute (other than by derivation).
– Transitive analysis is only performed on those entities that do not
have a concatenated key.
22
Data Analysis for Database Design
• Normalization Example
– Third Normal Form:
• Third normal form analysis looks for two types of
problems, derived data and transitive dependencies.
(continued)
– A transitive dependency exists when a non-key attribute is
dependent on another non-key attribute (other than by
derivation).
» This error usually indicates that an undiscovered entity
is still embedded within the problem entity.
– Transitive analysis is only performed on those entities that
do not have a concatenated key.
• “An entity is said to be in third normal form if every nonprimary key attribute is dependent on the primary key,
the whole primary key, and nothing but the primary key.”
23
MEMBER ORDERED PRODUCT (2NF)
---------------Key Data-----------------Member-Number (PK1) (FK)
Product-Number (PK1) (FK)
-------------Non-Key Data------------Quantity-Ordered
Purchased-Unit-Price
Extended-Price
CORRECTION
MEMBER ORDERED PRODUCT (3NF)
---------------Key Data-----------------Member-Number (PK1) (FK)
Product-Number (PK1) (FK)
-------------Non-Key Data------------Quantity-Ordered
Purchased-Unit-Price
Extended-Price
24
MEMBER (3NF)
---------------------Key Data---------------------Member-Number (PK1)
------------------Non-Key Data------------------Member-Name
Member-Status
Member-Street-Address
Member-Daytime-Phone-Number
Date-of-Last-Order
Member-Balance-Due
Member-Bonus-Balance-Available
Member-Credit-Card-Information
placed
MEMBER ORDER (2NF)
------------------Key Data--------------------Order-Number (PK)
----------------Non-Key Data----------------Order-Creation-Date
Order-Automatic-Fill-Date
Member Number (FK1)
Member-Name
Member-Address
Shipping-Address
Shipping Instructions
Club-Name (FK2)
Order-Sub-Total-Cost
Order-Sales-Tax
Ship-Via-Method
Shipping-Charge
Order-Status
Prepaid-Amount
CORRECTION
MEMBER ORDER (3NF)
------------------Key Data--------------------Order-Number (PK)
----------------Non-Key Data----------------Order-Creation-Date
Order-Automatic-Fill-Date
Member Number (FK1)
Member-Name
Member-Address
Shipping-Address
Shipping Instructions
Club-Name (FK2)
Order-Sub-Total-Cost
Order-Sales-Tax
Ship-Via-Method
Shipping-Charge
Order-Status
Prepaid-Amount
25
Data Analysis for Database Design
• Simplification by Inspection:
• When several analysts work on a common
application, it is not unusual to create problems
that won’t be taken care of by normalization.
– These problems are best solved through simplification by
inspection, a process wherein a data entity in 3NF is
further simplified by such efforts as addressing subtle
data redundancy.
26
Data Analysis for Database Design
– CASE Support for Normalization:
• Most CASE tools can only normalize to first normal
form.
– They accomplish this in one of two ways.
» They look for many-to-many relationships and
resolve those relationships into associative entities.
» They look for attributes specifically described as
having multiple values for a single entity instance.
• It is exceedingly difficult for a CASE tool to identify
second and third normal form errors.
– That would require the CASE tool to have the intelligence
to recognize partial and transitive dependencies.
27
Database Design
• The Database Schema
– The design of a database is depicted as a special
model called a database schema.
• A database schema is the physical model or blueprint
for a database. It represents the technical
implementation of the logical data model.
• A relational database schema defines the database
structure in terms of tables, keys, indexes, and integrity
rules.
• A database schema specifies details based on the
capabilities, terminology, and constraints of the chosen
database management system.
28
Database Design
• The Database Schema
– Transforming the logical data model into a physical
relational database schema rules and guidelines:
1 Each fundamental, associative, and weak entity is
implemented as a separate table.
– The primary key is identified as such and implemented as an
index into the table.
– Each secondary key is implemented as its own index into the
table.
– Each foreign key will be implemented as such.
– Attributes will be implemented with fields.
» These fields correspond to columns in the table.
29
Database Design
• The Database Schema
– Transforming the logical data model into a physical
relational database schema rules and guidelines:
(continued)
– The following technical details must usually be specified for
each attribute.
» Data type. Each DBMS supports different data types, and terms for
those data types.
» Size of the Field. Different DBMSs express precision of real numbers
differently.
» NULL or NOT NULL. Must the field have a value before the record can
be committed to storage?
» Domains. Many DBMSs can automatically edit data to ensure that
fields contain legal data.
» Default. Many DBMSs allow a default value to be automatically set in
the event that a user or programmer submits a record without a value.
30
Database Design
• The Database Schema
– Transforming the logical data model into a physical
relational database schema rules and guidelines:
(continued)
2 Supertype/subtype entities present additional options as
follows:
– Most CASE tools do not currently support object-like
constructs such as supertypes and subtypes.
– Most CASE tools default to creating a separate table for each
entity supertype and subtype.
– If the subtypes are of similar size and data content, a
database administrator may elect to collapse the subtypes
into the supertype to create a single table.
3 Evaluate and specify referential integrity constraints.
31
Database Design
• Data and Referential Integrity
– There are at least three types of data integrity that
must be designed into any database - key integrity,
domain integrity and referential integrity.
– Key Integrity:
• Every table should have a primary key (which may be
concatenated).
– The primary key must be controlled such that no two records
in the table have the same primary key value.
– The primary key for a record must never be allowed to have
a NULL value.
32
Database Design
• Data and Referential Integrity
– Domain Integrity:
• Appropriate controls must be designed to ensure
that no field takes on a value that is outside of the
range of legal values.
– Referential Integrity:
• A referential integrity error exists when a foreign
key value in one table has no matching primary key
value in the related table.
33
Database Design
• Data and Referential Integrity
– Referential Integrity:
• Referential integrity is specified in the form of deletion
rules as follows:
– No restriction.
» Any record in the table may be deleted without regard
to any records in any other tables.
– Delete:Cascade.
» A deletion of a record in the table must be automatically
followed by the deletion of matching records in a
related table.
– Delete:Restrict.
» A deletion of a record in the table must be disallowed
until any matching records are deleted from a related
table.
34
Database Design
• Data and Referential Integrity
– Referential Integrity:
• Referential integrity is specified in the form of
deletion rules as follows: (continued)
– Delete:Set Null.
» A deletion of a record in the table must be
automatically followed by setting any matching keys
in a related table to the value NULL.
35
Database Design
• Roles
– Some database shops insist that no two fields
have exactly the same name.
• This presents an obvious problem with foreign keys
– A role name is an alternate name for a foreign
key that clearly distinguishes the purpose that
the foreign key serves in the table.
– The decision to require role names or not is
usually established by the data or database
administrator.
36
Download