More on the Relational Database Model relational database design:

advertisement
More on the Relational Database Model
• We have presented details of four basic steps involved in
relational database design:
–
–
–
–
Identification of entities
Identification of candidate key fields and actual key fields
Creation of entity-relationship diagram
Design of relational database
• We will examine the four basic steps above in more detail,
focusing on:
–
–
–
–
–
Rules for transactional database design
Ensuring integrity of data in tables
Basic operations for combining/analyzing data in tables
Defining indexes
Cleaning dirty data
Monday, July 11, 2016
MIS 90-728 Lecture Notes
1
Database Design for Transactions
• The central role of most databases is to record transactions:
– Cars are towed;
– Students are assigned to classes;
– Clients in a social services office have their needs assessed.
• These entities are useful only if there are other entities that
explain:
– Who provides the service;
– Who receives the service;
– How the service is characterized.
• In addition, a linking entity set can be created to handle a
many-to-many relationship:
– Multiple students can attend multiple classes;
– Multiple cars can undergo multiple inspection processes.
Monday, July 11, 2016
MIS 90-728 Lecture Notes
2
Products-in-Columns Form
• A particular object is used in a fixed (small) number of
transactions
– Example: car inspection
Inspection data
Inspection
Date
8/15/95
8/06/96
8/08/97
...
VIN
23878110
23878110
82736618
...
1
E-R Diagram
Garage
12A4
12A4
12A4
...
Brakes
P
P
P
...
VEHICLE
Steering
P
P
P
...
M
P
P
P
M
1
INSPECTION
(undergoes)
Relational
Database Design
Lights
GARAGE
(performs)
GARAGE (Garage ID, Garage Name, ...)
VEHICLE (VIN, Make, Year, ...)
INSPECTION (Inspection Number, VIN@, Garage ID@, Inspection Date,
Brakes, Lights, Steering)
Recall that one-to-many relationships are implemented in relational
databases as foreign keys
Monday, July 11, 2016
MIS 90-728 Lecture Notes
3
Classical Linking Table Form

A particular object is used in a variable (large) number of
transactions
– Example: a storesells 5,000 different products (SKUs).
Can we use products-in-columns form?
M
PRODUCT
M
1
M
SALE
(used in)
M
(rings up)
(requests)
CUSTOMER
1
SALESPERSON
Problems with a many-to-many relationship:
• Data redundancy:
• System efficiency and output errors:
Monday, July 11, 2016
MIS 90-728 Lecture Notes
4
Products-in-Rows Form
• Devise a “bridge” (linking) entity that replaces one many-tomany relationship with two one-to-many relationships.
– Example: create an SPLINK table with attributes based only on
ProductID and SalesID
E-R
Diagram
PRODUCT
(appears in)
1
M
(conducts)
M
SPLINK
Relational
Database
Design
SALE
(appears in)
(requests)
CUSTOMER
1
SALESPERSON
SALESPERSON (Employee ID, Employee First Name, Employee Last Name, ...)
CUSTOMER (Customer ID, Customer First Name, Customer Last Name, ...)
PRODUCT CODE (Product Code, Product Description)
SALE (Sale ID, Transaction Date, Customer ID@, Employee ID@)
SPLINK (Sale ID@, Product Code@ Quantity, Price)
tblSPLINK
Monday, July 11, 2016
SaleID
30012
30012
30013
30013
ProductID
A
C
A
B
Quantity
12
24
100
150
MIS 90-728 Lecture Notes
UnitPrice
32.95
99.95
32.95
59.75
5
One-to-One Relationships
• A one-to-one relationship occurs when an instance of one
attribute is associated with one (or possibly zero) instances of
another attribute.
– For example, a TEACHER entity can be associated with one instance
of the OFFICE entity and vice-versa
TEACHER
1
1
OFFICE
Why might one define distinct entities in a one-to-one relationship?
Monday, July 11, 2016
MIS 90-728 Lecture Notes
6
Generalization Hierarchies
• A generalization hierarchy occurs if one entity is uniquely
associated with two or more attributes, and that relationship
can be characterized as two or more one-to-one entity
relationships
– For example, a TEACHER entity could be linked with the entities
ASSIGNED_PC, ASSIGNED_MAC and ASSIGNED_WORKST
1
TEACHER 1
1
G
1
ASSIGNED_PC
Monday, July 11, 2016
G
1
ASSIGNED_MAC
1
ASSIGNED_WORKST
MIS 90-728 Lecture Notes
7
Properties of Keys and Data Integrity
Recall the definition of a primary key:
In addition, a primary key must:
– Take non-null values at all times
– Determine all non-key attributes
These properties represent entity integrity for the RDBM.
There are a variety of other keys that can appear in RDBMs:
– Superkey:
– Candidate key:
– Secondary key:
A foreign key field in a given table must either:
– be null (no value has yet been assigned), or
– take a value equal to one of the key values in the related table
This property represents referential integrity for the RDBM.
Monday, July 11, 2016
MIS 90-728 Lecture Notes
8
Code Attributes
• Codes are a way of classifying attribute values that ensure
uniformity and consistency
– Example: consider an attribute of the STUDENT entity called ST_Dept
Without codes, the department “Electrical and Computer Engineering”
could be recorded as: “Elec./Comp. Eng.”, “ECE”, “E&CE”, etc.
• Codes reduce key entry errors, inconsistency and obsolescence.
• Three types of code implementation schemes:
– Self-documenting:
– Cryptic codes and descriptions:
– All codes in a single table:
Monday, July 11, 2016
MIS 90-728 Lecture Notes
9
Relational Database Table Operations
A number of operations that can be performed on one or more
tables form the foundation for queries:
• Union
• Intersect
• Difference
• Product
• Select
• Project
Monday, July 11, 2016
MIS 90-728 Lecture Notes
10
Relational Database Table Operations
• Join: combine information from two or more tables linked
by common attributes
– Example:
StudentID
393452297
495772944
190385723
Tract
8034
8035
8037
Monday, July 11, 2016
Grade
8
8
9
Score
95
85
90
Tract
8031
8034
8035
8036
8037
8040
MedFamInc
52300
27300
32457
39080
29032
15900
MIS 90-728 Lecture Notes
AvgFamSize
1.6
2.5
2.2
3.6
3.9
2.7
11
When is Data Redundancy a Good Thing?
• Generally, aside from foreign keys, there should be as little
repetition of entity attributes between tables as possible.
• Sometimes, however, it is advantageous to have certain fields
appear in more than one table as non-FK fields:
Customer
Cus_Code
Cos_LName
Cus_Fname
Cus_Initial
Cus_AreaCode
Cus_Phone
•
•
1
Invoice
 Inv_Number
Cus_Code
Inv_Date
1

Line
Inv_Number
Line_Number
Prod_Code
Line_Units
Line_Price
1

Product
Prod_Code
Prod_Description
Prod_Price
Prod_OnHand
Line_Price represents the price of the item at the time it was
purchased;
Line_Number allows reporting of purchases in the order they
were actually made
Monday, July 11, 2016
MIS 90-728 Lecture Notes
12
Indexes and Data Retrieval
Primary key fields distinguish every table row (entity) from each other, enabling
easy lookups of particular rows by that primary key value.
But what if we wish to identify particular rows of a table quickly through nonprimary key values?
For example, we wish to identify all invoices that have been created on a certain
date, but do not wish to examine each row of the table to find a date that
matches the one in mind.
Define a table index, or “pointer” to rows of the table based on
values of the designated attribute.
P_Num
(Index key)
123
126
Pointers to
Painting Table
1,2,4
3,5
Multiple indexes for a table are defined easily in Access
Monday, July 11, 2016
MIS 90-728 Lecture Notes
13
Data Cleaning
• Database development often entails input of data from
multiple, unreliable sources:
–
–
–
–
mainframe databases
spreadsheets
paper files
verbal descriptions
• Foreign data must be preprocessed (“cleaned”) before the
database may be updated:
–
–
–
–
maintain consistent lists of codes
identify duplicate records
identify records common to two or more tables
update primary keys via missing foreign key values
Data cleaning may be the most important task in database
implementation
Monday, July 11, 2016
MIS 90-728 Lecture Notes
14
A Note on Names
• Entities:
– Names should be simple and descriptive and capitalized for clarity, e.g.
EMPLOYEE, JOB CLASSIFICATION, etc.
– Attribute names should be title case and correspond to natural language,
e.g. First Name, Rank, Salary
• Tables
– Names can match the names of corresponding entities in title case, e.g.
Employee
– OR follow so-called “Hungarian convention”: “tbl” preceding name,
e.g. tblEmployee, tblJobClassification
• Fields
– Field names may be identical in format to Attribute names
– OR be linked explicitly to table name, e.g. Emp_Fname, Emp_LName
Most RDBMs in practice use Hungarian convention and field
names linked to table names. Whatever you do, be consistent!
Monday, July 11, 2016
MIS 90-728 Lecture Notes
15
Bringing It All Together in a Commercial
RDBM Package
• Good relational database software packages allows the
user to:
– Record and display the design of every table, including field
names, descriptions, types, ranges, key fields, foreign keys and
indexes in a data dictionary;
– Record and display relationships between tables;
– Support at least the fundamental relational functions SELECT,
PROJECT and JOIN;
Packages such as Microsoft Access, FoxPro and Oracle
support this functionality
Monday, July 11, 2016
MIS 90-728 Lecture Notes
16
Download