Normal Forms

advertisement
Introduction to
Database Design
July 2006
Ken Nunes
knunes @ sdsc.edu
SAN DIEGO SUPERCOMPUTER CENTER
Database Design Agenda
•Introductions
•General Design Considerations
•Entity-Relationship Model
•Normalization
•Overview of SQL
•Star Schemas
•Additional Information
•Q&A
SAN DIEGO SUPERCOMPUTER CENTER
General Design Considerations
•Users
•Application Requirements
•Legacy Systems/Data
SAN DIEGO SUPERCOMPUTER CENTER
Users
•Who are they?
•Administrative
•Scientific
•Technical
•Impact
•Access Controls
•Interfaces
•Service levels
SAN DIEGO SUPERCOMPUTER CENTER
Application Requirements
•What kind of database?
•OnLine Analytical Processing (OLAP)
•OnLine Transactional Processing (OLTP)
•Budget
•Platform / Vendor
•Workflow?
•order of operations
•error handling
•reporting
SAN DIEGO SUPERCOMPUTER CENTER
Legacy Systems/Data
•What systems are currently in place?
•Where does the data come from?
•How is it generated?
•What format is it in?
•What is the data used for?
•Which parts of the system must remain static?
SAN DIEGO SUPERCOMPUTER CENTER
Entity - Relationship Model
A logical design method which emphasizes
simplicity and readability.
•Basic objects of the model are:
•Entities
•Relationships
•Attributes
SAN DIEGO SUPERCOMPUTER CENTER
Entities
Data objects detailed by the information in the
database.
•Denoted by rectangles in the model.
Employee
SAN DIEGO SUPERCOMPUTER CENTER
Department
Attributes
Characteristics of entities or relationships.
•Denoted by ellipses in the model.
Employee
Name
SSN
SAN DIEGO SUPERCOMPUTER CENTER
Department
Name
Budget
Relationships
Represent associations between entities.
•Denoted by diamonds in the model.
Employee
Name
SSN
works in
Start date
SAN DIEGO SUPERCOMPUTER CENTER
Department
Name
Budget
Relationship Connectivity
Constraints on the mapping of the associated
entities in the relationship.
•Denoted by variables between the related entities.
•Generally, values for connectivity are expressed as “one” or
“many”
Employee
Name
SSN
N
work
Start date
SAN DIEGO SUPERCOMPUTER CENTER
1
Department
Name
Budget
Connectivity
one-to-one
Department
1
has
1
Manager
N
Project
N
Project
one-to-many
Department
1
has
many-to-many
Employee
M
works on
SAN DIEGO SUPERCOMPUTER CENTER
ER example
Retailer wants to create an online webstore.
•The retailer requires information on:
•Customers
•Items
•Orders
SAN DIEGO SUPERCOMPUTER CENTER
Webstore Entities & Attributes
•Customers - name, credit card, address
•Items - name, price, inventory
•Orders - item, quantity, cost, date, status
Name
credit card
Customers
name
address
price
date
cost
Orders
Items
inventory
SAN DIEGO SUPERCOMPUTER CENTER
status
item
quantity
Webstore Relationships
Identify the relationships.
•The orders are recorded each time a customer
purchases items, so the customer and order
entities are related.
•Each customer may make several purchases so the relationship
is one-to-many
Customer
1
N
purchase
SAN DIEGO SUPERCOMPUTER CENTER
Order
Webstore Relationships
Identify the relationships.
•The order consists of the items a customer
purchases but each item can be found in multiple
orders.
•Since a customer can purchase multiple items and make multiple
orders the relationship is many to many.
Order
M
N
consists
SAN DIEGO SUPERCOMPUTER CENTER
Item
Webstore ER Diagram
name
credit card
address
Customers
1
status
purchase
date
N
Orders
item
quantity
Items
consists
M
cost
SAN DIEGO SUPERCOMPUTER CENTER
N
name
price
inventory
Logical Design to Physical Design
Creating relational SQL schemas from entityrelationship models.
•Transform each entity into a table with the key and its
attributes.
•Transform each relationship as either a relationship
table (many-to-many) or a “foreign key” (one-to-many
and many-to-many).
SAN DIEGO SUPERCOMPUTER CENTER
Entity tables
Transform each entity into a table with a key and
its attributes.
Employee
Name
create table employee
(emp_no number,
name varchar2(256),
ssn number,
primary key (emp_no));
SSN
SAN DIEGO SUPERCOMPUTER CENTER
Foreign Keys
Transform each one-to-one or one-to-many relationship
as a “foreign key”.
•Foreign key is a reference in the child (many) table to the primary
key of the parent (one) table.
Department
1
has
N
Employee
create table department
(dept_no number,
name varchar2(50),
primary key (dept_no));
create table employee
(emp_no number,
dept_no number,
name varchar2(256),
ssn number,
primary key (emp_no),
foreign key (dept_no) references department);
SAN DIEGO SUPERCOMPUTER CENTER
Foreign Key
Department
dept_no
1
2
3
Accounting has 1 employee:
Name
Accounting
Human Resources
IT
Employee
emp_no
1
2
3
4
5
6
dept_no
2
3
2
1
3
3
Name
Nora Edwards
Ajay Patel
Ben Smith
Brian Burnett
John O'Leary
Julia Lenin
SAN DIEGO SUPERCOMPUTER CENTER
Brian Burnett
Human Resources has 2 employees:
Nora Edwards
Ben Smith
IT has 3 employees:
Ajay Patel
John O’Leary
Julia Lenin
Many-to-Many tables
Transform each many-to-many relationship as a table.
•The relationship table will contain the foreign keys to the related
entities as well as any relationship attributes.
Project
N
Start date
has
create table project_employee_details
(proj_no number,
emp_no number,
start_date date,
primary key (proj_no, emp_no),
foreign key (proj_no) references project
foreign key (emp_no) references employee);
M
Employee
SAN DIEGO SUPERCOMPUTER CENTER
Many-to-Many tables
Project
proj_no
1
2
3
Project_employee_details
Name
Employee Audit
Budget
Intranet
proj_no
1
3
3
2
3
2
Employee
emp_no
1
2
3
4
5
6
dept_no
2
3
2
1
3
3
Name
Nora Edwards
Ajay Patel
Ben Smith
Brian Burnett
John O'Leary
Julia Lenin
SAN DIEGO SUPERCOMPUTER CENTER
emp_no
4
6
5
6
2
1
start_date
4/7/03
8/12/02
3/4/01
11/11/02
12/2/03
7/21/04
Employee Audit has 1 employee:
Brian Burnett
Budget has 2 employees:
Julia Lenin
Nora Edwards
Intranet has 3 employees:
Julia Lenin
John O’Leary
Ajay Patel
Normalization
A logical design method which minimizes data
redundancy and reduces design flaws.
•Consists of applying various “normal” forms to
the database design.
•The normal forms break down large tables into
smaller subsets.
SAN DIEGO SUPERCOMPUTER CENTER
First Normal Form (1NF)
Each attribute must be atomic
• No repeating columns within a row.
• No multi-valued columns.
1NF simplifies attributes
• Queries become easier.
SAN DIEGO SUPERCOMPUTER CENTER
1NF
Employee (unnormalized)
emp_no
1
2
3
name
Kevin Jacobs
Barbara Jones
Jake Rivera
dept_no
201
224
201
dept_name
R&D
IT
R&D
skills
C, Perl, Java
Linux, Mac
DB2, Oracle, Java
Employee (1NF)
emp_no
1
1
1
2
2
3
3
3
name
Kevin Jacobs
Kevin Jacobs
Kevin Jacobs
Barbara Jones
Barbara Jones
Jake Rivera
Jake Rivera
Jake Rivera
dept_no
201
201
201
224
224
201
201
201
SAN DIEGO SUPERCOMPUTER CENTER
dept_name
R&D
R&D
R&D
IT
IT
R&D
R&D
R&D
skills
C
Perl
Java
Linux
Mac
DB2
Oracle
Java
Second Normal Form (2NF)
Each attribute must be functionally dependent on
the primary key.
• Functional dependence - the property of one or more
attributes that uniquely determines the value of other
attributes.
• Any non-dependent attributes are moved into a
smaller (subset) table.
2NF improves data integrity.
• Prevents update, insert, and delete anomalies.
SAN DIEGO SUPERCOMPUTER CENTER
Functional Dependence
Employee (1NF)
emp_no
1
1
1
2
2
3
3
3
name
Kevin Jacobs
Kevin Jacobs
Kevin Jacobs
Barbara Jones
Barbara Jones
Jake Rivera
Jake Rivera
Jake Rivera
dept_no
201
201
201
224
224
201
201
201
dept_name
R&D
R&D
R&D
IT
IT
R&D
R&D
R&D
skills
C
Perl
Java
Linux
Mac
DB2
Oracle
Java
Name, dept_no, and dept_name are functionally dependent on
emp_no. (emp_no -> name, dept_no, dept_name)
Skills is not functionally dependent on emp_no since it is not unique
to each emp_no.
SAN DIEGO SUPERCOMPUTER CENTER
2NF
Employee (1NF)
emp_no
1
1
1
2
2
3
3
3
name
Kevin Jacobs
Kevin Jacobs
Kevin Jacobs
Barbara Jones
Barbara Jones
Jake Rivera
Jake Rivera
Jake Rivera
dept_no
201
201
201
224
224
201
201
201
dept_name
R&D
R&D
R&D
IT
IT
R&D
R&D
R&D
Employee (2NF)
emp_no
1
2
3
name
Kevin Jacobs
Barbara Jones
Jake Rivera
dept_no
201
224
201
dept_name
R&D
IT
R&D
SAN DIEGO SUPERCOMPUTER CENTER
skills
C
Perl
Java
Linux
Mac
DB2
Oracle
Java
Skills (2NF)
emp_no
1
1
1
2
2
3
3
3
skills
C
Perl
Java
Linux
Mac
DB2
Oracle
Java
Data Integrity
Employee (1NF)
emp_no
1
1
1
2
2
3
3
3
name
Kevin Jacobs
Kevin Jacobs
Kevin Jacobs
Barbara Jones
Barbara Jones
Jake Rivera
Jake Rivera
Jake Rivera
dept_no
201
201
201
224
224
201
201
201
dept_name
R&D
R&D
R&D
IT
IT
R&D
R&D
R&D
skills
C
Perl
Java
Linux
Mac
DB2
Oracle
Java
• Insert Anomaly - adding null values. eg, inserting a new department does not
require the primary key of emp_no to be added.
• Update Anomaly - multiple updates for a single name change, causes
performance degradation. eg, changing IT dept_name to IS
• Delete Anomaly - deleting wanted information. eg, deleting the IT department
removes employee Barbara Jones from the database
SAN DIEGO SUPERCOMPUTER CENTER
Third Normal Form (3NF)
Remove transitive dependencies.
• Transitive dependence - two separate entities exist
within one table.
• Any transitive dependencies are moved into a smaller
(subset) table.
3NF further improves data integrity.
• Prevents update, insert, and delete anomalies.
SAN DIEGO SUPERCOMPUTER CENTER
Transitive Dependence
Employee (2NF)
emp_no
1
2
3
name
Kevin Jacobs
Barbara Jones
Jake Rivera
dept_no
201
224
201
dept_name
R&D
IT
R&D
Dept_no and dept_name are functionally dependent on
emp_no however, department can be considered a
separate entity.
SAN DIEGO SUPERCOMPUTER CENTER
3NF
Employee (2NF)
emp_no
1
2
3
name
Kevin Jacobs
Barbara Jones
Jake Rivera
Employee (3NF)
emp_no
1
2
3
name
Kevin Jacobs
Barbara Jones
Jake Rivera
dept_no
201
224
201
SAN DIEGO SUPERCOMPUTER CENTER
dept_no
201
224
201
dept_name
R&D
IT
R&D
Department (3NF)
dept_no dept_name
201
R&D
224
IT
Other Normal Forms
Boyce-Codd Normal Form (BCNF)
• Strengthens 3NF by requiring the keys in the
functional dependencies to be superkeys (a column or
columns that uniquely identify a row)
Fourth Normal Form (4NF)
• Eliminate trivial multivalued dependencies.
Fifth Normal Form (5NF)
• Eliminate dependencies not determined by keys.
SAN DIEGO SUPERCOMPUTER CENTER
Normalizing our webstore (1NF)
orders
order_id
405
405
405
408
410
410
cust_id
45
45
45
78
102
102
item_id
34
35
56
56
72
81
quantity
2
1
3
2
2
1
items
cost
100
50
75
50
150
175
date
2/306
2/306
2/306
3/5/06
3/10/06
3/10/06
status
shipped
shipped
shipped
refunded
shipped
shipped
item_id
34
35
56
72
81
name
sweater red
sweater blue
t-shirt
jeans
jacket
price
50
50
25
75
175
inventory
21
10
76
5
9
customers
cust_id
45
45
45
78
102
102
name
Mike Speedy
Mike Speedy
Mike Speedy
Frank Newmon
Joe Powers
Joe Powers
address
123 A St.
123 A St.
123 A St.
2 Main St.
343 Blue Blvd.
343 Blue Blvd.
SAN DIEGO SUPERCOMPUTER CENTER
credit_card_num
45154
32499
12834
45698
94065
10532
credit_card_type
visa
mastercard
discover
visa
mastercard
discover
Normalizing our webstore (2NF & 3NF)
customers
cust_id
45
78
102
name
Mike Speedy
Frank Newmon
Joe Powers
address
123 A St.
2 Main St.
343 Blue Blvd.
SAN DIEGO SUPERCOMPUTER CENTER
credit_cards
cust_id
45
45
45
78
102
102
num
45154
32499
12834
45698
94065
10532
type
visa
mastercard
discover
visa
mastercard
discover
Normalizing our webstore (2NF & 3NF)
items
item_id
34
35
56
72
81
name
sweater red
sweater blue
t-shirt
jeans
jacket
cust_id
45
78
102
inventory
21
10
76
5
9
order details
orders
order_id
405
408
410
price
50
50
25
75
175
date
2/306
3/5/06
3/10/06
status
shipped
refunded
shipped
order_id
405
405
405
408
410
410
SAN DIEGO SUPERCOMPUTER CENTER
item_id
34
35
56
56
72
81
quantity
2
1
3
2
2
1
cost
100
50
75
50
150
175
Revisit webstore ER diagram
address
Customers
name
Credit card
have
1
N
1
card type
purchase
status
date
N
Orders
1
name
price
consists
quantity
cost
card number
N
Order details
M
SAN DIEGO SUPERCOMPUTER CENTER
consists
N
Items
inventory
Structured Query Language
SQL is the standard language for data definition
and data manipulation for relational database
systems.
• Nonprocedural
• Universal
SAN DIEGO SUPERCOMPUTER CENTER
Data Definition Language
The aspect of SQL that defines and manipulates
objects in a database.
• create tables
• alter tables
• drop tables
• create views
SAN DIEGO SUPERCOMPUTER CENTER
Create Table
address
create table customer
(cust_id number,
name varchar(50) not null,
address varchar(256) not null,
primary key (cust_id));
name
Customer
1
have
create table credit_card
(cust_id number not null,
credit_card_type char(5) not null,
credit_card_num number not null,
foreign key (cust_id) references customer);
N
Credit card
card type
SAN DIEGO SUPERCOMPUTER CENTER
card number
Modifying Tables
alter table customer modify name varchar(256);
alter table customer add credit_limit number;
drop table customer;
SAN DIEGO SUPERCOMPUTER CENTER
Data Manipulation Language
The aspect of SQL used to manipulate the data in
a database.
• queries
• updates
• inserts
• deletes
SAN DIEGO SUPERCOMPUTER CENTER
Data Manipulation Language
The aspect of SQL used to manipulate the data in
a database.
• queries
• updates
• inserts
• deletes
SAN DIEGO SUPERCOMPUTER CENTER
Select command
Used to query data from database tables.
• Format:
Select <columns> From <table>
Where <condition>;
SAN DIEGO SUPERCOMPUTER CENTER
Query example
customers
cust_id
45
78
102
name
Mike Speedy
Frank Newmon
Joe Powers
address
123 A St.
2 Main St.
343 Blue Blvd.
Select name from customers;
result:
Mike Speedy
Frank Newmon
Joe Powers
SAN DIEGO SUPERCOMPUTER CENTER
Query example
customers
cust_id
45
78
102
name
Mike Speedy
Frank Newmon
Joe Powers
address
123 A St.
2 Main St.
343 Blue Blvd.
select name from customers
where address = ‘123 A St.’;
result:
Mike Speedy
SAN DIEGO SUPERCOMPUTER CENTER
Query example
customers
cust_id
45
78
102
name
Mike Speedy
Frank Newmon
Joe Powers
credit_cards
address
123 A St.
2 Main St.
343 Blue Blvd.
cust_id
45
45
45
78
102
102
num
45154
32499
12834
45698
94065
10532
select * from customers
where customers.cust_id = credit_cards.cust_id
and type = ‘visa’;
returns:
Cust_id
Name
Address
Cust_id
Num
type
45
Mike Speedy
123 A St.
45
45154
visa
78
Frank Newmon
2 Main St.
78
45698
visa
SAN DIEGO SUPERCOMPUTER CENTER
type
visa
mastercard
discover
visa
mastercard
discover
Changing Data
There are 3 commands that change data in a
table.
Insert:
insert into <table> (<columns>) values (<values>);
insert into customer (cust_id, name) values (3, ‘Fred Flintstone’);
Update:
update <table> set <column> = <value> where <condition>;
update customer set name = ‘Mark Speedy’ where cust_id = 45;
Delete:
delete from <table> where <condition>;
delete from customer where cust_id = 45;
SAN DIEGO SUPERCOMPUTER CENTER
Star Schemas
Designed for data retrieval
• Best for use in decision support tasks such as Data
Warehouses and Data Marts.
• Denormalized - allows for faster querying due to less
joins.
• Slow performance for insert, delete, and update
transactions.
• Comprised of two types tables: facts and dimensions.
SAN DIEGO SUPERCOMPUTER CENTER
Fact Table
The main table in a star schema is the Fact table.
• Contains groupings of measures of an event to be
analyzed.
•Measure - numeric data
Invoice Facts
units sold
unit amount
total sale price
SAN DIEGO SUPERCOMPUTER CENTER
Dimension Table
Dimension tables are groupings of descriptors
and measures of the fact.
•descriptor - non-numeric data
Customer Dimension
cust_dim_key
name
address
phone
Location Dimension
loc_dim_key
store number
store address
store phone
SAN DIEGO SUPERCOMPUTER CENTER
Time Dimension
time_dim_key
invoice date
due date
delivered date
Product Dimension
prod_dim_key
product
price
cost
Star Schema
The fact table forms a one to many relationship with each
dimension table.
Customer Dimension
1
cust_dim_key
name
address
phone
N
Location Dimension
loc_dim_key
store number
store address
store phone
N
Invoice Facts
cust_dim_key
loc_dim_key
time_dim_key
prod_dim_key
units sold
unit amount
total sale price
1
SAN DIEGO SUPERCOMPUTER CENTER
1
N
Time Dimension
time_dim_key
invoice date
due date
delivered date
Product Dimension
N
prod_dim_key
product
price
1 cost
Analyzing the webstore
The manager needs to analyze the orders
obtained from the webstore.
• From this we will use the order table to create our fact
table.
Order Facts
date
items
customers
SAN DIEGO SUPERCOMPUTER CENTER
Webstore Dimension
We have 2 dimensions for the schema:
customers and items.
Customer Dimension
cust_dim_key
name
address
credit_card_type
SAN DIEGO SUPERCOMPUTER CENTER
Item Dimension
item_dim_key
name
price
inventory
Webstore Star Schema
Order Facts
date
items
customers
N
N
1
1
Customer Dimension
cust_dim_key
name
address
credit_card_type
SAN DIEGO SUPERCOMPUTER CENTER
Item Dimension
item_dim_key
name
price
inventory
Books and Reference
•Database Design for Mere Mortals,
Michael J. Hernandez
•Information Modeling and Relational Databases,
Terry Halpin
•Database Modeling and Design,
Toby J. Teorey
SAN DIEGO SUPERCOMPUTER CENTER
Continuing Education
UCSD Extension
Data Management Courses
DBA Certificate Program
Database Application Developer Certificate Program
SAN DIEGO SUPERCOMPUTER CENTER
Data Central
The Data Services Group provides Data Allocations for
the research community.
• http://datacentral.sdsc.edu/
•Tools and expertise for making data collections
available to the broader scientific community.
•Provide disk, tape, and database storage resources.
SAN DIEGO SUPERCOMPUTER CENTER
Download