Week 11 November 7 • Data Normalization and ERD

advertisement
1
Week 11
November 7
• Data Normalization and ERD
• Conceptual, Logical and Physical Database Design
R. Ching, Ph.D. • MIS • California State University, Sacramento
Data Normalization
2
• The purpose of normalization is to produce a stable set of
relations that is a faithful model of the operations of the
enterprise.
– Achieve a design that is highly flexible
– Reduce redundancy
– Ensure that the design is free of certain update,
insertion and deletion anomalies
Catherine Richardo, 1990
R. Ching, Ph.D. • MIS • California State University, Sacramento
Normalization
3
1NF
Flat file
2NF
Partial dependencies removed
3NF
Transitive dependencies removed
BCNF
Every determinant is a candidate key
4NF
Non-tivial multi-valued dependencies
removed
R. Ching, Ph.D. • MIS • California State University, Sacramento
Order No.
Date:
Stereos To Go
6 / 15 / 99
Address:
Invoice
John Smith
2036-26 Street
Sacramento
CA 95819
City
Date Shipped:
Item
Number
Product
Code
State
0000 000 0000 0
John Smith
1/05
Zip Code
6 / 18 / 99
Product Description/Manufacturer
1
SAGX730
Pioneer Remote A/V Receiver
2
AT10
CDPC725
Cervwin Vega Loudspeakers
Sony Disc-Jockey CD Changer
3
Go, Hogs
Stereos To Go
0000-000-0000-0
Account No.
Customer:
10001
Qty
Price
1 56995
1 35995
1 39995
4
5
Subtotal
Shipping & Handling
Sales Tax
Total
R. Ching, Ph.D. • MIS • California State University, Sacramento
132985
10000
10306
153291
4
Unnormalized Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account
Cust_name Cust_addr Cust_city Cust_state Zip_code,
Item1 Item1_descrip Item1_qty Item1_price,
Item2 Item2_descrip Item2_qty Item2_price, . . . ,
Item7 Item7_descrip Item7_qty Item7_price)
How would a program process the data to recreate the invoice?
R. Ching, Ph.D. • MIS • California State University, Sacramento
5
Unnormalized to 1NF
6
(Invoice_number, Invoice_date, Date_delivered, Cust_account
Cust_name Cust_addr Cust_city Cust_state Zip_code,
Item1, Item1_descrip, Item1_qty, Item1_price,
Item2, Item2_descrip, Item2_qty, Item2_price, . . . , Repeating groups
Item7, Item7_descrip, Item7_qty, Item7_price)
A flat file places all the data of a transaction into a single record.
This is reminiscent of a COBOL or BASIC program
processing a single transaction with one read statement.
R. Ching, Ph.D. • MIS • California State University, Sacramento
Unnormalized to 1NF
7
(Invoice_number, Invoice_date, Date_delivered, Cust_account,
Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,
Item, Item_descrip, Item_qty, Item_price)
Nominated group of attributes
to serve as the key
(form a unique combination)
• Eliminate the repeating groups.
• Each row retains data for one item.
• If a person bought 5 items, we
would have five tuples
R. Ching, Ph.D. • MIS • California State University, Sacramento
1NF
8
Flat File
Item
Description
Item Item
Quantity Price
10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec
10001 123456 John Smith •••
AT10
1
569.95
Cerwin Vega Loudspeakers 1
359.95
10001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD
1
399.95
10001 123456 John Smith ••• S/H
Shipping
1
100.00
10001 123456 John Smith ••• Tax
Sales Tax
1
103.06
R. Ching, Ph.D. • MIS • California State University, Sacramento
From 1NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code,
Item, Item_descrip, Item_qty, Item_price)
Functional dependencies and determinants
Example: item_descrip is functionally dependent on item,
such that item is the determinant of item_descript.
R. Ching, Ph.D. • MIS • California State University, Sacramento
9
From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code)
(Item, Item_descrip, Item_qty, Item_price)
Is this unique by itself?
What happens if the item is purchased more than once?
R. Ching, Ph.D. • MIS • California State University, Sacramento
10
From 1NF to 2NF
11
(Invoice_number, Invoice_date, Date_delivered,
Cust_account, Cust_name, Cust_addr, Cust_city,
Cust_state, Zip_code)
Partial dependency
(Invoice_number, Item, Item_descrip, Item_qty, Item_price)
Composite key (forms a unique combination)
R. Ching, Ph.D. • MIS • California State University, Sacramento
From 1NF to 2NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account,
Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)
(Invoice_number, Item, Item_qty, Item_price)
(Item, Item_descrip)
R. Ching, Ph.D. • MIS • California State University, Sacramento
12
From 2NF to 3NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account,
Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)
(Invoice_number, Item, Item_qty, Item_price)
(Item, Item_descrip)
Which attributes are dependent on others?
Is there a problem?
R. Ching, Ph.D. • MIS • California State University, Sacramento
13
Transitive Dependencies and Anomalies
• Insertion anomalies
– To add a new row, all customer (name, address, city,
state, zip code, phone) and products (description) must
be consistent with previous entries
• Deletion anomalies
– By deleting a row, a customer or product may cease to
exist
• Modification anomalies
– To modify a customer’s or product’s data in one row, all
modifications must be carried out to all others
R. Ching, Ph.D. • MIS • California State University, Sacramento
14
Insertion and Modification Anomalies
For example…
15
Insert a new Panasonic product
Product_code Manufacturer_name
DVD-A110
PV-4210
PV-4250
Panasonic
Panasonic
Panasonic
CT-32S35
PAN
Inconsistency
DVD-A110
PV-4210
PV-4250
CT-32S35
Panasonic
PanaSonic
Pana Sonic
PAN
R. Ching, Ph.D. • MIS • California State University, Sacramento
Change all Panasonic
products’ manufacturer
name to “Panasonic USA”
Deletion Anomaly
For Example…
4377182
4398711
4578461
4873179
John Smith
Arnold S
Gray Davis
Lisa Carr
16




Sacramento
Davis
Sacramento
Reno
CA
CA
CA
NV
95831
95691
95831
89557
By deleting customer Arnold S, we would also be deleting
Davis, California.
R. Ching, Ph.D. • MIS • California State University, Sacramento
Transitive
Dependencies
 A condition where A, B, C
are attributes of a relation
such that if A  B and
B  C, then C is transitively
dependent on A via B
(provided that A is not
functionally dependent on B
or C).
Invoice_number
Invoice_date
Date_delivered
Cust_account
Cust_name
Cust_addr
Cust_city
Cust_state
Zip_code
Item
Item_descrip
Invoice_number+Item
Item_qty
Item_price
R. Ching, Ph.D. • MIS • California State University, Sacramento
17
Why Should City and State Be Separated
from Customer Relation?
• City and state are dependent on zip code for their values
and not the customer’s identifier (i.e., key).
Zip_code  City, State
• Otherwise,
Cust_account  Cust_addr, Zip_code  City, State
In which case, you have transitive dependency.
R. Ching, Ph.D. • MIS • California State University, Sacramento
18
3NF
Invoice Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer Relation
(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code Relation
(Zip_code, City, State)
Invoice_items Relation
(Invoice_number, Item, Item_qty, Item_price)
Items Relation
(Item, Item_descrip)
R. Ching, Ph.D. • MIS • California State University, Sacramento
19
3NF
20
Invoice Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer Relation
(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code Relation
(Zip_code, City, State)
Invoice_items Relation
(Invoice_number, Item, Item_qty, Item_price)
Items Relation
Manufacturers Relation
(Item, Item_descrip)
(Manuf_code, Manuf_name)
Since the Items relation contains the manufacturer’s name in the
description, a separate Manufacturers relation can be created
R. Ching, Ph.D. • MIS • California State University, Sacramento
21
R. Ching, Ph.D. • MIS • California State University, Sacramento
First to Third Normal Form
(1NF - 3NF)
• 1NF: A relation is in first normal form if and only if every
attribute is single-valued for each tuple (remove the
repeating or multi-value attributes and create a flat file)
• 2NF: A relation is in second normal form if and only if it
is in first normal form and the nonkey attributes are fully
functionally dependent on the key (remove partial
dependencies)
• 3NF: A relation is in third normal form if it is in second
normal form and no nonkey attribute is transitively
dependent on the key (remove transitive dependencies)
R. Ching, Ph.D. • MIS • California State University, Sacramento
22
23
Putting It Together
ERD of the Normalized Data Model
R. Ching, Ph.D. • MIS • California State University, Sacramento
3NF
24
Invoice Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer Relation
(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code Relation
(Zip_code, City, State)
Invoice_items Relation
(Invoice_number, Item, Item_qty, Item_price)
Items Relation
Manufacturers Relation
(Item, Item_descrip, Manuf_code)
(Manuf_code, Manuf_name)
R. Ching, Ph.D. • MIS • California State University, Sacramento
ERD
Invoices
Invoice_number
Invoice_date
Date_delivered
Cust_account
Invoice_items
Invoice_number
Item
Item_qty
Item_price
25
Customers
Cust_account
Cust_name
Cust_addr
Zip_code
Items
Item
Item_descrip
Manuf_code
R. Ching, Ph.D. • MIS • California State University, Sacramento
Zip_Codes
Zip_code
City
State
Manufacturers
Manuf_code
Manuf_name
ERD
Invoices
Invoice_number
Invoice_date
Date_delivered
Cust_account
26
Order
(0..*)
(1..1)
Customers
Cust_account
Cust_name
Cust_addr
Zip_code
Locate
(0..*)
(1..1)
Zip_Codes
Zip_code
City
State
(1..1)
 Have
(1..*)
Invoice_items
Items
Invoice_number Appear on Item
(0..*)
(1..1) Item_descrip
Item
Manuf_code
Item_qty
Item_price
R. Ching, Ph.D. • MIS • California State University, Sacramento
Manufacturers
Produce
(0..*)
(1..1)
Manuf_code
Manuf_name
ERD
Invoices
Invoice_number
Invoice_date
Date_delivered
Cust_account
(1..1)
 Have
(1..*)
Invoice_items
27
Order
(0..*)
(1..1)
Customers
Cust_account
Cust_name
Cust_addr
Zip_code
Locate
(0..*)
(1..1)
Zip_Codes
Zip_code
City
State
Partial
Zip codes locate Customers.
() A zip code can be related to a
minimum of zero and a maximum of
Manufacturers
Items
many customers.
Produce
() A customer
can be related
to a
Item
Manuf_code
(1..1)
Item_descrip
minimum and(0..*)
maximum
of Manuf_name
one zip code.
Invoice_number Appear on
(0..*)
(1..1)
Item
Manuf_code
Item_qty
Item_price
R. Ching, Ph.D. • MIS • California State University, Sacramento
ERD
Invoices
Invoice_number
Invoice_date
Date_delivered
Cust_account
28
Order
(0..*)
(1..1)
Partial
Customers
Cust_account
Cust_name
Cust_addr
Zip_code
Locate
(0..*)
(1..1)
Zip_Codes
Zip_code
City
State
(1..1)
Customers order (items) on invoices.
 Have
() A customer can be related to a
(1..*)
minimum
of zero and a maximum of
Invoice_items
Items
many invoices.
() An invoiceAppear
can be on
related
Item to a
Invoice_number
(0..*)
(1..1)ofItem_descrip
Item
minimum and maximum
one
Manuf_code
Item_qty
customer.
Item_price
R. Ching, Ph.D. • MIS • California State University, Sacramento
Manufacturers
Produce
(0..*)
(1..1)
Manuf_code
Manuf_name
ERD
Invoices
Invoice_number
Invoice_date
Date_delivered
Cust_account
29
Customers
Zip_Codes
Order
Cust_account
Locate
Zip_code
(0..*)
(1..1) Cust_name
(0..*)
(1..1) City
Cust_addr
State
Invoices
possess
invoice
items
Zip_code
() An invoice can be related to a
(1..1)
minimum of one and a maximum of
 Have Mandatory many invoice items.
(1..*)
() An invoice item can be related to a
minimum
invoice.
Invoice_items
Items and maximum of oneManufacturers
Invoice_number Appear on Item
(0..*)
(1..1) Item_descrip
Item
Manuf_code
Item_qty
Item_price
R. Ching, Ph.D. • MIS • California State University, Sacramento
Produce
(0..*)
(1..1)
Manuf_code
Manuf_name
ERD
30
Invoices
Customers
Items are sold on invoice items.
Invoice_number
() An item canOrder
be relatedCust_account
to a
Invoice_date
Cust_name
(0..*)
(1..1)
minimum
of
zero
and
a
maximum
of
Date_delivered
Cust_addr
many invoice items.
Cust_account
Zip_code
Locate
(0..*)
(1..1)
Zip_Codes
Zip_code
City
State
()
An invoice item can be related to a
(1..1)
minimum and maximum of one item.
 Have
(1..*)
Partial
Invoice_items
Items
Invoice_number Appear on Item
(0..*)
(1..1) Item_descrip
Item
Manuf_code
Item_qty
Item_price
R. Ching, Ph.D. • MIS • California State University, Sacramento
Manufacturers
Produce
(0..*)
(1..1)
Manuf_code
Manuf_name
ERD
Invoices
Invoice_number
Invoice_date
Date_delivered
Cust_account
(1..1)
 Have
31
Order
(0..*)
(1..1)
Manufacturers produce items.
Customers
Zip_Codes
()
A manufacturer can be related
to a
Cust_account
Zip_code
minimum
of zeroLocate
and a maximum
of
Cust_name
(0..*)
(1..1) City
many
items.
Cust_addr
State
()
An
item
can
be
related
to
a
Zip_code
minimum and maximum of one
manufacturer.
(1..*)
Partial
Invoice_items
Items
Invoice_number Appear on Item
(0..*)
(1..1) Item_descrip
Item
Manuf_code
Item_qty
Item_price
R. Ching, Ph.D. • MIS • California State University, Sacramento
Manufacturers
Produce
(0..*)
(1..1)
Manuf_code
Manuf_name
32
Higher Forms of Data Normalization
•
•
•
•
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Domain Key Normal Form (DKNF)
R. Ching, Ph.D. • MIS • California State University, Sacramento
Boyce-Codd Normal Form (BCNF)
• A relation is in Boyce-Codd normal form if and only if
every determinant is a candidate key
A determines B
Attribute A
Attribute B
Determinant
(B is functionally dependent on A)
• For a relation with only one candidate key, 3NF and BCNF
are equivalent.
• Usually occurs when keys of different relations overlap
R. Ching, Ph.D. • MIS • California State University, Sacramento
33
BCNF Example
User (UserID, Dept, Name, ComputerID, EmpClassification)
ComputerID  Dept (a department issues a computer)
UserID, Dept  ComputerID, Name, EmpCassification
(Employees may have the same name and
UserIDs are unique within the department only)
UserID, ComputerID  Dept, Name, EmpClassification
BCNF
UserComputer (ComputerID, Dept)
User (UserID, ComputerID, Name, EmpClassification)
R. Ching, Ph.D. • MIS • California State University, Sacramento
34
From 3NF to BCNF
35
Invoice Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer Relation
(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code Relation
(Zip_code, City, State)
Invoice_items Relation
(Invoice_number, Item, Item_qty, Item_price)
Items Relation
(Item, Item_descrip)
Manufacturers Relation
(Manuf_code, Manuf_name)
R. Ching, Ph.D. • MIS • California State University, Sacramento
Candidate keys?
Fourth Normal Form (4NF)
• A relation is in fourth normal form if and only if it is in
Boyce-Codd normal form and there are no nontrivial
dependencies.
– Identify all determinants and make sure they are
candidate keys
R. Ching, Ph.D. • MIS • California State University, Sacramento
36
4NF Example
37
Employee (EmployeeID, Dept, Project)
Matrix
management
100
100
100
102
102
102
Finance
Marketing
Finance
Finance
Marketing
Finance
4NF
F177-99
F177-99
F288-00
F288-00
F177-99
F177-99
Employee (EmployeeID, Dept)
Projects (EmployeeID, Project)
R. Ching, Ph.D. • MIS • California State University, Sacramento
Multivalued
dependencies
4NF Example
Matrix
management
100
100
100
102
102
102
100
100
102
102
38
Finance
Marketing
Finance
Finance
Marketing
Finance
4NF
Finance
Marketing
Finance
Marketing
R. Ching, Ph.D. • MIS • California State University, Sacramento
F177-99
F177-99
F288-00
F288-00
F177-99
F177-99
100
100
102
102
Multivalued
dependencies
F177-99
F288-00
F288-00
F177-99
Fifth Normal Form (5NF)
aka Project-Join NF
39
• A relation is in fifth normal form if no remaining nonloss
projections (i.e., all projects preserve all information
contained in the original relation)are possible, except the
trivial one in which the key appears in each project.
– The join of all projects will result in the original
relation
– No systematic method exists for obtaining 5NF or for
ensuring that a set of relations is indeed 5NF
Ricardo, 1990
R. Ching, Ph.D. • MIS • California State University, Sacramento
Domain-Key Normal Form (DKNF)
• A relation is in domain-key normal form if every constraint
is a logical consequence of domain constraints or key
constraints (i.e., all possible values are a result of an
imposed constraint)
– There is no proven method of converting a design to
DKNF, so it remains an ideal rather than a state that can
readily be achieved
Ricardo, 1990
R. Ching, Ph.D. • MIS • California State University, Sacramento
40
DKNF
41
For example:
Emp_ID, Emp_name, Classification, Position, Salary
Domain for Classification:
• Executive
• Manager
• Staff
R. Ching, Ph.D. • MIS • California State University, Sacramento
Domain for Position
• Strategic Planner
• CIO
• Vice President
Domain for Position
• Programmer/Analyst I
• Programmer/Analyst II
• Database/Analyst I
Database Design Methodology
Conceptual database design
• Build conceptual representation of the
database
Logical database design
• Translate conceptual representation to
logical structure of the database
Physical database design
• Operatioanlize logical structure in a
physical implementation
R. Ching, Ph.D. • MIS • California State University, Sacramento
42
Conceptual Database Design
• The process of constructing a model of the data used in an
enterprise, independent of all physical considerations
• What’s involved…
–
–
–
–
–
–
–
–
Identify entity types, relationship types
Identify and associate attributes with entity or relationship types
Determine attribute domains
Determine candidate, primary and alternate key attributes
Consider use of enhanced modeling concepts
Check model for redundancies
Validate conceptual model against user transactions
Review conceptual data model with the users
R. Ching, Ph.D. • MIS • California State University, Sacramento
43
Logical Database Design
• The process of constructing a model of the data used in an
enterprise based on a specific data model, but independent
of a particular DBMS and other physical considerations
• What’s involved…
–
–
–
–
–
–
–
Derive relations for logical data model
Validate relations using data normalization
Validate relations against user transactions
Check integrity constraints
Review logical data model (ERD) with the users
Merge logical data models into global data model
Check for future growth
R. Ching, Ph.D. • MIS • California State University, Sacramento
44
Gather Information
• Meet with the users to get gather information
– Interviews
– Documents
R. Ching, Ph.D. • MIS • California State University, Sacramento
45
Derive Relations
46
 Invoices have invoice items
Invoice Items
 One-to-many relationship
Have
1..1
Invoice number (pk) 1..*
Product code (pk)
Manufacture code
Quantity
 Mandatory (all
Sales Price
 Weak entity type
(Invoice number is
part of key)
invoices must have
at least one invoice
item
Invoice
Invoice number (pk)
Invoice date
Delivery date
Sales type
Customer account
 Strong entity type
• Strong and weak entity types
• Relationship types (cardinality)
• Participation (mandatory vs. partial)
R. Ching, Ph.D. • MIS • California State University, Sacramento
Validate Relations
• Normalize relations
• Validate against transactions - Can a transaction be
recreated given the data retained in the relations?)
• Check integrity constraints
– Required data (not null)
– Domain constraints (in, references)
– Multiplicity
– Entity integrity (primary key)
– Referential integrity (foreign key)
– General constraints (business rules)
R. Ching, Ph.D. • MIS • California State University, Sacramento
47
Review Data Model with the Users
• Be pleasant and professional, not arrogant, challenging or
condescending
– Not everyone is receptive to change
– Your role is to facilitate change
• The user is always “right” – It’s his/her data
• Document all change requests (CYA)
• Listen, listen, listen… (Even if you don’t agree)
R. Ching, Ph.D. • MIS • California State University, Sacramento
48
Logical Global Data Model
49
Invoice
Records Transactions
Inventory
Counts and retail prices
Local
Data
Models
Cust Accounts
Customer credit accounts
Cust Billing
Global Data
Model
Customer credit sales
Vendor History
Vendor performance
Product Sales
Sales history
R. Ching, Ph.D. • MIS • California State University, Sacramento
Local data models are merged
to create a (near) normalized
global data model
Physical Database Design
• The process of producing a description of the
implementation of the database on secondary storage
• It describes the base relations, fle organizations and
indexes used to achieve efficient access to the data and nay
associated integrity constraints and security measures
• What’s involved…
– Translate logical data model for target DBMS: Design base
relations, representation of derived data and general constraints
– Design file organizations and indexes: Analyze transactions,
choose file organizations, choose indexes, Dictated by the DB product
estimate disk space requirements
– Design user views and security mechanisms
– Consider the introduction of controlled redundancy
– Monitor and tune the operational system
R. Ching, Ph.D. • MIS • California State University, Sacramento
50
Logical vs. Physical Database Design
• Logical
The process of constructing a model of the information use
the enterprise based on one model of data, BUT
independent of a particular DBMS and other physical
aspects.
• Physical
The process of producing a description of the
implementation of the database on secondary storage; it
describes the storage structures and access methods used to
gain access effectively.
Whereas the logical database design is concerned with the
what, physical database design is concerned with the how.
R. Ching, Ph.D. • MIS • California State University, Sacramento
51
Physical Database Design
Five steps:
• Translate the global (enterprise) logical data model for the
target DBMS
• Design files organizations and indexes, estimate database
space (disk space requirements)
• Design and implement user views and security
mechanisms
• Consider the introduction of controlled redundancy
(denormalization)
• Monitor and tune the operational system
R. Ching, Ph.D. • MIS • California State University, Sacramento
52
Translate the Global Logical Database Model
for the Target DBMS
• Design the relations for the target DBMS
– Decide how to represent the base relations in the global
logical data model in the target DBMS
• Specify keys (primary, foreign), default values,
integrity constraints (table, column), and indexes
• Design integrity rules for the target DBMS
– Design the enterprise constraints for the target DBMS
• Applies to updates and inserts
R. Ching, Ph.D. • MIS • California State University, Sacramento
53
Design and Implement the Physical
Representation
• Determine the file organizations and access methods that
will be used to store the base relations (i.e., the way in
which relations and tuples will be held in secondary
storage)
Depends on the vendor!
– Understand the system resources
• Understand the capabilities of the hardware (CPU,
memory, disk I-O)
• Analyze the software’s performance and limitations
on the network (client/server) and Internet
R. Ching, Ph.D. • MIS • California State University, Sacramento
54
Design and Implement the Physical
Representation
• Analyze the transactions - understand the functionality of
the transactions that will run on the database, and analyze
the import transactions
• Choose file organization
• Choose secondary indexes - determine whether secondary
indexes will enhance performance
– Index the primary key (if it is not the key of the file
organization)
– Do not index small relations
– Add a secondary index to a heavily used secondary key
– Add a secondary index to a frequently used foreign key
R. Ching, Ph.D. • MIS • California State University, Sacramento
55
Design and Implement the Physical
Representation
(cont.)
– AVOID INDEXING AN ATTRIBUTE OR
RELATION THAT IS FREQUENTLY UPDATED
– Avoid indexing an attribute if the query will retrieve a
large portion of the tuples in a relation
– Avoid indexing attributes that consist of long character
strings
R. Ching, Ph.D. • MIS • California State University, Sacramento
56
Design and Implement the Physical
Representation
• Consider the introduction of controlled redundancy
– Determine whether introducing redundancy in a
controlled manner by relaxing the normalization rules
will enhance performance
• Denormalize only when necessary
– However, denormalizing
» Makes implementation more complex
» Sacrifices flexibility
» May slow down updates (although retrievals
may be increased)
R. Ching, Ph.D. • MIS • California State University, Sacramento
57
3NF (Logical Database Design)
58
Invoice Relation
Invoice_number Invoice_date Date_delivered
Cust_account
Customer Relation
Cust_account Cust_name Cust_addr
Zip_code
Zip_code Relation
Zip_code
City
State
Invoice_items Relation
Invoice_number Item Item_qty Item_price
Items Relation
Manufacturers Relation
Item Item_descrip Manuf_code
Manuf_code
R. Ching, Ph.D. • MIS • California State University, Sacramento
Manuf_name
Denormalization
59
• Duplicating attributes or combining relations
– Combining 1:1 relationships
Customers Relation
Cust_account Cust_name Cust_addr
Zip_code
Customer_accounts Relation
Cust_account Account_type Credit_limit Current_balance Pay_history
Customers Relation
Cust_account Cust_name Cust_addr Zip_code
Credit_limit Current_balance Pay_history
R. Ching, Ph.D. • MIS • California State University, Sacramento
Account_type
Denomalization
60
• Duplicating attributes or combining relations
– Duplicating nonkey attributes in 1:M relationships to
reduce joins (creating partial or transitive dependencies)
Customers Relation
Cust_account Cust_name Cust_addr Zip_code
Credit_limit Current_balance Pay_history
Account_type
Zip_codes Relation
Zip_code
City
State
Customers Relation
Cust_account Cust_name Cust_addr City
Account_type Credit_limit Current_balance
R. Ching, Ph.D. • MIS • California State University, Sacramento
State Zip_code
Pay_history
Denomalization
61
(cont.)
– Reference tables (introducing transitive dependencies)
Invoice_items Relation
Invoice_number Item Item_qty Item_price
Items Relation
Manufacturers Relation
Item Item_descrip Manuf_code
Manuf_code
Manuf_name
Problem: In order to know the manufacturer’s name of a
customer’s purchased item, a join between Items and
Manufacturers must be performed
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denomalization
62
(cont.)
– Reference tables (introducing transitive dependencies)
Invoice_items Relation
Invoice_number Item Item_qty Item_price
Items Relation
Manufacturers Relation
Item Item_descrip Manuf_code
Manuf_code
Manuf_name
Invoice_number Item Manuf_code
Item_price
Manuf_name
Item_qty
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denomalization
63
(cont.)
– Duplicating foreign key attributes in 1:M relationships to
reduce joins
Invoice_items Relation
Invoice_number Item Item_qty Item_price
Items Relation
Manufacturers Relation
Item Item_descrip Manuf_code
Manuf_code
Manuf_name
Problem: To find the manufacturer’s name of a product (e.g., Sony
CDP-525) from line_items (relation), two joins must be made:
manufacturers to products, and products to manufacturers.
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denomalization
64
(cont.)
Invoice_items Relation
Invoice_number Item Item_qty Item_price
Items Relation
Manufacturers Relation
Item Item_descrip Manuf_code
Manuf_code
Invoice_number Item Manuf_code
R. Ching, Ph.D. • MIS • California State University, Sacramento
Manuf_name
Item_qty Item_price
Denomalization
65
(cont.)
– Duplicating attributes in M:N relationships to reduce
joins
If joint accounts are allowed and different types of accounts
(i.e., long term, revolving) are available:
Customers Relation
Cust_account Cust_name Cust_addr
Zip_code Soc_Sec_Num
M:N
Customer_accounts Relation
Cust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_Num
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denormalization
66
123456789 John Smith
123456789 Jane Smith
112233445 John Doe
…
...
…
123-45-6789
987-65-4321
567-32-1234
A customer can have
several accounts...
123456789
123456789
543219876
678901234
548794133
…
…
…
…
…
123-45-6789
987-65-7321
123-45-6789
987-65-7321
567-32-1234
R. Ching, Ph.D. • MIS • California State University, Sacramento
An account can have
several owners...
Denormalization
67
(cont.)
Customers Relation
Cust_account Cust_name Cust_addr
Zip_code Soc_Sec_Num
M:N
Customer_accounts Relation
Cust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_Num
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denormalization
68
(cont.)
• Duplicating attributes in M:N relationships to reduce joins
Customers Relation
Cust_account Cust_name Cust_addr
Zip_code Soc_Sec_Num
Customer_accounts Relation
Cust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_Num
Cust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_Num Cust_name
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denomalization
69
(cont.)
– Introducing repeating groups (if the number of
occurrences is known and/or constant)
– Creating extract tables (in an extreme case, an
unnormalized relation) - frees computing resources
Cust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_Num1 Cust_name1 Soc_Sec_num2 Cust_name2
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denomalization
70
(cont.)
– Introduction of “codes” to
• Simplify the composite key
• Retain the original sequence
Invoice_items Relation
Invoice_number Item Manuf_code Item_qty Item_price
Invoice_number Item_number Item Manuf_code Item_qty Item_price
R. Ching, Ph.D. • MIS • California State University, Sacramento
Invoice
No.
0
0
0
1
O
r
d
e
r
N
o
. 1
1
5
9
3
D
a
t
e
: 6
/
/
A
c
c
o
u
n
t
N
o
.
71
S
t
e
r
e
o
s
T
o
G
o
1
0
0
5
2
5
9
4
1
2
i
l
l
i
a
m
T
e
l
l
C
u
s
t
o
m
e
r
:W
A
d
d
r
e
s
s
:
2
0
3
6
2
6
S
t
r
e
e
t
S
a
c
r
a
m
e
n
t
o
C
A
9
5
8
1
9
C
i
t
y
S
t
a
t
eZ
i
p
C
o
d
e
1
8
9
3
D
a
t
e
S
h
i
p
p
e
d
: 6
/
/
r
o
d
u
c
t
I
t
e
m P
C
o
d
e
N
u
m
b
e
r
P
r
o
d
u
c
t
D
e
s
c
r
i
p
t
i
o
n
/
M
a
n
u
f
a
c
t
u
r
e
r
r
i
c
e
Q
t
y P
S
A
G
X
7
3
0
P
i
o
n
e
e
r
R
e
m
o
t
e
A
/
V
R
e
c
e
i
v
e
r1
5
6
9
9
5
2A
T
1
0 C
e
r
v
w
i
n
V
e
g
a
L
o
u
d
s
p
e
a
k
e
r
s 1
3
5
9
9
5
3C
D
P
C
7
2
5
S
o
n
y
D
i
s
c
J
o
c
k
e
y
C
D
C
h
a
n
g
e
r1
3
9
9
9
5
1
4
5
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denormalization
10001 AT10
10001 CDPC725
10001 SAGX730
CV Loudspeakers
2
SON Disc-Jockey CD Changer 1
PIO Remote A/V Receiver
1
72
359.95
399.95
569.95
Key
Problem: These items are not in the sequence as they appear on
the original document when retrieved from the table.
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denormalization
10001 01 SAGX730 PIO Remote A/V Receiver
1
10001 02 AT10
CV Loudspeakers
2
10001 03 CDPC725 SON Disc-Jockey CD Changer 1
Key
R. Ching, Ph.D. • MIS • California State University, Sacramento
73
569.95
359.95
399.95
Denomalization
74
(cont.)
– Introducing calculated attributes
• Simplify processing
Invoice_items Relation
Invoice_number Item Manuf_code Item_qty Item_price
Invoice_number Item_number Item Manuf_code Item_qty Item_price
Extended_price
Item_qty x Item_price
R. Ching, Ph.D. • MIS • California State University, Sacramento
Denormalization
10001 01 SAGX730 PIO Remote A/V Receiver
1
10001 02 AT10
CV Loudspeakers
2
10001 03 CDPC725 SON Disc-Jockey CD Changer 1
75
569.95 569.95
359.95 719.90
399.95 399.95
Calculation
R. Ching, Ph.D. • MIS • California State University, Sacramento
76
R. Ching, Ph.D. • MIS • California State University, Sacramento
Download