Lecture06_257 - Courses - University of California, Berkeley

advertisement
Database Design: Logical
Models: Normalization and The
Relational Model
University of California, Berkeley
School of Information
IS 257: Database Management
IS 257 – Fall 2006
2006.09.14 - SLIDE 1
Announcements
• I will be away next week
• Instead we will have an informal workshop
to work on issues of choosing and
designing your personal Databases
IS 257 – Fall 2006
2006.09.14 - SLIDE 2
Lecture Outline
• Review
– Conceptual Model and UML
• Logical Model for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2006
2006.09.14 - SLIDE 3
Lecture Outline
• Review
• Logical Design for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2006
2006.09.14 - SLIDE 4
DiveShop ER Diagram
Customer
No
DiveCust
1
Destination
Name
Destination
no
Customer
No
ShipVia
n
Dest
n
1
DiveOrds
n
1
ShipVia
ShipVia
1
Destination
no
Site No
1
n
Site No
BioSite
Species
No
1
Destination
n
Sites
Order
No
n
1
1/n
ShipWrck
Order
No
DiveItem
n
Item
No
n
Site No
1
Species
No
BioLife
IS 257 – Fall 2006
1
DiveStok
Item
No
2006.09.14 - SLIDE 5
Lecture Outline
• Review
– Conceptual Model and UML
• Logical Model for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2006
2006.09.14 - SLIDE 6
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Internal
Model
Application 4
Conceptual
requirements
IS 257 – Fall 2006
2006.09.14 - SLIDE 7
Logical Model: Mapping to a Relational
Model
• Each entity in the ER Diagram becomes a
relation.
• A properly normalized (next time) ER diagram
will indicate where intersection relations for
many-to-many mappings are needed.
• Relationships are indicated by common columns
(or domains) in tables that are related.
• We will examine the tables for the Diveshop
derived from the ER diagram
IS 257 – Fall 2006
2006.09.14 - SLIDE 8
DiveShop ER Diagram
Customer
No
DiveCust
1
Destination
Name
Destination
no
Customer
No
ShipVia
n
Dest
n
1
DiveOrds
n
1
ShipVia
ShipVia
1
Destination
no
Site No
1
n
Site No
BioSite
Species
No
1
Destination
n
Sites
Order
No
n
1
1/n
ShipWrck
Order
No
DiveItem
n
Item
No
n
Site No
1
Species
No
BioLife
IS 257 – Fall 2006
1
DiveStok
Item
No
2006.09.14 - SLIDE 9
Customer = DIVECUST
Customer No
Name
Street
City
State/Prov Zip/Postal Code
Country
1480 Louis Jazdzewski
2501 O'Connor
New Orleans
LA
60332
U.S.A.
1481 Barbara Wright
6344 W. Freeway
San Francisco
CA
95031
U.S.A.
1909 Stephen Bredenburg
559 N.E. 167
Indianapolis
Place IN
46241
U.S.A.
1913 Phillip Davoust
123 First Street
Berkeley CA
94704
U.S.A.
1969 David Burgett
320 Montgomery
SeattleStreet
WA
98105
U.S.A.
2001 Mary Rioux1701 Gateway
Pueblo
Blvd. #385
CO
81002
U.S.A.
2306 Kim Lopez 14134 Nottingham
HonoluluLane
HI
96826
U.S.A.
2589 Hiram Marley
7233 Mill Run
SanDrive
Francisco
CA
94123
U.S.A.
3154 Tanya Kulesa
505 S. Flower,
NewMail
YorkStop
NY 48943 10032
U.S.A.
3333 Charles Sekaron
110 East Park
Miller
Avenue,SD
Box 8
57362
U.S.A.
3684 Lowell Lutz915 E. Fesler
Dallas
TX
75043
U.S.A.
4158 Keith Lucas56 South Euclid
Chicago IL
60542
U.S.A.
4175 Karen Ng 2134 ElmhillKlamath
Pike Falls
OR
97603
U.S.A.
5510 Ken Soule 58 Sansome
Aurora
Street CO
89022
U.S.A.
IS 257 – Fall 2006
Phone
First Contact
(902) 555-88881/29/95
(415) 555-43212/2/93
(317) 555-36441/5/93
(415) 555-91843/9/98
(206) 555-75803/12/99
(719) 555-20103/15/97
(808) 555-50501/29/99
(415) 555-64302/18/99
(212) 555-67501/30/99
(613) 555-43333/16/98
(214) 555-27222/15/99
(312) 555-43103/17/98
(503) 555-47003/20/99
(303) 555-66952/5/99
2006.09.14 - SLIDE 10
Dive Order = DIVEORDS
Order No Customer No
Sale Date
307
1480
9/1/99
310
1481
9/1/99
313
1909
9/1/99
314
1913
9/1/99
317
1969
9/1/99
320
2001
9/1/99
321
2306
9/1/99
325
2589
9/1/99
326
3333
9/1/99
327
3684
9/1/99
329
4158
9/1/99
330
4175
9/1/99
331
5510
9/1/99
333
5926
9/1/99
336
5719
9/1/99
IS 257 – Fall 2006
Ship Via
UPS
FedEx
Walk In
FedEx
FedEx
Walk In
Emery
Emery
FedEx
DHL
Walk In
FedEx
FedEx
DHL
FedEx
PaymentMethod
CcNumber CcExpDateNo Of People
Depart DateReturn DateDestinationVacationCost
Visa
12345 678 90 1/1/01
2 11/8/00 11/15/00 Fiji
10000
Check
1
4/4/00
4/18/00 Santa Barbara 6000
Visa
456456456 9/11/00
4 6/27/00
7/11/00 Cozumel
8000
Check
3
2/7/00
2/14/00 Monterey
6000
AmEx
432432432 12/31/02
4
5/9/00
5/16/00 Fiji
20000
Cash
1 10/10/00 10/17/00 Santa Barbara 3000
Master Card
1112223334 8/12/00
1 3/15/00
4/12/00 New Jersey
8000
AmEx
332332332 12/10/99
1 3/15/00
4/12/00 New Jersey
8000
Money Order
2 2/10/00
2/17/00 Monterey
4000
Master Card
122122321 11/9/99
4 3/10/00
3/23/00 Florida
24000
Cash
1
5/4/00
5/15/00 Cozumel
1571
Check
2
7/3/00
7/10/00 Florida
6000
Money Order
6 6/20/00
6/30/00 Santa Barbara 36000
Discover 123123123 12/21/00
2 6/10/00
6/17/00 Fiji
10000
Cash
10
4/2/00
4/24/00 Great Barrier Reef
200000
2006.09.14 - SLIDE 11
Line item = DIVEITEM
Order No Item No
307
90010
307
90020
307
90021
307
90030
307
90051
310
90011
310
90045
310
90059
310
90074
310
90078
313
90127
314
90072
314
90094
314
90100
317
90012
IS 257 – Fall 2006
Rental/SaleQty
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Sale
Rental
Rental
Rental
Sale
Line Note
4
1
1
2
2
1
1
1
1
1
1
3
3
3
2
This is our most popular mask.
These are our best selling fins.
A good weight belt for beginners
Holds 10 cubic feet of cargo.
2006.09.14 - SLIDE 12
Shipping information = SHIPVIA
Ship Via
DHL
Emery
FedEx
UPS
US Mail
IS 257 – Fall 2006
Ship Cost
8
11
12
10
6
2006.09.14 - SLIDE 13
Dive Equipment Stock= DIVESTOK
Item No
90010
90011
90012
90020
90021
90022
90023
90024
90025
90030
90031
90032
90033
90040
90041
90042
IS 257 – Fall 2006
DescriptionEquipment On
Class
Hand Reorder Point
Cost
Sale Price Rental Price
Shotgun 2 Snorkel - Clear
12
2 $18.00
$30.00
$2.00
Shotgun 2 Snorkel - Red
12
2 $18.00
$30.00
$2.00
Shotgun 2 Snorkel - Teal
11
2 $18.00
$30.00
$2.00
Tri-Vent Mask
Mask
- Clear
14
2 $62.50 $100.00
$5.00
Tri-Vent Mask
Mask
- Red
10
2 $62.50 $100.00
$5.00
Tri-Vent Mask
Mask
- Teal
14
2 $62.50 $100.00
$7.00
Quad Vision
Mask
Mask - Clear
11
2 $48.25
$80.00
$7.00
Quad Vision
Mask
Mask - Red
13
2 $48.25
$80.00
$7.00
Quad Vision
Mask
Mask - Teal
10
2 $48.25
$80.00
$10.00
Sea Wing Fins
Fins - Clear
12
2 $60.00 $100.00
$12.00
Sea Wing Fins
Fins - Red
11
2 $60.00 $100.00
$12.00
Sea Wing Fins
Fins - Teal
12
2 $60.00 $100.00
$12.00
Jet Fin - Black
Fins
14
2 $30.00
$60.00
$10.00
D350 Second
Regulator
Stage
11
1 $162.50 $270.00
$20.00
G250 Second
Regulator
Stage
13
1 $144.50 $240.00
$20.00
G200 Second
Regulator
Stage
12
1 $105.25 $175.00
$20.00
2006.09.14 - SLIDE 14
Dive Locations = DEST
DestinationDestination
No
Avg
Name
Temp Avg
(F) Temp Spring
(C)
Temp
Spring
(F) Temp
Summer
(C) Temp
Summer
(F) Temp
Fall Temp
(C) (F)
Fall Temp (C)
Winter Temp
Winter
(F) Temp
Accomodations
(C)
Night Life
1 Cozumel
78
25.556
76
24.444
84
28.889
78
25.556
74
23.333 Cheap
Sleepy
2 Great Barrier Reef80
26.667
76
24.444
84
28.889
78
25.556
76
24.444 Moderate Pleasant
3 Monterey
60
15.556
62
16.667
64
17.778
64
17.778
58
14.444 Expensive Wild
4 Santa Barbara
75
23.889
73
22.777
78
25.556
72
22.222
70
21.111 Expensive Wild
5 Florida
77
25
75
23.889
85
29.444
78
25.556
70
21.111 Moderate Pleasant
6 Fiji
75
23.889
76
24.444
80
26.667
74
23.333
70
21.111 Expensive Sleepy
7 New Jersey
57
13.889
57
13.89
60
15.556
58
14.444
53
11.667 Expensive Pleasant
IS 257 – Fall 2006
Body of Water
Travel Cost
Caribbean
1000
Coral Sea
5000
Pacific
2000
Pacific
3000
Caribbean
3000
South Pacific 5000
Atlantic
2000
2006.09.14 - SLIDE 15
Dive Sites = SITE
Site No
DestinationSite
No Name
Site HighlightSiteDistance
NotesDistance
from Depth
Town
from(m)
(ft)Depth
Town (km)
(m) Visibility (ft)Visibility (m)
Current
1001
1 Palancar Reef Reef
10 16.09
100
30.48
150
45.72 Strong
1002
1 Santa Rosa ReefReef
8 12.87
80
24.384
150
45.72 Strong
1003
1 Chancanab ReefR eef
4 6.437
60
18.288
100
30.48 Mild
1004
1 Punta Sur
Reef
13 20.92
120
36.576
175
53.34 Strong
1005
1 Yocab Reef
Reef
6 9.656
50
15.24
100
30.48 Mild
2001
2 Heron Island
Reef
50 80.47
90
27.432
150
45.72 Mild
2002
2 Cod Hole
Fish
45 72.42
50
15.24
150
45.72 Mild
2003
2 Butterfly Bay
Caves
20 32.19
70
21.336
70
21.336 None
2004
2 Wheeler Reef Marine Life
30 48.28
50
15.24
125
38.1 Mild
2005
2 Watanabe
Marine Life
130 209.2
150
45.72
200
60.96 None
3001
3 Point Lobos
Marine Life
3 4.828
60
18.288
75
22.86 None
3002
3 Macabee BeachMarine Life
0.1 0.161
40
12.192
40
12.192 None
3003
3 Pinnacles
Pinnacle
1 1.609
60
18.288
50
15.24 Mild
3004
3 Monastery Beach
Marine Life
3 4.828
50
15.24
40
12.192 Surge
IS 257 – Fall 2006
Skill Level
Intermediate
Intermediate
Beginning
Advanced
Beginning
Intermediate
Beginning
Advanced
Beginning
Intermediate
Beginning
Beginning
Beginning
Beginning
2006.09.14 - SLIDE 16
Sea Life = BIOLIFE
Species NoCategory Common Name Species Name Length (cm)
Length (in)
Notes Graphic
90020 TriggerfishClown TriggerfishBallistoides conspicillum
50 19.685
90030 Snapper Red Emperor
Lutjanus sebae
60 23.622
90050 Wrasse Giant Maori Wrasse
Cheilinus undulatus 229 90.157
90070 Angelfish Blue Angelfish Pomacanthus nauarchus
30 11.811
90080 Cod
Lunartail RockcodVariola louti
80 31.496
90090 Scorpionfish
Firefish
Pterois volitans
38 14.961
90100 ButterflyfishOrnate Butterflyfish
Chaetodon Ornatissimus
19 7.4803
90110 Shark
Swell Shark
Cephaloscyllium ventriosum
102 40.157
90120 Ray
Bat Ray
Myliobatis californica 56 22.047
90130 Eel
California Moray Gymnothorax mordax 150 59.055
90140 Cod
Lingcod
Ophiodon elongatus 150 59.055
IS 257 – Fall 2006
2006.09.14 - SLIDE 17
BIOSITE -- linking relation
Species No Site No
90010
2001
90010
2002
90010
2003
90010
2004
90010
2005
90010
6001
90010
6003
90010
6004
90010
6005
90020
2001
90020
2002
IS 257 – Fall 2006
2006.09.14 - SLIDE 18
Shipwrecks = SHIPWRK
Ship Name Site No
Delaware
7007
F.S.Loop
4004
Gosford
4001
Great Isaac
7002
Lizzie D
7001
Mohawk
7004
R.P. Resor
7006
Star of Scotland 4002
Tolten
7008
USS Moody
4006
Valiant
4003
Category Type
Interest
TonnageLength (ft)
Length (m) Beam (ft)
Beam (m)
Commercial
Steam Freighter
Treasure
1646
252
76.8096
37
11.2776
Commercial
Steam Schooner
Machinery
794
193
58.8264
39
11.8872
Commercial
Barque Rigged
Fixture
Sail
2250
280
85.344
42
12.8016
Commercial
Seagoing Tug
Fixture
1117
185
56.388
37
11.2776
Commercial
Tug/Rumrunner
Treasure
122
84
25.6032
21
6.4008
PassengerOcean Liner
Treasure
8140
402 122.5296
54
16.4592
Commercial
Oil Tanker Treasure
7450
435
132.588 66.8 20.36064
PassengerBritish Q-Boat
Treasure
1250
263
80.1624
35
10.668
Commercial
Freighter Fixture
1858
280
85.344
43
13.1064
Military
WWI Destroyer
Treasure
1308
314
95.7072
31
9.4488
PassengerLuxury Motor
Treasure
Yacht
444 162.4 49.49952
26
7.9248
IS 257 – Fall 2006
Cause
Date Sunk Comments
Passengers/Crew
Survivors
Condition Graph
Fire
66
66 Broken
Deliberate
1/1/47
0
Scattered
Fire
Intact
Collision
4/16/47
27
27 Intact
Unknown 10/19/22
8
0 Intact
Collision
1/25/35
163
118 Scattered
Military
2/28/42
50
2 Broken
Weather
1/22/42
5
4 Broken
Military
3/13/42
28
1 Intact
Deliberate
1/1/33
0
Intact
Fire
12/17/30
25
25 Intact
2006.09.14 - SLIDE 19
Mapping to Other Models
• Hierarchical
– Need to make decisions about access paths
• Network
– Need to pre-specify all of the links and sets
• Object-Oriented
– What are the objects, datatypes, their
methods and the access points for them
• Object-Relational
– Same as relational, but what new datatypes
might be needed or useful (more on OR later)
IS 257 – Fall 2006
2006.09.14 - SLIDE 20
Lecture Outline
• Review
• Logical Model for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2006
2006.09.14 - SLIDE 21
Normalization
• Normalization theory is based on the
observation that relations with certain
properties are more effective in inserting,
updating and deleting data than other sets
of relations containing the same data
• Normalization is a multi-step process
beginning with an “unnormalized” relation
– Hospital example from Atre, S. Data Base:
Structured Techniques for Design,
Performance, and Management.
IS 257 – Fall 2006
2006.09.14 - SLIDE 22
Normal Forms
•
•
•
•
•
•
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
IS 257 – Fall 2006
2006.09.14 - SLIDE 23
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
IS 257 – Fall 2006
BoyceCodd and
Higher
Functional
dependency
of nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependency
of nonkey
attributes on
the primary
key
2006.09.14 - SLIDE 24
Unnormalized Relations
• First step in normalization is to convert the
data into a two-dimensional table
• In unnormalized relations data can repeat
within a column
IS 257 – Fall 2006
2006.09.14 - SLIDE 25
Unnormalized Relation
Patient #
Surgeon #
145
1111 311
Surg. date
Patient Name
Jan 1,
1995; June
12, 1995
John White
Patient Addr Surgeon
15 New St.
New York,
NY
243
1234 467
2345 189
Jan 8,
1996
Charles Brown
4876 145
Nov 5,
1995
Hal Kane
5123 145
May 10,
1995
Paul Kosher
Charles
Field
10 Main St. Patricia
Rye, NY
Gold
Dogwood
Lane
Harrison,
David
NY
Rosen
55 Boston
Post Road,
Chester,
CN
Beth Little
Blind Brook
Mamaronec
k, NY
Beth Little
6845 243
Apr 5,
1994 Dec
15, 1984
Ann Hood
Hilton Road
Larchmont, Charles
NY
Field
IS 257 – Fall 2006
Postop drug
Drug side effects
Gallstone
s removal;
Beth Little Kidney
Michael
stones
Penicillin,
Diamond removal
none-
Apr 5,
1994 May
10, 1995
Mary Jones
Surgery
rash
none
Eye
Cataract
removal
Thrombos Tetracyclin Fever
is removal e none
none
Open
Heart
Surgery
Cholecyst
ectomy
Gallstone
s
Removal
Eye
Cornea
Replacem
ent Eye
cataract
removal
Cephalosp
orin
none
Demicillin
none
none
none
Tetracyclin
e
Fever
2006.09.14 - SLIDE 26
First Normal Form
• To move to First Normal Form a relation
must contain only atomic values at each
row and column.
– No repeating groups
– A column or set of columns is called a
Candidate Key when its values can uniquely
identify the row in the relation.
IS 257 – Fall 2006
2006.09.14 - SLIDE 27
First Normal Form
Patient #
Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name
1111
145
01-Jan-95 John White
1111
311
12-Jun-95 John White
15 New St.
New York,
NY
15 New St.
New York,
NY
1234
243
05-Apr-94 Mary Jones
10 Main St.
Rye, NY
1234
467
10-May-95 Mary Jones
2345
4876
5123
6845
6845
IS 257 – Fall 2006
189
145
145
243
243
Charles
08-Jan-96 Brown
10 Main St.
Rye, NY
Dogwood
Lane
Harrison,
NY
05-Nov-95 Hal Kane
55 Boston
Post Road,
Chester,
CN
05-Apr-94 Ann Hood
15-Dec-84 Ann Hood
Hilton Road
Larchmont,
NY
Drug adminSide Effects
Charles Field
Gallstone
s removal
Kidney
stones
removal
Eye
Cataract
removal
Patricia Gold
Thrombos
is removal none
none
David Rosen
Open
Heart
Surgery
none
Beth Little
Cholecyst
ectomy
Demicillin
Beth Little
Michael
Diamond
Blind Brook
Mamaronec
10-May-95 Paul Kosher k, NY
Beth Little
Hilton Road
Larchmont,
NY
Surgery
Penicillin
rash
none
none
Tetracyclin
e
Fever
Cephalosp
orin
Charles Field
Gallstone
s
Removal
none
Eye
Cornea
Replacem Tetracyclin
ent
e
Charles Field
Eye
cataract
removal
none
none
none
Fever
none
2006.09.14 - SLIDE 28
1NF Storage Anomalies
• Insertion: A new patient has not yet undergone
surgery -- hence no surgeon # -- Since surgeon
# is part of the key we can’t insert.
• Insertion: If a surgeon is newly hired and hasn’t
operated yet -- there will be no way to include
that person in the database.
• Update: If a patient comes in for a new
procedure, and has moved, we need to change
multiple address entries.
• Deletion (type 1): Deleting a patient record may
also delete all info about a surgeon.
• Deletion (type 2): When there are functional
dependencies (like side effects and drug)
changing one item eliminates other information.
IS 257 – Fall 2006
2006.09.14 - SLIDE 29
Second Normal Form
• A relation is said to be in Second Normal
Form when every nonkey attribute is fully
functionally dependent on the primary
key.
– That is, every nonkey attribute needs the full
primary key for unique identification
IS 257 – Fall 2006
2006.09.14 - SLIDE 30
Second Normal Form
Patient #
1111
1234
2345
4876
5123
6845
IS 257 – Fall 2006
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
2006.09.14 - SLIDE 31
Second Normal Form
Surgeon #
Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
IS 257 – Fall 2006
2006.09.14 - SLIDE 32
Second Normal Form
Patient # Surgeon # Surgery Date
1111
1111
1234
1234
2345
4876
Drug Admin Side Effects
145
Gallstones
01-Jan-95 removal
Kidney
Penicillin
rash
311
stones
12-Jun-95 removal
none
none
243
Eye Cataract
05-Apr-94 removal
Tetracycline Fever
467
Thrombosis
10-May-95 removal
189
Open Heart
08-Jan-96 Surgery
Cephalospori
n
none
145
Cholecystect
05-Nov-95 omy
Demicillin
none
none
none
none
none
5123
145
6845
243
6845
243
IS 257 – Fall 2006
Surgery
Gallstones
10-May-95 Removal
Eye cataract
15-Dec-84 removal
Eye Cornea
05-Apr-94 Replacement
none
none
Tetracycline Fever
2006.09.14 - SLIDE 33
1NF Storage Anomalies Removed
• Insertion: Can now enter new patients without
surgery.
• Insertion: Can now enter Surgeons who haven’t
operated.
• Deletion (type 1): If Charles Brown dies the
corresponding tuples from Patient and Surgery
tables can be deleted without losing information
on David Rosen.
• Update: If John White comes in for third time,
and has moved, we only need to change the
Patient table
IS 257 – Fall 2006
2006.09.14 - SLIDE 34
2NF Storage Anomalies
• Insertion: Cannot enter the fact that a particular
drug has a particular side effect unless it is given
to a patient.
• Deletion: If John White receives some other drug
because of the penicillin rash, and a new drug
and side effect are entered, we lose the
information that penicillin can cause a rash
• Update: If drug side effects change (a new
formula) we have to update multiple occurrences
of side effects.
IS 257 – Fall 2006
2006.09.14 - SLIDE 35
Third Normal Form
• A relation is said to be in Third Normal Form if
there is no transitive functional dependency
between nonkey attributes
– When one nonkey attribute can be determined with
one or more nonkey attributes there is said to be a
transitive functional dependency.
• The side effect column in the Surgery table is
determined by the drug administered
– Side effect is transitively functionally dependent on
drug so Surgery is not 3NF
IS 257 – Fall 2006
2006.09.14 - SLIDE 36
Third Normal Form
Patient # Surgeon # Surgery Date
IS 257 – Fall 2006
Surgery
Drug Admin
1111
145
1111
311
01-Jan-95 Gallstones removal
Kidney stones
12-Jun-95 removal
Penicillin
1234
243
05-Apr-94 Eye Cataract removal Tetracycline
1234
467
10-May-95 Thrombosis removal
2345
189
08-Jan-96 Open Heart Surgery
Cephalosporin
4876
145
05-Nov-95 Cholecystectomy
Demicillin
5123
145
10-May-95 Gallstones Removal
none
6845
243
none
6845
243
15-Dec-84 Eye cataract removal
Eye Cornea
05-Apr-94 Replacement
none
none
Tetracycline
2006.09.14 - SLIDE 37
Third Normal Form
Drug Admin
IS 257 – Fall 2006
Side Effects
Cephalosporin
none
Demicillin
none
none
none
Penicillin
rash
Tetracycline
Fever
2006.09.14 - SLIDE 38
2NF Storage Anomalies Removed
• Insertion: We can now enter the fact that a
particular drug has a particular side effect
in the Drug relation.
• Deletion: If John White recieves some
other drug as a result of the rash from
penicillin, but the information on penicillin
and rash is maintained.
• Update: The side effects for each drug
appear only once.
IS 257 – Fall 2006
2006.09.14 - SLIDE 39
Boyce-Codd Normal Form
• Most 3NF relations are also BCNF
relations.
• A 3NF relation is NOT in BCNF if:
– Candidate keys in the relation are composite
keys (they are not single attributes)
– There is more than one candidate key in the
relation, and
– The keys are not disjoint, that is, some
attributes in the keys are common
IS 257 – Fall 2006
2006.09.14 - SLIDE 40
Most 3NF Relations are also BCNF – Is
this one?
Patient # Patient Name Patient Address
15 New St. New
1111 John White York, NY
10 Main St. Rye,
1234 Mary Jones NY
Charles
Dogwood Lane
2345 Brown
Harrison, NY
55 Boston Post
4876 Hal Kane
Road, Chester,
Blind Brook
5123 Paul Kosher Mamaroneck, NY
Hilton Road
6845 Ann Hood
Larchmont, NY
IS 257 – Fall 2006
2006.09.14 - SLIDE 41
BCNF Relations
Patient # Patient Name
IS 257 – Fall 2006
Patient #
1111 John White
1111
1234 Mary Jones
Charles
2345 Brown
1234
4876 Hal Kane
4876
5123 Paul Kosher
5123
6845 Ann Hood
6845
2345
Patient Address
15 New St. New
York, NY
10 Main St. Rye,
NY
Dogwood Lane
Harrison, NY
55 Boston Post
Road, Chester,
Blind Brook
Mamaroneck, NY
Hilton Road
Larchmont, NY
2006.09.14 - SLIDE 42
Fourth Normal Form
• Any relation is in Fourth Normal Form if it
is BCNF and any multivalued
dependencies are trivial
• Eliminate non-trivial multivalued
dependencies by projecting into simpler
tables
IS 257 – Fall 2006
2006.09.14 - SLIDE 43
Fifth Normal Form
• A relation is in 5NF if every join
dependency in the relation is implied by
the keys of the relation
• Implies that relations that have been
decomposed in previous NF can be
recombined via natural joins to recreate
the original relation.
IS 257 – Fall 2006
2006.09.14 - SLIDE 44
Effectiveness and Efficiency Issues for
DBMS
• Focus on the relational model
• Any column in a relational database can
be searched for values.
• To improve efficiency indexes using
storage structures such as BTrees and
Hashing are used
• But many useful functions are not
indexable and require complete scans of
the the database
IS 257 – Fall 2006
2006.09.14 - SLIDE 45
Example: Text Fields
• In conventional RDBMS, when a text field
is indexed, only exact matching of the text
field contents (or Greater-than and Lessthan).
– Can search for individual words using pattern
matching, but a full scan is required.
• Text searching is still done best (and
fastest) by specialized text search
programs (Search Engines) that we will
look at more later.
IS 257 – Fall 2006
2006.09.14 - SLIDE 46
Normalization
• Normalization is performed to reduce or
eliminate Insertion, Deletion or Update
anomalies.
• However, a completely normalized
database may not be the most efficient or
effective implementation.
• “Denormalization” is sometimes used to
improve efficiency.
IS 257 – Fall 2006
2006.09.14 - SLIDE 47
Normalizing to death
• Normalization splits database information
across multiple tables.
• To retrieve complete information from a
normalized database, the JOIN operation
must be used.
• JOIN tends to be expensive in terms of
processing time, and very large joins are
very expensive.
IS 257 – Fall 2006
2006.09.14 - SLIDE 48
Downward Denormalization
Customer
ID
Address
Name
Telephone
Before:
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
IS 257 – Fall 2006
After:
Customer
ID
Address
Name
Telephone
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
2006.09.14 - SLIDE 49
Upward Denormalization
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order Item
Order No
Item No
Item Price
Num Ordered
IS 257 – Fall 2006
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order Price
Order Item
Order No
Item No
Item Price
Num Ordered
2006.09.14 - SLIDE 50
Denormalization
• Usually driven by the need to improve
query speed
• Query speed is improved at the expense
of more complex or problematic DML
(Data manipulation language) for updates,
deletions and insertions.
IS 257 – Fall 2006
2006.09.14 - SLIDE 51
Using RDBMS to help normalize
• Example database: Cookie
• Database of books, libraries, publisher and
holding information for a shared (union)
catalog
IS 257 – Fall 2006
2006.09.14 - SLIDE 52
Cookie relationships
IS 257 – Fall 2006
2006.09.14 - SLIDE 53
Cookie BIBFILE relation
ACCNO
A003
T082
C024
B006
B007
B005
B008
B010
B009
B012
B011
B014
B013
B016
B017
F047
B116
S102
B118
B018
C031
C032
C034
AUTHOR
TITLE
LOC
PUBID DATE PRICE
AMERICAN LIBRARY ASSOCIATION
ALA BULLETIN
CHICAGO
04
$3.00
ANDERSON, THEODORE
THE TEACHING OF MODERN
PARIS LANGUAGES
53
1955
$10.95
AXT, RICHARD G.
COLLEGE SELF STUDY
BOULDER,
: LECTURES
CO. ON
51INSTITU
1960
$7.00
BALDERSTON, FREDERICK
MANAGING
E.
TODAYS SAN
UNIVERSITY
FRANCISCO 27
1975
$6.00
BARZUN, JACQUES
TEACHER IN AMERICA
GARDEN CITY
18
1954
$7.00
BARZUN, JACQUES
THE AMERICAN UNIVERSITY
NEW YORK
: HOW IT 24
RUNS, W
1970
$5.00
BARZUN, JACQUES
THE HOUSE OF INTELLECT
NEW YORK
24
1961
$8.00
BELL, DANIEL
THE COMING OF POST-INDUSTRIAL
NEW YORK
SOCIETY
09
1976
:
$10.00
BENSON, CHARLES S. IMPLEMENTING THE SAN
LEARNING
FRANCISCO
SOCIETY
27
1974
$9.00
BERG, IVAR
EDUCATION AND JOBS
BOSTON
: THE GREAT TRAINING
10
1971
$12.00
BERSI, ROBERT M.
RESTRUCTURING THE
WASHINGTON,
BACCALAUREATE
D.C.
03
1973
$11.00
BEVERIDGE, WILLIAM I.THE ART OF SCIENTIFIC
NEWINVESTIGATION
YORK
58
1957
$14.00
BIRD, CAROLINE
THE CASE AGAINST NEW
COLLEGE
YORK
08
1975
$13.00
BISSELL, CLAUDE T. THE STRENGTH OF THE
TORONTO
UNIVERSITY 57
1968
$14.00
BLAIR, GLENN MYERS EDUCATIONAL PSYCHOLOGY
NEW YORK
30
1962
$11.00
BLAKE, ELIAS, JR.
THE FUTURE OF THECAMBRIDGE,
BLACK COLLEGES
MA.02
1971
$14.25
BOLAND, R.J.
CRITICAL ISSUES IN INFORMATION
CHICHESTER, ENG.
SYSTEMS
63
1987
R
$30.95
BROWN, SANBORN C., SCIENTIFIC
ED.
MANPOWER
CAMBRIDGE, MASS.
29
1971
$4.00
BUCKLAND, MICHAEL K.LIBRARY SERVICES ELMSFORD,
IN THEORY AND
NY CONTEXT
70
1983
$12.00
BUDIG, GENE A.
ACADEMIC QUICKSAND
LINCOLN,
: SOME
NEBRASKA
TRENDS
37 AND1973
ISS
$13.00
CALIFORNIA. DEPT. OFLAW
JUSTICE
IN THE SCHOOLMONTCLAIR, N.J. 35
1974
$0.50
CAMPBELL, MARGARET
WHY
A. WOULD A GIRLOLD
GO INTO
WESTBURY,
MEDICINE?
48
N.Y. 1973
$1.50
CARNEGIE COMMISSION
A DIGEST
ON HIGHER
OF REPORTS
NEW
OF
YORK
THE CARNEGIE
30
COMM
1974
$3.50
IS 257 – Fall 2006
PAGINATION
ILL
HEIGHT
63 V.
ILL.
26
294 P.
22
X, 300 P. GRAPHS
28
XVI, 307 P.
24
280 P.
18
XII, 319 P.
20
VIII, 271 P.
21
XXVII, 507 P.
21
XVII, 147 P.
24
XX, 200 P.
21
IV, 160P.
23
XIV, 239 P.
18
XII, 308 P.
18
VII, 251 P.
21
678 P.
24
VIII, PP. 539
23
XV, 394 P. ILL.
24
X, 180 P.
26
XII, 201 P. ILL.
23
74 P.
23
IV, 87 P.
21
V, 114 P.
24
399 P.
24
2006.09.14 - SLIDE 54
How to Normalize?
• Currently no way to have multiple authors
for a given book, and there is duplicate
data spread over the BIBFILE table
• Can we use the DBMS to help us
normalize?
• Access example…
IS 257 – Fall 2006
2006.09.14 - SLIDE 55
Database Creation in Access
• Simplest to use a design view
– wizards are available, but less flexible
• Need to watch the default values
• Helps to know what the primary key is, or
if one is to be created automatically
– Automatic creation is more complex in other
RDBMS and ORDBMS
• Need to make decision about the physical
storage of the data
IS 257 – Fall 2006
2006.09.14 - SLIDE 56
Database Creation in Access
• Some Simple Examples
IS 257 – Fall 2006
2006.09.14 - SLIDE 57
Lecture Outline
• Review
• Logical Model for the Diveshop
database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2006
2006.09.14 - SLIDE 58
Advantages of RDBMS
• Relational Database Management
Systems (RDBMS)
• Possible to design complex data storage
and retrieval systems with ease (and
without conventional programming).
• Support for ACID transactions
– Atomic
– Consistent
– Independent
– Durable
IS 257 – Fall 2006
2006.09.14 - SLIDE 59
Advantages of RDBMS
• Support for very large databases
• Automatic optimization of searching (when
possible)
• RDBMS have a simple view of the
database that conforms to much of the
data used in business
• Standard query language (SQL)
IS 257 – Fall 2006
2006.09.14 - SLIDE 60
Disadvantages of RDBMS
• Until recently, no real support for complex
objects such as documents, video, images,
spatial or time-series data. (ORDBMS add -- or
make available support for these)
• Often poor support for storage of complex
objects from OOP languages (Disassembling the
car to park it in the garage)
• Usually no efficient and effective integrated
support for things like text searching within fields
(MySQL does have simple keyword searching
now with index support)
IS 257 – Fall 2006
2006.09.14 - SLIDE 61
Next Week
• Database Design Workshop
IS 257 – Fall 2006
2006.09.14 - SLIDE 62
Download