Database Design Practical Guide

advertisement
ARCH-3: Database Design, a Practical
Guide
Click to add subtitle
Gus Björklund
Wizard, Progress Software Corporation
Ask questions as we go
if I am not being clear.
Warning: there is a mistake in these slides.
2
ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Rules are made to be broken
To every rule,
there is an exception!
3
ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
If you thought this talk was going to be about indexing …
It isn’t. Nor is it about performance.
4
ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Topics
 Theory:
• What is Database Design
• Basic Elements
• Representing the Model as Tables
 Practice
• An Example
 Some Other Topics
5
ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
First, a little theory
6
ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What do we mean by database design?
 A process for defining a model of a subset of
the “real”1 world, then representing it as data
in tables in a relational database
At least, that’s the definition we will use for
the purposes of this talk.
1 Well, for small values of real, anyway.
7
ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Basic Elements
What do we put in our model?
 Just 3 Things:
• Entities
• Attributes
• Relationships
The “entity-relationship model” was described by Peter Chen in 1976.
See http://bit.csc.lsu.edu/~chen/chen.html
8
ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Basic Elements: Entities
 Can be thought of as nouns
• People
– author, composer, performer, seller, buyer
• Places
– home, IP address, URL, destination, factory,
store
• Things
– song, recording, instrument, car, invoice
Is “telephone number” a place or a thing?
9
ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Basic Elements: Attributes
Entities have attributes

Can be thought of as adjectives (but only loosely):
•
•
•
•
•
•
•
•
•
Length
Color
Horsepower
Part number
Song Title
Publication Date
Size
Fabric
Owner
Is “telephone number” a attribute or an entity?
10 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Basic Elements: Relationships
Entities are connected by relationships
 Can be thought of as verbs:
•
•
•
•
•
•
•
•
•
has a
owns
contains
supervises
performs
called
sold
purchased
proved
Is “telephone number” a relationship?
11 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Relationships have attributes too
In May, 1995,
Andrew Wiles
published
a proof
of Fermat’s Last Theorem
12 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Relationships have attributes too
attribute
In May, 1995,
entity
Andrew Wiles
relationship
published
entity
a proof
of Fermat’s Last Theorem
13 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What goes in an entity
 Identifying attributes
• Must be able to uniquely identify the entity
• Can have more than one way to id
• Id can be composite
 Descriptive attributes
• the values you need to keep track of
• generally should be simple, not complex
14 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What to include in your model



The things your application has to keep track of
• Telephones, wires, switches
The actions your application or its users perform
• Make calls, send telephone bills, collect payments
Some attributes of the things and actions
• Originating number, date and time of call, duration, called
number



Keep it simple
Be accurate
Keep it up to date
15 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What to include in your model
 Consider the goals of the system
 Everything you include should be there for a
reason you can state
• in no more than two sentences
 Everything should have a clear name
• if you can’t name it, it doesn’t belong
 Talk to the stakeholders !!!
16 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What to leave out of your model
 The real world has properties that don’t



matter (to your application)
The real world has relationships that don’t
matter
Things happen in the real world that don’t
matter
Keep it simple
• If you can’t say why you need it, leave it out
17 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Logical vs Physical Data Models




Logical entities often require multiple tables to
represent them
• Tables can be thought of as logical or physical
• It depends on your point of view
There is also the physical storage database layout
•
•
•
•
storage areas
data extents
disks
etc.
We aren’t going to talk about the physical database
layout
We will talk about tables
18 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Mapping Your Model to a Database
Simply put,
 Entities become tables
• Identifiers become indexes
 Attributes become columns
• Data types: pick appropriate
 Relationships become tables or foreign keys
19 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
“In theory, there is no difference between
theory and practice, but in practice there is.”
Jan van de Snepscheut
20 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Now for some practice.
21 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
An example
 Music store
• Buys compact disc recordings from
distributors
• Has inventory
• Allows customers to search for what they want
– Maybe in an in-store kiosk or on the web
• Sells compact discs to customers
22 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What should we do first?
23 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Activities
 We buy discs from a distributor
 Orders are sent to a distributor
 Orders are delivered to the store
 Orders may be cancelled
 We sell discs to customers in sales transactions
 Customers buy discs in sales transactions
 Customers search for what they want to buy
Which of these must be remembered by the system?
24 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What do we need to keep track of
 Discs we have
 Discs we sold
 Discs we know about and can get
 Discs we have ordered
 Information needed to do our income tax
•
•
•
•
what we paid for stock
when we bought it
what we sold it for
when we sold it
25 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Disc entities
 UPC Code:
 Manufacturer:
 Cost to us:
 Price charged:
 Tax charged:
 Date purchased:
 Date sold:
26 ARCH-3: Database Design A Practical Guide
8697-07416-2
Sony BMG
$ 2.00
$ 17.95
$ 0.80
March 19, 2007
June 9, 2007
© 2007 Progress Software Corporation
Disc table might look like this
upc
manuf
cost
price
tax
datePurch
dateSold
8697-07416-2
Sony BMG
2.00
17.95
0.90
2007-03-19
2007-06-09
8697-07416-2
Sony BMG
2.00
?
?
2007-06-09
?
314-510347-2
Island Records
2.21
15.95
0.80
2006-01-12
2007-02-14
314-510347-2
Island Records
2.21
?
?
2006-01-12
27 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What’s wrong?









Is upc a unique identifier?
Might have bought from a distributor
Have no information about what is on the disc
• How do customers search?
Don’t know when disc was made
Could be more than one tax jurisdiction
• provincial tax, city tax
Don’t know if disc is on order
Don’t know who bought it
Duplicated data
Etc., etc.
28 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Disc entities take 2















UPC Code:
Manufacturer:
Distributor:
Cost to us:
Price charged:
Tax charged:
Date ordered:
Date received:
Date sold:
Disc Title:
Artist:
Track 1:
Track 2:
Track 3:
etc.
29 ARCH-3: Database Design A Practical Guide
8697-07416-2
Sony BMG
Bob’s Wholesale CD’s
$ 2.00
$ 17.95
$ 0.80
March 19, 2007
March 20, 2007
June 9, 2007
“The Essential Joshua Bell”
Joshua Bell
“Danse Russe”
“Violin Concerto in E Minor”
“Nocturne in C-sharp Minor”
© 2007 Progress Software Corporation
Example: Now What’s wrong?
 This is getting messy
 Activities combined with disc’s attributes
 Have duplicated information
 How many tracks can there be?
 What if there is more than one artist?
 Don’t have all the information a customer
might want to use to search
30 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Discs revisited
 Discs have titles
 Discs have pictures on the cover
 Discs contain tracks
 Discs are made by manufacturers
 Discs are purchased from distributors
 Discs are ordered from distributors
 Discs are delivered to the store
 Discs are sold to customers
31 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
“Discs contain tracks …”











Tracks contain songs
Tracks occur in order
Tracks have a duration
Songs are performed in performances
Songs have performers (usually)
Songs have composers
Songs have names (titles)
Songs have a key (but not always)
Performances are done by performers
Performers can be groups (bands, orchestras, etc.)
Performances are performed in a location or venue
32 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
We seem to need these entities






Discs
Manufacturers
Distributors
Orders
Customers
Inventory
33 ARCH-3: Database Design A Practical Guide




Tracks
Songs
Performers
Groups ?
© 2007 Progress Software Corporation
Songs have names (titles).
Are names properties of songs?
Or are they entities related to songs?
Or are they something else?
34 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Song data (track 1)
Title
“Danse Russe” from Swan Lake, Op.20
Time
4:30
Composer
Peter Tchaikovsky
Category
Classical, violin, orchestra
Performers
Track number
Joshua Bell, Michael Tilson Thomas,
Berlin Philharminic Orchestra
1
Disc upc
8697-07416-2
35 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Song data (track 2)
Title
Violin Concerto in E Minor, Op. 64
Time
6:27
Composer
Felix Mendelssohn
Category
Classical, violin, orchestra
Performers
Track number
Joshua Bell, Sir Roger Norrington,
Camerata Salzburg
2
Disc upc
8697-07416-2
36 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Performance data
Title
Violin Concerto in E Minor, Op. 64
Time
6:27
Composer
Felix Mendelssohn
Category
Classical, violin, orchestra
Performers
Joshua Bell, Sir Roger Norrington,
Camerata Salzburg
37 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Performance data take 2
Title
Violin Concerto in E Minor, Op. 64
Time
6:27
Composer
Felix Mendelssohn
Category
Classical, violin, orchestra
Performers
Joshua Bell, Sir Roger Norrington,
Camerata Salzburg
?
Performance
Date
Performance
Location
38 ARCH-3: Database Design A Practical Guide
?
© 2007 Progress Software Corporation
Performer data
id
1
2
name
Joshua Bell
Sir Roger Norrington
3
4
5
6
Camerata Salzburg
Michael Tilson Thomas
Berlin Philharmonic
Bono
7
8
9
The Edge
Adam Clayton
Larry Mullen
39 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Performance to Performer Relationship
performance id performer id
1
1
1
2
1
3
1
…
2
1
2
4
2
5
2
…
325
6
325
7
325
8
325
9
40 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Performance data take 3
Performance id
2
Title
Violin Concerto in E Minor, Op. 64
Time
6:27
Composer
Felix Mendelssohn
Category
Classical, violin, orchestra
41 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Track to Performance Relationship
Disc upc
Track Num
Performance id
8697-07416-2 1
1
8697-07416-2 2
2
…
…
…
314-510347-2 1
42 ARCH-3: Database Design A Practical Guide
325
© 2007 Progress Software Corporation
Relationships (so far):
track
performance
one to one
performer
performance
performance
disc
track
performer
performance
one to many
track
many to many
track
43 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What happened to Songs?
44 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Relationships (take 2):
song
track
performance
song
one to many
performance
one to one
performance
disc
one to many
track
performer
track
performance
performance
performer
track
performance
many to many
45 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Relationships (take 3):
disc
track
song
performance
performer
track
song
performance
performer
track
song
performance
46 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What about
“business entities”
?
Where are they
?
47 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Business entities
disc
track
song
performance
performer
track
song
performance
performer
track
song
performance
48 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Business entities
disc
track
song
performance
performer
track
song
performance
performer
track
song
performance
49 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Business entities
disc
track
song
performance
performer
track
song
performance
performer
track
song
performance
50 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Should you use arrays?
51 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Indexes
 Enforce uniqueness
 Make searches faster
 Enable fast retrieval of entities by their

identities
Enable finding entities with certain attributes
52 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What indexes do we need
for the music store database?
53 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Tables
0) Discs
1) Tracks
2) Songs
3) Performers
4) Performances
5) Tracks of discs
6) Performances of songs
7) Performers of performances
54 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What indexes do we need
0) Indexes for identifying attributes
1) A unique row identifier
2) Indexes for the queries you will do
55 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
What should we do next ?
56 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Other Topics
 Normalization
 Unique keys
 Word indexes
 Naming
 Customisation
57 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Normalization
 Oversimplified, it means:
• Don’t duplicate data
 Attributes should be simple
•
•
•
•
have only one value
be necessary
not derived data
don’t repeat
 Complicated attributes are often entities in
their own right
• For example, addresses might be
58 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Unique keys







EVERY table must have a unique key
EVERY row needs a unique identifier
• that never changes even if moved to another database
(i.e. if you replicate)
Often, users don’t need to see it
Use a UUID or sequence or maybe datetime
Unique key is the ONLY way to identify rows
unambiguously
ROWID’s are temporary and can change
Use the same method throughout
• You’ll be glad you did
59 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Word indexes
 Can be used to hold multiple status or
attribute values
• Conflicts with normalisation
• Flexible
 Easy to add new ones
 Queries are fast
 Example:
• Category: classical, violin, orchestral, concerto
60 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Naming
Good names are crucial to understanding
• What is in the column “GL01262” ?
61 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Naming
Good names are crucial to understanding
 Table and column names should have clear
meanings everyone can understand
• “GL01262” vs “dateEntered”
 Names with dashes cause inconvenience
with SQL
• “order-date”
 Booleans should be named for truth value
• “backOrdered”
 No double negations
• “notOutOfStock”
62 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Making tables customizable
We will look at 3 ways:
 Spare columns
 Separate table with spare columns
 Separate table with name/value pairs
63 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Spare columns in table
custnum
name
city
extra1
extra2
extra3
001
Bob
Phoenix
frozen
?
0.0
002
Alice
Boston
?
125.46
0.12
003
Eve
Denver
?
?
?
64 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Spare columns in table
custnum
name
city
extra1
extra2
extra3
001
Bob
Phoenix
frozen
?
0.0
002
Alice
Boston
?
125.46
0.12
003
Eve
Denver
?
?
?
What data types should you use?
How many spare columns?
Wasted columns when not used
How do you know what each spare got used for?
How do you know how many unused spares you have?
65 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Separate table for spare columns
custnum
name
city
001
Bob
Phoenix
002
Alice
Boston
003
Eve
Denver
66 ARCH-3: Database Design A Practical Guide
custnum
extra1
extra2
extra3
001
frozen
?
0.0
002
?
125.46
0.12
© 2007 Progress Software Corporation
Separate table for spare columns
custnum
name
city
001
Bob
Phoenix
002
Alice
Boston
003
Eve
Denver
67 ARCH-3: Database Design A Practical Guide
custnum
status
owed
discount
001
frozen
?
0.0
002
?
125.46
0.12
© 2007 Progress Software Corporation
Separate table with name/value pairs
custnum
name
city
001
Bob
Phoenix
002
003
Alice
Eve
Boston
custnum name
value
001
status
frozen
002
owed
125.46
002
discount
0.12
Denver
68 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Modeling Tools
 PCase
 Enterprise Architect
 Power Designer
 ConceptDraw
 Erwin
 Rational
Pencil and paper !
Blackboard !
69 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Summary
 Understand the requirements
 Leave out what is not needed
 Review the design with stakeholders
 Evolve the design as changes come up
 Test to make sure it works
• Can it do everything that is needed?
• Does it perform adequately?
 Expect changes to come
70 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Homework



Papers
• Wiles, A.: "Modular elliptic curves and Fermat's Last
Theorem”, Annals of Mathematics 141 (3): 443-551
• Chen, P.: “The Entity-Relationship Model -- Toward a
Unified View of Data”, ACM TODS Vol 1, No 1, 1976
Wikipedia articles to start from:
• entity-relationship model
• data model
Books:
• Teorey, Lightstone, Nadeau: “Database Modeling and
Design”, Morgan Kaufmann.
71 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Questions
72 ARCH-3: Database Design A Practical Guide
© 2007 Progress Software Corporation
Download