ARCH-3: Database Design, a Practical Guide Click to add subtitle Gus Björklund Wizard, Progress Software Corporation Ask questions as we go if I am not being clear. Warning: there is a mistake in these slides. 2 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Rules are made to be broken To every rule, there is an exception! 3 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation If you thought this talk was going to be about indexing … It isn’t. Nor is it about performance. 4 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Topics Theory: • What is Database Design • Basic Elements • Representing the Model as Tables Practice • An Example Some Other Topics 5 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation First, a little theory 6 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What do we mean by database design? A process for defining a model of a subset of the “real”1 world, then representing it as data in tables in a relational database At least, that’s the definition we will use for the purposes of this talk. 1 Well, for small values of real, anyway. 7 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Basic Elements What do we put in our model? Just 3 Things: • Entities • Attributes • Relationships The “entity-relationship model” was described by Peter Chen in 1976. See http://bit.csc.lsu.edu/~chen/chen.html 8 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Basic Elements: Entities Can be thought of as nouns • People – author, composer, performer, seller, buyer • Places – home, IP address, URL, destination, factory, store • Things – song, recording, instrument, car, invoice Is “telephone number” a place or a thing? 9 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Basic Elements: Attributes Entities have attributes Can be thought of as adjectives (but only loosely): • • • • • • • • • Length Color Horsepower Part number Song Title Publication Date Size Fabric Owner Is “telephone number” a attribute or an entity? 10 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Basic Elements: Relationships Entities are connected by relationships Can be thought of as verbs: • • • • • • • • • has a owns contains supervises performs called sold purchased proved Is “telephone number” a relationship? 11 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Relationships have attributes too In May, 1995, Andrew Wiles published a proof of Fermat’s Last Theorem 12 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Relationships have attributes too attribute In May, 1995, entity Andrew Wiles relationship published entity a proof of Fermat’s Last Theorem 13 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What goes in an entity Identifying attributes • Must be able to uniquely identify the entity • Can have more than one way to id • Id can be composite Descriptive attributes • the values you need to keep track of • generally should be simple, not complex 14 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What to include in your model The things your application has to keep track of • Telephones, wires, switches The actions your application or its users perform • Make calls, send telephone bills, collect payments Some attributes of the things and actions • Originating number, date and time of call, duration, called number Keep it simple Be accurate Keep it up to date 15 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What to include in your model Consider the goals of the system Everything you include should be there for a reason you can state • in no more than two sentences Everything should have a clear name • if you can’t name it, it doesn’t belong Talk to the stakeholders !!! 16 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What to leave out of your model The real world has properties that don’t matter (to your application) The real world has relationships that don’t matter Things happen in the real world that don’t matter Keep it simple • If you can’t say why you need it, leave it out 17 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Logical vs Physical Data Models Logical entities often require multiple tables to represent them • Tables can be thought of as logical or physical • It depends on your point of view There is also the physical storage database layout • • • • storage areas data extents disks etc. We aren’t going to talk about the physical database layout We will talk about tables 18 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Mapping Your Model to a Database Simply put, Entities become tables • Identifiers become indexes Attributes become columns • Data types: pick appropriate Relationships become tables or foreign keys 19 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation “In theory, there is no difference between theory and practice, but in practice there is.” Jan van de Snepscheut 20 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Now for some practice. 21 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation An example Music store • Buys compact disc recordings from distributors • Has inventory • Allows customers to search for what they want – Maybe in an in-store kiosk or on the web • Sells compact discs to customers 22 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What should we do first? 23 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Activities We buy discs from a distributor Orders are sent to a distributor Orders are delivered to the store Orders may be cancelled We sell discs to customers in sales transactions Customers buy discs in sales transactions Customers search for what they want to buy Which of these must be remembered by the system? 24 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What do we need to keep track of Discs we have Discs we sold Discs we know about and can get Discs we have ordered Information needed to do our income tax • • • • what we paid for stock when we bought it what we sold it for when we sold it 25 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Disc entities UPC Code: Manufacturer: Cost to us: Price charged: Tax charged: Date purchased: Date sold: 26 ARCH-3: Database Design A Practical Guide 8697-07416-2 Sony BMG $ 2.00 $ 17.95 $ 0.80 March 19, 2007 June 9, 2007 © 2007 Progress Software Corporation Disc table might look like this upc manuf cost price tax datePurch dateSold 8697-07416-2 Sony BMG 2.00 17.95 0.90 2007-03-19 2007-06-09 8697-07416-2 Sony BMG 2.00 ? ? 2007-06-09 ? 314-510347-2 Island Records 2.21 15.95 0.80 2006-01-12 2007-02-14 314-510347-2 Island Records 2.21 ? ? 2006-01-12 27 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What’s wrong? Is upc a unique identifier? Might have bought from a distributor Have no information about what is on the disc • How do customers search? Don’t know when disc was made Could be more than one tax jurisdiction • provincial tax, city tax Don’t know if disc is on order Don’t know who bought it Duplicated data Etc., etc. 28 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Disc entities take 2 UPC Code: Manufacturer: Distributor: Cost to us: Price charged: Tax charged: Date ordered: Date received: Date sold: Disc Title: Artist: Track 1: Track 2: Track 3: etc. 29 ARCH-3: Database Design A Practical Guide 8697-07416-2 Sony BMG Bob’s Wholesale CD’s $ 2.00 $ 17.95 $ 0.80 March 19, 2007 March 20, 2007 June 9, 2007 “The Essential Joshua Bell” Joshua Bell “Danse Russe” “Violin Concerto in E Minor” “Nocturne in C-sharp Minor” © 2007 Progress Software Corporation Example: Now What’s wrong? This is getting messy Activities combined with disc’s attributes Have duplicated information How many tracks can there be? What if there is more than one artist? Don’t have all the information a customer might want to use to search 30 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Discs revisited Discs have titles Discs have pictures on the cover Discs contain tracks Discs are made by manufacturers Discs are purchased from distributors Discs are ordered from distributors Discs are delivered to the store Discs are sold to customers 31 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation “Discs contain tracks …” Tracks contain songs Tracks occur in order Tracks have a duration Songs are performed in performances Songs have performers (usually) Songs have composers Songs have names (titles) Songs have a key (but not always) Performances are done by performers Performers can be groups (bands, orchestras, etc.) Performances are performed in a location or venue 32 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation We seem to need these entities Discs Manufacturers Distributors Orders Customers Inventory 33 ARCH-3: Database Design A Practical Guide Tracks Songs Performers Groups ? © 2007 Progress Software Corporation Songs have names (titles). Are names properties of songs? Or are they entities related to songs? Or are they something else? 34 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Song data (track 1) Title “Danse Russe” from Swan Lake, Op.20 Time 4:30 Composer Peter Tchaikovsky Category Classical, violin, orchestra Performers Track number Joshua Bell, Michael Tilson Thomas, Berlin Philharminic Orchestra 1 Disc upc 8697-07416-2 35 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Song data (track 2) Title Violin Concerto in E Minor, Op. 64 Time 6:27 Composer Felix Mendelssohn Category Classical, violin, orchestra Performers Track number Joshua Bell, Sir Roger Norrington, Camerata Salzburg 2 Disc upc 8697-07416-2 36 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Performance data Title Violin Concerto in E Minor, Op. 64 Time 6:27 Composer Felix Mendelssohn Category Classical, violin, orchestra Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg 37 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Performance data take 2 Title Violin Concerto in E Minor, Op. 64 Time 6:27 Composer Felix Mendelssohn Category Classical, violin, orchestra Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg ? Performance Date Performance Location 38 ARCH-3: Database Design A Practical Guide ? © 2007 Progress Software Corporation Performer data id 1 2 name Joshua Bell Sir Roger Norrington 3 4 5 6 Camerata Salzburg Michael Tilson Thomas Berlin Philharmonic Bono 7 8 9 The Edge Adam Clayton Larry Mullen 39 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Performance to Performer Relationship performance id performer id 1 1 1 2 1 3 1 … 2 1 2 4 2 5 2 … 325 6 325 7 325 8 325 9 40 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Performance data take 3 Performance id 2 Title Violin Concerto in E Minor, Op. 64 Time 6:27 Composer Felix Mendelssohn Category Classical, violin, orchestra 41 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Track to Performance Relationship Disc upc Track Num Performance id 8697-07416-2 1 1 8697-07416-2 2 2 … … … 314-510347-2 1 42 ARCH-3: Database Design A Practical Guide 325 © 2007 Progress Software Corporation Relationships (so far): track performance one to one performer performance performance disc track performer performance one to many track many to many track 43 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What happened to Songs? 44 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Relationships (take 2): song track performance song one to many performance one to one performance disc one to many track performer track performance performance performer track performance many to many 45 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Relationships (take 3): disc track song performance performer track song performance performer track song performance 46 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What about “business entities” ? Where are they ? 47 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Business entities disc track song performance performer track song performance performer track song performance 48 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Business entities disc track song performance performer track song performance performer track song performance 49 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Business entities disc track song performance performer track song performance performer track song performance 50 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Should you use arrays? 51 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Indexes Enforce uniqueness Make searches faster Enable fast retrieval of entities by their identities Enable finding entities with certain attributes 52 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What indexes do we need for the music store database? 53 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Tables 0) Discs 1) Tracks 2) Songs 3) Performers 4) Performances 5) Tracks of discs 6) Performances of songs 7) Performers of performances 54 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What indexes do we need 0) Indexes for identifying attributes 1) A unique row identifier 2) Indexes for the queries you will do 55 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation What should we do next ? 56 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Other Topics Normalization Unique keys Word indexes Naming Customisation 57 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Normalization Oversimplified, it means: • Don’t duplicate data Attributes should be simple • • • • have only one value be necessary not derived data don’t repeat Complicated attributes are often entities in their own right • For example, addresses might be 58 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Unique keys EVERY table must have a unique key EVERY row needs a unique identifier • that never changes even if moved to another database (i.e. if you replicate) Often, users don’t need to see it Use a UUID or sequence or maybe datetime Unique key is the ONLY way to identify rows unambiguously ROWID’s are temporary and can change Use the same method throughout • You’ll be glad you did 59 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Word indexes Can be used to hold multiple status or attribute values • Conflicts with normalisation • Flexible Easy to add new ones Queries are fast Example: • Category: classical, violin, orchestral, concerto 60 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Naming Good names are crucial to understanding • What is in the column “GL01262” ? 61 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Naming Good names are crucial to understanding Table and column names should have clear meanings everyone can understand • “GL01262” vs “dateEntered” Names with dashes cause inconvenience with SQL • “order-date” Booleans should be named for truth value • “backOrdered” No double negations • “notOutOfStock” 62 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Making tables customizable We will look at 3 ways: Spare columns Separate table with spare columns Separate table with name/value pairs 63 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Spare columns in table custnum name city extra1 extra2 extra3 001 Bob Phoenix frozen ? 0.0 002 Alice Boston ? 125.46 0.12 003 Eve Denver ? ? ? 64 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Spare columns in table custnum name city extra1 extra2 extra3 001 Bob Phoenix frozen ? 0.0 002 Alice Boston ? 125.46 0.12 003 Eve Denver ? ? ? What data types should you use? How many spare columns? Wasted columns when not used How do you know what each spare got used for? How do you know how many unused spares you have? 65 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Separate table for spare columns custnum name city 001 Bob Phoenix 002 Alice Boston 003 Eve Denver 66 ARCH-3: Database Design A Practical Guide custnum extra1 extra2 extra3 001 frozen ? 0.0 002 ? 125.46 0.12 © 2007 Progress Software Corporation Separate table for spare columns custnum name city 001 Bob Phoenix 002 Alice Boston 003 Eve Denver 67 ARCH-3: Database Design A Practical Guide custnum status owed discount 001 frozen ? 0.0 002 ? 125.46 0.12 © 2007 Progress Software Corporation Separate table with name/value pairs custnum name city 001 Bob Phoenix 002 003 Alice Eve Boston custnum name value 001 status frozen 002 owed 125.46 002 discount 0.12 Denver 68 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Modeling Tools PCase Enterprise Architect Power Designer ConceptDraw Erwin Rational Pencil and paper ! Blackboard ! 69 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Summary Understand the requirements Leave out what is not needed Review the design with stakeholders Evolve the design as changes come up Test to make sure it works • Can it do everything that is needed? • Does it perform adequately? Expect changes to come 70 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Homework Papers • Wiles, A.: "Modular elliptic curves and Fermat's Last Theorem”, Annals of Mathematics 141 (3): 443-551 • Chen, P.: “The Entity-Relationship Model -- Toward a Unified View of Data”, ACM TODS Vol 1, No 1, 1976 Wikipedia articles to start from: • entity-relationship model • data model Books: • Teorey, Lightstone, Nadeau: “Database Modeling and Design”, Morgan Kaufmann. 71 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation Questions 72 ARCH-3: Database Design A Practical Guide © 2007 Progress Software Corporation