T81-490b Systems Analysis and Development Project Database Design – Creating the Physical Data Model – Part 1 T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Announcements Change to the Schedule Where we're going: Assignment #6: Process Modeling - due next week Talk about Physical Database tonight and next week Exam #2 Assignment #7: Data Dictionary - will be assigned tonight Assignment #8: Report and Screen Design 5 more class nights Tonight's Topic: Going from the Conceptual/Logical Model to the Physical Model Make Assignment #7 tonight T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Take Quiz on Reading Assignment #6. T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Make Assignment #7 now. T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Making the transition from Conceptual data design to physical data design. In assignment #5, you created a model that was partly conceptual and partly logical. Here are some specific definitions. T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Conceptual Data Model: - high-level, business-oriented view - non-critical details left out - emphasize the most important entities, attributes, and relationships Goal: clarity at a high level T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Logical Data Model: - fully normalized - all attributes defined - all candidate keys specified - primary key identified - foreign key definitions clearly defined or implied - any remaining many-to-many are translated into associative entities - cardinality has been specified - optionality has been specified Goal: a complete document from which a physical database can be developed T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Physical Data Model: - dependent on your physical DBMS - specified by DDL statements which will actually be used to create the database objects - may not be fully normalized Goal: creation of a physical database T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Normalization -- a quick review Before we begin to create the Physical Model, we must make sure that our Logical Model is normalized. Normalization is essentially the process of identifying the one best place where each fact belongs. It is important for data integrity, and for ease of loading data into our database. T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 The Normal Forms 1NF 1. Eliminate repeating groups 2. Eliminate/resolve non-atomic data 2NF 1. All attributes are dependent on the primary key 3NF 1. No relationships between the attributes This is corny and over-worn, but I'll say it anyway. In 3NF, "Every attribute depends upon the key, the whole key, and nothing but the key... ... so help me Codd." T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 To ensure a working knowledge of Normalization, Do the Soccer exercise. T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Break T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 The Physical Data Model The physical data model is created by transforming the logical data model into a physical implementation based on the DBMS to be used for deployment. It is very vendor-specific. You will need a good working knowledge of the DBMS. The term "model" is a little misleading. It is not a diagram, like ERD or DFD. It is basically a set of DDL to create the database objects. (Please note: this whole discussion assumes you will be using a Relational Database Management System.) T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Basic Transformations 1. Transform Entities to Tables 2. Transform Attributes to Columns The naming rules of the DBMS may not let you keep the same names you had you had in the logical model. --> Look at handout on abbrev's. 3. Transform Domains to Data Types Each column must have a data type and size. Maybe decimals too. More about data types later. Maybe constraints on the columns. NOT NULL constraints Uniqueness constraints "Check" constraints: specific or range of values 4. Transform Relationships into Referential Contstraints Primary Keys and Foreign Keys A good CASE tool will generate DDL from your data model. Handout and discuss: ERD to DDL T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Handout and discuss: ERD to DDL T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Other physical model structures to be discussed later (next week) include: Physical data structures: - tablespaces, datafiles, extents, blocks, rows Performance structures: - indexes Security features: -- grants, views T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Performance -- preliminary introduction What are the performance issues? - Essentially: How fast does it run? (Does it run fast enough?) - Possibly also: How well does it scale? (well enough?) Scalability - It worked in test with small amounts of data. - Results were correct. - It ran fast. - Why wouldn't it be the same in production? - You will soon scale up. Maybe data is gradually added to the system. - The results (hopefully) will still be correct. - As volume increases, things start to slow down. Why? What's the solution to these issues? Performance Tuning / Design for Performance T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Denormalization Why denormalize? One reason only: performance. Don't denormalize because you think it might be helpful. Try your best to tune the performance first. Do it as a last resort in a long performance tuning effort. It is disruptive. It takes time (maybe downtime to the end users). It could be interpreted as you fixing an implementation mistake. Document what you did, why you did it, and when. T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 The downside: redundant data. Now, when you load that data, you must load it in more than one place. How to handle that fact? If you do it in programs - you might forget, in a new program. - it's a lot of work that way - adhoc user adds a row of data – then suddenly you're off You might use Triggers - at the database level - independent of programs or adhoc users T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Examples of reasons you might denormalize: (NOT an exhaustive list) 1. Prejoined Tables If the join must done often, and is prohibitively expensive. Advantage: you do the join only once. Must be periodically refreshed or rebuilt (which will do the join again). In Oracle, this is called a Materialized View. 2. Report Tables Often a report cannot be generated using SQL only. You can create a Prejoined Table with just the information needed for the report. Then write a program to do a simple SELECT from the report table and then do the remainder of the formatting. 3. Derivable Data If the cost (in cpu cycles) is prohibitive, you might physically store such calculated data instead of calculating it on the fly each time. T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 Handout the Soccer "partially denormalized" solution -- discuss. T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 A note about Disk Utilization In the olden days, disk was very expensive. Now, it is "cheap". But don't be wasteful due to 1) incomptence, or 2) apathy Another Triple constraint triangle (no, this is not on the Exam) - Storage -- don't squander disk - Performance -- don't spend time packing/unpacking, compress/decompress, decoding - Maintainability -- don't be so cryptic that nobody can understand your code Watch out for the ever expanding data store where no data is ever deleted. Design a plan to archive and delete old data that is no longer actively used See chart on next slide… Date 2/21/2005 12/21/2004 10/21/2004 8/21/2004 6/21/2004 4/21/2004 2/21/2004 12/21/2003 10/21/2003 8/21/2003 6/21/2003 4/21/2003 2/21/2003 12/21/2002 10/21/2002 8/21/2002 6/21/2002 4/21/2002 2/21/2002 12/21/2001 10/21/2001 8/21/2001 6/21/2001 4/21/2001 2/21/2001 12/21/2000 10/21/2000 8/21/2000 6/21/2000 4/21/2000 2/21/2000 Size Millions T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 SMT_DATA Trend 1800 1600 1400 1200 1000 800 Series1 600 400 200 0 T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 x T81-490b -- Database Design – Creating the Physical Data Model – Part 1 -- Class #12 x