Physical Database Design Barry Floyd BUS 498 Advanced Database Management Systems Introduction The Physical Database Design Process Goal is to translate our conceptual designs into physical reality Draw on requirements analysis and our conceptual data model Agenda Data Volume and Usage Analysis Data Distribution Strategy discuss this later in the quarter Indexes Denormalization Overview Important step in the database design process (also the last step) Decisions made here impact ... data accessibility response times usability Vocabulary Data volume - how many records Data usage - how often and in what manner are the records used Data Volume Analysis Use volume analysis to select physical storage devices estimate costs of storage Data Volume Analysis LOCATION 100 TREATMENT PATIENT GIVEN PHYSICIAN 50 GIVEN CHARGE GIVEN ITEM 500 Data Volume Analysis LOCATION 100 DERIVE (10) TREATMENT PATIENT 1000 * Keep patient record active for 30 days * Average length of stay CHARGE for a patient is 3 days 100 X 30 / 3 => 1000 (20) PHYSICIAN 50 ITEM 500 Data Volume Analysis LOCATION 100 (10) TREATMENT 4000 (4) DERIVE PATIENT 1000 (20) PHYSICIAN 50 * Each patient has 4 treatments ITEM CHARGE on average. 500 1000 X 4 => 4000 Data Volume Analysis * Each patient has 10 charges LOCATION on average. 100 1000 X 10 => 10,000 TREATMENT 4000 (4) PATIENT 1000 (20) PHYSICIAN 50 (20) ITEM 500 (10) CHARGE 10,000 DERIVE Data Volume Analysis LOCATION 100 (10) TREATMENT 4000 (4) KNOW ... Number of records and relationships PATIENT 1000 (20) PHYSICIAN 50 (20) ITEM 500 (10) CHARGE 10,000 Data Usage Analysis Want to identify major transactions and processes which hit on the database Analyze each transaction and process to determine access paths used and frequency of use Create composite map from individual analyses Transaction Analysis Form TRANSACTION NUMBER MVCH-4 TRANSACTION NAME: CREATE PATIENT BILL TRANSACTION VOLUME: AVERAGE 2/HR PEAK: 10/HR (1) PATIENT 1000 (2) CHARGE 10,000 NO. NAME (3) ITEM 500 ACCESS TRAN PERIOD TYPE REF REF (1) ENTRY-PATIENT READ 1 10 Transaction Analysis Form (1) PATIENT 1000 (2) CHARGE 10,000 NO. NAME (3) ITEM 500 ACCESS TRAN PERIOD TYPE REF REF (1) ENTRY-PATIENT READ 1 10 (2) PATIENT-CHARGE READ 10 100 (3) CHARGE-ITEM READ 10 100 Composite Usage Map Determine how the data structures are accessed for each transaction and process include programs standard queries programmed ad hoc Composite Usage Map LOCATION 100 TREATMENT 4000 (50) (50) (25) PATIENT 1000 PHYSICIAN 50 (50) NUMBER IS PER HOUR AT PEAK VOLUME CHARGE 10,000 ITEM 500 Composite Usage Map LOCATION 100 (75) TREATMENT 4000 (25) (50) PATIENT 1000 (20) (50) (200) CHARGE 10,000 (30) PHYSICIAN 50 (100) ITEM 500 Composite Usage Map (50) LOCATION 100 (75) TREATMENT 4000 (50) (25) PATIENT 1000 (20) (200) (50) CHARGE 10,000 (25) (50) (30) PHYSICIAN 50 (100) (50) ITEM 500 Summary Given volume and usage knowledge we can consider different physical implementation strategies, including ... INDEXES DENORMALIZATION CLUSTERING Indexes Purpose: To speed up access to a particular row or a group of rows in a table. Also used to enforce uniqueness Eliminates the necessity of re-sorting the table each time we need to create a sequenced list Indexes Allen 3 Brian 6 Carole 7 John 2 Karen 5 Marvin1 Sharon 8 Sue 4 1 2 3 4 5 6 7 8 Marvin … John ... Allen ... Sue ... Karen ... Brian ... Carole ... Sharon ... Example SELECT NAME, DEPT, RATING FROM EMP WHERE RATING = 10; Indexing on RATING improves performance. Without an index, must do a full table scan. Costs of an index? Storage space Maintenance Indexed must be changed for each add/delete or change in value on indexed field. One benchmark ... insert into table w/o indexes, 0.11 seconds, w/ 8 indexes, 0.94 seconds. Access Indexes Automatically created on primary key. You must create other indexes as needed. Note, creating a unique index on a foreign key turns the relationship into a 1 - 1 relationship rather than a 1 - m relationship. Let’s consider Oracle indexes and performance ... Oracle Indexes SELECT COUNT(*) FROM EMP WHERE EMP_NO>0 INDEX + TABLE INDEX ONLY % OF FILE READ BREAKEVEN SELECT EMP_NAME FROM EMP WHERE EMP_NO>0 % 8.5 15.5 25.2 50.7 100 FULL TABLE SCAN Seconds 0.66 12.03 1.04 16.21 1.54 25.45 2.80 33.89 5.72 87.23 35.70 35.70 35.70 35.70 35.70 26,000 Rows, 7 Rows per Block Oracle Indexes SELECT COUNT(*) FROM EMP WHERE EMP_NO>0 INDEX + TABLE INDEX ONLY % OF FILE READ BREAKEVEN SELECT EMP_NAME FROM EMP WHERE EMP_NO>0 % 8.5 15.5 25.2 50.7 100 Seconds 0.66 2.31 1.05 4.01 1.59 6.37 2.91 12.69 6.01 25.37 FULL TABLE SCAN 4.52 4.52 4.52 4.52 4.52 26,000 Rows, 258 Rows per Block Rules of thumb Use indexes generously for applications which are decision support/retrieval based. Use indexes judiciously for transaction processing applications. Places to use indexes PRIMARY KEY FOREIGN KEYS Non Key attributes that are referred to in qualification, sorting, and grouping (WHERE, ORDER BY, GROUP BY) Denormalization Goal is to reduce the number of physicals reads to the storage devices by reducing the number of joins. Costs of Denormalization Makes coding more complex Often sacrifices flexibility Will speed up retrieval but slow updates Including children in the parent record Multiple addresses in the personnel record Absolute number of children for a parent is known (e.g., 2 addresses) The number won’t change over time The number is not very large Clusters in Oracle Clustering stores records from two tables into the same physical storage space Only useful for EQUI-JOINS Improves performance by 2-3 times Storing most recent child data in the parent record Multiple children, but children have an ordering (e.g., date of order) For example, perhaps storing amount of last order. Amount of last dividend paid to a particular account Store running totals / Create extract tables Store summary data from a child record Year to date sales Create a summary table which contains aggregate values over some period (say, one month) Duplicating a key beyond an immediate child record CLASS CLASS_ID PARTS PART_ID,CLASS_ID ORDERS ADD THIS KEY ORDER_ID, PART_ID, CLASS_ID Consider SQL statement for previous example SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESC FROM CLASS C, PART P, ORDER O WHERE O.PART_NO = P.PART_NO AND P.CLASS = C.CLASS; SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESC FROM CLASS C, ORDER O WHERE O.CLASS = C.CLASS; Record Partitioning Breaking up a record into two parts A,B,C,D,E,F,G E,F,G A,B,C,D Summary Logical design gives you information about the ‘how’ to build the system. Good physical design takes into account the performance of the final design … to know how best to do this task, you must understand how the system is being used!