Information Resources Management April 3, 2001 Agenda Administrivia Physical Database Design Database Integrity Performance Administrivia Exam 2 Regrade Requests Exam SQL Create Database Enter query(s) as submitted Submit to me Database (electronic) Graded homework (paper) Reserve the right to change test data and reexecute query Foreign Keys Inserts require all FK values be the value of a primary key in the reference table Update and delete constraints are also possible Referential Integrity ON DELETE CASCADE/RESTRICT/SET NULL ON UPDATE CASCADE/RESTRICT/SET NULL Default ON DELETE RESTRICT ON UPDATE CASCADE Example CREATE TABLE PCAccess (PC# INTEGER, EmpID CHAR(9), AccessType CHAR(15), PRIMARY KEY (PC#, EmpID), FOREIGN KEY (EmpID) REFERENCES (Employee), FOREIGN KEY (PC#) REFERENCES (PC)) Example - PCAccess Table PC# EmpID AccessType 1 1 Full 1 2 Restricted 2 1 Full 3 4 Full 3 2 Semi-Restricted Example #1 PC# 1 1 2 3 3 EmpID 1 2 1 4 2 INSERT INTO PCAccess (PC#) VALUES (4) AccessType Full Restricted Full Full Semi-Restricted Example #2 PC# 1 1 2 3 3 EmpID 1 2 1 4 2 INSERT INTO PCAccess (PC#, EmpID) VALUES (4,5) AccessType Full Restricted Full Full Semi-Restricted Example #3 PC# 1 1 2 3 3 EmpID 1 2 1 4 2 UPDATE Employee SET EmpID = 10 WHERE EmpID = 1 AccessType Full Restricted Full Full Semi-Restricted Example #4 PC# 1 1 2 3 3 EmpID 1 2 1 4 2 UPDATE PCAccess SET EmpID = 10 WHERE EmpID = 1 AccessType Full Restricted Full Full Semi-Restricted Example #5 PC# 1 1 2 3 3 EmpID 1 2 1 4 2 DELETE FROM Employee WHERE EmpID = 2 AccessType Full Restricted Full Full Semi-Restricted Example #6 PC# 1 1 2 3 3 EmpID 1 2 1 4 2 DELETE FROM PCAccess WHERE EmpID = 2 AccessType Full Restricted Full Full Semi-Restricted Example #7 PC# 1 1 2 3 3 DELETE FROM PC WHERE PC# = 3 EmpID 1 2 1 4 2 AccessType Full Restricted Full Full Semi-Restricted Cascading Chain followed until the end Especially for deletes If mix of CASCADE, RESTRICT, SET NULL Will get all or nothing Update & Delete Constraints CREATE TABLE T1 (A CHAR(5) B CHAR(5) PRIMARY KEY (A,B) FOREIGN KEY (A) REFERENCES (T2) ON DELETE RESTRICT ON UPDATE RESTRICT) CREATE TABLE T2 (C CHAR(5) D VARCHAR(30) PRIMARY KEY (C)) Constraints T1 T2 A X1 X1 X1 X1 X2 X3 X3 B Y1 Y2 A1 A4 A4 Y5 Y1 C X1 X2 X3 D X1 Desc X2 Desc X3 Desc Want to update the value of X1 to be X11. What has to happen? Performance Requires Knowledge of DBMS Applications Data Users & Expectations Environment Performance Classes OLTP On-Line Transaction Processing OLAP On-Line Analytic Processing Mix of OLTP and OLAP OLTP Throughput Driven Throughput - number of transactions per unit of time Lots of Transactions Mix of Update and Query High Concurrency OLAP Response Time Driven Response Time - single transaction Very Large, Possibly Complex, Transactions Query Evaluation and Optimization Performance Tuning Consider the Mix of OLTP & OLAP Interference Between Types Example: Single daily large analytic transaction, rest simple transactions, locking could prevent others from running. Tuning Levels DBMS Hardware Design Interactions Between Levels DBMS Parameter Tuning Specific to DBMS Buffers - Buffer Pool Logging - Checkpoints Lock Management Space Allocation - Log, Data, Freespace Thread Management Operating System Tuning Hardware Tuning Memory CPU Disk RAID Number of Drives Partitioning Architecture -- Parallel Systems? RAID Redundant Array of Inexpensive Disks Appears as single disk Physical storage difference - no database differences Increase performance Provide recovery from disk failure Negative Effect of RAID MTBF (mean time between failures) Increase by factor = # of drives used # Drives 1 2 4 8 MTBF 730 days 365 182 ½ 91 ¼ How RAID Works Striping - dividing equally across all disks 1 2 3 4 1 2 3 4 Stripe 1 5 6 7 8 Stripe 2 9 10 11 12 Stripe 3 Stripe n RAID Levels RAID-0 RAID-1 RAID-2 RAID-3 RAID-4 RAID-5 RAID-6 RAID-0 All disks store unique data Very fast No fault tolerance or recovery 1 1 2 2 3 3 4 4 5 6 7 8 9 10 11 12 RAID-1 Fully Redundant Faster Reads/Slower Writes High fault tolerance -- easy recovery 1 1 2 2 3 1 4 2 3 4 3 4 5 6 5 6 RAID-2 Each record spans all drives Some disks store ECC (error correction codes) Parity checks allow error detection and correction 1 1a 2 1b 3 4 ECC ECC 2a 2b ECC ECC 3a 3b ECC ECC RAID-3 Each record spans all drives One disk stores ECC Single-User 1 1a 2 1b 3 1c 4 ECC 2a 2b 2c ECC 3a 3b 3c ECC RAID-4 Each record stored on a single disk One drive for ECC Multi-user reads; Single-User writes 1 1 2 2 3 3 4 ECC 4 5 6 ECC 7 8 9 ECC RAID-5 (Rotating Parity Array) Drive has both data and ECC ECCs rotate to different drive Multi-user reads and writes 1 1 2 2 3 3 4 ECC ECC 4 5 6 6 ECC 7 8 9 10 ECC 11 12 13 14 ECC RAID-6 P+Q Redundancy P - “parity” Q - “extra parity” 2 bits of ECC per 4 bits of data Handles multiple disk failures Reed-Solomon codes Introduction to the Theory of ErrorCorrecting Codes, Pless (1989) Your Mileage May Vary “We note that numerous improvements have been proposed to the basic RAID schemes described here. As a result, sometimes there is confusion about the exact definitions of the different RAID levels.” RAID Usage 1, 3, and 5 outperform others RAID-1 - fastest, no storage cost, but not fault tolerant RAID-3 - single-user only RAID-5 - higher speed than single disk, fault tolerant, multi-user, but some storage cost and slower write times Design Tuning Transactions Physical Database Transaction Tuning The DBMS optimizes so why worry? An optimized poorly written transaction can always be outperformed by a wellwritten nonoptimized one. EXPLAIN (DB2) What did the optimizer come up with? Transaction Tuning Distributed Databases Client-Server Network performance becomes an additional concern Transaction Tuning DBA participation in program reviews and walkthroughs Continuous Monitoring Transaction Tuning Heuristics Single query instead of multiple queries “multiple” includes sub-queries Avoid long-running transactions Avoid large quantities of updates Locking and logging Reduce number of tables joined Transaction Tuning Heuristics Reduce sorting Return less data rather than more Don’t shift logic from query to program Optimizer is likely to be faster Less data is returned Transaction Tuning Example Get the names of all managers whose offices have property listed in Pgh. SELECT * FROM Property as P, Office as O, Manager as M, Employee as E WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr = M.OfficeNbr AND M.EmpID = E.EmpID AND PropertyID IN (SELECT PropertyID FROM Property as P2 WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr) Transaction Tuning Example SELECT * FROM Property as P, Office as O, Manager as M, Employee as E WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr = M.OfficeNbr AND M.EmpID = E.EmpID AND PropertyID IN (SELECT PropertyID FROM Property as P2 WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr) More is selected than is needed. Transaction Tuning Example SELECT * FROM Property as P, Office as O, Manager as M, Employee as E WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr = M.OfficeNbr AND M.EmpID = E.EmpID AND PropertyID IN (SELECT PropertyID FROM Property as P2 WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr) Some joined tables can be eliminated. Transaction Tuning Example SELECT * FROM Property as P, Office as O, Manager as M, Employee as E WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr = M.OfficeNbr AND M.EmpID = E.EmpID AND PropertyID IN (SELECT PropertyID FROM Property as P2 WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr) Subquery is executed once per office. Transaction Tuning Example SELECT E.Name FROM Employee as E WHERE E.MgrFlag = 1 AND OfficeNbr IN (SELECT OfficeNbr FROM Property as P WHERE P.City = ‘Pgh’) Version without any joins - 2 single table queries only. Transaction Tuning Example SELECT DISTINCT E.Name FROM Property as P, Employee as E WHERE P.OfficeNbr = E.OfficeNbr AND E.MgrFlag = 1 AND P.City = ‘Pgh’ Single query with join. Transaction Tuning Explain (or similar tool) can help to identify how each transaction will access the data and what temporary tables will have to be created to execute the query With multiple options, test them Order of conditions in WHERE can affect the optimization and performance I.E., put MgrFlag = 1 first Physical Database Tuning Indices Schema Tuning Retaining Normalization Denormalization Indices Unique Nonunique Single Attribute Multiple Attributes (concatenated or composite key) Primary Key Secondary Index Additional Indices Index decreases read time but increases update time Based on queries - even single query (EXPLAIN) Indices need reorganization Inserts, Updates, Deletes Specify freespace Reduce frequency of reorganizations Schema Tuning Staying Normal Split Tables - Vertical Partitioning Highly used vs. infrequently used columns Don’t partition if result will be more joins Keys are duplicated Schema Tuning Staying Normal Variable length fields (VARCHAR, others) Indeterminant record lengths Row locations vary Vertically partition row into two tables, one with fixed and one with variable columns Schema Tuning Leaving Normal Normalization Eliminates duplication Reduces anomalies Does not result in efficiency Denormalize for performance Denormalization Warnings Increases chance of errors or inconsistencies May result in reprogramming if business rules change Optimizes based on current transaction mix Increases duplication and space required Increases programming complexity Always normalize first then denormalize Denormalization Partition Rows Combine Tables Combine and Partition Replicate Data Combining Opportunities One-to-one (optional) allow nulls Many-to-many (assoc. entity) 2 tables instead of 3 Reference data (one-to-many) “one” not use elsewhere few of “many” Combining Examples Employee-Spouse (name and SSN only) Owner-PctOwned - Property few owners with multiple properties Property-Type (description) one type per property Partitioning Horizontal By row type Separate processing by type Supertype/subtype decision Vertical (already seen) Both Replication Intentionally repeating data Example: Owner-PctOwned-Property Owner includes PctOwned & PropertyID Property includes majority OwnerSSN and PctOwned Performance Tuning Not a one-time event Monitoring probably more important Things change applications, database (table) sizes, data characteristics hardware, operating system, DBMS