Information Resources Management April 10, 2001 Agenda Administrivia Database Design Denormalization Database Administration Security Backup & Recovery Concurrency Controls Administrivia Schema Tuning Staying Normal Split Tables - Vertical Partitioning Highly used vs. infrequently used columns Don’t partition if result will be more joins Keys are duplicated Schema Tuning Staying Normal Variable length fields (VARCHAR, others) Indeterminant record lengths Row locations vary Vertically partition row into two tables, one with fixed and one with variable columns Schema Tuning Leaving Normal Normalization Eliminates duplication Reduces anomalies Does not result in efficiency Denormalize for performance Denormalization Warnings Increases chance of errors or inconsistencies May result in reprogramming if business rules change Optimizes based on current transaction mix Increases duplication and space required Increases programming complexity Always normalize first then denormalize Denormalization Partition Rows Combine Tables Combine and Partition Replicate Data Combining Opportunities One-to-one (optional) allow nulls Many-to-many (assoc. entity) 2 tables instead of 3 Reference data (one-to-many) “one” not use elsewhere few of “many” Combining Examples Employee-Spouse (name and SSN only) Owner-PctOwned - Property few owners with multiple properties Property-Type (description) one type per property Partitioning Horizontal By row type Separate processing by type Supertype/subtype decision Vertical (already seen) Both Replication Intentionally repeating data Example: Owner-PctOwned-Property Owner includes PctOwned & PropertyID Property includes majority OwnerSSN and PctOwned Performance Tuning Not a one-time event Monitoring probably more important Things change applications, database (table) sizes, data characteristics hardware, operating system, DBMS Database Administration Security Backup & Recovery Concurrency Controls Security - Authorization Row Operations Read Insert Update Delete Table Operations Index Creation/Removal Resource New Tables Alteration Drop Authorization Granularity Table-level only View is the same as a table Views Select statement that is given a table name Views can select from other views CREATE VIEW OfficeEmps AS (SELECT O.OfficeNbr, E1.EmpID, E1.Name, M.EmpID, E2.Name AS MgrName) FROM Office AS O, Manager AS M, Employee AS E1, Employee as E2 WHERE O.OfficeNbr = M.OfficeNbr AND M.EmpID = E2.EmpID AND O.OfficeNbr = E1.OfficeNbr and E1.EmpID <> E2.EmpID) Enhancing Granularity Through Views Specific Columns - SELECT xxxx Specific Rows - WHERE xxxx=yyyy Both SQL GRANT priviledge ON table TO user (WITH GRANT OPTION) REVOKE priviledge ON table FROM user (RESTRICT or CASCADE) GRANTS by that user on that table Types of Failures Transaction Logical System System Operating System Hardware Network Disk Recovery Approaches Switch - mirror DB needed (RAID-1) Restore/Rerun Previous backup Rerun all transactions (needed) Log-Based Rollback - undo incomplete Rollforward - previous backup Requirements Permanently write changes without changing the database Transaction States Partially Committed - transaction is done Fully Committed - changes have been made Log-Based Recovery Log - record of all database activity Log Records Transaction start Transaction write (update) new and old values Transaction abort Transaction commit Log-Based Recovery Deferred Immediate Deferred Log Trans Log DB Database modification occurs after transaction commits Deferred Log Only new values kept in update log record Only committed changes need to be reapplied at recovery Uncommitted changes can be removed from the log Deferred Log Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) COMMIT **committed** Log <T1 START> Database <T1 EMP=75, GRADE=11> <T1 COMMIT> <T2 START> EMP=75, GRADE=11 <T2 EMP=75, SALARY=26500> <T2 COMMIT> EMP=75, SALARY=26500 Deferred Log Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) COMMIT **committed** Log <T1 START> Database <T1 EMP=75, GRADE=11> <T1 COMMIT> <T2 START> EMP=75, GRADE=11 <T2 EMP=75, SALARY=26500> <T2 COMMIT> Recovery only deletes from log EMP=75, SALARY=26500 Deferred Log Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) COMMIT **committed** Log <T1 START> Database Database <T1 EMP=75, GRADE=11> <T1 COMMIT> <T2 START> EMP=75, EMP=75, GRADE=11 GRADE=11 <T2 EMP=75, SALARY=26500> SALARY=26500> <T2 COMMIT> EMP=75, EMP=75, SALARY=26500 SALARY=26500 REDO(T1) - commit vs. actual database update Deferred Log Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) COMMIT **committed** Log <T1 START> Database Database <T1 EMP=75, GRADE=11> <T1 COMMIT> <T2 START> EMP=75, EMP=75, GRADE=11 GRADE=11 <T2 EMP=75, SALARY=26500> SALARY=26500> <T2 COMMIT> REDO(T1); Delete T2 from Log EMP=75, EMP=75, SALARY=26500 SALARY=26500 Deferred Log Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) COMMIT **committed** Log <T1 START> Database Database <T1 EMP=75, GRADE=11> <T1 COMMIT> <T2 START> EMP=75, EMP=75, GRADE=11 GRADE=11 <T2 EMP=75, SALARY=26500> SALARY=26500> <T2 COMMIT> REDO(T1); REDO(T2) EMP=75, EMP=75, SALARY=26500 SALARY=26500 Failure During Recovery Recovery from recovery must be possible Redo must be executable multiple times without any differences from a single execution Immediate Modification Trans Log DB Database modified as transaction proceeds Immediate Modification Update log records require old and new values Recovery requires either a REDO or an UNDO based on whether or not each transaction was committed Immediate Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) COMMIT **committed** Log <T1 START> <T1 EMP=75, GRADE=10,11> Database EMP=75, GRADE=11 <T1 COMMIT> <T2 START> <T2 EMP=75, SALARY=25000,26500> <T2 COMMIT> EMP=75, SALARY=26500 Immediate Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** Log <T1 START> <T1 EMP=75, GRADE=10,11> Database EMP=75, GRADE=11 <T1 COMMIT> <T2 START> READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) <T2 EMP=75, SALARY=25000,26500> COMMIT **committed** <T2 COMMIT> UNDO(T1) EMP=75, SALARY=26500 Immediate Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) COMMIT **committed** Log <T1 START> <T1 EMP=75, GRADE=10,11> GRADE=10,11> Database Database EMP=75, GRADE=11 EMP=75, GRADE=11 <T1 COMMIT> <T2 START> <T2 EMP=75, SALARY=25000,26500> SALARY=25000,26500> <T2 COMMIT> REDO(T1) EMP=75, SALARY=26500 EMP=75, SALARY=26500 Immediate Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) COMMIT **committed** Log <T1 START> <T1 EMP=75, GRADE=10,11> GRADE=10,11> Database Database EMP=75, GRADE=11 EMP=75, GRADE=11 <T1 COMMIT> <T2 START> <T2 EMP=75, SALARY=25000,26500> SALARY=25000,26500> EMP=75, SALARY=26500 EMP=75, SALARY=26500 <T2 COMMIT> UNDO(T2); REDO(T1) -- order can be important Immediate Example Transaction READ(EMP=75) GRADE=11 WRITE(EMP=75) COMMIT **committed** READ(EMP=75) READ(GRADE=11) SALARY = 26500 WRITE(EMP=75) COMMIT **committed** Log <T1 START> <T1 EMP=75, GRADE=10,11> GRADE=10,11> Database Database EMP=75, GRADE=11 EMP=75, GRADE=11 <T1 COMMIT> <T2 START> <T2 EMP=75, SALARY=25000,26500> SALARY=25000,26500> EMP=75, SALARY=26500 EMP=75, SALARY=26500 <T2 COMMIT> REDO(T1); REDO(T2) Logging Requirements Log must always be in “stable storage” All log writes must be successful Log kept separate from database Backup copy of database that coincides with start of a new log Recovery needed dependent on type of failure Database restart must recovery completely before allowing new transactions Checkpoints Recovery has to search entire log Many REDOs are unnecessary Recovery can be a lengthy process Checkpoints are used to limit the recovery action that is needed Checkpoints 1. Flush all log records to permanent storage 2. Flush all data buffers to permanent storage 3. Write a <checkpoint> to the permanent storage copy of the log No updates are allowed while checkpointing Checkpoint Recovery 1. Search from end of log to most recent <checkpoint> 2. Continue searching backward until the first transaction <START> before the <checkpoint> 3. From that <START> onward, UNDO and REDO all transactions (Serial execution only) Advantages of Logging Less Overhead at Commit No Data Fragmentation No Need for Garbage Collection Faster recovery Support for Concurrency Transactions Concept State Serializability Maintaining Serializability Transaction Single Unit of Work - User’s Perspective Multiple Operations Required Properties (ACID) Atomicity - all or none Consistency - database consistency maintained Isolation - appearance of being alone Durability - changes persist Transaction State Active Partially Committed Failed Aborted Partially Committed Committed Committed Active Failed Aborted Implementing Transactions in SQL COMMIT WORK ROLLBACK WORK Atomicity & Durability Easiest Completely new copy of database Update new copy Don’t update pointer until commit Recoverable from failure at any point provided the acknowledgement of the commit and the update of the pointer occur simultaneously. Concurrency Multiple Transactions Serial (one at a time) is best but combination of slow & fast in single transaction short and long transactions Concurrency must be handled carefully Example Employee (EmpID, Grade, Salary) Grade (Grade, Midpoint) Employee: Grade: 75, 10, 25000 10, 20000 11, 30000 Example T1 - Change employee #75 to grade 11 READ (Employee) Grade = 11 WRITE (Employee) T2 - Update salaries by 5% of midpoint READ (Employee) READ (Grade) Salary = Salary + (0.05 * Midpoint) WRITE (Employee) Example - Serial Execution T1 then T2 Result: Salary = 26500 (25000 + .05*30000) T2 then T1 Result: Salary = 26000 (25000 + .05*20000) Concurrent Execution T1 READ (Employee) T2 READ (Employee) Grade = 11 WRITE (Employee) READ (Grade) Salary = WRITE (Employee) Result? Concurrent Execution T1 T2 READ (Employee) READ (Grade) READ (Employee) Grade = 11 WRITE (Employee) Salary = WRITE (Employee) Result? Recoverable Schedules If T2 reads an item updated by T1, T1 must commit before T2 Cascadeless Schedule If T2 reads an item updated by T1, T1 must commit before T2 reads Not Recoverable T1 READ (Employee) WRITE (Employee) T2 READ (Employee) READ (Grade) WRITE (Employee) COMMIT ROLLBACK Result? Recoverable T1 READ (Employee) WRITE (Employee) T2 READ (Employee) READ (Grade) WRITE (Employee) COMMIT COMMIT Result? Recoverable T1 READ (Employee) WRITE (Employee) T2 READ (Employee) READ (Grade) WRITE (Employee) ROLLBACK ????? Result? Cascadeless T1 READ (Employee) WRITE (Employee) COMMIT T2 READ (Employee) READ (Grade) WRITE (Employee) COMMIT Result? Ensuring Serializability Concurrency Control Schemes Can’t analyze transactions some in progress analysis longer than transaction already running continue to run Concurrency Control - Locks Shared - Read only Exclusive - Read/Write LOCK-S LOCK-X Compatibility of Locks multiple transactions can have the same lock shared locks only Deadlocks T1: READ(A), READ(B), WRITE(A) T2: READ(B), READ(A), WRITE(B) T1 LOCK-X(A) READ(A) T2 LOCK-X(B) READ(B) LOCK-S(B) WRITE(A) LOCK-S(A) WRITE(B) UNLOCK(A) UNLOCK(B) UNLOCK(A) UNLOCK(B) Locking Protocol Set of Rules Reduce Possibility of Deadlocks Create “appearance” of serial execution to each transaction Two-Phase Locking Protocol Growing Phase Can obtain but not release locks Shrinking Phase Can release but not obtain locks First release of a lock is the transition between phases (lock point) Two-Phase Locking Strict Prevent cascading rollbacks Exclusive locks (LOCK-X) held until commit Rigorous All locks held until commit Lock Conversion Changing a Lock Upgrade - shared to exclusive Downgrade - exclusive to shared Can only upgrade in growing phase Can only downgrade in shrinking phase Most Used Locking Scheme Read LOCK-S(A), READ(A) Write If LOCK-S(A), UPGRADE(A), WRITE(A) If no lock, LOCK-X(A), WRITE(A) Locks held until COMMIT or ROLLBACK Strict - exclusive only Rigorous - all locks Granularity Lock only what is needed Could be Row Table Set of Tables Entire Database Model as a tree with the database at the root and the rows as the leaves Intention Locking To lock a row Traverse the tree from the root to the row Put intention locks on the nodes on the way down Intention locks provide knowledge of lower level locks when a higher level lock is desired -- prevents having to traverse the entire tree to lock the database Intention Locking Locks Acquired Top-Down Locks Released Bottom-Up Deadlocks Prevention Recovery Deadlock Prevention 1. Acquire all locks simultaneously 2. Rollback instead of waiting for a lock 2a. Lock wait timeouts Deadlock Recovery If not prevented, deadlocks must be detected and recovered Detection - periodically search for problems Recovery Select a victim - which one? Rollback - how far? Avoid Starvation Always killing the same victim which never gets to execute Recovery with Concurrency Locking Protocol Transaction Rollback Checkpoints Restart Recovery Locking Protocol Recovery Dependent on Locking Multiple UNDOs may not work correctly if a second transaction reads a value updated by a prior transaction before the prior transaction commits Use Two-Phase Locking that is at least Strict Transaction Rollback Use log records to complete the rollback Must rollback from most recent to earlier updates Release exclusive locks after rollback is completed Checkpoints Multiple transactions can be active at a checkpoint Change <checkpoint> log record to include list of all currently active transactions Still have to halt other processing while checkpointing Checkpointing - When? More often checkpoints -> faster recovery Less often -> longer recovery MTBF - all components Timing Amount of Activity # transactions # updates log file size Restart Recovery Redo list - commit found Undo list - start found - not on Redo list Scan log backwards from the end Stop at <checkpoint> For each transaction on the checkpoint list not on the Redo list, add it to the Undo list Restart Recovery 1. Starting again at the end of the log, Undo all transactions on the Undo list 2. Return to the most recent checkpoint 3. Move forward and redo all transactions on the redo list Homework #8 Database Design Database Administration