Chapter 23 6e - 21 5e : Recovery CSE 4701 Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 A portion of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech. Remaining slides represent new material. Chapter 23-1 Overview of Material CSE 4701 Recall Transaction Concepts What is a Transaction Two-Phase Commit and Recovery Motivating Recovery What is it? Why is it Needed? Database States and Failures Recovery Techniques Alternative UNDO/REDO Approaches Checkpointing Shadow Paging Concluding Remarks Chap 21/23 depends on Chap 19/20 21/22 (TP) Chapter 23-2 Modeling Transactions CSE 4701 System State All Aspects which Encompass a Snapshot of the DB Including Assertions, Constraints, Meta-Data, etc., that can be used to Maintain and Verify DB Transactions must Preserve System State by Insuring DB Consistency Failures Require Corrective Actions via Undo - Correct to a Prior Consistent State Redo - Rerun Aborted/Incomplete Transactions DB Actions in a Transaction Categorized by: Unprotected - Abort Does not Require Undo/Redo Protected - Abort May Require Undo/Redo Real - Once Done - Transaction Cannot be Undone Chapter 23-3 Complications in Transaction Execution CSE 4701 What are Different Types of Transactions? Simple Linear Sequence of Actions Inter-Transaction Concurrency What we’ve seen so for in TP/CC Nested Transactions Transactions within Transactions (e.g. Nested Select) Requires Intra-Transaction Concurrency Control Long-Term Transactions Require Hours or Days for Execution Can’t Just Make DB Unavailable for Long Periods Both OCC and PCC are Often Infeasible! Chapter 23-4 Examples of Nested/Long-Lived CSE 4701 Nested Travel Agent Example - Reserving a Trip Airline Reservation Hotel Reservation Car Rental A Transaction of Transactions Long-Term Transactions CAD/CAM Consider a Jet Engine Some Analysis Techniques (Structure, Fluids, etc.) may Require Hours and Update DB DB Can’t be Available for Long-Term Chapter 23-5 Handling Nested/Long-Term Transactions CSE 4701 Time-Domain Addressing Maintain Time History of DB Objects All Versions/Values of Objects are Stored Never Delete - Record All Values (States) For Example Consider Data Item V with Initial Value v Value is “Reset” By Transactions at Different Times - let t Represent time of a Transaction Let T1 T2 T3 be Transaction Execution Order Suppose T1 Accesses v - Record as < v0, t1 > Suppose T2 Accesses v - Record as < v1, t2 > Suppose T3 Accesses v - Record as < v2, t3 > What Happens if t2 > t3 and both are Modifies? Chapter 23-6 Two Phase Commit Policy CSE 4701 All Actions for a Transaction are Performed in a Workspace (in Memory) Rather than Directly on the DB Copy of the Data These Actions are Written in Log/Journal (Including the Commit Action) Leads to Two-Phase Commit Policy Transaction Cannot Write to DB Until Committed Transaction Cannot Commit Until All Changes have been Recorded First in the Log/Journal Two Phases are: Phase 1: Write Data in Log/Journal Phase 2: Write Data in DB Failure Can Occur Anytime! Chapter 23-7 Recovery and Two-Phase Commit CSE 4701 Two-Phase Commit Still Requires Recovery Failure Before Any Commit to Log/Journal Must Redo Transaction T and Undo Effects of Any Dependent Transactions that Read Results of T Failure After Partial Commit to Log/Journal Must Undo/Redo Transaction T and Undo Effects of Any Dependent Transactions that Read Results of T Failure After Total Commit to Log/Journal Failure Before Any Writes to DB (Permanent Writes) – Use Log to Write to DB Failure After Partial Writes to DB (Permanent Writes) – Use Log to Write to DB - Consider Dependent Transactions Failure After All Writes to DB - Not a Problem Chapter 23-8 What is Database Recovery? CSE 4701 Purpose of Database Recovery To bring the database into the last consistent state, which existed prior to the failure To preserve transaction properties (Atomicity, Consistency, Isolation and Durability) Example: If the system crashes before a fund transfer transaction completes its execution, then either one or both accounts may have incorrect value. Thus, the database must be restored to the state before the transaction modified any of the accounts. Chapter 23-9 Recall ACID Transaction Model CSE 4701 Database Consists of Set of Data Items Read(x) Gets Last Stored Value in X Write(x) Stores a New Value Into X Atomicity: A Set of R/W Operations that Either Completes Entirely or Not at All Consistency: R/W Operations take the Database from a One Consistent State to Another Consistent State Isolation: No Intermediate Values Produced by the R/W Operations will be Visible to Other Transactions Durability: Once the Transaction is Completed, and All the Updates Are Committed, then these Changes Must Never be Lost because of Subsequent Failure Chapter 23-10 Why is Recovery Needed? CSE 4701 Impossible to Build a Perfect System There will be Failures of Various Types Ability to Recover and Restart Unreliability Can Occur Not Really Failures, but Unexpected Behavior Inconsistent Results at Different Times Unavailability Often Happens Can’t Run Transactions When Desired How is Recovery Achieved? Redundancy for Fault-Tolerance Mirroring/Shadowing for Data (Disks are Cheap) Ability to UNDO/REDO to obtain “Correct” and “Consistent” DB State Chapter 23-11 Why is Recovery Needed? CSE 4701 Transactions are Liable to Fail for Many Reasons Hardware or Software Failure Deadlock Occurs Transaction Error (e.g., Divide by Zero) after Partial Execution In Either Case We May Need to Abort a Completed Transaction Due to Error in Another Transaction We Must Recover the DB to “Correct” State What do OS’s Do? Weekly Backups of File System Incremental Backups (To another Disk) Raid Arrays System and Editor Log Files Chapter 23-12 How is Recovery Supported? CSE 4701 Transaction Log For recovery from any type of failure data values are required prior to modification (BFIM - BeFore IMage) new value after modification (AFIM – AFter IMage) These values and other information is stored in a sequential file called Transaction log. A sample log is given below. Back P and Next P point to the previous and next log records of the same transaction. T ID Back P Next P Operation Data item Begin T1 0 1 T1 1 4 Write X Begin T2 0 8 T1 2 5 W Y T1 4 7 R M T3 0 9 R N T1 5 nil End BFIM AFIM X = 100 X = 200 Y = 50 Y = 100 M = 200 M = 200 N = 400 N = 400 Chapter 23-13 What are Types of Updates? CSE 4701 Immediate Update: As soon as a data item is modified in cache, the disk copy is updated. Deferred Update: All modified data items in the cache is written either after a transaction ends its execution or after a fixed number of transactions have completed their execution. Shadow update: The modified version of a data item does not overwrite its disk copy but is written at a separate disk location. In-place update: The disk version of the data item is overwritten by the cache version. Chapter 23-14 How is Data Staged To/From DB? CSE 4701 Data Caching Data items to be modified are First stored into database cache by the Cache Manager (CM) After modification they are flushed (written) to the disk. Flushing is controlled by Modified and Pin-Unpin bits. Pin-Unpin: Instructs the operating system not to flush the data item. Modified: Indicates the AFIM of the data item. Chapter 23-15 What Does Database Recovery Do? CSE 4701 Either Transaction Roll-back (Undo) and RollForward (Redo) To maintain atomicity, a transaction’s operations are redone or undone. Undo: Restore all BFIMs on to disk (Remove all AFIMs) – Roll Back Redo: Restore all AFIMs on to disk – Roll Forward Database recovery is achieved either by performing only Undos or only Redos or by a combination of the two. These operations are recorded in the log as they happen. Chapter 23-16 Example : Transactions and Log CSE 4701 Chapter 23-17 Example: Roll-Back Over Time CSE 4701 Roll-back: One execution of T1, T2 and T3 as recorded in the log. Chapter 23-18 Database Recovery Approaches CSE 4701 Evolved from OS Techniques Backup Copies of Database Tape Copies (early days) and CD Copies Online (Mirror or FTP) Off Site Storage of DB (Daily/Weekly) Maintenance of Journal or Log File Containing All Changes to DB Since Last “Backup” Each Journal Entry Contains Transaction ID Old/New Values of Data Item(s) Beginning/Ending Point of Transaction When Failure Occurs Redo Aborted Transactions/Rollback Completed Transactions/Undo Partially Executed Transaction Chapter 23-19 Recovery Objective CSE 4701 Maintaining DB State - Three Possibilities Correct State - Contains Most Recent Copies of Data put Into DB by Users and Contains no Data Deleted by Users Valid State - Contains Information that is Part of a Correct State Consistent State - Valid State Plus DB Information Must Satisfy the User’s Consistency Criteria What are Analogies for these Three States? Chapter 23-20 DB State Concepts CSE 4701 Consider Source Files (.c) and Object Files (.o) oldtest.c/oldtest.o;test.c/test.o; newtest.c/newtest.o; What are Different States in this Case Correct State Most Recent Source and Object File newtest.c and newtest.o Valid State A Source and Object File But Not Necessarily the Source that Corresponds to the Object oldtest.c and newtest.o Consistent State A Source and its Corresponding Object File but Not Necessarily the Most Recent One test.c and test.o Chapter 23-21 Kinds of DB Recovery CSE 4701 Recovery to a Correct State Recovery to a Correct State that May have Existed in the Past Recovery to a Possible Previous State (Many not have Existed) Recovery to a Valid State (May be Undesirable) Recovery to a Consistent State (Old - Backup Version) Crash Resistance Keep DB in a State Such that If Failure Occurs, System will Always be in Correct State This is Almost Impossible in Practice! Chapter 23-22 Types of Failures in DBMS CSE 4701 Transaction Failures Transaction Aborts (Unilaterally/Due to Deadlock) Avg. 3% of Transactions Abort Abnormally System (Site) Failures Processor, Main Memory, Power Supply, ... Main Memory Contents Are Lost, but Secondary Storage Contents Are Safe Partial vs. Total Failure Media Failures Secondary Storage Devices (Stored Data Is Lost) Head Crash/controller Failure (?) Communication Failures Lost/Undeliverable Messages Network Partitioning Chapter 23-23 Recovery Concepts CSE 4701 Recovery from Failures: Transaction Abort: Rollbacks - To Earlier DB State Machine Crash: Database in Temporary State Undo Aborted Transactions for Permanent DB Changes Redo Committed Transactions – Written in Log but Not in DB – Dependent on Aborted Transactions Media Failure: Need a Backup Copy Strategies of Keeping the Backup Copy Chapter 23-24 Recovery Management Architecture CSE 4701 Volatile Storage Main Memory of the Computer System (RAM) Stable Storage Resilient to Failures and Loses its Contents Only in Media Failures (e.g., Head Crashes on Disks) Implemented via a Combination of Hardware (Non-volatile Storage) and Software (Stable-write, Stable-read, Clean-up) Components Recovery Manager Secondary storage Main memory Fetch, Flush Stable database Read Write Database Buffer Manager Write Read Database buffers (Volatile database) Chapter 23-25 Cascaded Aborts CSE 4701 Reading Uncommitted Data May Increase Concurrency May Cause Cascaded Aborts 2PL Example: T1 X_Lock(X) Read(X); Write(X); Unlock(X); abort; time T2 X_Lock(X) Read(X); ... Write(X); commit; Problem Due of Transaction Durability Property To Avoid Cascaded Aborts: Read Only Committed Data Read(X) in T2 is on “Old” Value of X (Before T1) Chapter 23-26 Atomic Commit CSE 4701 Execution Algorithm: All the Writes are Stored in Private Workspaces At Commit Time, All of the Writes are Performed on Database Atomically (Write All or Nothing) If a Transaction Aborts (or Machine/system Crashes Before Commit): Throw Away the Workspaces If the Machine Crashes in the Middle of Writing Use Only Idempotent Writes (No Incr/decr) Re-executing Transaction is Equivalent to Executing it Once When Crash, Start Over With All Idempotent Writes Expensive Commit Chapter 23-27 Log-Based Techniques CSE 4701 Database Log Every Action of a Transaction Must Not Only Perform the Action, but Must Also Write a Log Record to an Append-only File Old stable database state Update Operation New stable database state Database Log A Log File Maintained by the DBMS System and Residing on Stable Storage When a Change is Made to the Database, a Record Containing Values of the Updated Item is Written to the Log File Chapter 23-28 Log Information CSE 4701 The Log Contains Information Used by the Recovery Process to Restore the Consistency of a System The Type of Log Records Include Start: Transaction T Has Started Read: Transaction T Has Read Data Item X Write: Transaction T Has Changed a Value Track the Old Value (BFIM Before Image) of X Track the New Value (AFIM After Image) of X Abort Commit Chapter 23-29 Transactions and the Log Consider the Transactions Below The Execution Results in Write Actions to Log Objective: Recover Using Log Which has “State” of DB w.r.t. Concurrently Executing Transactions CSE 4701 Chapter 23-30 REDO Protocol Old stable database state CSE 4701 REDO New stable database state Database Log REDO'ing an Action Means Performing it Again The REDO Operation Uses the Log Information and Performs the Action that Might have Been Done Before, or Not Done Due to Failures The REDO Operation Generates the New Image Chapter 23-31 UNDO Protocol New stable database state CSE 4701 UNDO Old stable database state Database Log UNDO 'ing an Action Means to Restore the Object to its Before Image The UNDO Operation uses the Log Information and Restores the Old Value of the Object Chapter 23-32 Consider Following Schedule/Log CSE 4701 What Occurs When the System Crashes? * T3 is rolled back since it did not reach its commit point ** T2 is rolled back since it reads the value of item B written by T3 Chapter 23-33 View Schedule Graphically CSE 4701 Like CC Algorithms, Recovery Algorithms Track Who is Writing What Data Item Who is Reading Written Data Items Is Write to Permanent DB? Is Read from Committed or Uncommitted Data? Chapter 23-34 Why Logging? CSE 4701 Upon Recovery: All of T1 's Effects should be Reflected in the Database (REDO If Necessary Due to a Failure) None of T2 's Effects should be Reflected in the Database (UNDO If Necessary) system crash T1 T2 0 Begin End Begin t time Chapter 23-35 What Does System Log Contain? CSE 4701 Conceptually, Two Logs Undo Log Contains Transaction Actions Needed to Undo the Effects of a Transaction if there is a Failure Attempt to Bring the DB Back to a Prior Correct State Worst Case - Valid or Consistent State Redo Log Contains Transaction Actions Needed to Redo the Effects of a Transaction that Did Not Complete Attempt to Roll the DB Forward to a New Correct State Avoid Valid or Consistent State What approach can your Application Live with? Chapter 23-36 When to Write Log Records to Stable Store CSE 4701 Assume a Transaction T Updates Page p Fortunate Case System Writes p in Stable Database System Updates Stable Log for this Update SYSTEM FAILURE OCCURS (B/4 T Commits) We Can Recover (Undo) by Restoring P to its Old State by Using the Log Unfortunate Case System Writes P in Stable Database SYSTEM FAILURE OCCURS (B/4 Stable Log is Updated) We Cannot Recover From this Failure Because there is No Log Record to Restore the Old Value Solution: Write-ahead Log (WAL) Protocol Chapter 23-37 Write–Ahead Log Protocol CSE 4701 WAL Protocol : 1. Before a Stable Database is Updated, the Undo Portion of the Log should be Written to the Stable Log (Force-write) – Separate Disk 2. When a Transaction Commits, the Redo Portion of the Log Must Be Written to Stable Log Prior to the Updating of the Stable Database Notice: If a System Crashes B/4 Transaction Completely Committed, then All Operations Must Be Undone Need the Before Images - BFIM - (Undo Portion of Log) Once a Transaction is Committed, Some of its Actions Might have to Be Redone Need After Images- AFIM - (Redo Portion of Log) Chapter 23-38 Logging Interface Secondary storage CSE 4701 Main memory Fetch, Flush Stable database Log buffers Local Recovery Manager (LRM) Stable log Read Write Database Buffer Manager Read Write Read Write Database buffers (Volatile database) Possible Execution Strategies: Undo/No-Redo (Immediate Update) No-Undo/Redo (Deferred Update) Undo/Redo No-undo/No-Redo Chapter 23-39 Undo/No-Redo CSE 4701 Incremental Log with Immediate Updates (Undo-only) Execution Algorithm: All the Writes are Performed ‘Directly’ on the Stable DB Abort Buffer Manager May have Written Some of the Updated Pages into Stable Database LRM Performs Transaction Undo (Partial Undo) If Transaction T Aborts (or Machine Crashes), Recovery Procedure Undo (T) Must Undo its Effects, e.g., by Consulting with the Log File Commit LRM Issues a Flush Command to the Buffer Manager for All Updated Pages LRM Writes an "End_of_transaction" into Log Recover No Need to Perform Redo Perform Global Undo Chapter 23-40 No-Undo/Redo CSE 4701 Incremental Log With Deferred Updates (Redo-only) Execution Algorithm: All Writes are Performed in Private Workspaces (Log Files) Abort None of Updated Pages have Been Written into Stable DB Throw Away the Workspaces, i.e., Release the Fixed Pages Commit LRM Writes an "End_of_transaction" Record into the Log LRM Sends an Unfix Command to the Buffer Manager for All Pages that were Previously Fixed Recover Perform Partial Redo If a Commit is Interrupted by Crash, Must Gradually Redo the Unwritten by Committed Transaction Operations No Need to Perform Global Undo Chapter 23-41 Undo/Redo CSE 4701 Execution Algorithm: All the Writes are Gradually Written to the Database, Before or After the Commit Time Abort Buffer Manager May have Written Some of the Updated Pages Into Stable Database LRM Performs Transaction Undo (Partial Undo) to “Cover Up” Inconsistency by Undoing its Effects Commit LRM Writes an "End_of_transaction" Record into the Log Recover For Transactions with a "Begin_transaction" and an "End_of_transaction" Record in the Log, a Partial Redo is Initiated by LRM For Transactions with Only a "Begin_transaction" in the Log, a Global Undo is Executed by LRM Chapter 23-42 No-Undo/No-Redo CSE 4701 Abort None of the Updated Pages have been Written Into Stable Database Release the Fixed Pages Commit (the Following Have to Be Done Atomically) LRM Issues a Flush Command to the Buffer Manager for All Updated Pages LRM Sends an Unfix Command to the Buffer Manager for All Pages that were Previously Fixed LRM Writes an "End_of_transaction" Record into the Log Recover No Need to Do Anything Chapter 23-43 Log-Based Recover Summary CSE 4701 Types of Failures: Transaction aborts Machine Crashes Data Movement Policies: Deferred Updates Immediate Updates No Deferred or Immediate Updates Recovery Actions: Transactions Committed Aborted All Blocks On Disk Ok Undo Some (Not All) Blocks on Disk No Blocks On Disk Redo Redo Undo Ok Chapter 23-44 Log-based Recover Strategies CSE 4701 Undo-Only Recovery (Immediate Updates) Immediate Updates on the Database Do Not Leave Blocks on Disk After Commit Force All Blocks at Commit Time Redo-only Recovery (Deferred Updates) Do Not Write Blocks to Disk Before Commit Deferred Update Redo and Undo Recovery: Write to the Database Before or After Commit If Abort, Undo If Crash, Redo/Undo Chapter 23-45 Deferred Update Example - Single-User CSE 4701 The [write_item,...] operations of T1 are redone T2 log entries are ignored by the recovery process since it didn’t finish Chapter 23-46 Deferred Update Example - Multi-User CSE 4701 T2 and T3 are ignored because they did not reach their commit points – didn’t finish T4 is redone because its commit point is after the last system checkpoint Chapter 23-47 Checkpoints CSE 4701 Checkpoints – Analogous to a Mini-Commit Tell the Recovery Scheme what Changes have Actually Been Made to the Database In Addition to the Log File, the System Periodically Performs Check Points Transaction Save Points Internally Consistent Synchronization Points Long Transactions May Return to them Instead of the Begin-transaction DBMS Check Points Transaction-Consistent Log Record Force All Committed Pages to Disk Flush All Log Records and a Check Point Record DB Recovery May Start From the Last Check Point Instead of the Beginning of the Transaction Chapter 23-48 What is the Checkpoint Process? CSE 4701 Time to time (randomly or under some criteria) the database flushes its buffer to database disk to minimize the task of recovery. The following steps defines a checkpoint operation: Suspend execution of transactions temporarily. Force write modified buffer data to disk. Write a [checkpoint] record to the log, save the log to disk. Resume normal transaction execution. During recovery redo or undo is required to transactions appearing after [checkpoint] record. Chapter 23-49 Augmenting Log with CheckPoints CSE 4701 Consider the Prior Schedule Note the Addition of [checkpoint] Entries This is a Save Point that Can be at Logical States (After Commits) Regularly (After Every X Log Entries) Reduce Recovery Effort to go Back to Last Checkpoint Chapter 23-50 Recovery Technique: Shadow Paging CSE 4701 The Database is Partitioned into Fixed-sized Pages (Blocks). The System Maintains Two Tables for Each Transaction: A Current Page Table A Shadow Page Table When the Transaction Starts, the Current and Shadow Page Tables are Identical Current Page Table Kept in Main Memory If it is not too Large Each Update Creates a New Page from the Free-page-list Modify the Current Page Table to Record the Modifications to the Database by Using Pointers to Point to the New Pages That Hold the Modified Data Values Shadow Page Table Saved on Nonvolatile Storage (e.g., Disks) Used to Keep the Pointers to the Old Pages Before the Updates Chapter 23-51 Shadow Paging CSE 4701 The AFIM does not overwrite its BFIM but recorded at another place on the disk. Thus, at any time a data item has AFIM and BFIM (Shadow copy of the data item) at two different places on the disk on two different disks or X and Y in memory and X’ and Y’ on disk. X Y X' Y' Database X and Y: Shadow copies of data items X' and Y': Current copies of data items Chapter 23-52 Shadow Paging CSE 4701 To manage access of data items by concurrent transactions two directories (current & shadow) used. The directory arrangement is illustrated below. Here a page is a data item. Still a Concern if One Share Disk Fails Chapter 23-53 Shadow Paging CSE 4701 Correctness: Case(1): The Transaction has Not Committed When the System Crashes, and the Back up is Needed, Copy the Shadow Page Table into Main Memory Write Case (1) Guarantees that the State is Recovered to the One Before the Execution of the Transaction Case(2): The Transaction has Committed Starting Address of the Current Page Table Replaces the Starting Address of the Last Shadow Page Table All Changes are Reflected in the Current Page Table Chapter 23-54 Shadow Paging CSE 4701 Advantages: No Overhead of Log File Recovery From Failure is Faster Disadvantages: Pages that are Desirable to Be Physically Close by May Scatter All Over the Disk Each Time a Transaction Commits Pages Containing the Old Version of Changed Data Becomes Free but Unavailable Garbage Collection is Fired up Periodically so that “Third” and Higher Versions of Same Page Freed Difficult to Allow Concurrent Execution of Transactions Vulnerable to Failure if Single Disk Chapter 23-55 Backups for Recovery CSE 4701 Stop the TP and Make a Copy Easy to Implement and Correct Introduces Periods of Unavailability With Two Versions You Can Always Make a Backup (May Be Old) Fuzzy Dump Read the Database Incrementally Example: Read All Accounts, Money Transfer Incremental Reading of Entire Databases Off Site Storage - Backup Tapes/Disks Companies today that Maintain Copy of Critical DBs Chapter 23-56 Media Failure Recovery CSE 4701 Disks are Cheap Today - There is No Reason Not to have Multiple Hard Drives for DB Copies Concept of “Mirrored” FTP and other Sites Architecture Secondary storage Main memory Stable log Local Recovery Manager Stable database Fetch, Flush Database Buffer Manager Read Write Log buffers Read Write Read Write Write Write Archive database Archive log Database buffers (Volatile database) Chapter 23-57 Concluding Remarks CSE 4701 Intent of Chapter Review and Re-enforce Transaction Processing Concepts and Their Relationship to Recovery Introduction to Concepts of Undo, Redo, and Cascading (for Dependent Transactions) Discussion of Correct vs. Valid vs. Consistent Database States What can your Application Live With? What about your Semester Project? Alternative Recovery Techniques Log-Based Checkpoint Shadow-Paging What about Combinations? Chapter 23-58