Resource Managers Jim Gray Microsoft, Gray @ Microsoft.com Andreas Reuter International University, Andreas.Reuter@i-u.de Mon Tue Wed Thur Fri 9:00 Overview TP mons Log Files &Buffers B-tree 11:00 Faults Lock Theory ResMgr COM+ Access Paths 1:30 Tolerance Lock Techniq CICS & Inet Corba Groupware 3:30 T Models Queues Adv TM Replication Benchmark 7:00 Party Workflow Cyberbrick Party Gray & Reuter: Resource Manager 1 Whirlwind Tour: The Actors Resource managers – – – – provide ACID objects (transactional objects) Use log manager to record changes Use transaction manager to coordinate multi-RM changes Use communication manager to make transactional RPCs Communication Manager Resource Managers Objects Communication Manager Transaction Manager Transaction Manager Log Manager Log Manager Log Log Gray & Reuter: Resource Manager Objects Volatile Storage Volatile Storage Durable Storage Resource Managers Durable Storage 2 Whirlwind Tour: the Application Verbs TRID Boolean void Begin_Work(context *); Commit_Work(context *); Abort_Work(void); /* begin a transaction /* commit the transaction /* rollback to savepoint zero savepoint savepoint Boolean context TRID Save_Work(context *); Rollback_Work(savepoint); Prepare_Work(context *); Read_Context(void); Chain_Work(context *); /* establish a savepoint */ /*return to savept (savept 0 = abort)*/ /* put transaction in prepared state */ /* return current savepoint context */ /* end current and start next trans */ TRID TRID id*/ Boolean My_Trid(void); Leave_Transaction(void); /* return current transaction identifier*/ /*set process trid null, return current Resume_Transaction(TRID); /* set process trid to desired trid */ */ */ */ enum tran_status { ACTIVE , PREPARED , ABORTING , COMMITTING , ABORTED , COMMITTED}; tran_status Status_Transaction(TRID); /* transaction identifier status */ Gray & Reuter: Resource Manager 3 Whirlwind Tour Types Of Transaction Executions A Simple Commit Begin Action Action Save Action Save Action Action Action Save Action Action Commit A Simple Abort Begin Action Action Save Action Save Action Action Action Save Action Rollback A Partial Rollback Begin Action Action Save Action Save Action Action Action Save Action Rollback A Persistent Transaction Surviving A System Restart Action Action Action Save Action Commit Begin Action Action Save Persistent Action Save Action Restart Action Save Action Commit Shaded stuff is “undone” Gray & Reuter: Resource Manager 4 Whirlwind Tour: the TRID Flow Call graph: who calls whom. TRIDs flow on all such calls. Application is typically root. RM can be an application (use a transactional RM to store state) Transaction Application Servers Application Servers Resource Managers Resource Managers Application Gray & Reuter: Resource Manager 5 Whirlwind tour Normal (no failure) Transaction Execution TM generates the TRID at Begin_Work(). Coordinates Commit, RM joins work, generates log records, allows commit Begin _Wo rk() transid Wo rk Requests Application Wo rk Requests Reso urce M an ager Lo ck Requests No rmal Fun citon s Lo g Records Jo in_ Wo rk Lo ck M an ager T ransactio n M an ager Lo g M an ager Co mmit_Wo rk() T ransactio n Callbacks Fun ction s Gray & Reuter: Resource Manager Co mmit Ph ase 1 ? Yes/No Co mmit Ph ase 2 ack Write Commit Lo g Record & Force Lo g 6 WW tour: The Resource Manger view Transaction Manager Identify SaveWork RollbackWork Join StatusTransaction Leave Resume Save Prepare Commit UNDO REDO Checkpoint rmCall(...) response TP monitor administrative functions and callbacks to install, start, and schedule a resource manager resource manager's own service interface functions transaction management callbacks Gray & Reuter: Resource Manager Resource Manager invocation other resource managers callbacks rmCall(...) (depends on application) 7 WW tour: The Resource manager view BooleanSavepoint(LSN *); BooleanPrepare(LSN *); void Commit(); void Abort(); /* invoked at tran Save_Work(). Returns RM vote */ /* invoked at phase_1. Return vote on commit */ /* called at commit ¯2 */ /* called at failed commit ¯2 or abort */ void UNDO(LSN); /* Undo the log record with this LSN void REDO(LSN); /* Redo the log record with this LSN BooleanUNDO_Savepoint(LSN);/* Vote TRUE if can return to savepoint void REDO_Savepoint(LSN);/* Redo a savepoint. */ */ */ */ void LSN TM_Startup(LSN); /* TM restarting. Passes RM ckpt LSN */ Checkpoint(LSN * low_water); /* TM checkpointing, Return RM ckpt LSN, set low water LSN */ Boolean Join_Work(RMID, TRID); /* Become part of a transaction */ Gray & Reuter: Resource Manager 8 WW Tour: The Transaction Manager Transaction rollback. coordinates transaction rollback to a savepoint or abort rollbacks can be initiated by any participant. Resource manager restart. If an RM fails and restarts, TM presents checkpoint anchor & RM undo/redo log System restart. TM drives local RM recovery (like RM restart) TM resolves any in-doubt distributed transactions Media recovery. TM helps RM reconstruct damaged objects by providing archive copies of object + the log of object since archived. Node restart. Transaction commit among independent TMs when a TM fails. Gray & Reuter: Resource Manager 9 WW Tour: When a Transaction Aborts Begin _Wo rk() Application transid Wo rk Requests Reso urce M an ager Wo rk Requests Lo ck Requests Lo ck Jo in_ Wo rk M an ager No rmal Fun citon s Lo g Records Ro llback_ Work () T ransactio n Callbacks T ransactio n M an ager Lo g M an ager Un do (lo g reco rd) Aborted(transid) ReadT ran saction 's Lo g Records & Call Undo Write Abo rt Record in Lo g At transaction rollback TM drives undo of each RM joined to the transaction Can be to savepoint 0 (abort) or partial rollback. Gray & Reuter: Resource Manager 10 WW tour: the Transaction Manager at Restart/Recovery Transaction M anager Log M anager Log Records Log Records Find Checkp oint Read log forward Redo each op At end, Undo Soft Savepoints & Transactions Redo (log record) Redo (log record) Redo (log record) Redo (log record) Redo (log record) Redo(log record) Resource M anager Undo (log record) Undo (log record) Undo(log record) At restart, TM reading the log drives RM recovery. Single log scan. Single resolver of transactions. Multiple logs possible, but more complex/more work. Gray & Reuter: Resource Manager 11 End of Whirl-Wind Tour Gray & Reuter: Resource Manager 12 Resource Manager Concepts: Undo Redo Protocol DO-UNDO- REDO Protocol Old State New State New State DO UNDO log record Old State Old State log record New State REDO log record Gray & Reuter: Resource Manager 13 Resource Manager Concepts: Transaction UNDO Protocol declare cursor for transaction_log select rmid, lsn from log where trid = :trid descending lsn; void transaction_undo(TRID trid) { int sqlcode; open cursor transaction_log; while (TRUE) { fetch transaction_log into :rmid, :lsn; if (sqlcode != 0) break; rmid.undo(lsn); } close cursor transaction_log; }; /* a cursor on the transaction's log */ /* it returns the resource manager name */ /* and record id (log sequence number) */ /* and returns records in LIFO order */ /* Undo the specified transaction. */ /* event variables set by sql */ /* open an sql cursor on the trans log */ /* scan trans log backwards & undo each*/ /* fetch the next most recent log rec */ /* */ /* if no more, trans is undone, end loop*/ /* tell RM to undo that record */ /* tell RM to undo that record */ /* Undo scan is complete, close cursor */ /* return to caller */ • If UNDO to savepoint , the UNDO stops at desired savepoint Gray & Reuter: Resource Manager 14 Resource Manager Concepts: Restart REDO Protocol void log_redo(void) {declare cursor for the_log select rmid, lsn from log ascending lsn; open cursor the_log; while (TRUE) { fetch the_log into :rmid, :lsn; if (sqlcode != 0) break; rmid.redo(lsn);} close cursor the_log; }; /* /* declare cursor from log start forward /* gets RM id and log record id (lsn) /* of all log records. /* in FIFO order /* open an sql cursor on the log table /* Scan log forward& redo each record. /* fetch the next log record /* if no more, then all redone, end loop /* tell RM to redo that record /* Redo scan complete, close cursor /* return to caller */ */ */ */ */ */ */ */ */ */ */ */ Note: REDO forwards, UNDO backwards Gray & Reuter: Resource Manager 15 Idempotence Old State undo log record New State redo log record F(F(X)) == F(X): Needed in case restart fails (and restarts) Redo(Redo(old_state,log), log) = Redo(new_state,log) = new_state Undo(Undo(new_state,log), log) = Undo(old_state,log) = old_state Gray & Reuter: Resource Manager 16 Testable State: Can Tell If It Happened. IF operation not idempotent AND state not testable THEN recovery is impossible ELSE for F in {UNDO, REDO}: not testable: WHILE (! ACK) F(F(X)) testable: WHILE ( not desired state) {F(x)} Old State Unknown State test New State Gray & Reuter: Resource Manager 17 Real Operations: Can Not Be Undone Defer operations until commit is assured. Perform as part of Phase 2 of commit If must undo for some reason, generate compensation log record to be processed by some higher authority. Old State Old State Old State DO Old State Old State UNDO log record log record New State Old State Commit log record Gray & Reuter: Resource Manager Compensation log record New State REDO log record 18 Example: Communications Session RM Session And Message Recovery Actions Sender DO lo g mes sage & seqno s end UNDO s end cancellatio n (gen erates log record) REDO resend mess age COMMIT Receiver establis h s av epo int. lo g mes sage & seqno acknowledge lo g cancellation mess age return to savepoin t acknowledge if not d uplicate <no rmal DO pro cess ing> else ju st acknowledge. s end an y deferred (real) do it Opsmes aresages idempotent (sequence numbers) and testable (sequence numbers) Gray & Reuter: Resource Manager 19 Kinds of Logging Physical: Keep old and new value of container (page, file,...) Pro: Simple Allows recovery of physical object (e.g. broken page) Con: Generates LOTS of log data Logical: -1 Keep call params such that you can compute F(x), F (x) Pro: Sounds simple Compact log. Con: Doesn't work (wrong failure model). Operations do not fail cleanly. Gray & Reuter: Resource Manager 20 Sample Physical LOG RECORD struct compressed_log_record_for_page_update /* */ { int opcode; /* opcode will say compressed page update*/ filename fname; /* name of file that was updated */ long pageno; /* page that was updated */ long offset; /* offset within page that was updated */ long length; /* length of field that was updated */ char old_value[length]; /* old value of field */ char new_value[length]; /* new value of field */ }; /* */ Ordinary sequential insert is OK. Update of sorted (B-tree) page: update LSN update page space map update pointer to record insert record at correct spot (move 1/2 the others) Essentially writes whole page (old and new). 16KB log records for 100-byte updates. Gray & Reuter: Resource Manager 21 Sample Physical LOG RECORD struct logical_log_record_for_insert { int opcode; filename fname; long length; char record[length]; }; /* /* opcode will says insert /* name of file that was updated /* length of record that was updated /* value record /* */ */ */ */ */ */ Very compact. Implies page update(s) for record (may be many pages long). Implies index updates (many be many indices on base table) Gray & Reuter: Resource Manager 22 The trouble with Logical Logging Logical logging needs to start UNDO/REDO with an action-consistent state. No half completed operations. for example: insert (table, record) ALL or NONE of the indices should be updated when logical UNDO/REDO is invoked. Problem: Failure model is Page & Message action consistency (Lampson /Sturgis model of Chapter 3). Actions can fail due to: Logic: e.g. duplicate key. Limit: ran out of space Contention: deadlock Media: broken page or session System: computer failure/restart Gray & Reuter: Resource Manager 23 Making Logical Logging Work: Shadows Keep old copy of each page Reset page to old copy at abort (no undo log) Discard old copy at commit. Handles all online failures due to: Logic: e.g. duplicate key. Limit: ran out of space Contention: deadlock Problem: forces page locking, only one updater per page. What about restart? Need to atomically write out all changed pages. Gray & Reuter: Resource Manager 24 Making Logical Logging Work: Shadows Perform same shadow trick at disc level. Keep shadow copy of old pages. Write out new pages. In one careful write, write out new page root. Makes update atomic A Shadow Update Old Data Directory A B C Gray & Reuter: Resource Manager Free Space Bit Map New Directory A C Free Space Bit Map B 25 Shadows Pro: Simple Not such a bad deal with non-volatile ram Con: page locking extra space extra overhead (for page maps) extra IO declusters sequential data Gray & Reuter: Resource Manager 26 Compromise Physio-Logical Logging Physio-Logical Logging Physical to a "page" (physical container) Logical within a "page". Keep old and new value of container (page, file,...) Pro: Simple Allows recovery of physical object (e.g. broken page) Con: Generates LOTS of log data Gray & Reuter: Resource Manager 27 Logical vs Physio-logical Logging Ins ert record r into table A Table A Logical log record insert, A, r Table A Index B Index B Index C Index C Physiological log records insert, A, page 508, r insert, B, p age 72, s insert, C, p age 94, t Note: physical log records would be bigger for sorted pages. Gray & Reuter: Resource Manager 28 Physiological Logging Rules Complex operations are a sequence of simple operations on pages and messages. Each operation is constructed as a mini-transaction: lock the object in exclusive mode transform the object generate an UNDO-REDO log record record log LSN in object unlock the object. Action Consistent Object: When object semaphore free, no ops in progress. Log-Consistency: contains log records of all complete page/msg actions. Gray & Reuter: Resource Manager 29 Physiological Logging Rules Online Operation - Only Need the Fix Rule Each operation is structured as a mini-transaction. Each operation generates an UNDO record. No page operation fails with the semaphore set. (exception handler must clean up state and UNFIX any pages). Then Rollback can be physical to a page/session/container and logical within page/session/container. Gray & Reuter: Resource Manager 30 Physiological Logging Rules Restart Operation - Need WAL and F@C Need Page-Action consistent disc state. Pages are action consistent. Committed actions can be redone from log. Uncommitted actions can be undone from log. WAL: Write Ahead Log Write undo/redo log records before overwriting disc page Only write action-consistent pages Force-Log-At-Commit Make transaction log records durable at commit. Gray & Reuter: Resource Manager 31 Physiological Logging Rules WAL and F@C WAL: Write Ahead Log write page: get page semaphore copy page give page semaphore /* avoids holding semaphore during IO */ Force_log(Page(LSN)) /*WAL logic, probably already flushed*/ Write copy to disc. WAL gives idempotence and testability. Force-Log-At-Commit At commit phase 1: Force_log(transaction.max_lsn) Gray & Reuter: Resource Manager 32 WAL & F@C in Pictures Volatile Page Versions Volatile Log Durable Log Persistent Page Records Records Versions PVlsn VVlsn VLlsn DLlsn online: VVlsn = VLlsn restart: DLlsn <= VVlsn PVlsn <= DLlsn Commit: commit_lsn <= DLlsn At restart all volatile memory is reset and must be reconstructed from persistent memory. DLlsn restart: PVlsn PVlsn <= DLlsn commit_lsn <= DLlsn FIX, WAL and F@C assure these assertions Gray & Reuter: Resource Manager 33 The One Bit Resource Manager Manages an array of transactional bits (the free space bit map). i = get_bit(); /* gets a free bit and sets it */ give_bit(i); /* returns a free bit (when transaction commits) */ Gray & Reuter: Resource Manager 34 The Bitmap and Its Log Records The Data Structure struct { LSN lsn; xsemaphore sem; Boolean bit[BITS]; } page; /* layout of the one-bit RM data structure /* page LSN for WAL protocol /* semaphore regulates access to the page /* page.bit[i] = TRUE => bit[i] is free /* allocates the page structure */ */ */ */ */ The Log Records struct { int index; Boolean value; } log_rec; /* log record format for the one-bit RM /* index of bit that was updated /* new value of bit[index] /* log record used by the one-bit RM */ */ */ */ const int rec_size = sizeof(log_rec); /*size of the log record body. Gray & Reuter: Resource Manager */ 35 Page and Log Consistency for 1-Bit RM Data dirty if reflects an uncommitted transaction update Otherwise, data is clean. Page Consistency: • No clean free bit has been given to any transaction. • Every clean busy bit was given to exactly one transaction. • Dirty bits locked in X mode by updating transactions . • The page.lsn reflects most recent log record for page. Log Consistency: • Log contains a record for every completed mini-transaction update to the page. Gray & Reuter: Resource Manager 36 give_bit() get_bit() & give_bit(i) temporarily violate page consistency. Mini-transaction holds semaphore while violating consistency. Makes page & log mutually consistent before releasing sem. => each mini-transaction observes a consistent page state. void give_bit(int i) /* free a bit { if (LOCK_GRANTED==lock(i,LOCK_X,LOCK_LONG,0)) /* Lock bit { Xsem_get(&page.sem); /* get page sem page.bit[i] = TRUE; /* free the bit log_rec.index = i; /* generate log rec log_rec.value = TRUE; /*saying bit is free page.lsn = log_insert(log_rec,rec_size); /*write log rec&update lsn Xsem_give(&page.sem);} /* page consistent else /* if lock failed, caller doesn't own bit, Abort_Work(); /* in that case abort caller's trans return; }; /* Gray & Reuter: Resource Manager */ */ */ */ */ */ */ */ */ */ */ 37 get_bit() int get_bit(void) /* allocate a bit to and returns bit index */ { int i; /* loop variable */ Xsem_get(&page.sem); /* get the page semaphore */ for ( i = 0; i<BITS; i++); /* loop looking for a free bit */ {if (page.bit[i]) /* if bit is free, may be dirty (so locked)*/ {if (LOCK_GRANTED =lock(i,LOCK_X,LOCK_LONG,0));/* lock bit */ { page.bit[i] =FALSE; /* got lock on it, so it was free */ log_rec.value = FALSE; /* generate log rec describing update */ log_rec.index = i; /* */ page.lsn = log_insert(log_rec,rec_size); /* write log rec&update lsn */ Xsem_give(&page.sem); /* page now consistent, give up sem */ return i; } /* return to caller */ }; /* else lock bounce so bit dirty */ }; /* try next free bit, */ Xsem_give(&page.sem); /* if no free bits, give up semaphore */ Abort_Work(); /* abort transaction */ return -1;}; /* returns -1 if no bits are available. */ Gray & Reuter: Resource Manager 38 Compensation Logging Logical Old State New State UNDO log record compensation log record Undo may generate a log record recording undo step Makes Page LSN monotonic Similar technique was used for Communication Manager (session sequence number was monotonic) Gray & Reuter: Resource Manager 39 1-bit RM UNDO Callback void undo(LSN lsn) /* undo a one-bit RM operation */ { int i; /* bit index */ Boolean value; /* old bit value from log rec to be undone*/ log_rec_header header; /* buffer to hold log record header */ rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log rec */ Xsem_get(&page.sem); /* get the page semaphore */ i = log_rec.index; /* get bit index from log record */ value = ! log_rec.value; /* get complement of new bit value */ page.bit[i] = value; /* update bit to old value */ log_rec.value= value; /* make a compensation log record */ page.lsn = log_insert(log_rec,rec_size); /* log it and bump page lsn */ Xsem_give(&page.sem); /* free the page semaphore */ return; } /* */ Gray & Reuter: Resource Manager 40 1-bit RM Checkpoint Callback LSN checkpoint(LSN * low_water) /* copy 1-page RM state to persistent store*/ { Xsem_get(&page.sem); /* get the page semaphore */ *low_water = log_flush(page.lsn); /* WAL force up to page lsn, and */ /* set low water mark */ write(file,page,0,sizeof(page)); /* write page to persistent memory */ Xsem_give(&page.sem); /* give page semaphore */ return NULLlsn; } /* return checkpoint lsn (none needed) */ Gray & Reuter: Resource Manager 41 1-bit RM REDO Callback void redo( LSN lsn) /* redo an free space operation */ { int i; /* bit index */ Boolean value; /* new bit value from log rec to be redone*/ log_rec_header header; /* buffer to hold log record header */ rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log record */ i = log_rec.index; /* Get bit index */ lock(i,LOCK_X,LOCK_LONG,0); /* get lock on the bit (often not needed) */ Xsem_get(&page.sem); /* get the page semaphore */ if (page.lsn < lsn) /* if bit version older than log record */ { value= log_rec.value; /* then redo the op. get new bit value */ page.bit[i] = value; /* apply new bit value to bit */ page.lsn = lsn; } /* advance the page lsn */ Xsem_give(&page.sem); /* free the page semaphore */ return; }; /* */ Gray & Reuter: Resource Manager 42 1-BIT Rm Noise Callbacks Boolean prepare(LSN * lsn) {*lsn = NULLlsn; return TRUE ;}; /* 1-bit RM has no phase 1 work /* */ */ void Commit(void ) /* Commit release locks & { unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return */ */ void Abort(void ) /* Abort release all locks & { unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return */ */ Boolean savepoint((LSN * lsn) {*lsn = NULLlsn; return TRUE ;}; */ */ /* no work to do at savepoint /* void UNDO_savepoint(LSN lsn) /* rollback work or abort transaction {if (savepoint == 0) /* if at savepoint zero (abort) unlock_class(LOCK_LONG, TRUE, MyRMID()); /* release all locks }; /* Gray & Reuter: Resource Manager */ */ */ */ 43 Summary Model: Complex actions are a page/message action sequence. LSN: Each page carries an LSN and a semaphore. ReadFix: Read acts semaphore in shared mode. WriteFix: Update actions get semaphore in exclusive mode, generate one or more log records covering the page, advance the page LSN to match highest LSN give semaphore WAL: log_flush(page.LSN) before overwriting persistent page F@C: force all log records up to the commit LSN at commit Compensation Logging: Invalidate undone log record with a compensating log record. Idempotence via LSN: page LSN makes REDO idempotent Gray & Reuter: Resource Manager 44 Two Phase Commit Getting two or more logs to agree Getting two or more RMs to agree Atomically and Durably Even in case one of them fails and restarts. The TM phases Prepare. Invoke each joined RM asking for its vote. Decide. If all vote yes, durably write commit log record. Commit. Invoke each joined RM, telling it commit decision. Complete. Write commit completion when all RM ACK. Gray & Reuter: Resource Manager 45 Centralized Case of Two Phase Commit Each participant: (TM &RM) goes through a sequence of states Null Active Prepared Committing Committed Aborting Aborted These generate log records Gray & Reuter: Resource Manager 46 Examples Committed begin DO rm1 DO rm2 DO rm2 prepare rm2 {locks} commit { rm1, rm2} complete Gray & Reuter: Resource Manager Aborted begin DO rm1 DO rm2 DO rm2 UNDO rm2 UNDO rm2 UNDO rm1 UNDO begin { rm1, rm2} complete 47 Transitions in Case of Restart Active state not persistent, others are persistent For both TM and RM. Log records make them persistent (redo) TM tries to drive states to the right. (to committed, aborted) Null Active Gray & Reuter: Resource Manager Prepared Committing Aborting Committed Aborted 48 Successful two phase commit Message/Call flow from TM to each RM joined to transaction State Active Coordinator Prepare Local Prepare (lazy) Prepared Committing Committed yes Write Commit Record In Log Commit (force) Local Commit Work (lazy) Ack Participant State Active Local Prepare Write Prepare Record In Log (force) Prepared Local Commit Work Write Completion Record Committing In Log (lazy) Ack when durable. Write Completion Record In Log (lazy) Committed If TM and RM share the same log, the RM FORCE can piggyback on the TM FORCE One IO to commit a transaction (less if commit is grouped) Gray & Reuter: Resource Manager 49 Abort Two Phase Commit If RM sends "NO" or no response (timeout), TM starts abort. Calls UNDO of each trans log record May stop at a savepoint. At begin_trans it calls ABORT() callback of each joined RM Gray & Reuter: Resource Manager 50 Distributed two phase commit Tracking joined TMs -- the communications manager helps Much as TRPC helps in the local case. call trid, data Communications M anager first time? callee trid is incomingfrom A trid is outgoingto B Transaction M anager A Sess ion Communications trid, data M anager first time? Transaction M anager Root TM owes a Prepare/Commit/Abort message to each joined TM. Joined TM does "local" commit. Gray & Reuter: Resource Manager 51 Full Transaction State Diagram Next section explains how these states are implemented. live states null = save point 0 Begun = save point 1 save point n Volatile States Persistent States Durable States active persistent save p oint n prepared committing aborting committed aborted complete states Gray & Reuter: Resource Manager 52 Summary of Resource Manager Concepts DO/UNDO/REDO Idempotent, Testable, Real operations Logical vs Physical logging Shadows to make logical logging work Physiological logging Fix, WAL, Force-at-commit Page/Message/Log consistency RM callbacks (the 1-bit resource manager) Join, Prepare, Commit, Abort, UNDO, REDO, .... Restart REDO/UNDO Two phase commit (RM story is simple). Gray & Reuter: Resource Manager 53