Resource Managers

advertisement
Resource Managers
Jim Gray
Microsoft, Gray @ Microsoft.com
Andreas Reuter
International University, Andreas.Reuter@i-u.de
Mon
Tue
Wed
Thur
Fri
9:00
Overview
TP mons
Log
Files &Buffers
B-tree
11:00
Faults
Lock Theory
ResMgr
COM+
Access Paths
1:30
Tolerance
Lock Techniq
CICS & Inet
Corba
Groupware
3:30
T Models
Queues
Adv TM
Replication
Benchmark
7:00
Party
Workflow
Cyberbrick
Party
Gray & Reuter: Resource Manager
1
Whirlwind Tour: The Actors
Resource managers
–
–
–
–
provide ACID objects (transactional objects)
Use log manager to record changes
Use transaction manager to coordinate multi-RM changes
Use communication manager to make transactional RPCs
Communication
Manager
Resource
Managers
Objects
Communication
Manager
Transaction
Manager
Transaction
Manager
Log
Manager
Log
Manager
Log
Log
Gray & Reuter: Resource Manager
Objects
Volatile Storage
Volatile Storage
Durable Storage
Resource
Managers
Durable Storage
2
Whirlwind Tour: the Application Verbs
TRID
Boolean
void
Begin_Work(context *);
Commit_Work(context *);
Abort_Work(void);
/* begin a transaction
/* commit the transaction
/* rollback to savepoint zero
savepoint
savepoint
Boolean
context
TRID
Save_Work(context *);
Rollback_Work(savepoint);
Prepare_Work(context *);
Read_Context(void);
Chain_Work(context *);
/* establish a savepoint
*/
/*return to savept (savept 0 = abort)*/
/* put transaction in prepared state */
/* return current savepoint context */
/* end current and start next trans */
TRID
TRID
id*/
Boolean
My_Trid(void);
Leave_Transaction(void);
/* return current transaction identifier*/
/*set process trid null, return current
Resume_Transaction(TRID); /* set process trid to desired trid
*/
*/
*/
*/
enum tran_status { ACTIVE , PREPARED , ABORTING , COMMITTING , ABORTED , COMMITTED};
tran_status Status_Transaction(TRID);
/* transaction identifier status
*/
Gray & Reuter: Resource Manager
3
Whirlwind Tour
Types Of Transaction Executions
A Simple
Commit
Begin
Action
Action
Save
Action
Save
Action
Action
Action
Save
Action
Action
Commit
A Simple
Abort
Begin
Action
Action
Save
Action
Save
Action
Action
Action
Save
Action
Rollback
A Partial
Rollback
Begin
Action
Action
Save
Action
Save
Action
Action
Action
Save
Action
Rollback
A Persistent Transaction
Surviving A System Restart
Action
Action
Action
Save
Action
Commit
Begin
Action
Action
Save Persistent
Action
Save
Action
Restart
Action
Save
Action
Commit
Shaded stuff is “undone”
Gray & Reuter: Resource Manager
4
Whirlwind Tour: the TRID Flow
Call graph: who calls whom.
TRIDs flow on all such calls.
Application is typically root.
RM can be an application (use a transactional RM to store state)
Transaction
Application
Servers
Application
Servers
Resource
Managers
Resource
Managers
Application
Gray & Reuter: Resource Manager
5
Whirlwind tour Normal (no failure) Transaction
Execution
TM generates the TRID at Begin_Work().
Coordinates Commit,
RM joins work, generates log records, allows commit
Begin _Wo rk()
transid
Wo rk Requests
Application
Wo rk Requests
Reso urce
M an ager
Lo ck Requests
No rmal
Fun citon s
Lo g Records
Jo in_ Wo rk
Lo ck
M an ager
T ransactio n
M an ager
Lo g
M an ager
Co mmit_Wo rk()
T ransactio n
Callbacks
Fun ction s
Gray & Reuter: Resource Manager
Co mmit Ph ase 1 ?
Yes/No
Co mmit Ph ase 2
ack
Write Commit
Lo g Record &
Force Lo g
6
WW tour: The Resource Manger view
Transaction
Manager
Identify
SaveWork
RollbackWork
Join
StatusTransaction
Leave
Resume
Save
Prepare
Commit
UNDO
REDO
Checkpoint
rmCall(...)
response
TP monitor
administrative functions
and callbacks to install, start, and
schedule a resource manager
resource manager's own service interface
functions
transaction
management
callbacks
Gray & Reuter: Resource Manager
Resource
Manager
invocation
other
resource
managers
callbacks
rmCall(...)
(depends on application)
7
WW tour: The Resource manager view
BooleanSavepoint(LSN *);
BooleanPrepare(LSN *);
void
Commit();
void
Abort();
/* invoked at tran Save_Work(). Returns RM vote */
/* invoked at phase_1. Return vote on commit
*/
/* called at commit ¯2
*/
/* called at failed commit ¯2 or abort
*/
void
UNDO(LSN);
/* Undo the log record with this LSN
void
REDO(LSN);
/* Redo the log record with this LSN
BooleanUNDO_Savepoint(LSN);/* Vote TRUE if can return to savepoint
void
REDO_Savepoint(LSN);/* Redo a savepoint.
*/
*/
*/
*/
void
LSN
TM_Startup(LSN);
/* TM restarting. Passes RM ckpt LSN
*/
Checkpoint(LSN * low_water); /* TM checkpointing, Return RM ckpt LSN,
set low water LSN
*/
Boolean Join_Work(RMID, TRID); /* Become part of a transaction
*/
Gray & Reuter: Resource Manager
8
WW Tour: The Transaction Manager
Transaction rollback.
coordinates transaction rollback to a savepoint or abort
rollbacks can be initiated by any participant.
Resource manager restart.
If an RM fails and restarts, TM presents checkpoint anchor & RM undo/redo log
System restart.
TM drives local RM recovery (like RM restart)
TM resolves any in-doubt distributed transactions
Media recovery.
TM helps RM reconstruct damaged objects by providing
archive copies of object + the log of object since archived.
Node restart.
Transaction commit among independent TMs when a TM fails.
Gray & Reuter: Resource Manager
9
WW Tour: When a Transaction Aborts
Begin _Wo rk()
Application
transid
Wo rk Requests
Reso urce
M an ager
Wo rk Requests
Lo ck Requests Lo ck
Jo in_ Wo rk M an ager
No rmal
Fun citon s
Lo g Records
Ro llback_ Work ()
T ransactio n
Callbacks
T ransactio n
M an ager
Lo g
M an ager
Un do (lo g reco rd)
Aborted(transid)
ReadT ran saction 's
Lo g Records &
Call Undo
Write Abo rt Record
in Lo g
At transaction rollback
TM drives undo of each RM joined to the transaction
Can be to savepoint 0 (abort) or partial rollback.
Gray & Reuter: Resource Manager
10
WW tour: the Transaction Manager
at Restart/Recovery
Transaction
M anager
Log
M anager
Log Records
Log Records
Find Checkp oint
Read log forward
Redo each op
At end,
Undo Soft
Savepoints &
Transactions
Redo (log record)
Redo (log record)
Redo (log record)
Redo (log record)
Redo (log record)
Redo(log record)
Resource
M anager
Undo (log record)
Undo (log record)
Undo(log record)
At restart, TM reading the log drives RM recovery.
Single log scan.
Single resolver of transactions.
Multiple logs possible, but more complex/more work.
Gray & Reuter: Resource Manager
11
End of Whirl-Wind Tour
Gray & Reuter: Resource Manager
12
Resource Manager Concepts:
Undo Redo Protocol
DO-UNDO- REDO Protocol
Old State
New State
New State
DO
UNDO
log record
Old State
Old State
log record
New State
REDO
log record
Gray & Reuter: Resource Manager
13
Resource Manager Concepts:
Transaction UNDO Protocol
declare cursor for transaction_log
select rmid, lsn
from
log
where trid = :trid
descending lsn;
void transaction_undo(TRID trid)
{ int
sqlcode;
open cursor transaction_log;
while (TRUE)
{
fetch transaction_log into :rmid, :lsn;
if (sqlcode != 0) break;
rmid.undo(lsn);
}
close cursor transaction_log;
};
/* a cursor on the transaction's log
*/
/* it returns the resource manager name */
/* and record id (log sequence number) */
/* and returns records in LIFO order */
/* Undo the specified transaction.
*/
/* event variables set by sql
*/
/* open an sql cursor on the trans log */
/* scan trans log backwards & undo each*/
/* fetch the next most recent log rec
*/
/*
*/
/* if no more, trans is undone, end loop*/
/* tell RM to undo that record
*/
/* tell RM to undo that record
*/
/* Undo scan is complete, close cursor */
/* return to caller
*/
• If UNDO to savepoint , the UNDO stops at desired savepoint
Gray & Reuter: Resource Manager
14
Resource Manager Concepts:
Restart REDO Protocol
void log_redo(void)
{declare cursor for the_log
select rmid, lsn
from
log
ascending lsn;
open cursor the_log;
while (TRUE)
{ fetch the_log into :rmid, :lsn;
if (sqlcode != 0) break;
rmid.redo(lsn);}
close cursor the_log;
};
/*
/* declare cursor from log start forward
/* gets RM id and log record id (lsn)
/* of all log records.
/* in FIFO order
/* open an sql cursor on the log table
/* Scan log forward& redo each record.
/* fetch the next log record
/* if no more, then all redone, end loop
/* tell RM to redo that record
/* Redo scan complete, close cursor
/* return to caller
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
Note: REDO forwards, UNDO backwards
Gray & Reuter: Resource Manager
15
Idempotence
Old State
undo
log record
New State
redo
log record
F(F(X)) == F(X): Needed in case restart fails (and restarts)
Redo(Redo(old_state,log), log) = Redo(new_state,log) = new_state
Undo(Undo(new_state,log), log) = Undo(old_state,log) = old_state
Gray & Reuter: Resource Manager
16
Testable State: Can Tell If It Happened.
IF operation not idempotent AND state not testable
THEN recovery is impossible
ELSE for F in {UNDO, REDO}:
not testable: WHILE (! ACK) F(F(X))
testable: WHILE ( not desired state) {F(x)}
Old State
Unknown
State
test
New State
Gray & Reuter: Resource Manager
17
Real Operations: Can Not Be Undone
Defer operations until commit is assured.
Perform as part of Phase 2 of commit
If must undo for some reason,
generate compensation log record
to be processed by some higher authority.
Old State
Old State
Old State
DO
Old State
Old State
UNDO
log record
log record
New State
Old State
Commit
log record
Gray & Reuter: Resource Manager
Compensation log record
New State
REDO
log record
18
Example: Communications Session RM
Session And Message Recovery Actions
Sender
DO
lo g mes sage & seqno
s end
UNDO
s end cancellatio n
(gen erates log record)
REDO
resend mess age
COMMIT
Receiver
establis h s av epo int.
lo g mes sage & seqno
acknowledge
lo g cancellation mess age
return to savepoin t
acknowledge
if not d uplicate
<no rmal DO pro cess ing>
else ju st acknowledge.
s end an y deferred (real)
do it
Opsmes
aresages
idempotent (sequence numbers)
and testable (sequence numbers)
Gray & Reuter: Resource Manager
19
Kinds of Logging
Physical:
Keep old and new value of container (page, file,...)
Pro: Simple
Allows recovery of physical object (e.g. broken page)
Con: Generates LOTS of log data
Logical:
-1
Keep call params such that you can compute F(x), F (x)
Pro: Sounds simple
Compact log.
Con: Doesn't work (wrong failure model).
Operations do not fail cleanly.
Gray & Reuter: Resource Manager
20
Sample Physical LOG RECORD
struct compressed_log_record_for_page_update /*
*/
{ int opcode;
/* opcode will say compressed page update*/
filename fname;
/* name of file that was updated
*/
long
pageno;
/* page that was updated
*/
long
offset;
/* offset within page that was updated
*/
long
length;
/* length of field that was updated
*/
char
old_value[length];
/* old value of field
*/
char
new_value[length];
/* new value of field
*/
};
/*
*/
Ordinary sequential insert is OK.
Update of sorted (B-tree) page:
update LSN
update page space map
update pointer to record
insert record at correct spot (move 1/2 the others)
Essentially writes whole page (old and new).
16KB log records for 100-byte updates.
Gray & Reuter: Resource Manager
21
Sample Physical LOG RECORD
struct logical_log_record_for_insert
{ int opcode;
filename fname;
long
length;
char
record[length];
};
/*
/* opcode will says insert
/* name of file that was updated
/* length of record that was updated
/* value record
/*
*/
*/
*/
*/
*/
*/
Very compact.
Implies page update(s) for record (may be many pages long).
Implies index updates (many be many indices on base table)
Gray & Reuter: Resource Manager
22
The trouble with Logical Logging
Logical logging needs to start UNDO/REDO with an action-consistent state.
No half completed operations.
for example: insert (table, record)
ALL or NONE of the indices should be updated
when logical UNDO/REDO is invoked.
Problem:
Failure model is Page & Message action consistency
(Lampson /Sturgis model of Chapter 3).
Actions can fail due to:
Logic: e.g. duplicate key.
Limit: ran out of space
Contention: deadlock
Media: broken page or session
System: computer failure/restart
Gray & Reuter: Resource Manager
23
Making Logical Logging Work: Shadows
Keep old copy of each page
Reset page to old copy at abort (no undo log)
Discard old copy at commit.
Handles all online failures due to:
Logic: e.g. duplicate key.
Limit: ran out of space
Contention: deadlock
Problem: forces page locking, only one updater per page.
What about restart?
Need to atomically write out all changed pages.
Gray & Reuter: Resource Manager
24
Making Logical Logging Work: Shadows
Perform same shadow trick at disc level.
Keep shadow copy of old pages.
Write out new pages.
In one careful write, write out new page root.
Makes update atomic
A Shadow Update
Old
Data
Directory
A
B
C
Gray & Reuter: Resource Manager
Free Space
Bit Map
New
Directory
A
C
Free Space
Bit Map
B
25
Shadows
Pro: Simple
Not such a bad deal with non-volatile ram
Con: page locking
extra space
extra overhead (for page maps)
extra IO
declusters sequential data
Gray & Reuter: Resource Manager
26
Compromise Physio-Logical Logging
Physio-Logical Logging
Physical to a "page" (physical container)
Logical within a "page".
Keep old and new value of container (page, file,...)
Pro: Simple
Allows recovery of physical object (e.g. broken page)
Con: Generates LOTS of log data
Gray & Reuter: Resource Manager
27
Logical vs Physio-logical Logging
Ins ert record r into table A
Table A
Logical log record
insert, A, r
Table A
Index B
Index B
Index C
Index C
Physiological log records
insert, A, page 508, r
insert, B, p age 72, s
insert, C, p age 94, t
Note: physical log records would be bigger for sorted pages.
Gray & Reuter: Resource Manager
28
Physiological Logging Rules
Complex operations are a sequence of simple operations on pages and
messages.
Each operation is constructed as a mini-transaction:
lock the object in exclusive mode
transform the object
generate an UNDO-REDO log record
record log LSN in object
unlock the object.
Action Consistent Object:
When object semaphore free, no ops in progress.
Log-Consistency:
contains log records of all complete page/msg actions.
Gray & Reuter: Resource Manager
29
Physiological Logging Rules
Online Operation - Only Need the Fix Rule
Each operation is structured as a mini-transaction.
Each operation generates an UNDO record.
No page operation fails with the semaphore set.
(exception handler must clean up state
and UNFIX any pages).
Then Rollback can be
physical to a page/session/container and
logical within page/session/container.
Gray & Reuter: Resource Manager
30
Physiological Logging Rules
Restart Operation - Need WAL and F@C
Need Page-Action consistent disc state.
Pages are action consistent.
Committed actions can be redone from log.
Uncommitted actions can be undone from log.
WAL: Write Ahead Log
Write undo/redo log records before overwriting disc page
Only write action-consistent pages
Force-Log-At-Commit
Make transaction log records durable at commit.
Gray & Reuter: Resource Manager
31
Physiological Logging Rules
WAL and F@C
WAL: Write Ahead Log
write page:
get page semaphore
copy page
give page semaphore /* avoids holding semaphore during IO */
Force_log(Page(LSN)) /*WAL logic, probably already flushed*/
Write copy to disc.
WAL gives idempotence and testability.
Force-Log-At-Commit
At commit phase 1:
Force_log(transaction.max_lsn)
Gray & Reuter: Resource Manager
32
WAL & F@C in Pictures
Volatile Page
Versions
Volatile Log Durable Log Persistent Page
Records
Records
Versions
PVlsn
VVlsn
VLlsn
DLlsn
online: VVlsn = VLlsn
restart: DLlsn <= VVlsn
PVlsn <= DLlsn
Commit:
commit_lsn <= DLlsn
At restart all volatile memory is reset and must be
reconstructed from persistent memory.
DLlsn
restart:
PVlsn
PVlsn <= DLlsn
commit_lsn <= DLlsn
FIX, WAL and F@C assure these assertions
Gray & Reuter: Resource Manager
33
The One Bit Resource Manager
Manages an array of transactional bits (the free space bit map).
i = get_bit();
/* gets a free bit and sets it
*/
give_bit(i);
/* returns a free bit (when transaction commits)
*/
Gray & Reuter: Resource Manager
34
The Bitmap and Its Log Records
The Data Structure
struct {
LSN
lsn;
xsemaphore sem;
Boolean
bit[BITS];
} page;
/* layout of the one-bit RM data structure
/* page LSN for WAL protocol
/* semaphore regulates access to the page
/* page.bit[i] = TRUE => bit[i] is free
/* allocates the page structure
*/
*/
*/
*/
*/
The Log Records
struct
{ int index;
Boolean
value;
} log_rec;
/* log record format for the one-bit RM
/* index of bit that was updated
/* new value of bit[index]
/* log record used by the one-bit RM
*/
*/
*/
*/
const int rec_size = sizeof(log_rec); /*size of the log record body.
Gray & Reuter: Resource Manager
*/
35
Page and Log Consistency for 1-Bit RM
Data dirty if reflects an uncommitted transaction update
Otherwise, data is clean.
Page Consistency:
• No clean free bit has been given to any transaction.
• Every clean busy bit was given to exactly one transaction.
• Dirty bits locked in X mode by updating transactions .
• The page.lsn reflects most recent log record for page.
Log Consistency:
• Log contains a record for every completed
mini-transaction update to the page.
Gray & Reuter: Resource Manager
36
give_bit()
get_bit() & give_bit(i) temporarily violate page consistency.
Mini-transaction holds semaphore while violating consistency.
Makes page & log mutually consistent before releasing sem.
=> each mini-transaction observes a consistent page state.
void give_bit(int i)
/* free a bit
{ if (LOCK_GRANTED==lock(i,LOCK_X,LOCK_LONG,0)) /* Lock bit
{ Xsem_get(&page.sem);
/* get page sem
page.bit[i] = TRUE;
/* free the bit
log_rec.index = i;
/* generate log rec
log_rec.value = TRUE;
/*saying bit is free
page.lsn = log_insert(log_rec,rec_size);
/*write log rec&update lsn
Xsem_give(&page.sem);}
/* page consistent
else
/* if lock failed, caller doesn't own bit,
Abort_Work();
/* in that case abort caller's trans
return; };
/*
Gray & Reuter: Resource Manager
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
37
get_bit()
int get_bit(void)
/* allocate a bit to and returns bit index */
{ int i;
/* loop variable
*/
Xsem_get(&page.sem);
/* get the page semaphore
*/
for ( i = 0; i<BITS; i++);
/* loop looking for a free bit
*/
{if (page.bit[i])
/* if bit is free, may be dirty (so locked)*/
{if (LOCK_GRANTED =lock(i,LOCK_X,LOCK_LONG,0));/* lock bit */
{ page.bit[i] =FALSE;
/* got lock on it, so it was free
*/
log_rec.value = FALSE;
/* generate log rec describing update */
log_rec.index = i;
/*
*/
page.lsn = log_insert(log_rec,rec_size); /* write log rec&update lsn */
Xsem_give(&page.sem);
/* page now consistent, give up sem
*/
return i; }
/* return to caller
*/
};
/* else lock bounce so bit dirty
*/
};
/* try next free bit,
*/
Xsem_give(&page.sem);
/* if no free bits, give up semaphore
*/
Abort_Work();
/* abort transaction
*/
return -1;};
/* returns -1 if no bits are available. */
Gray & Reuter: Resource Manager
38
Compensation Logging
Logical Old State
New State
UNDO
log record
compensation log record
Undo may generate a log record recording undo step
Makes Page LSN monotonic
Similar technique was used for Communication Manager
(session sequence number was monotonic)
Gray & Reuter: Resource Manager
39
1-bit RM UNDO Callback
void undo(LSN lsn)
/* undo a one-bit RM operation
*/
{ int
i;
/* bit index
*/
Boolean
value;
/* old bit value from log rec to be undone*/
log_rec_header header;
/* buffer to hold log record header
*/
rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log rec
*/
Xsem_get(&page.sem);
/* get the page semaphore
*/
i = log_rec.index;
/* get bit index from log record
*/
value = ! log_rec.value;
/* get complement of new bit value
*/
page.bit[i] = value;
/* update bit to old value
*/
log_rec.value= value;
/* make a compensation log record
*/
page.lsn = log_insert(log_rec,rec_size);
/* log it and bump page lsn
*/
Xsem_give(&page.sem);
/* free the page semaphore
*/
return; }
/*
*/
Gray & Reuter: Resource Manager
40
1-bit RM Checkpoint Callback
LSN checkpoint(LSN * low_water) /* copy 1-page RM state to persistent store*/
{ Xsem_get(&page.sem);
/* get the page semaphore
*/
*low_water = log_flush(page.lsn); /* WAL force up to page lsn, and
*/
/*
set low water mark
*/
write(file,page,0,sizeof(page));
/* write page to persistent memory
*/
Xsem_give(&page.sem);
/* give page semaphore
*/
return NULLlsn; }
/* return checkpoint lsn (none needed) */
Gray & Reuter: Resource Manager
41
1-bit RM REDO Callback
void redo( LSN lsn)
/* redo an free space operation
*/
{ int
i;
/* bit index
*/
Boolean
value;
/* new bit value from log rec to be redone*/
log_rec_header header;
/* buffer to hold log record header
*/
rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log record
*/
i = log_rec.index;
/* Get bit index
*/
lock(i,LOCK_X,LOCK_LONG,0);
/* get lock on the bit (often not needed) */
Xsem_get(&page.sem);
/* get the page semaphore
*/
if (page.lsn < lsn)
/* if bit version older than log record */
{ value= log_rec.value;
/* then redo the op. get new bit value */
page.bit[i] = value;
/* apply new bit value to bit
*/
page.lsn = lsn; }
/* advance the page lsn
*/
Xsem_give(&page.sem);
/* free the page semaphore
*/
return; };
/*
*/
Gray & Reuter: Resource Manager
42
1-BIT Rm Noise Callbacks
Boolean prepare(LSN * lsn)
{*lsn = NULLlsn; return TRUE ;};
/* 1-bit RM has no phase 1 work
/*
*/
*/
void Commit(void )
/* Commit release locks &
{ unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return
*/
*/
void Abort(void )
/* Abort release all locks &
{ unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return
*/
*/
Boolean savepoint((LSN * lsn)
{*lsn = NULLlsn; return TRUE ;};
*/
*/
/* no work to do at savepoint
/*
void UNDO_savepoint(LSN lsn)
/* rollback work or abort transaction
{if (savepoint == 0)
/* if at savepoint zero (abort)
unlock_class(LOCK_LONG, TRUE, MyRMID()); /* release all locks
};
/*
Gray & Reuter: Resource Manager
*/
*/
*/
*/
43
Summary
Model: Complex actions are a page/message action sequence.
LSN: Each page carries an LSN and a semaphore.
ReadFix: Read acts semaphore in shared mode.
WriteFix: Update actions get semaphore in exclusive mode,
generate one or more log records covering the page,
advance the page LSN to match highest LSN
give semaphore
WAL: log_flush(page.LSN) before overwriting persistent page
F@C: force all log records up to the commit LSN at commit
Compensation Logging: Invalidate undone log record with a
compensating log record.
Idempotence via LSN: page LSN makes REDO idempotent
Gray & Reuter: Resource Manager
44
Two Phase Commit
Getting two or more logs to agree
Getting two or more RMs to agree
Atomically and Durably
Even in case one of them fails and restarts.
The TM phases
Prepare. Invoke each joined RM asking for its vote.
Decide. If all vote yes, durably write commit log record.
Commit. Invoke each joined RM, telling it commit
decision.
Complete. Write commit completion when all RM ACK.
Gray & Reuter: Resource Manager
45
Centralized Case of Two Phase Commit
Each participant: (TM &RM) goes through a
sequence of states
Null
Active
Prepared
Committing
Committed
Aborting
Aborted
These generate log records
Gray & Reuter: Resource Manager
46
Examples
Committed
begin
DO rm1
DO rm2
DO rm2
prepare rm2 {locks}
commit { rm1, rm2}
complete
Gray & Reuter: Resource Manager
Aborted
begin
DO rm1
DO rm2
DO rm2
UNDO rm2
UNDO rm2
UNDO rm1
UNDO begin { rm1, rm2}
complete
47
Transitions in Case of Restart
Active state not persistent, others are persistent
For both TM and RM.
Log records make them persistent (redo)
TM tries to drive states to the right. (to committed, aborted)
Null
Active
Gray & Reuter: Resource Manager
Prepared
Committing
Aborting
Committed
Aborted
48
Successful two phase commit
Message/Call flow from TM to each RM joined to transaction
State
Active
Coordinator
Prepare
Local Prepare
(lazy)
Prepared
Committing
Committed
yes
Write Commit
Record In Log Commit
(force)
Local Commit
Work
(lazy)
Ack
Participant
State
Active
Local Prepare
Write Prepare Record
In Log (force)
Prepared
Local Commit Work
Write Completion Record Committing
In Log (lazy)
Ack when durable.
Write Completion
Record In Log
(lazy)
Committed
If TM and RM share the same log,
the RM FORCE can piggyback on the TM FORCE
One IO to commit a transaction (less if commit is grouped)
Gray & Reuter: Resource Manager
49
Abort Two Phase Commit
If RM sends "NO" or no response (timeout), TM starts abort.
Calls UNDO of each trans log record
May stop at a savepoint.
At begin_trans it calls ABORT() callback of each joined RM
Gray & Reuter: Resource Manager
50
Distributed two phase commit
Tracking joined TMs -- the communications manager helps
Much as TRPC helps in the local case.
call
trid, data
Communications
M anager
first time?
callee
trid is
incomingfrom A
trid is
outgoingto B
Transaction
M anager A
Sess ion
Communications
trid, data
M anager
first time?
Transaction
M anager
Root TM owes a Prepare/Commit/Abort message to each joined TM.
Joined TM does "local" commit.
Gray & Reuter: Resource Manager
51
Full Transaction State Diagram
Next section explains how these states are implemented.
live states
null
= save point 0
Begun
= save point 1
save point n
Volatile
States
Persistent
States
Durable
States
active
persistent save p oint n
prepared
committing
aborting
committed
aborted
complete states
Gray & Reuter: Resource Manager
52
Summary of Resource Manager Concepts
DO/UNDO/REDO
Idempotent, Testable, Real operations
Logical vs Physical logging
Shadows to make logical logging work
Physiological logging
Fix, WAL, Force-at-commit
Page/Message/Log consistency
RM callbacks (the 1-bit resource manager)
Join, Prepare, Commit, Abort, UNDO, REDO, ....
Restart REDO/UNDO
Two phase commit (RM story is simple).
Gray & Reuter: Resource Manager
53
Download