Advanced Transaction Management 9:00 11:00 13:30 15:30 18:00 Aug. 2 Intro & terminology Reliability Fault tolerance Transaction models Reception Aug. 3 Aug. 4 Aug. 5 Aug. 6 TP mons Logging & Files & Structured & ORBs res. Mgr. Buffer Mgr. files Locking Res. Mgr. & COM+ Access paths theory Trans. Mgr. Locking CICS & TP CORBA/ Groupware techniques & Internet EJB + TP Queueing Advanced Replication Performance Trans. Mgr. & TPC Workflow Cyberbricks Party FREE Chapter 13 Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Mixing Transaction Managers Four standards: LU 6.2 ~ APPC ~ CPIC ~ CICS: de facto TP standard X/Open + OSI/TP : The de jure TP standard. OTS: The CORBA standard TIP: De facto interoperability standard Almost everyone interoperates with LU6.2 LU6.2 has evolved to have presumed abort, not reuse aborted trids, .. other fixes LU6.2 is "open" two phase commit, documented interface, reconnection / resolve is documented. Internally, everyone uses private protocols with many tricks. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 3 Mixing "OLD" Transaction Managers Many old TP monitors are not open: Do not expose 2PC (prepare() and commit()) => insist on being root commit coordinator. All will become X/Open-compliant eventually and thus be open TP monitors. If stuck with an "closed" TM: Can still get atomicity if: 1. Only one closed TM involved. 2. TM is direct not queued © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 4 Mixing with a Closed Transaction Manager All "open" TMs and RMs prepared, closed TM does "RUMP" Do Transaction While not acknowledge Send trid + data Wait Transaction Gateway to Closed Transaction Mgr If Not duplicate Do transaction Insert trid in done table Commit Done Table Acknowledge deferred_update(int id, complex_type list_of_updates) /* rump logic */ {Begin_Work(); /* start a new transaction */ select count(*) from done where id = :id; /* test if work was done */ if not found then /* if not done */ do list_of_updates; /* then do the list of updates.*/ insert into done values (:id); /* flag transaction done */ Commit_Work(); /* commit update and flag */ acknowledge; /* reply success to caller */ } /* in both cases. */ Status_Transaction(TRID trid) { select count(*) into :ans from done where trid = :trid; return ans:} © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 5 Mixing Open Transaction Managers Gateway translates between external and internal TRID. Gateway translates between external and internal protocols Participates in transaction resolution (is a TM in both worlds) Transaction Gateway "Foreign" Transaction Managers OSI Protocol Stack Trid Map Table his trid © Jim Gray, Andreas Reuter our trid Transaction Processing - Concepts and Techniques "Our" Transaction Manager Local Protocol WICS August 2 - 6, 1999 6 Mixing Open Transaction Managers Multiple entry problem: TRID enters system twice at two different paths. "works" but looks like two separate transactions. commit dependency is external to system. Fancy option problem: External/internal TM has an option the other does not. Fakes (or turn off) optimizations/options not supported by one side or the other © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 7 Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 8 Non-Blocking Commit The problem: what if the coordinator fails. Solutions: 1. wait 2. appoint a new coordinator Appointment can be thought of as a process pair (n-plex) Works great in a cluster (no communications failures). Process Pair Primary Backup Participants Log Prepare (+ list of participants and sessions) ack Prepare Prepared Commit ack Write Commit Log Record Commit Committed Complete ack © Jim Gray, Andreas Reuter Write "Complete" Log Record Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 9 Non-Blocking Commit in a WAN: 3j or Heuristic or Operator Command Wide area net can partition Process pairs cannot reliably decide to take over. Solution(s): 1. Three phase protocol Broadcast participant list and decision as part of phase 1.5; let (majority) of participants decide if coordinator fails. 2. Heuristic decisions Default to commit/abort. Announce Heuristic Mismatch at reconnect if wrong guess 3. Human decision Announce Operator Mismatch at reconnect if wrong guess. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 10 Transfer of Commit What if a participant is more secure than the coordinator? is more reliable than the coordinator? Is faster than the coordinator? Transfer commit authority to him? Gas Pump Gas Pump LA Bank LA Bank SF Bank © Jim Gray, Andreas Reuter Visa SF Bank Transaction Processing - Concepts and Techniques Visa WICS August 2 - 6, 1999 11 Transfer of Commit Is also an optimization: saves messages if done as part of commit. called nested commit protocol or last resource manager optimization No Transfer of Commit Begin Dequeue doit work request Commit_Work() Commit Phase 2 Commit complete Enqueue Prepare Transfer of Commit Begin Dequeue Prepare doit work request + You are Root! Phase 2 Commit Enqueue Commit_Work() complete 2 messages vs 5 messages (plus one lazy msg) © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 12 Transfer of Commit: More Complex Case More complex if the root has more than one branch: Need to set up new sessions among "trusted" nodes Deutschland Deutschland Lybia Lybia US US root sends new root name to all participants at phase 1 © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 13 Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 14 Optimizing Commit Can optimize: Delay: milliseconds/commit Message cost: number, size, urgency of messages IO cost: number, size, or urgency of IO CPU cost: cycles used Throughput: maximum commit rate. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 15 Commit: the General Case Prepare(): 1 rpc or message pair per RM and one per non-root TM 1 forced IO per RM (prepare record) 1 forced IO per TM(commit record) Commit(): The same. Summary of 2PC cost: IO: 2(RM+TM) RPCs: 2(RM+(TM-1)) Messages: 4(RM+(TM-1)) (equivalent to RPCs) Delay: 2IO ~ 50ms ~ 10Kins. 4 msg ~ 20ms ~ 50Kins 50ms*(RM+TM) + 20ms*(RM+TM-1) These are the error-free counts (i.e. the minimum values) © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 16 Commit: Simple Optimizations Presumed abort saves a TM IO (implicit in protocol above) Do phase 1, phase2 in parallel (saves delay) Common log (saves RM log forces) IO: 2(TM) Messages: 4(RM+TM-1) (equivalent to RPCs) Delay: 2*IO*TM + 4*M*(RM+TM-1) ~50ms*TM+40ms*(RM+TM-1) Use Local RPC (10x faster) ~50ms*TM + RM+40ms*(TM-1) Use WADS for low IO latency(3ms vs 25ms) ~ 6ms*TM + RM + 40ms*(TM-1) Simple case of 1 TM 2 RM: ~ 8ms delay for a commit. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 17 Group Commit Optimization Amortizes IO and messages across several transactions Adds delay If N transactions in a group: IO, Message cost per transaction is ~ 1/N Small extra delay if one slow step in original path. As system heats up (commit rate rises) to 25tps start to install group commit with a 30ms threshold (at 100tps: 3.3 trans/group). © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 18 Simple Commit Optimizations Read-only: just get phase1 call to release locks. Note: may violate ACID, should release read locks at phase 2 if any locks acquired during phase 1. Saves messages (Phase 2) and IO (no RM IO). True read-only transaction must prepare at phase 1 unlock at phase 2. Unjoin: RM does no work at commit/abort. Lazy: user-requested group commit. Piggybacks on others. no extra IO or messages. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 19 Transaction Commit Trees one node case TM RM deep bush TM RM TM TM RM TM general TM RM TM TM RM RM TM RM TM TM RM RM share log transfer LRPC commit . © Jim Gray, Andreas Reuter RM RM Parallel transfer Transaction Processing - Concepts and Techniques TM TM RM RM TM TM RM RM Parallel transfer WICS August 2 - 6, 1999 20 Transfer of COMMIT: Linear COMMIT Parent and other sub-trees prepare then transfer commit authority to remaining child. Last in chain becomes commit coordinator. More delay, fewer messages For N=2, Same delay, 3 vs 4 messages. Always use it. TM TM RM TM RM TM RM TM RM TM RM © Jim Gray, Andreas Reuter TM RM TM RM TM RM TM RM TM RM TM RM TM RM RM TM RM TM RM Transaction Processing - Concepts and Techniques TM TM RM RM TM TM RM RM WICS August 2 - 6, 1999 TM RM TM RM 21 Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 22 Disaster Recovery at a Remote Site Replicate Data Applications Network connection at 2 (or more sites) Symmetric design: Either site can process transactions Asymmetric design: One site is master of each data item. Allows: Caching Batching of updates at backup So far, asymmetric design is most popular. To get symmetry, have each node master 1/2 of the db/net. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 23 Sample Physical LOG RECORD Basic idea of asymmetric design: send log from primary to backup backup applies log to its copy backup is in constant media recovery backup processes/sessions/data ready to take over System Pairs Symmetric: Client Basic Idea Two System Pairs Session Primary log Backup System Pair Hub: Central Site Backs up Several Primaries Primary Backup © Jim Gray, Andreas Reuter client Primary Primary Clients log Backup log Primary Vault: Backup stores Log Client and Archive Dumps client client Primary Backup log & Primary archive dumps Transaction Processing - Concepts and Techniques Backup WICS August 2 - 6, 1999 24 Sample Physical LOG RECORD Need some way to decide failure. Easy in a cluster Hard in a WAN (partition possible) Solutions: Extra wires Wires on demand (dialup) Human (operator) Quorum device. Kind of log? Logical log is best loose coupling (allows backup to be a different TM/RM failure independence (different from physiological log) © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 25 Takeover Logic /* initialization Tell primary I'm here Setup all RMs and application processes Open all initial sessions to clients. */ /* the main backup loop While (not primary) {redo log} /* the main backup loop */ */ /* Takeover redo rest of log resend most recent message on each session abort any incomplete transactions */ /* Become Primary tell application processes to start accepting requests. */ © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 26 Session Takeover Just like process pairs Session sequence numbers eliminate duplicates So, get at-least-once delivery: resend msg at takeover Network Switches Clients OSI, SNA,TCP/IP, X..25,etc Primary © Jim Gray, Andreas Reuter Backup Front Ends Switch Clients OSI, SNA,TCP/IP, X..25,etc Primary Transaction Processing - Concepts and Techniques Backup WICS August 2 - 6, 1999 27 Catch-up After Failure Failed node at restart executes normal restart Then enters backup logic. If both fail, outside observer must say who is best backup has to match its log to new primary. Design issue: are nodes bit-for-bit identical? If so, backup must “trim” log to match primary. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 28 How Safe? 1-SAFE: 2-SAFE: VERY-SAFE: no extra delay, risks lost transactions extra delay (if backup up), single fault tolerant, high availability extra delay, no lost transactions low availability Both Up client 1-Safe commit primary backup commit ok client 2-Safe commit commit © Jim Gray, Andreas Reuter client commit primary backup commit client commit ok client Very Safe P rimary Up, Backup Down commit commit client commit ok out of service Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 29 System Pairs vs Replicated Data System pairs replicate the application DB application processes sessions Data replicators only replicate data. Other aspects left as an exercise for the application designer. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 30 System Pair Benefits Tolerates faults Hardware Environment Operations Heisenbugs Can replace software/hardware online Can move backup to new building or... Allows design diversity: backup can be completely different Step 1: Both systems are running version V1. Primary V1 Backup V1 © Jim Gray, Andreas Reuter Step 2: Backup is co ld-lo aded as versio n V2. Primary V1 Backup V2 Step 4: Backup is co ld-lo aded as versio n V2 Step 3: SWITCH to Backup. Backup V1 Primary V2 Backup V2 Transaction Processing - Concepts and Techniques Primary V2 WICS August 2 - 6, 1999 31 Outline Mixing heterogeneous TMs High-Availability Commit & Transfer of Commit Optimizing Commit Disaster Protection via Data/Application Replication © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 32