Advanced Transaction Management 

advertisement
Advanced Transaction Management
9:00
11:00
13:30
15:30
18:00
Aug. 2
Intro &
terminology
Reliability
Fault
tolerance
Transaction
models
Reception
Aug. 3
Aug. 4
Aug. 5
Aug. 6
TP mons
Logging &
Files &
Structured
& ORBs
res. Mgr.
Buffer Mgr.
files
Locking Res. Mgr. &
COM+
Access paths
theory
Trans. Mgr.
Locking
CICS & TP
CORBA/
Groupware
techniques & Internet
EJB + TP
Queueing
Advanced
Replication Performance
Trans. Mgr.
& TPC
Workflow Cyberbricks
Party
FREE
Chapter 13

Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of
Commit

Optimizing Commit

Disaster Protection via Data/Application
Replication
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

2
Mixing Transaction Managers








Four standards: LU 6.2 ~ APPC ~ CPIC ~ CICS: de
facto TP standard
X/Open + OSI/TP : The de jure TP standard.
OTS: The CORBA standard
TIP: De facto interoperability standard
Almost everyone interoperates with LU6.2
LU6.2 has evolved to have presumed abort, not reuse
aborted trids, .. other fixes
LU6.2 is "open" two phase commit, documented
interface, reconnection / resolve is documented.
Internally, everyone uses private protocols with many
tricks.
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

3
Mixing "OLD" Transaction Managers

Many old TP monitors are not open:


Do not expose 2PC (prepare() and commit())
=> insist on being root commit coordinator.

All will become X/Open-compliant eventually and thus
be open TP monitors.

If stuck with an "closed" TM:




Can still get atomicity if:
1. Only one closed TM involved.
2. TM is direct not queued
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
4
Mixing with a Closed Transaction Manager
All "open" TMs and RMs prepared, closed TM does "RUMP"
Do Transaction
While not acknowledge
Send trid + data
Wait
Transaction Gateway
to Closed Transaction Mgr
If Not duplicate
Do transaction
Insert trid in done table
Commit
Done Table
Acknowledge

deferred_update(int id, complex_type list_of_updates)
/* rump logic
*/
{Begin_Work();
/* start a new transaction
*/
select count(*) from done where id = :id;
/* test if work was done
*/
if not found then
/* if not done
*/
do list_of_updates;
/* then do the list of updates.*/
insert into done values (:id);
/* flag transaction done
*/
Commit_Work();
/* commit update and flag
*/
acknowledge;
/* reply success to caller
*/
}
/* in both cases.
*/
Status_Transaction(TRID trid)
{ select count(*) into :ans from done where trid = :trid; return ans:}
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
5
Mixing Open Transaction Managers



Gateway translates between external and internal TRID.
Gateway translates between external and internal protocols
Participates in transaction resolution (is a TM in both worlds)

Transaction Gateway
"Foreign"
Transaction
Managers
OSI Protocol Stack
Trid Map Table
his trid
© Jim Gray, Andreas Reuter
our trid
Transaction Processing - Concepts and Techniques
"Our"
Transaction
Manager
Local Protocol
WICS August 2 - 6, 1999
6
Mixing Open Transaction Managers


Multiple entry problem:
TRID enters system twice at two different paths.
"works" but looks like two separate transactions.
commit dependency is external to system.

Fancy option problem:
External/internal TM has an option the other does not.
Fakes (or turn off) optimizations/options not supported
by one side or the other
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
7
Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of Commit

Optimizing Commit

Disaster Protection via Data/Application Replication
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

8
Non-Blocking Commit
The problem: what if the coordinator fails.
Solutions:
1. wait
2. appoint a new coordinator
Appointment can be thought of as a process pair (n-plex)
Works great in a cluster (no communications failures).
Process Pair
Primary
Backup
Participants

Log
Prepare (+ list of participants and sessions)
ack
Prepare
Prepared
Commit
ack
Write Commit Log Record
Commit
Committed
Complete
ack
© Jim Gray, Andreas Reuter
Write "Complete" Log Record
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
9
Non-Blocking Commit in a WAN:
3j or Heuristic or Operator Command
Wide area net can partition
Process pairs cannot reliably decide to take over.
Solution(s):
1. Three phase protocol
Broadcast participant list and decision as part of phase
1.5; let (majority) of participants decide if coordinator
fails.
2. Heuristic decisions
Default to commit/abort.
Announce Heuristic Mismatch at reconnect if wrong
guess
3. Human decision
Announce Operator Mismatch at reconnect if wrong
guess.
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

10
Transfer of Commit
What if a participant
is more secure than the coordinator?
is more reliable than the coordinator?
Is faster than the coordinator?
Transfer commit authority to him?
Gas Pump
Gas Pump
LA Bank
LA Bank
SF Bank
© Jim Gray, Andreas Reuter

Visa
SF Bank
Transaction Processing - Concepts and Techniques
Visa
WICS August 2 - 6, 1999
11
Transfer of Commit
Is also an optimization:
saves messages if done as part of commit.
called nested commit protocol
or last resource manager optimization
No Transfer of Commit
Begin
Dequeue
doit
work request
Commit_Work()
Commit
Phase 2 Commit
complete
Enqueue
Prepare

Transfer of Commit
Begin
Dequeue
Prepare
doit
work request
+ You are Root!
Phase 2 Commit
Enqueue
Commit_Work()
complete
2 messages vs 5 messages (plus one lazy msg)
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
12
Transfer of Commit: More Complex Case
More complex if the root has more than one branch:
Need to set up new sessions among "trusted" nodes

Deutschland
Deutschland
Lybia
Lybia
US
US
root sends new root name to all participants at phase 1
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
13
Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of Commit

Optimizing Commit

Disaster Protection via Data/Application Replication
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

14
Optimizing Commit
Can optimize:
Delay: milliseconds/commit
Message cost: number, size, urgency of messages
IO cost: number, size, or urgency of IO
CPU cost: cycles used
Throughput: maximum commit rate.
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

15
Commit: the General Case
Prepare():
1 rpc or message pair per RM
and one per non-root TM
1 forced IO per RM (prepare record)
1 forced IO per TM(commit record)
Commit():
The same.
Summary of 2PC cost:
IO: 2(RM+TM)
RPCs: 2(RM+(TM-1))
Messages: 4(RM+(TM-1)) (equivalent to RPCs)
Delay: 2IO ~ 50ms ~ 10Kins.
4 msg ~ 20ms ~ 50Kins
50ms*(RM+TM) + 20ms*(RM+TM-1)

These are the error-free counts (i.e. the minimum values)
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
16
Commit: Simple Optimizations
Presumed abort saves a TM IO (implicit in protocol above)
Do phase 1, phase2 in parallel (saves delay)
Common log (saves RM log forces)
IO: 2(TM)
Messages: 4(RM+TM-1) (equivalent to RPCs)
Delay: 2*IO*TM + 4*M*(RM+TM-1)
~50ms*TM+40ms*(RM+TM-1)
Use Local RPC (10x faster)
~50ms*TM + RM+40ms*(TM-1)
Use WADS for low IO latency(3ms vs 25ms)
~ 6ms*TM + RM + 40ms*(TM-1)
Simple case of 1 TM 2 RM:
~ 8ms delay for a commit.
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

17
Group Commit Optimization
Amortizes IO and messages across several transactions
Adds delay
If N transactions in a group:
IO, Message cost per transaction is ~ 1/N

Small extra delay if one slow step in original path.
As system heats up (commit rate rises) to 25tps
start to install group commit with a 30ms threshold
(at 100tps: 3.3 trans/group).
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
18
Simple Commit Optimizations
Read-only: just get phase1 call to release locks.
Note: may violate ACID, should release read locks
at phase 2 if any locks acquired during phase 1.
Saves messages (Phase 2) and IO (no RM IO).
True read-only transaction must prepare at phase 1
unlock at phase 2.

Unjoin: RM does no work at commit/abort.
Lazy: user-requested group commit. Piggybacks on others.
no extra IO or messages.
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
19
Transaction Commit Trees
one node
case
TM
RM
deep
bush
TM
RM
TM
TM
RM
TM
general
TM
RM
TM
TM
RM
RM
TM
RM
TM
TM
RM
RM
share log transfer
LRPC
commit
. © Jim Gray, Andreas Reuter

RM
RM
Parallel
transfer
Transaction Processing - Concepts and Techniques
TM
TM
RM
RM
TM
TM
RM
RM
Parallel
transfer
WICS August 2 - 6, 1999
20
Transfer of COMMIT: Linear COMMIT
Parent and other sub-trees prepare
then transfer commit authority to remaining child.
Last in chain becomes commit coordinator.
More delay, fewer messages
For N=2, Same delay, 3 vs 4 messages.
Always use it.
TM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
© Jim Gray, Andreas Reuter
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM

RM
TM
RM
TM
RM
Transaction Processing - Concepts and Techniques
TM
TM
RM
RM
TM
TM
RM
RM
WICS August 2 - 6, 1999
TM
RM
TM
RM
21
Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of Commit

Optimizing Commit

Disaster Protection via Data/Application Replication
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

22
Disaster Recovery at a Remote Site
Replicate
Data
Applications
Network connection at 2 (or more sites)
Symmetric design:
Either site can process transactions
Asymmetric design:
One site is master of each data item.
Allows:
Caching
Batching of updates at backup
So far, asymmetric design is most popular.
To get symmetry, have each node master 1/2 of the db/net.
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

23
Sample Physical LOG RECORD
Basic idea of asymmetric design:
send log from primary to backup
backup applies log to its copy
backup is in constant media recovery
backup processes/sessions/data ready to take over
System Pairs
Symmetric:
Client
Basic Idea
Two System
Pairs
Session
Primary log
Backup
System Pair
Hub:
Central Site Backs
up
Several Primaries
Primary
Backup
© Jim Gray, Andreas Reuter
client
Primary
Primary
Clients
log
Backup
log Primary
Vault:
Backup stores Log
Client
and
Archive Dumps
client
client
Primary
Backup

log &
Primary archive
dumps
Transaction Processing - Concepts and Techniques
Backup
WICS August 2 - 6, 1999
24
Sample Physical LOG RECORD
Need some way to decide failure.
Easy in a cluster
Hard in a WAN (partition possible)
Solutions:
Extra wires
Wires on demand (dialup)
Human (operator)
Quorum device.
Kind of log?
Logical log is best
loose coupling (allows backup to be a different TM/RM
failure independence (different from physiological log)
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

25
Takeover Logic
/* initialization
Tell primary I'm here
Setup all RMs and application processes
Open all initial sessions to clients.
*/
/* the main backup loop
While (not primary) {redo log} /* the main backup loop
*/
*/
/* Takeover
redo rest of log
resend most recent message on each session
abort any incomplete transactions
*/
/* Become Primary
tell application processes to start accepting requests.
*/
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques

WICS August 2 - 6, 1999
26
Session Takeover



Just like process pairs
Session sequence numbers eliminate duplicates
So, get at-least-once delivery: resend msg at takeover
Network Switches Clients
OSI, SNA,TCP/IP, X..25,etc
Primary
© Jim Gray, Andreas Reuter
Backup

Front Ends Switch Clients
OSI, SNA,TCP/IP, X..25,etc
Primary
Transaction Processing - Concepts and Techniques
Backup
WICS August 2 - 6, 1999
27
Catch-up After Failure
Failed node at restart executes normal restart

Then enters backup logic.
If both fail, outside observer must say who is best
backup has to match its log to new primary.
Design issue: are nodes bit-for-bit identical?
If so, backup must “trim” log to match primary.
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
28
How Safe?
1-SAFE:
2-SAFE:
VERY-SAFE:
no extra delay, risks lost transactions
extra delay (if backup up),
single fault tolerant, high availability
extra delay, no lost transactions
low availability
Both Up
client
1-Safe
commit
primary
backup
commit
ok
client
2-Safe
commit
commit
© Jim Gray, Andreas Reuter
client
commit
primary
backup
commit
client
commit
ok
client
Very Safe
P rimary Up, Backup Down

commit
commit
client
commit
ok
out of
service
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
29
System Pairs vs Replicated Data
System pairs replicate the application

DB

application processes

sessions

Data replicators only replicate data.
Other aspects left as an exercise for the
application designer.
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999
30
System Pair Benefits
Tolerates faults
Hardware
Environment
Operations
Heisenbugs
Can replace software/hardware online
Can move backup to new building or...
Allows design diversity: backup can be completely different
Step 1: Both systems are running version V1.
Primary
V1
Backup
V1
© Jim Gray, Andreas Reuter
Step 2: Backup is co ld-lo aded as versio n V2.
Primary
V1
Backup
V2
Step 4: Backup is co ld-lo aded as versio n V2
Step 3: SWITCH to Backup.
Backup
V1

Primary
V2
Backup
V2
Transaction Processing - Concepts and Techniques
Primary
V2
WICS August 2 - 6, 1999
31
Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of Commit

Optimizing Commit

Disaster Protection via Data/Application
Replication
© Jim Gray, Andreas Reuter
Transaction Processing - Concepts and Techniques
WICS August 2 - 6, 1999

32
Download