UCBerkeley_Gray_FT_Avialiability_talk.

advertisement
FT 101
Jim Gray Microsoft Research
http://research.microsoft.com/~gray/Talks/
80% of slides are not shown (are hidden) so view with PPT to see them all
Outline
• Terminology and empirical measures
• General methods to mask faults.
• Software-fault tolerance
• Summary
1
Dependability: The 3 ITIES
• Reliability / Integrity:
does the right thing.
(Also large MTTF)
Integrity Security
Reliability
• Availability: does it now.
(Also small MTTR
MTTF+MTTR
Availability
System Availability:
if 90% of terminals up & 99% of DB up?
(=>89% of transactions are serviced on time).
• Holistic vs. Reductionist view
2
High Availability System Classes
Goal: Build Class 6 Systems
Unavailable
System Type
(min/year)
Unmanaged
50,000
Managed
5,000
Well Managed
500
Fault Tolerant
50
High-Availability
5
Very-High-Availability
.5
Ultra-Availability
.05
Availability
90.%
99.%
99.9%
99.99%
99.999%
99.9999%
99.99999%
Availability
Class
1
2
3
4
5
6
7
UnAvailability = MTTR/MTBF
can cut it in ½ by cutting MTTR or MTBF
3
Demo: looking at some nodes
• Look at http://uptime.netcraft.com/
• Internet Node availability:
92% mean,
97% median
Darrell Long (UCSC)
ftp://ftp.cse.ucsc.edu/pub/tr/
– ucsc-crl-90-46.ps.Z "A Study of the Reliability of Internet Sites"
– ucsc-crl-91-06.ps.Z "Estimating the Reliability of Hosts Using the
Internet"
– ucsc-crl-93-40.ps.Z "A Study of the Reliability of Hosts on the Internet"
– ucsc-crl-95-16.ps.Z "A Longitudinal Survey of Internet Host Reliability"
4
Sources of Failures
Power Failure:
Phone Lines
Soft
Hard
Hardware Modules:
Software:
MTTF
2000 hr
MTTR
1 hr
>.1 hr
4000 hr
100,000hr
.1 hr
10 hr
10hr (many are transient)
1 Bug/1000 Lines Of Code (after vendor-user testing)
=> Thousands of bugs in System!
Most software failures are transient: dump & restart system.
Useful fact: 8,760 hrs/year ~ 10k hr/year
5
Case Study - Japan
"Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi Watanabe).
Vendor
4 2%
Tele Comm
lines
12 %
2 5%
Application
Software
11.2
%
Environment
9.3%
Operations
Vendor (hardware and software)
Application software
Communications lines
Operations
Environment
5
9
1.5
2
2
10
Months
Months
Years
Years
Years
Weeks
1,383 institutions reported (6/84 - 7/85)
7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES
To Get 10 Year MTTF, Must Attack All These Areas
6
Case Studies - Tandem Trends
Reported MTTF by Component
Mean Time to System Failure (years)
by Cause
450
400
maintenance
350
300
250
hardware
environment
200
operations
150
100
software
50
total
0
1985
1987
1989
1985
1987
1990
SOFTWARE
HARDWARE
MAINTENANCE
OPERATIONS
ENVIRONMENT
2
29
45
99
142
53
91
162
171
214
33
310
409
136
346
Years
Years
Years
Years
Years
SYSTEM
8
20
21
Years
Problem: Systematic Under-reporting
7
Many Software Faults are Soft
After
Design Review
Code Inspection
Alpha Test
Beta Test
10k Hrs Of Gamma Test (Production)
Most Software Faults Are Transient
MVS Functional Recovery Routines
Tandem Spooler
Adams
5:1
100:1
>100:1
Terminology:
Heisenbug: Works On Retry
Bohrbug: Faults Again On Retry
Adams: "Optimizing Preventative Service of Software Products", IBM J R&D,28.1,1984
Gray: "Why Do Computers Stop", Tandem TR85.7, 1985
8
Mourad: "The Reliability of the IBM/XA Operating System", 15 ISFTCS, 1985.
Summary of FT Studies
• Current Situation: ~4-year MTTF =>
Fault Tolerance Works.
• Hardware is GREAT (maintenance and MTTF).
• Software masks most hardware faults.
• Many hidden software outages in operations:
– New Software.
– Utilities.
• Must make all software ONLINE.
• Software seems to define a 30-year MTTF ceiling.
• Reasonable Goal: 100-year MTTF.
class 4 today => class 6 tomorrow.
9
Fault Tolerance vs Disaster Tolerance
• Fault-Tolerance: mask local faults
– RAID disks
– Uninterruptible Power Supplies
– Cluster Failover
• Disaster Tolerance: masks site failures
– Protects against fire, flood, sabotage,..
– Redundant system and service at remote site.
– Use design diversity
10
Outline
• Terminology and empirical measures
• General methods to mask faults.
• Software-fault tolerance
• Summary
11
Fault Model
• Failures are independent
So, single fault tolerance is a big win
• Hardware fails fast (blue-screen)
• Software fails-fast (or goes to sleep)
• Software often repaired by reboot:
– Heisenbugs
• Operations tasks: major source of outage
– Utility operations
– Software upgrades
12
Fault Tolerance Techniques
• Fail fast modules: work or stop
• Spare modules : instant repair time.
• Independent module fails by design
MTTFPair ~ MTTF2/ MTTR (so want tiny MTTR)
• Message based OS: Fault Isolation
software has no shared memory.
• Session-oriented comm: Reliable messages
detect lost/duplicate messages
coordinate messages with commit
• Process pairs :Mask Hardware & Software Faults
• Transactions: give A.C.I.D. (simple fault model)
13
Example: the FT Bank
Fault Tolerant Computer
Backup System
System MTTF >10 YEAR (except for power & terminals)
Modularity & Repair are KEY:
vonNeumann needed 20,000x redundancy in wires and switches
We use 2x redundancy.
Redundant hardware can support peak loads (so not redundant)
14
Fail-Fast is Good, Repair is Needed
Lifecycle of a module
fail-fast gives
short fault latency
High Availability
is low UN-Availability
Unavailability
MTTR
MTTF
Improving either MTTR or MTTF gives benefit
Simple redundancy does not help much.
15
Hardware Reliability/Availability
(how to make it fail fast)
Basic FailFast Designs
Pair
Triplex
Recursive Designs
Recursive Availability Designs
Triple Modular Redundancy
Pair & Spare + +
Comparitor Strategies:
Duplex:
Fail-Fast: fail if either fails (e.g. duplexed cpus)
vs
Fail-Soft: fail if both fail (e.g. disc, atm,...)
Note: in recursive pairs, parent knows which is bad.
Triplex:
Fail-Fast: fail if 2 fail (triplexed cpus)
Fail-Soft: fail if 3 fail (triplexed FailFast cpus)
16
Redundant Designs have Worse MTTF!
Duplex: fail fast
TMR: fail fast
mttf/2
5/6*mttf
mttf/1
mttf/2
mttf/3
0
work
1
work
2
work
Duplex: fail soft
TMR: fail soft
1.5*mttf
mttf/2
11/6*mttf
mttf/1
mttf/3
0
work
1
work
2
work
3
work
Pair & Spare: fail fast
mttf/4
0
3
work
mttf/1
mttf/2
2
work
0
work
1
work
Pair & Spare: fail soft
~2.1*mttf
mttf/4
4
work
mttf/3
3
work
mttf
mttf/2
2
work
1
work
mttf/2
2
work
mttf/1
1
work
0
work
The Airplane Rule:
3/4*mttf
4
work
0
work
1
work
2
work
3
work
mttf/1
mttf/2
0
work
A two-engine airplane
has twice as many engine
problems as a one engine plane.
THIS IS NOT GOOD: Variance is lower but MTTF is worse
Simple redundancy does not improve MTTF (sometimes hurts).
This is just an example of the airplane rule.
17
Add Repair: Get 104 Improvement
Duplex: fail fast:
mtbf/2
2
work
mttf/2
1
mttr work
Duplex: fail soft
TMR: fail fast
mttf/3
mttf/1
mttr
0
work
4
10 mttf
mttf/2
1
2
work mttr work
3
work
mttr
mttf/2
2
work
TMR: fail soft
mttf/3
mttf/1
0
mttr work
4
10 mttf
3
work
1
mttr work
0
mttr work
5
10 mttf
mttf/2
2
1
work
work
mttf/2
mttr
mttr
mttf/1
Availability estimates
1 year MTTF modules
12-hour MTTR
MTTF EQUATION
SIMPLEX
1 year
MTTF
DUPLEX:
~0.5
- MTTF/2
FAIL FAST
years
DUPLEX: FAIL
~1.5
- MTTF(3/2)
SOFT
years
TRIPLEX:
.8 year - MTTF(5/6)
0
mttr work
FAIL FAST
TRIPLEX:
FAIL SOFT
Pair and spare:
FAIL -FAST
TRIPLEX WITH
REPAIR
Duplex fail soft +
REPAIR
1.8
year
~. 7
year
>105
years
>104
years
COST
1
2+
2+
3+
- 1.8MTTF
3+
- MTTF(3/4)
4+
MTTF3/3MTTR
3+
2
18
MTTF2/2MTTR
4+
When To Repair?
Chances Of Tolerating A Fault are 1000:1 (class 3)
A 1995 study: Processor & Disc Rated At ~ 10khr MTTF
Computed Single
Observed
Failures
Double Fails
Ratio
10k Processor Fails
14 Double
~ 1000 : 1
40k Disc Fails,
26 Double
~ 1000 : 1
Hardware Maintenance:
On-Line Maintenance "Works" 999 Times Out Of 1000.
The chance a duplexed disc will fail during maintenance?1:1000
Risk Is 30x Higher During Maintenance
=> Do It Off Peak Hour
Software Maintenance:
Repair Only Virulent Bugs
Wait For Next Release To Fix Benign Bugs
19
OK: So Far
Hardware fail-fast is easy
Redundancy plus Repair is great (Class 7 availability)
Hardware redundancy & repair is via modules.
How can we get instant software repair?
We Know How To Get Reliable Storage
RAID Or Dumps And Transaction Logs.
We Know How To Get Available Storage
Fail Soft Duplexed Discs (RAID 1...N).
? How do we get reliable execution?
? How do we get available execution?
20
Outline
• Terminology and empirical measures
• General methods to mask faults.
• Software-fault tolerance
• Summary
21
Key Idea
}
Architecture
Hardware Faults
Software
Masks
Environmental Faults
Distribution
Maintenance
• Software automates / eliminates operators
So,
• In the limit there are only software & design faults.
{
}
{
Software-fault tolerance is the key to dependability.
INVENT IT!
22
Software Techniques:
Learning from Hardware
Recall that most outages are not hardware.
Most outages in Fault Tolerant Systems are SOFTWARE
Fault Avoidance Techniques: Good & Correct design.
After that: Software Fault Tolerance Techniques:
Modularity (isolation, fault containment)
Design diversity
N-Version Programming: N-different implementations
Defensive Programming: Check parameters and data
Auditors: Check data structures in background
Transactions: to clean up state after a failure
Paradox: Need Fail-Fast Software
23
Fail-Fast and High-Availability Execution
Software N-Plexing: Design Diversity
N-Version Programming
Write the same program N-Times (N > 3)
Compare outputs of all programs and take majority vote
Process Pairs: Instant restart (repair)
Use Defensive programming to make a process fail-fast
Have restarted process ready in separate environment
Second process “takes over” if primary faults
Transaction mechanism can clean up distributed state
LOGICAL PROCESS = PROCESS PAIR
if takeover in middle of computation.
SESSION
PRIMARY
PROCESS
STATE
INFORMATION
BACKUP
PROCESS
24
What Is MTTF of N-Version Program?
First fails after MTTF/N
Second fails after MTTF/(N-1),...
so MTTF(1/N + 1/(N-1) + ... + 1/2)
harmonic series goes to infinity, but VERY slowly
for example 100-version programming gives
~4 MTTF of 1-version programming
Reduces variance
N-Version Programming Needs REPAIR
If a program fails, must reset its state from other
programs.
=> programs have common data/state representation.
How does this work for
Database Systems?
Operating Systems?
Network Systems?
25
Answer: I don’t know.
Why Process Pairs Mask Faults:
Many Software Faults are Soft
After
Design Review
Code Inspection
Alpha Test
Beta Test
10k Hrs Of Gamma Test (Production)
Most Software Faults Are Transient
MVS Functional Recovery Routines
Tandem Spooler
Adams
5:1
100:1
>100:1
Terminology:
Heisenbug: Works On Retry
Bohrbug: Faults Again On Retry
Adams: "Optimizing Preventative Service of Software Products", IBM J R&D,28.1,1984
Gray: "Why Do Computers Stop", Tandem TR85.7, 1985
26
Mourad: "The Reliability of the IBM/XA Operating System", 15 ISFTCS, 1985.
Heisenbugs:
A Probabilistic Approach to Availability
There is considerable evidence that (1) production
systems have about one bug per thousand lines of
code (2) these bugs manifest themselves in
stochastically: failures are due to confluence of rare
events, (3) system mean-time-to-failure has a lower
bound of a decade or so. To make highly available
systems, architects must tolerate these failures by
providing instant repair (un-availability is approximated
by repair_time/time_to_fail so cutting the repair time in
half makes things twice as good. Ultimately, one builds
a set of standby servers which have both design
diversity and geographic diversity. This minimizes
27
common-mode failures.
Process Pair Repair Strategy
If software fault (bug) is a Bohrbug, then there is no repair
“wait for the next release” or
“get an emergency bug fix” or
“get a new vendor”
LOGICAL PROCESS = PROCESS PAIR
SESSION
If software fault is a Heisenbug,
then repair is
PRIMARY
PROCESS
STATE
INFORMATION
BACKUP
PROCESS
reboot and retry or
switch to backup process (instant restart)
PROCESS PAIRS Tolerate Hardware Faults
Heisenbugs
Repair time is seconds, could be mili-seconds if time is critical
Flavors Of Process Pair:
Lockstep
Automatic
State Checkpointing
Delta Checkpointing
Persistent
28
How Takeover Masks Failures
Server Resets At Takeover But What About
LOGICAL PROCESS = PROCESS PAIR
SESSION
Answer:
PRIMARY
PROCESS
STATE
INFORMATION
BACKUP
PROCESS
Application State?
Database State?
Network State?
Use Transactions To Reset State!
Abort Transaction If Process Fails.
Keeps Network "Up"
Keeps System "Up"
Reprocesses Some Transactions On Failure
29
PROCESS PAIRS - SUMMARY
Transactions Give Reliability
Process Pairs Give Availability
Process Pairs Are Expensive & Hard To Program
Transactions + Persistent Process Pairs
=> Fault Tolerant Sessions &
Execution
When Tandem Converted To This Style
Saved 3x Messages
Saved 5x Message Bytes
Made Programming Easier
30
SYSTEM PAIRS
FOR HIGH AVAILABILITY
Primary
Backup
Programs, Data, Processes Replicated at two sites.
Pair looks like a single system.
System becomes logical concept
Like Process Pairs: System Pairs.
Backup receives transaction log (spooled if backup down).
31
If primary fails or operator Switches, backup offers service.
SYSTEM PAIR CONFIGURATION
OPTIONS
Mutual Backup:
each has1/2 of Database & Application
Backup
Primary
Primary
Hub:
One site acts as backup for many others
In General can be any directed graph
Primary
Backup
Primary
Stale replicas: Lazy replication
Primary
Backup
Copy
Copy
Copy
32
SYSTEM PAIRS FOR: SOFTWARE
MAINTENANCE
(Primary)
V1
(Backup )
V1
St ep 1: Bot h systems are running V1.
(Bac kup )
V1
(Primary)
V2
Step 3 : SWITCH to Backup.
(Primary)
V1
(Backup )
V2
Step 2: Backup is cold-loaded as V2.
(Backup )
V2
(Prim ary )
V2
Step 4: Backup is cold-loaded as V2 D3 0.
Similar ideas apply to:
Database Reorganization
Hardware modification (e.g. add discs, processors,...)
Hardware maintenance
Environmental changes (rewire, new air conditioning)
Move primary or backup to new location.
33
SYSTEM PAIR BENEFITS
Protects against ENVIRONMENT:
weather
utilities
sabotage
Protects against OPERATOR FAILURE:
two sites, two sets of operators
Protects against MAINTENANCE OUTAGES
work on backup
software/hardware install/upgrade/move...
Protects against HARDWARE FAILURES
backup takes over
Protects against TRANSIENT SOFTWARE ERRORR
Allows design diversity
34
different sites have different software/hardware)
Key Idea
}
Architecture
Hardware Faults
Software
Masks
Environmental Faults
Distribution
Maintenance
• Software automates / eliminates operators
So,
• In the limit there are only software & design faults.
Many are Heisenbugs
{
}
{
Software-fault tolerance is the key to dependability.
INVENT IT!
35
References
Adams, E. (1984). “Optimizing Preventative Service of Software Products.” IBM Journal of
Research and Development. 28(1): 2-14.0
Anderson, T. and B. Randell. (1979). Computing Systems Reliability.
Garcia-Molina, H. and C. A. Polyzois. (1990). Issues in Disaster Recovery. 35th IEEE
Compcon 90. 573-577.
Gray, J. (1986). Why Do Computers Stop and What Can We Do About It. 5th Symposium on
Reliability in Distributed Software and Database Systems. 3-12.
Gray, J. (1990). “A Census of Tandem System Availability between 1985 and 1990.” IEEE
Transactions on Reliability. 39(4): 409-418.
Gray, J. N., Reuter, A. (1993). Transaction Processing Concepts and Techniques. San Mateo,
Morgan Kaufmann.
Lampson, B. W. (1981). Atomic Transactions. Distributed Systems -- Architecture and
Implementation: An Advanced Course. ACM, Springer-Verlag.
Laprie, J. C. (1985). Dependable Computing and Fault Tolerance: Concepts and Terminology.
15’th FTCS. 2-11.
Long, D.D., J. L. Carroll, and C.J. Park (1991). A study of the reliability of Internet sites. Proc
10’th Symposium on Reliable Distributed Systems, pp. 177-186, Pisa, September 1991.
Darrell Long, Andrew Muir and Richard Golding, ``A Longitudinal Study of Internet Host
Reliability,'' Proceedings of the Symposium on Reliable Distributed Systems, Bad Neuenahr,
36
Germany: IEEE, September 1995, pp. 2-9
37
Scaleable
Replicated Databases
Jim Gray (Microsoft)
Pat Helland (Microsoft)
Dennis Shasha (Columbia)
Pat O’Neil (U.Mass)
38
• Replication strategies
Outline
– Lazy and Eager
– Master and Group
• How centralized databases scale
– deadlocks rise non-linearly with
• transaction size
• concurrency
• Replication systems are unstable on scaleup
• A possible solution
39
Scaleup, Replication, Partition
Base case
Scaleup
a 1 TPS system
to a 2 TPS centralized system
200 Users
2 TPS server
Partitioning
Replication
Two 1 TPS systems
Two 2 TPS systems
100 Users
1 tps
O tps
100 Users
O tps
1 TPS server
1 TPS server
100 Users
• N2
more
2 TPS server
work
100 Users
1 tps
1 TPS server
100 Users
2 TPS server
40
•
Why
Replicate
Databases?
Give users a local copy for
– Performance
– Availability
– Mobility (they are disconnected)
• But... What if they update it?
• Must propagate updates to other copies
41
Write A
Write A
Propagation Strategies
• Eager: Send update right away
– (part of same transaction)
– N times larger transactions
• Lazy: Send update asynchronously
– separate transaction
– N times more transactions
• Either way
– N times more updates per second per node
– N2 times more work overall
Write A
Write B
Write B
Write B
Write C
Write C
Write C
Commit
Commit
Commit
Write A
Write B
Write C
Commit
Write A
Write B
Write C
Commit
Write A
Write B
Write C
Commit
42
Update Control Strategies
• Master
– Each object has a master node
– All updates start with the master
– Broadcast to the subscribers
• Group
– Object can be updated by anyone
– Update broadcast to all others
• Everyone wants Lazy Group:
– update anywhere, anytime, anyway
43
• EagerQuiz
Questions:
Name
One
– Master:N-Plexed disks
– Group:
• Lazy
?
– Master:
Bibles, Bank accounts, SQLserver
– Group: Name servers, Oracle, Access...
• Note: Lazy contradicts Serializable
– If two lazy updates collide, then ... reconcile
• discard one transaction (or use some other rule)
• Ask for human advice
• Meanwhile, nodes disagree =>
– Network DB state diverges: System Delusion
44
Anecdotal Evidence
•
•
•
•
Update Anywhere systems are attractive
Products offer the feature
It demos well
But when it scales up
– Reconciliations start to cascade
– Database drifts “out of sync” (System Delusion)
• What’s going on?
45
Outline
• Replication strategies
– Lazy and Eager
– Master and Group
• How centralized databases scale
– deadlocks rise non-linearly
• Replication is unstable on scaleup
• A possible solution
46
Simple Model of WaitsDBsize
records
• TPS transactions per second
• Each
– Picks Actions records uniformly
TransctionsxActions
from set of DBsize records
2
– Then commits
• About Transactions x Actions/2 resources locked
Transactions x Actions
• Chance a request waits is
2 x DB_size
• Action rate is TPS x Actions
• Active Transactions TPS x Actions x Action_Time
• Wait Rate = Action rate x Chance a request waits
•
=
TPS2 x Actions3 x Action_Time
• 10x more transactions, 100x 2more
waits
x DB_size
47
Simple
Model
• A deadlock
is a wait
cycle
• Cycle of length 2:
of Deadlocks
– Wait rate x Chance Waitee waits for waiter
– Wait rate x (P(wait) / Transactions)
TPS2 x Actions3 x Action_Time
2 x DB_size
•
TPS2 x Actions5 x Action_Time
Cycles of length
3 are2 PW3, so
4 x DB_size
TPS x Actions3x Action_Time
2 x DB_size
TPS x Actions x Action_Time
ignored.
• 10x bigger trans = 100,000x more deadlocks
48
Summary
So
Far
Even centralized systems unstable
•
• Waits:
– Square of concurrency
– 3rd power of transaction size
• Deadlock rate
– Square of concurrency
– 5th power of transaction size
49
Outline
• Replication strategies
• How centralized databases scale
• Replication is unstable on scaleup
• Eager (master & group)
• Lazy (master & group & disconnected)
• A possible solution
50
Eager Transactions are FAT
Write A
• If N nodes, eager transaction is Nx bigger
Write A
Write A
Write B
– Takes Nx longer
– 10x nodes, 1,000x deadlocks
– (derivation in paper)
Write B
Write B
Write C
Write C
Write C
Commit
Commit
Commit
• Master slightly better than group
• Good news:
– Eager transactions only deadlock
– No need for reconciliation
51
Write A
Lazy Master & Group
New
Write C
Timestamp
Write B
Commit
• Use optimistic concurrency control
– Keep transaction timestamp with record
– Updates carry old+new timestamp
– If record has old timestamp
Write A
Write B Write A
Write C Write B
Commit Write C
Commit
• set value to new value
• set timestamp to new timestamp
– If record does not match old timestamp
• reject lazy transaction
– Not SNAPSHOT isolation (stale reads)
• Reconciliation:
A Lazy Transaction
TRID, Timestamp
OID, old time, new value
– Some nodes are updated
– Some nodes are “being reconciled”
52
Reconciliation
• Reconciliation means System Delusion
– Data inconsistent with itself and reality
• How frequent is it?
• Lazy transactions are not fat
– but N times as many
– Eager waits become Lazy reconciliations
– Rate is: TPS2 x (Actions x Nodes)3 x Action_Time
2 x DB_size
– Assuming everyone is connected
53
Eager & Lazy: Disconnected
• Suppose mobile nodes disconnected for a day
• When reconnect:
– get all incoming updates
– send all delayed updates
• Incoming is Nodes x TPS x Actions x disconnect_time
• Outgoing is: TPS x Actions x Disconnect_Time
Action_Time
• Conflicts are intersection of these two sets
Action_Time
Disconnect_Time x (TPS xActions x Nodes)2
DB_size
54
•
•
•
•
Replication strategies (lazy & eager, master & group)
Outline
How centralized databases scale
Replication is unstable on scaleup
A possible solution
– Two-tier architecture: Mobile & Base nodes
– Base nodes master objects
– Tentative transactions at mobile nodes
• Transactions must be commutative
– Re-apply transactions on reconnect
– Transactions may be rejected
55
• Each object mastered
at a node
Safe Approach
• Update Transactions only
read and write master items
• Lazy replication to other nodes
• Allow reads of stale data (on user request)
• PROBLEMS:
– doesn’t support mobile users
– deadlocks explode with scaleup
• ?? How do banks work???
56
Two Tier Replication
• Two kinds of nodes:
– Base nodes always connected, always up
– Mobile nodes occasionally connected
• Data mastered at base nodes
• Mobile nodes
– have stale copies
– make tentative updates
Mobile
Base
Node
57
Mobile Node Makes Tentative
Updates
• Updates local database while disconnected
• Saves transactions
• When Mobile node reconnects:
Tentative transactions re-done
as Eager-Master
(at original time??)
tentative
transactions
• Some may be rejected
Mobile
– (replaces reconciliation)
• No System Delusion.
base updates &
failed base transactions
Base
Node
58
• Must be commutative
others
Tentativewith
Transactions
– Debit 50$ rather than Change 150$ to 100$.
• Must have acceptance criteria
– Account balance is positive
– Ship date no later than quoted
– Price is no greater than quoted
Tentative
Transactions
at local DB
Transactions
From
Others
Updates & Rejects
59
Refinement: Mobile Node Can
Master Some Data
• Mobile node can master “private” data
– Only mobile node updates this data
– Others only read that data
• Examples:
– Orders generated by salesman
– Mail generated by user
– Documents generated by Notes user.
60
Virtue of 2-Tier Approach
•
•
•
•
Allows mobile operation
No system delusion
Rejects detected at reconnect (know right away)
If commutativity works,
– No reconciliations
– Even though work rises as (Mobile + Base)2
61
Outline
•
•
•
•
Replication strategies (lazy & eager, master & group)
How centralized databases scale
Replication is unstable on scaleup
A possible solution (two-tier architecture)
– Tentative transactions at mobile nodes
– Re-apply transactions on reconnect
– Transactions may be rejected & reconciled
• Avoids system delusion
62
Download