CS514: Intermediate Course in Operating Systems Lecture 13: Oct. 5

advertisement
CS514: Intermediate
Course in Operating
Systems
Professor Ken Birman
Ben Atkin: TA
Lecture 13: Oct. 5
Consistency
• How can we relate models of
consistency to cost and
availability?
• Is it possible to reconcile
transactional replication with
virtual synchrony replication?
Consistency
• Various models
– Multiple copies of some object but
behavior mimics a single nonfaulty object
– ACID: 1-copy SR plus durability
– FLP style of consensus
– Dynamic uniformity versus static
model
Basic “design points”
• Does the model guarantee anything
relative to “last words” of a process
that fails?
– Yes for transactions: ACID
– No, in virtual synchrony
• Can do better using “flush” primitive
• And can mimic transactional replication if we
require that primary partition is also a
quorum of some statically specified set of
processes
Are actions
asynchronous?
• No in case of transactions
– We can do things locally
– But at commit time, we need to
synchronize
– And most transactional replication
schemes are heavily synchronous
• Yes for virtual synchrony
– But only with cbcast or fbcast
Mixing models
• Virtual synchrony is like “weak
transactional serializability”
– In fact, connection can be made
precise
– We use a model called
linearizability by Wing and Herlihy
• Much recent work on database
replication mixes models…
Real systems have
varied needs
• Must match choice of
properties to needs of the
application
• Find that multiple models are
hard to avoid
– We want the stronger models for
database applications
– But where data won’t persist, the
cheaper models suffice…
Digression
• Need to strengthen our intuition
• Can we find examples of real
systems that might need group
communication or data replication?
– Ideally, systems that can’t be built in any
other way
– Use this to think about properties
required for application correctness
Joint Battlespace
Infosphere
ROBUST INFRASTRUCTURE
Distributed Trading System
Pricing DB’s
Trader Clients
Historical Data
Market
Data
Feeds
Current Pricing
Availability for historical data
Load balancing and consistent
message delivery for price
distribution
Parallel execution for analytics
Analytics
Tokyo, London, Zurich, ...
Long-Haul WAN Spooler
Distributed Service Node
One phone number per person
Telephony Data/Digitized Voice Path
x86/UNIX
1.
RISC/UNIX
2.
Telephone
Trunk Lines
RISC/UNIX
3.
Ethernet
= Isis
 Replicated files for digitized voice store
 Redundancy for database availability
 Load balancing for call handling & routing
Dumb
Switch
Calls,
Changes,
Adds,
Deletes
Shop Floor Process Control
Example
WorkStream
Server
VAX
HP
Operator
Client
Recipe
Management
Server
Data
Collection
Server
HP
HP
Operator
Client
PC
Operator
Client
HP
PC
Operator
Client
HP
Station
Controller
Enet
HP
Station
Controller
Factory equipment
The List Goes On
• Air traffic control system
• Medical decision support in a hospital
• Providing real-time data in support of
banking or major risk-management
strategies in finance
• Real-time system for balancing power
production and consumption in the power
grid
• Telephone system for providing services in
setting with mobile users and complex
requirements
Challenge faced by
developers
• We have multiple notions of
consistency now:
– Transactional, with persistent data
– Process groups with dynamic uniformity
– Process groups without dynamic
uniformity
– Primary partition notion of progress
– Non-primary partitions with merge
• How can we make the right choices
for a given situation?
One sizes fits all?
• One possibility is that we’ll simply
need multiple options
– User would somehow specify their
requirements
– Given this information, system would
configure protocols appropriately
• Alternative is to decide to
standardize on one scheme
– Likely to be a strong, more costly option
“Understanding” CATOCS
• Paper by Cheriton, Skeen in 1993
• They argue that end-to-end approach
dictates
– Simplicity in the GCS
– Properties enforced near end-points
• Paper is full of mistakes but the
point is well taken
– People don’t want to pay for properties
they don’t actually require!
French air traffic control
• They wanted to use replication and
group communication in a system for
high availability controller consoles
• Issues they faced
– How strong is the consistency need?
– Where should we use groups?
– Where should we use transactions
Air traffic control
• Much use of computer technologies
– Flight management system (controls
airplane)
– Flaps, engine controls (critical
subsystems)
– Navigational systems
– TCAS (collision avoidance system)
– Air traffic control system on ground
• In-flight, approach, international “hand-off”
– Airport ground system (runways, gates,
etc)
Air traffic control
• Much use of computer technologies
– Flight management system (controls
airplane)
– Flaps, engine controls (critical
subsystems)
– Navigational systems
– TCAS (collision avoidance system)
– Air traffic control system on ground
• In-flight, approach, international “hand-off”
– Airport ground system (runways, gates,
etc)
ATC system components
Onboard
Radar
X.500
Directory
Controllers
Air Traffic Database
(flight plans, etc)
Possible uses of groups
• To replicate data in console clusters
• For administration of console
clusters
• For administration of the “whole
system”
• For radar communication from radar
to the consoles
• To inform consoles when flight plan
database is updated
• To replicate the database itself
ATC system components
Onboard
Radar
X.500
Directory
Controllers
Air Traffic Database
(flight plans, etc)
French air traffic control
• Some conclusions
– They use transactions for the flight plan
database
• In fact would love to ways to replicate this
“geographically”
• But the topic remains research
– They use one process group for each set
of 3-5 control consoles
– They use unreliable hardware multicast
to distribute radar inputs
• Different groups treated in different
ways
French air traffic control
• Different consistency in different uses
• In some cases, forced changes to the
application itself
– E.g. different consoles may not have
identical radar images
• Choices always favored
– Simplicity
– Avoiding technology performance and
scaling limits
Air traffic control
example
• Controller interacts with service:
“where can I safely route flight TWA
857?”
• Service responds: “sector 17.8.09 is
available”
... what forms of consistency are
needed in order to make this a
safe action to perform?
Observations that can
help
• Real systems are client-server
structured
• Early work on process group
computing tended to forget this!
– Isis system said “RPC can be harmful”
but then took the next step and said “so
we won’t think in client-server terms”.
This was a mistake!
– Beware systems that provide a single
API system-wide
A multi-tier API
• Separate concerns:
– Client system wants a simple interface,
RPC to servers and a reliable stream
connection back, wants to think of the
whole system as a single server
– Server wants to implement a WAN
abstraction out of multiple component
servers
– Server itself wants replication and loadbalancing for fault-tolerance
• Need security and management API
throughout
Sample but typical issue
• It is very appealing to say
– This server poses a problem
– So I’ll roll it out…
– … and replace it with a high
availability group server
• Often, in practice, the existing
code on the client side
precludes such upgrades!
Separate concerns
• Consistency goals for client are
different from goals within lower
levels of many systems
• At client level, main issue is dynamic
uniformity: does a disconnected
client continue to act on basis of
information provided before the
partitioning? Do we care?
• In ATC example, the answer is yes,
so we need dynamic uniformity
guarantees
WAN architecture
• Mental model for this level is a
network whose component nodes
are servers
• Each server initiates updates to data
it “owns” and distributes this data to
the other servers
• May also have globally owned data
but this is an uncommon case!
• For global data need dynamic
uniformity but for locally owned
data, weaker solution sufficies
Consistency approach in
partitionable network
• Free to update your local data
• When partition ends, state merges
by propogation of local updates to
remote sites, which had safe but
stale view of other sites’ local data.
(Treated formally by Malki, Dolev,
Strong; Keidar, others)
• Global updates may be done using
dynamically uniform protocols, but
will obviously be delayed during
partitioning events
Within a server
• At this level, see a server replicated
on multiple nodes for
– Fault-tolerance (availability or
recoverability)
– Load-balancing
– Replication to improve response time
• Goal is primary component progress
and no need for dynamic uniformity
Worst case for a
replicated server?
• If application wants recoverability,
server replication may be costly and
counterproductive
• Many real database systems actually
sacrifice transactional guarantees to
fudge this case:
– Primary/backup approach with log sent
from primary to backup periodically
– Failure can cause some transactions to
“vanish” until primary recovers and lost
log records are discovered
Observations?
• Complex systems may exhibit
multiple needs, superimposed
• Needs are easiest to understand
when approached in terms of
architectural structure
• Literature of distributed consistency
is often confusing because different
goals are blurred in papers
Example of a blurry goal
• Essential point of the famous FLP
result: can’t guarantee liveness in a
system that also provides an
external consistency propertly such
as dynamic unformity or database
atomicity
• Can evade in settings with accurate
failure detectors... but real systems
can make mistakes
• But often, we didn’t actually want
this form of consistency!
Example of a blurry goal
(cont)
• Moreover, FLP result may require a
very “clever” adversary strategy.
• Friedman and Vaysburd have a proof
that an adversary that cannot
predict the future is arbitrarily
unlikely to prevent consensus!
• On the other hand, it is easy to force
a system to wait if it wants external
consistency. Think about 2PC and
3PC. This is a more serious issue.
Much of the theory is
misunderstood!
• Theory tends to make sweeping
claims: “The impossibility of group
membership in asynchronous
systems”
• These claims are, strictly speaking,
correct
• But they may not be relevant in
specific practical settings because,
often, the practical situation needs
much weaker guarantees!
When do we need FLP
style consistency?
• Few real systems need such strong
forms of consistency
• Yet relatively little is understood
about the full spectrum of weaker
consistency options
• Interplay between consistency of
the fault-tolerance solution and
other properties like security or realtime further clouds the picture
Why should this bother
us?
• Main problem is that we can’t just pick a
single level of consistency that will make
all users happy
• Database consistency model is extremely
slow: even database vendors don’t respect
the model (primary/backup “window of
vulnerability” is accepted because 2PC is
too costly)
• Dynamic uniformity costs a factor of 1001000 compared to non-uniform protocols
... but non-uniform
protocols are too weak!
• Usually, non-uniform protocols are
adequate
– They capture “all the things that a
system can detect about itself”
– They make sense when partitioning
can’t occur, as on a cluster of computers
working as a server
• But they don’t solve our ATC
example and are too weak for a
wide-area database replicated over
many servers
Optimal Transactions
• Best known dynamic uniformity
solution is actually not the static
scheme we’ve examined
• This optimal approach
– Was first developed by Lamport in his
Paxos paper, but the paper was very
hard to follow
– Later, Keidar, Chockler and Dolev
showed that a version of 3-phase
commit gives optimal progress; the
scheme was very similar to Paxos
– But performance is still “poor”
Long-term prospects?
• Systems in which you pay for what you use
• API’s specialized to particular models of
computation in which API can make a
choice even if same choice wouldn’t work
for other API’s
• Tremendous performance variation
depending on the nature of the tradeoffs
accepted
• Big difference depending on whether we
care about actions by that partitioned-off
controller
Theory side
• Beginning to see a solid and well founded
theory of consistency that deals with nonuniform case
– For example, Lynch has an IOA model of vsync.
• Remains hard to express the guarantees of
such a model in the usual temporal logic
style
– Cristian and Fetzer did some nice work on this
• Challenge is that we want to say things
“about” the execution, but we often don’t
know “yet” if the process we are talking
about will turn out to be a member of the
system or partitioned away!
Other directions?
• In subsequent lectures will look at
probabilistic guarantees
– These are fairly basic to guarantees of
real-time behavior
– They can be integrated into more
traditional replication and process group
methods, but not easily
• Self-stabilization is another option,
we won’t consider it here (Dijkstra)
Self-stabilization
• Idea is that the system, when
pushed out of a consistent state,
settles back into one after the
environment stops “pushing”
• Example: after a failure, if no more
failures occur, some system
guarantee is restored
• Concern is that this may not bound
the degree of deviation from correct
behavior while the system is being
perturbed.
Curious problems?
• Is reliability/consistency fundamentally
“unstable”
– Many systems tend to thrash if reliability technology is
scaled up enough (except if goals are probabilistic)
– Example: reliable 1-n communication is harder and
harder to scale as n gets larger. (But probabilistically
reliable 1-n communication may do better as we scale)
– Underlying theory completely unknown: research topic
... can we develop a “90% reliable” protocol?
Is there a window of stable behavior?
Consistency is
ubiquitous
• We often talk about “the” behavior of “a”
system, not the “joint” behavior of its
components
• Implication is that our specifications
implicitly assume that there is a mapping
from “the system” to the execution model
• Confusion over consistency thus has
fundamental implications. One of the
hardest problems we face in distributed
computing today!
Moving on…
• But enough about replication
• Now start to think about higher
level system issues
• Next week: briefly look at what
is known about how and why
systems fail
• Then look at a variety of
structuring options and trends…
Download