Reliability and State Machines in an Advanced Network Testbed Mac Newbold

advertisement
Reliability and State Machines in
an Advanced Network Testbed
Mac Newbold
School of Computing
University of Utah
MS Thesis Defense
April 5, 2004
Advisor: Prof. Jay Lepreau
Distributed Systems
• Distributed Systems are complex
– Many components
– Distributed across multiple systems
• Component failures are relatively common
– But should not cause system breakdown
• “A distributed system is one in which the
failure of a computer you didn’t even know
existed can render your own computer
unusable.” – Leslie Lamport, quoted in
CACM, June 1992
April 5th, 2004
Mac Newbold - MS Thesis Defense
2
Our Context: Emulab
• Emulab is an advanced network testbed
• Complex time- and space-shared system
• System dynamically reconfigures nodes and
network links to create “experiments”
• Key architectural feature: Central Database
– System uses DB for storage, communication
• Complex system with many different scripts
and programs on clients and servers
April 5th, 2004
Mac Newbold - MS Thesis Defense
3
Emulab Background
• First prototype in April 2000 (10 nodes)
• In production since Oct. 2000 (40 nodes)
• Early versions weren’t perfect
– Reliability problems
– Experiments of limited size
– Inefficient use of resources
• Problem is becoming harder
– 200 nodes, 400 remote, 2000 virtual nodes
April 5th, 2004
Mac Newbold - MS Thesis Defense
4
Four Key Challenges
Emulab requirements:
• Reliability
• Scalability
• Performance and Efficiency
• Generality and Portability
April 5th, 2004
Mac Newbold - MS Thesis Defense
5
1. Reliability
• Complex systems are hard to make reliable
• Many sources of unreliability:
– Hardware – commodity PCs as nodes
– Software – misconfiguration, bugs, etc.
– Humans – can interrupt at any time
• More complexity and more parts mean
higher chance that something is broken at
any given time
April 5th, 2004
Mac Newbold - MS Thesis Defense
6
2. Scalability
• Almost everything a testbed provides is
harder to provide at a larger scale
• Larger scale requires more resources
• If throughput doesn’t increase, things slow
down at larger scale
• Increased load adversely affects reliability
• Practical scalability limited by reliability,
performance and efficiency
April 5th, 2004
Mac Newbold - MS Thesis Defense
7
3. Performance and Efficiency
• One direct requirement on performance:
– Emulab is used in an interactive style
• System tasks must complete in “a few minutes”
• Indirect requirements:
– Scalability requirement places high demands
on many system components
– Maximize efficient resource utilization
• As many users/experiments as possible in the
shortest time with the fewest resources
April 5th, 2004
Mac Newbold - MS Thesis Defense
8
4. Generality and Portability
• Workload Generality
– Wide variety of research, teaching,
development, and testing activities
– Good support for minimally and noninstrumented client OS’s and devices
• Model generality
– Evolving system
– New types of network devices
• Portable and non-intrusive client software
April 5th, 2004
Mac Newbold - MS Thesis Defense
9
Summary of Challenges
• Three challenges are closely related
– Fourth is a constraint more than a challenge
• Reliability is key
• Failure rates directly impact scalability,
performance, and efficiency
• Generality and portability requirements
constrain any solution used to address the
challenges
April 5th, 2004
Mac Newbold - MS Thesis Defense
10
Thesis Statement
Enhancing Emulab with a flexible framework
based on state machines provides better
monitoring and control, and improves the
reliability, scalability, performance,
efficiency, and generality of the system.
April 5th, 2004
Mac Newbold - MS Thesis Defense
11
Outline
• Introduction, Background, and Challenges
• Thesis Statement
• State Machines in Emulab
– Interactions, Control Models, stated
– Node Boot State Machines
• Results
• Related and Future Work
• Conclusion
April 5th, 2004
Mac Newbold - MS Thesis Defense
12
State Machines
• Also called Finite State Machines (FSMs),
Finite State Automata (FSAs), or
Statecharts in UML
• Well-known model
• Simple
• Explicit model
• Rich and flexible
• Easy to understand and visualize
April 5th, 2004
Mac Newbold - MS Thesis Defense
13
Example State Machine
• Three main parts:
• States
• Transitions
– Directional
• Events
– Associated with
transitions – labels
• Stored in database
– diagrams generated
automatically from DB
April 5th, 2004
Mac Newbold - MS Thesis Defense
14
State Machines in Emulab
• Each state machine has a “type”
– Currently three: node boot, node allocation,
and experiment status
• Multiple machines allowed within a type
– Only in one state in one machine of a type
• States can have “timeouts” with actions
– Timer starts when state is entered
• State “triggers” – Entry Actions
April 5th, 2004
Mac Newbold - MS Thesis Defense
15
Direct Interaction
• Within a type, can take a transition from a
state in one machine (or “mode”) to a state
in another machine of that type
– Known as “mode transition” in Emulab
– Similar to hierarchical state machines
• Highlights similarities/symmetries
– Most machines are variations of another
• Improved code reuse
April 5th, 2004
Mac Newbold - MS Thesis Defense
16
Direct Interaction Example
April 5th, 2004
Mac Newbold - MS Thesis Defense
17
Models of State Machine Control
• Centralized monitoring & control: stated
– State changes submitted, checked for
correctness, applicable actions performed
– Daemon tracks timeouts
– Used for Node Boot state machines
• Distributed management of state machine
– No central service enforcing correctness
– No dependency on central service
– Timeouts harder to implement
April 5th, 2004
Mac Newbold - MS Thesis Defense
18
The stated State Daemon
•
•
•
•
Listens for events continuously
State transitions cause database updates
Invalid transitions cause notifications
Timeouts, timeout actions, triggers
configurable in DB for each state
• Caching – only writer of node boot states
• Modular design – dispatch events to proper
action handlers
April 5th, 2004
Mac Newbold - MS Thesis Defense
19
Node Boot State Machines
• Nodes in Emulab self-configure
• Monitored via state machines – stated
• “Normal” node boot machine
• Variations – “Minimal”
• Reloading node disks
April 5th, 2004
Mac Newbold - MS Thesis Defense
20
Node Self-Configuration
• Nodes send state events during booting to
allow progress to be monitored
• “Global knowledge” inside state daemon
– Better decisions about recovery steps
• Finer granularity gives more information for
recovery, allows for shorter timeouts
• Each OS image is associated with a “mode”
(state machine) that describes its behavior
April 5th, 2004
Mac Newbold - MS Thesis Defense
21
“Normal” Node Boot Machine
• Start in SHUTDOWN
• DHCP, start OS booting
• When Emulab-specific
configuration begins,
enter TBSETUP
• ISUP when finished
• In case of failure, can
retry from SHUTDOWN
April 5th, 2004
Mac Newbold - MS Thesis Defense
22
Variations of Node Boot
• Example: MINIMAL
• For OS images with
little or no special
Emulab support
• ISUP generated by
stated if necessary
– Immediate or ping
• SilentReboot allowed
in this mode
April 5th, 2004
Mac Newbold - MS Thesis Defense
23
Reloading Node Disks
• Mode transition
into RELOAD /
SHUTDOWN
• RELOADDONE
transitions into
mode for
newly-installed
OS image
April 5th, 2004
Mac Newbold - MS Thesis Defense
24
Reloading and Mode Transitions
April 5th, 2004
Mac Newbold - MS Thesis Defense
25
Experiment Status State Machine
• Uses distributed model
– Stored in database, but not
strictly enforced
• Documents life-cycle
• Restricts user interruption
– Reduces a source of errors
• Can queue, activate,
modify, restart, swap, or
terminate an expt.
April 5th, 2004
Mac Newbold - MS Thesis Defense
26
Node Allocation State Machine
• Distributed control model
• Diagram documents the
way the states are used by
the program, but not
currently enforced
• Either reloads nodes with a
custom image, or reboots
them as members of the
experiment
April 5th, 2004
Mac Newbold - MS Thesis Defense
27
Results: Context
• Emulab in production 1 year before state
machines were added
• In production 3 years since first stated
– 650 users, 150 projects, 75 institutions
• 19 papers, top venues
– Over 155,000 nodes allocated in nearly 10,000
experiment instances
– 13 classes at 10 universities
• Emulab SW on 6 more testbeds, 4 planned
April 5th, 2004
Mac Newbold - MS Thesis Defense
28
Results
• Anecdotal: (others in thesis)
– Reliability/Performance: Preventing race
conditions
– Generality: Graceful handling of custom OS
images
– Generality: New node types
• Experiment:
– Reliability/Scalability: Improved timeout
mechanisms
April 5th, 2004
Mac Newbold - MS Thesis Defense
29
Reliability/Performance:
Preventing Race Conditions
• Expt. ends, nodes move to holding expt.,
get reloaded, then freed while they boot
• Problem: Getting allocated while booting
• Node appears unresponsive, gets forcefully
power cycled, corrupts FS on disk
• Solution: don’t free immediately
• Add trigger on next ISUP for a node that
finishes booting, that frees it when booted
April 5th, 2004
Mac Newbold - MS Thesis Defense
30
Generality: Graceful Handling
of Custom OS Images
•
•
•
•
•
•
•
Users create custom OS images
Emulab client software is optional
Problem: Nodes don’t send state events
Solution: “Minimal” state machine
SHUTDOWN: maybe on server, optional
BOOTING: server side, trigger checks ISUP
ISUP: either node sends, or generated
when pingable, or generated immediately
April 5th, 2004
Mac Newbold - MS Thesis Defense
31
Generality: New Node Types
• Emulab is always growing and changing
• State machine model and our framework
are flexible to provide graceful evolution
• We’ve added 5 new node types
– IXPs, wide-area, PlanetLab, vnodes, sim-nodes
• Mostly used existing machines
• 2 new machines, slight variations
• 1 change to stated to add a new trigger
April 5th, 2004
Mac Newbold - MS Thesis Defense
32
Reliability/Scalability:
Improved Timeout Mechanisms
• Before: reboot node, wait for it to boot
– Static, 7 minute timeout
• Pragmatic – minimizes false positives/negatives
• Avg. 4 min., but max. error-free boot is 15 min.
• 11 minute delay is too long
• Improved: state machine monitoring
– Fine-grained, context-sensitive timeouts
– Faster error detection
– Better monitoring and control
April 5th, 2004
Mac Newbold - MS Thesis Defense
33
Reliability/Scalability:
Improved Timeout Mechanisms
• Experiment: Measure expt. swap-in time,
with and without the improvements
– Synthetic but plausible scenario
• One node, loads an RPM (8 min. install)
• Node reboots, timeout during RPM install
– Reboots again, timeout again, mark node dead
– Try twice per swap-in, 3 swap-in attempts
– Total failure in 45 min., 3 nodes “dead”
April 5th, 2004
Mac Newbold - MS Thesis Defense
34
Reliability/Scalability:
Improved Timeout Mechanisms
• With state machines:
– Timeouts: SHUTDOWN 2 min, BOOTING 3 min,
TBSETUP 10 min
•
•
•
•
•
Node reboots, enters BOOTING
1 minute: Enters TBSETUP
9 minutes: Enters ISUP, expt. ready
Succeeds, with no dead nodes or retries
Cut time from 45 min. to 9 min. (80%)
April 5th, 2004
Mac Newbold - MS Thesis Defense
35
Limitations and Issues
• stated is critical infrastructure
– Another single point of failure
– More system complexity, new bugs,
complicated debugging
– Potential for scaling problems (none seen yet)
• Simple heuristics for error detection
– Send mail for invalid transitions
April 5th, 2004
Mac Newbold - MS Thesis Defense
36
Summary of Results
• Explicit model requires careful thought
– Improves design and implementation
• Visualization makes it easier to understand
• Faster and more accurate error detection
• Better reliability helps scalability/efficiency
– Bigger expts. possible, less overhead per expt.
• Flexibility for evolution, workload generality
April 5th, 2004
Mac Newbold - MS Thesis Defense
37
Related Work
• “Standard” Finite State Automata – basics
– Timed Automata – have global clock
• Message Sequence Charts (MSCs)
– “Scenarios” – hierarchy, like modes/machines
• UML Statecharts
– States have entry actions – “triggers”
– Hierarchical states – similar to modes
– Can model Emulab’s timeouts
April 5th, 2004
Mac Newbold - MS Thesis Defense
38
Future Work
• Further developing distributed control
– Add monitoring, timeouts, triggers
• Better heuristics for error detection
– Only flag clustered or related errors
• Implement more ideas from other systems
– UML’s exit actions, guarded transitions, etc.
• Move code into database – i.e. triggers
– Easier to modify, framework code vs. machine
April 5th, 2004
Mac Newbold - MS Thesis Defense
39
Conclusion
Enhancing Emulab with a flexible framework
based on state machines provides better
monitoring and control, and improves the
reliability, scalability, performance,
efficiency, and generality of the system.
April 5th, 2004
Mac Newbold - MS Thesis Defense
40
Bonus Slides
April 5th, 2004
Mac Newbold - MS Thesis Defense
41
Demonstrating Improvement
• Currently: programs have their own retry
and timeout mechanisms for node reboots
– No knowledge of progress, just completion
– Can cause failures by forcing a reboot, which
can damage file systems on node disk
• “New Way”: stated handles timeout and
retry during rebooting
– Implemented, not installed
– Knows if progress is being made
– Programs simply wait for ISUP or failure event
April 5th, 2004
Mac Newbold - MS Thesis Defense
42
Demonstrating Improvement (cont’d)
• These failures directly hurt reliability
– Node failure can cause experiment setup to fail
• Significant impact on scaling, performance,
efficiency
• Maximum experiment size is limited by
node failure rate
– Failures make things take longer
– A slower system means less efficient use of
resources
April 5th, 2004
Mac Newbold - MS Thesis Defense
43
Demonstrating Improvement (cont’d)
• Compare current vs. new: failure rate, time
to completion, etc.
• Test data; one of:
– Historical experiments
– Artificially high load
– Fault injection, e.g. reboots
• Why new way should help:
– Better knowledge for intelligent recovery
• Know when to wait longer and when to retry
– Shorter timeouts allow for early error detection
April 5th, 2004
Mac Newbold - MS Thesis Defense
44
Future Work:
Modeling Indirect Interaction
• Occur between machines of different types
– Due to external relationships between the
entities tracked by each type of machine
• Examples:
– Same entity may be tracked in two different
types of machine
• Nodes are in Boot and Allocation machines
– Other relationship between entities
• Nodes may be “owned” or allocated to an
experiment – links Expt. Status and Node machines
April 5th, 2004
Mac Newbold - MS Thesis Defense
45
Question and Answer
April 5th, 2004
Mac Newbold - MS Thesis Defense
46
Conclusion
Enhancing Emulab with a flexible framework
based on state machines provides better
monitoring and control, and improves the
reliability, scalability, performance,
efficiency, and generality of the system.
April 5th, 2004
Mac Newbold - MS Thesis Defense
47
Download