Pervasive Debugging

advertisement
Pervasive Debugging
The Problem
Existing tools can be characterized as peer debugging
and utilize a “just a bunch of debuggers” architecture:
gdb
gdb
Target
Target
gdb
gdb
Target
Target
A traditional debugger or monitor runs on each node
ÿA central console issues command to them.
ÿFeatures are limited by the back-end debuggers.
ÿThis approach is taken in popular tools such as p2d2.
User
User
Interface
interface
Virtual
Virtual
Node
Node11
Network
Network
Pervasive
Pervasive
Debugger
debugger
Target
Target
Network
Network
Target
Target
Target
Target
gdb
gdb
Target
Target
This is complementary to existing tools: where
appropriate re-use their visualization features and
control interfaces.
User
User
Interface
Interface
Virtual
Virtual
Node
Node22
Virtual
VirtualNetwork
network
and
andDevices
devices
Physical
PhysicalHardware
hardware
Process grid
Source code &
Stack of focus
process
The entire system executes in a virtual environment
that is under the control of the debugger. A range of
implementation techniques are possible with different
trade-offs:
These systems can provide good data visualization,
consistent user interfaces between machines, abstraction
over PVM, MPI etc.
They do not provide:
ÿMulti-node breakpoints.
“Stop if more than one node thinks it is the leader.”
ÿNetwork-related or device-related breakpoints.
“Stop if a packet is dropped.”
ÿAtomic suspension of entire systems.
ÿDeterministic re-execution.
ÿDebugging across multiple layers of abstraction.
ÿEvaluation of performance outside the physical
system’s parameters.
Detail
Program output
Performance
http://www.cl.cam.ac.uk/netos/pdb/
Understanding and correcting the behaviour of parallel
and distributed applications is hard. It is difficult to
control the processes spread out over multiple nodes. It
is difficult to investigate the operating system kernels,
network links and other devices which are used.
Our Approach
Pervasive debugging conceptually places the
debugger below the entire distributed system. The
debugger genuinely operates at a whole-application
level rather than a per-node level:
ÿThe entire system can be stopped atomically.
ÿBreakpoints can relate to multi-node conditions.
ÿAn arbitrary inter-node network can be emulated.
Run all of the virtual nodes within
a single hardware-level or
instruction-level simulator, e.g.
Bochs. Suitable for small systems
with fine-grained concurrency –
e.g. mutex or work-queue
implementations.
Run the virtual nodes over a highperformance virtual machine
monitor, e.g. Xen.
Explore the federation of multiple
virtual machine monitors on separate
physical machines and the use of
timewarp-simulation style execution.
Steven Hand, Tim Harris, Alex Ho, Rashid Mehmood
{firstname.lastname}@cl.cam.ac.uk
Implementation
We’re building a prototype pervasive debugger using the Xen
virtual machine monitor (developed under the Programmable
Networks programme).
ÿz/VM-style virtualization on an uncooperative x86 architecture.
ÿSupport full-featured multi-user multi-application OSes.
ÿOSes are ported to a new ‘x86-xeno’ architecture.
ÿSimilar to x86, but call to Xen for privileged operations.
ÿMost kernel code executes directly.
ÿPorting requires access to kernel source code.
ÿExposing real resources important for correctness and
performance.
ÿRetain compatibility with OS ABIs.
ÿAll application code & libraries execute directly and without
recompilation.
ÿRun unmodified Linux, Windows XP and BSD application
binaries and libraries.
Current directions
Our research is broadly split into two categories.
The first is developing implementation methods for high
performance pervasive debugging. The second is exploring
the new techniques enabled by pervasive debugging.
Integration with existing tools and resources from
existing serial and distributed debuggers
ÿTo enable re-use of user-interfaces from p2d2 while
executing the system under test within a fully controllable
environment.
ÿTo enable controlled communication outside the virtual
environment, e.g. access to name services.
Network virtualization
ÿDescription of the topology and behaviour of the network
connecting the virtual nodes.
ÿValidation of behaviour observed within the pervasive
debugger against behaviour in real systems.
System-wide assertions
3)*45&#67
3)*45&#68
3)*45&#69
ÿDefinition of primitive
:
;
<
=
events generated within
each node (e.g. temporal
events such as breakpoints,
>6?6
67@:A<B
C6?668@;A=B
or spatial events such as
watchpoints triggered by
D6?6
69@>ACB
changes to shared memory
or disk storage).
ÿEvent-composition to build up system-level events.
Performance of Xen versus native execution
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
L
X
V
U
SPEC INT2000 (score)
L
X
V
U
Linux build time (s)
L
X
V
U
OSDB-OLTP (tup/s)
L
X
V
U
SPEC WEB99 (score)
Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U)
Scalability of Xen hosting multiple virtual nodes
1000
800
600
400
200
0
L
X
2
L
X
4
L
X
8
L
X
16
Simultaneous SPEC WEB99 Instances on Linux (L) and Xen(X)
Stress testing and fault injection
ÿ The virtual network could introduce extreme load or
packet-reordering: allows testing of corner-cases rarely
seen in practice.
Live debugging
ÿ There’s actually no need to suspend the virtual nodes while
the debugger is attached
ÿ Can use the debug interface for general visualization and
inspection during execution.
Federating execution over multiple physical machines
ÿ The controllable environment provided by Xen should
allow timewarp-simulation style execution.
ÿ Speculatively run nodes in parallel.
ÿ Use checkpointing and re-execution to deal with causality
problems.
References
ÿ T. Harris, “Dependable Computing Needs Pervasive
Debugging”,Proceedings of the 2002 ACM SIGOPS
European Workshop, September 2002.
ÿ A. Ho, “Pervasive Debugging: A Fresh Approach to
Understanding Distributed Applications”, 8th Cabernet
Radicals Workshop, October 2003.
ÿ P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris,
A. Ho, R. Neugebauer, I. Pratt, and A. Warfield,
“Xen and the Art of Virtualization”, Proceedings of the
19th ACM Symposium on Operating Systems Principles,
October 2003.
Download