Pervasive Debugging The Problem Existing tools can be characterized as peer debugging and utilize a “just a bunch of debuggers” architecture: gdb gdb Target Target gdb gdb Target Target A traditional debugger or monitor runs on each node ÿA central console issues command to them. ÿFeatures are limited by the back-end debuggers. ÿThis approach is taken in popular tools such as p2d2. User User Interface interface Virtual Virtual Node Node11 Network Network Pervasive Pervasive Debugger debugger Target Target Network Network Target Target Target Target gdb gdb Target Target This is complementary to existing tools: where appropriate re-use their visualization features and control interfaces. User User Interface Interface Virtual Virtual Node Node22 Virtual VirtualNetwork network and andDevices devices Physical PhysicalHardware hardware Process grid Source code & Stack of focus process The entire system executes in a virtual environment that is under the control of the debugger. A range of implementation techniques are possible with different trade-offs: These systems can provide good data visualization, consistent user interfaces between machines, abstraction over PVM, MPI etc. They do not provide: ÿMulti-node breakpoints. “Stop if more than one node thinks it is the leader.” ÿNetwork-related or device-related breakpoints. “Stop if a packet is dropped.” ÿAtomic suspension of entire systems. ÿDeterministic re-execution. ÿDebugging across multiple layers of abstraction. ÿEvaluation of performance outside the physical system’s parameters. Detail Program output Performance http://www.cl.cam.ac.uk/netos/pdb/ Understanding and correcting the behaviour of parallel and distributed applications is hard. It is difficult to control the processes spread out over multiple nodes. It is difficult to investigate the operating system kernels, network links and other devices which are used. Our Approach Pervasive debugging conceptually places the debugger below the entire distributed system. The debugger genuinely operates at a whole-application level rather than a per-node level: ÿThe entire system can be stopped atomically. ÿBreakpoints can relate to multi-node conditions. ÿAn arbitrary inter-node network can be emulated. Run all of the virtual nodes within a single hardware-level or instruction-level simulator, e.g. Bochs. Suitable for small systems with fine-grained concurrency – e.g. mutex or work-queue implementations. Run the virtual nodes over a highperformance virtual machine monitor, e.g. Xen. Explore the federation of multiple virtual machine monitors on separate physical machines and the use of timewarp-simulation style execution. Steven Hand, Tim Harris, Alex Ho, Rashid Mehmood {firstname.lastname}@cl.cam.ac.uk Implementation We’re building a prototype pervasive debugger using the Xen virtual machine monitor (developed under the Programmable Networks programme). ÿz/VM-style virtualization on an uncooperative x86 architecture. ÿSupport full-featured multi-user multi-application OSes. ÿOSes are ported to a new ‘x86-xeno’ architecture. ÿSimilar to x86, but call to Xen for privileged operations. ÿMost kernel code executes directly. ÿPorting requires access to kernel source code. ÿExposing real resources important for correctness and performance. ÿRetain compatibility with OS ABIs. ÿAll application code & libraries execute directly and without recompilation. ÿRun unmodified Linux, Windows XP and BSD application binaries and libraries. Current directions Our research is broadly split into two categories. The first is developing implementation methods for high performance pervasive debugging. The second is exploring the new techniques enabled by pervasive debugging. Integration with existing tools and resources from existing serial and distributed debuggers ÿTo enable re-use of user-interfaces from p2d2 while executing the system under test within a fully controllable environment. ÿTo enable controlled communication outside the virtual environment, e.g. access to name services. Network virtualization ÿDescription of the topology and behaviour of the network connecting the virtual nodes. ÿValidation of behaviour observed within the pervasive debugger against behaviour in real systems. System-wide assertions 3)*45&#67 3)*45&#68 3)*45&#69 ÿDefinition of primitive : ; < = events generated within each node (e.g. temporal events such as breakpoints, >6?6 67@:A<B C6?668@;A=B or spatial events such as watchpoints triggered by D6?6 69@>ACB changes to shared memory or disk storage). ÿEvent-composition to build up system-level events. Performance of Xen versus native execution 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U SPEC INT2000 (score) L X V U Linux build time (s) L X V U OSDB-OLTP (tup/s) L X V U SPEC WEB99 (score) Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U) Scalability of Xen hosting multiple virtual nodes 1000 800 600 400 200 0 L X 2 L X 4 L X 8 L X 16 Simultaneous SPEC WEB99 Instances on Linux (L) and Xen(X) Stress testing and fault injection ÿ The virtual network could introduce extreme load or packet-reordering: allows testing of corner-cases rarely seen in practice. Live debugging ÿ There’s actually no need to suspend the virtual nodes while the debugger is attached ÿ Can use the debug interface for general visualization and inspection during execution. Federating execution over multiple physical machines ÿ The controllable environment provided by Xen should allow timewarp-simulation style execution. ÿ Speculatively run nodes in parallel. ÿ Use checkpointing and re-execution to deal with causality problems. References ÿ T. Harris, “Dependable Computing Needs Pervasive Debugging”,Proceedings of the 2002 ACM SIGOPS European Workshop, September 2002. ÿ A. Ho, “Pervasive Debugging: A Fresh Approach to Understanding Distributed Applications”, 8th Cabernet Radicals Workshop, October 2003. ÿ P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the Art of Virtualization”, Proceedings of the 19th ACM Symposium on Operating Systems Principles, October 2003.