Disco

Kit Cischke 09/09/08 CS 5090 DISCO: RUNNING COMMODITY OPERATING SYSTEMS ON SCALABLE MULTIPROCESSORS Overview  Background  What are we doing here?  A Return to Virtual Machine Monitors  What does Disco do?  Disco: A Return to VMMs  How does Disco do it?  Experimental Results  How well does Disco dance? The Basic Problem  With the explosion of multiprocessor machines , especially of the NUMA variety, the problem of effectively using the machines becomes more immediate.  NUMA = Non-Uniform Memory Access – shows up a lot in clusters.  The authors point out that the problem applies to any major hardware innovation, not just multiprocessors. Potential Solution  Solution: Rewrite the operating system to address fault-tolerance and scalability.  Flaws:  Rewriting will introduce bugs.  Bugs can disrupt the system or the applications.  Instabilities are usually less-tolerated on these kinds of systems because of their application space.  You may not have access to the OS. Not So Good  Okay. So that wasn’t so good. What else do we have?  How about Virtual Machine Monitors?  A new twist on an old idea, which may work better now that we have faster processors. Enter Disco •Disco is a system VM that presents a similar fundamental machine to all of the various OS’s that might be running on the machine. •These can be commodity OS’s, uniprocessor, multiprocessor or specialty systems. Disco VMM  Fundamentally, the hardware is a cluster, but Disco introduces some global policies to manage all of the resources, which makes for better usage of the hardware.  We’ll use commodity operating systems and write the VMM. Rather than millions of lines of code, we’ll write a few thousand.  What if the resource needs exceed that of the commodity OS? Scalability  Very simple changes to the commodity OS (maybe on the driver level or kernel extension) can allow virtual machines to share resources.  E.g., a parallel database could have a cache in shared memory and multiple virtual processors running on virtual machines.  Support for specialized OS’s that need the power of multiple processors but not all of the features offered by a commodity OS. Further Benefits  Multiple copies of an OS naturally addresses scalability and fault containment.  Need greater scaling? Add a VM.  Only the monitor and the system protocols (NFS, etc.) need to scale.  OS or application crashes? No problem. The rest of the system is isolated.  NUMA memory management issues are addressed.  Multiple versions of different OS’s provide legacy support and convenient upgrade paths. Not All Sunshine & Roses  VMM Overhead  Additional exception processing, instruction execution and memory to virtualize hardware.  Privileged instructions aren’t directly executed on the hardware, so we need to fake it. I/O requests need to be intercepted and remapped.  Memory overhead is rough too.  Consider having 6 copies of Vista in memory simultaneously.  Resource Management  VMM can’t make intelligent decisions about code streams without info from OS. One Last Disadvantage  Communication  Sometimes resources simply can’t be shared the way we want.  Most of these can be mitigated though.  For example, most operating systems have good NFS support. So use it.  But… We can make it even better! (Details forthcoming.) Introducing Disco  VMM designed for the FLASH multiprocessor machine  FLASH is an academic machine designed at Stanford University  Is a collection of nodes containing a processor, memory, and I/O. Use directory cache coherence which makes it look like a CC-NUMA machine.  Has also been ported to a number of other machines. Disco’s Interface  The virtual CPU of Disco is an abstraction of a MIPS R10000.  Not only emulates but extends (e.g., reduces some kernel operations to simple load/store instructions.  A presented abstraction of physical memory starting at address 0 (zero).  I/O Devices  Disks, network interfaces, interrupts, clocks, etc.  Special interfaces for network and disks. Disco’s Implementation  Implemented as a multi-threaded shared- memory program.  Careful attention paid to memory placement, cache-aware data structures and processor communication patterns.  Disco is only 13,000 lines of code.  Windows Server 2003 - ~50,000,000  Red Hat 7.1 - ~ 30,000,000  Mac OS X 10.4 - ~86,000,000 Disco’s Implementation  The execution of a virtual processor is mapped one-for-one to a real processor.  At each context switch, the state of a processor is made to be that of a VP.  On MIPS, Disco runs in kernel mode and puts the processor in appropriate modes for what’s being run  Supervisor mode for OS, user mode for apps  Simple scheduler allows VP’s to be time- shared across the physical processors. Disco’s Implementation  Virtual Physical Memory  This discussion goes on for 1.5 pages. To sum up:  The OS makes requests to physical addresses, and Disco translates them to machine addresses.  Disco uses the hardware TLB for this.  Switching a different VP onto a new processor requires a TLB flush, so Disco maintains a 2nd-level TLB to offset the performance hit.  There’s a technical issue with TLBs, Kernel space and the MIPS processor that threw them for a loop. NUMA Memory Management •In an effort to mitigate the nonuniform effects of a NUMA machine, Disco does a bunch of stuff: • • Allocating as much memory to have “affinity” to a processor as possible. Migrates or replicates pages across virtual machines to reduce long memory accesses. Virtual I/O Devices  Obviously Disco needs to intercept I/O requests and direct them to the actual device.  Primarily handled by installing drivers for Disco I/O in the guest OS.  DMA provides an interesting challenge, in that the DMA addresses need the same translation as regular accesses.  However, we can do some especially cool things with DMA requests to disk. Copy-on-Write Disks  All disk DMA requests are caught and analyzed. If the data is already in memory, we don’t have to go to disk for it.  If the request is for a full page, we just update a pointer in the requesting virtual machine.  So what?  Multiple VM’s can share data without being aware of it. Only modifying the data causes a copy to be made.  Awesome for scaling up apps by using multiple copies of an OS. Only really need one copy of the OS kernel, libraries, etc. My Favorite – Networking  The Copy-on-write disk stuff is great for non- persistent disks. But what about persistent ones? Let’s just use NFS.  But here’s a dumb thing: A VM has a copy of information it wants to send to another VM on the same physical machine. In a naïve approach, we’d let that data be duplicated, taking up extra memory pointlessly.  So, let’s use copy-on-write for our network interface too! Virtual Network Interface  Disco provides a virtual subnet for VM’s to talk to each other.  This virtual device is Ethernet-like, but with no maximum transfer size.  Transfers are accomplished by updating pointers rather than actually copying data (until absolutely necessary).  The OS sends out the requests as NFS requests.  “Ah,” but you say. “What about the data locality as a VM starts accessing those files and memory?”  Page replication and migration! About those Commodity OS’s  So what do we really need to do to get these commodity operating systems running on Disco?  Surprisingly a lot and a little.  Minor changes were needed to IRIX’s HAL, amounting to 2 header files and 15 lines of assembly code. This did lead to a full kernel recompile though.  Disco needs device drivers. Let’s just steal them from IRIX!  Don’t trap on every privileged register access. Convert them into normal loads/stores to special address space, linked to the privileged registers. More Patching  “Hinting” added to HAL to help the VMM not do dumb things (or at least do fewer dumb things).  When the OS goes idle, the MIPS (usually) defaults to a low power mode. Disco just stops scheduling the VM until something interesting happens.  Other minor things were done, but that required patching the kernel. SPLASHOS  Some high-performance apps might need most or all of the machine. The authors wrote a “thin” operating system to run SPLASH-2 applications.  Mostly proof-of-concept. Experimental Results  Bad Idea: Target your software for a machine that doesn’t physically exist.  Like, I don’t know, FLASH?  Disco was validated using two alternatives:  SimOS  SGI Origin2000 Board that will form the basis of FLASH Experimental Design  Use 4 representative workloads for parallel applications:  Software Development (Pmake of a large app)  Hardware Development (Verilog simulator)  Scientific Computing (Raytracing and a sorting algorithm)  Commercial Database (Sybase)  Not only are they representative, but they each have characteristics that are interesting to study  For example, Pmake is multiprogrammed, lots of short-lived processes, OS & I/O intensive. Simplest Results Graph •Overhead of Disco is pretty modest compared to the uniprocessor results. •Raytrace is the lowest, at only 3%. Pmake is the highest, at 16%. •The main hits come from additional traps and TLB misses (from all the flushing Disco does). •Interestingly, less time is spent in the kernel in Raytrace, Engineering and Database. •Running a 64-bit system mitigates the impact of TLB misses. Memory Utilization Key thing here is how 8 VM’s doesn’t require 8x the memory of 1 VM. Interestingly, we have 8 copies of IRIX running in less than 256 MB of physical RAM! Scalability • Page migration and replication were disabled for these runs. • All use 8 processors and 256 MB of memory. • IRIX has a terrible bottleneck in synchronizing the system’s memory management code • It also has a “lazy” evaluation policy in the virtual memory system that drags “normal” RADIX down. •Overall though, check out those performance gains! Page Migration Benefits •The 100% UMA results give a lower bound on performance gains from page migration and replication. •But in short, the policies work great. Real Hardware  Experiences on the real SGI hardware pretty much confirms the simulations, at least at the uniprocessor level.  Overheads tend to be in the range of 3-8% on Pmake and the Engineering simulation. Summing Up  Disco works pretty well.  Memory usage scales well, processor utilization scales well.  Performance overheads are relatively small for most loads.  Lots of engineering challenges, but most seem to have been overcome. Final Thoughts  Everything in this paper seems, in retrospect, to be totally obvious. However, the combination of all of these factors seems like it would have taken just a ton of work.  Plus, I don’t think I could have done it half as well, to be honest.  Targeting a non-existent machine seems a little silly.  Overall, interesting paper.

Disco

Related documents

Products

Support

Disco

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib