XEN AND THE ART OF VIRTUALIZATION P. Barham, B.Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauery, I. Pratt, A. Wareld UCCL SOSP 2003 Paper highlights • • • A very efficient virtual machine hypervisor Main objectives were – Low overhead – Scalability Key ideas – Paravirtualization: faster but requires changes to to the guest OS – Use of x86 protection rings Virtual machines • Let different operating systems run at the same time on a single computer – Windows, Linux and Mac OS – A real-time OS and a conventional OS – A production OS and a new OS being tested How it is done • A hypervisor /VM monitor defines two or more virtual machines • Each virtual machine has – Its own virtual CPU – Its own virtual physical memory – Its own virtual disk(s) The virtualization process Virtual Virtual hardware # 1 hardware # 21 Actual hardware CPU CPU Memory Memory Memory Disk Disk Disk CPU Hypervisor Reminder • In a conventional OS, – Kernel executes in privileged/supervisor mode • Can do virtually everything – User processes execute in user mode • Cannot modify their page tables • Cannot execute privileged instructions A conventional architecture User mode User process User process System call Privileged mode Kernel Two virtual machines User Mode User Mode Privileged mode User process User process User process VM Kernel User process VM Kernel Hypervisor Explanations (II) • Whenever the kernel of a VM issues a privileged instruction, an interrupt occurs – The hypervisor takes control and do the physical equivalent of what the VM attempted to do: • Must convert virtual RAM addresses into physical RAM addresses • Must convert virtual disk block addresses into physical block addresses Translating a block address Access block x, y of my virtual disk That's block v, w of the actual disk VM kernel Hypervisor Virtual disk Access block v, w of actual disk Actual disk Handling I/Os • Difficult task because – Wide variety of devices – Some devices may be shared among several VMs • Printers • Shared disk partition – Want to let Linux and Windows access the same files Virtual Memory Issues • Each VM kernel manages its own memory – Its page tables map program virtual addresses into what it believes to be physical addresses The dilemma User process A VM kernel Page 735 of process A is stored in page frame 435 That's page frame 993 of the actual RAM Hypervisor The solution (I) • Address translation must remain fast! – Hypervisor lets each VM kernel manage their own page tables but do not use them • They contain bogus mappings! – It maintains instead its own shadow page tables with the correct mappings • Used to handle TLB misses Why it works • Most memory accesses go through the TLB • The system can tolerate slower page table updates The solution (II) • To keep its shadow page tables up to date, hypervisor must track any changes made by the VM kernels • Mark page tables read-only – Each attempt to update then by a VM kernel results in an interrupt Nastiest Issue • The whole VM approach assumes that a kernel executing in user mode will behave exactly like a kernel executing in privileged mode except that privileged instructions will be trapped • Not true for all architectures! – Intel x86 Pop flags (POPF) instruction –… The VMWare Solution • Mask the issue through clever software • Dynamic "binary translation" when direct execution of code would not work The Xen Solution • Presenting a virtual machine abstraction that is “similar but not identical to the underlying hardware” – Paravirtualization • Big advantage is faster performance • Big limitation is need to modify guest operating system Impact on Guest OS • Had to modify – 2,995 lines of Linux code • 1.36 % of total x86 code base – 4,620 lines of Windows XP code • 0.04 % of total x86 code base Memory management • Virtual machine exported by the hypervisor is not identical to a physical machine – Share of physical memory of each virtual machine may consist of non-contiguous pages Xen Tenets • Support for unmodified application binaries is essential • Supporting full multi-application guest OSes is important – Raises guest OS protection issues • Paravirtualization is necessary to achieve high performance • Bad idea to hide the effect of virtualization from guest OSes Xen Memory Management • Complicated because x86 TLB – Is hardware-managed – Has no tags identifying process address spaces • Need to flush the TLB at each context switch Clever Trick • The top 64MB region of each address space is reserved to Xen – Can execute Xen code without changing the page map and flushing the TLB Guest OS protection issues • Must prevent user applications from altering the guest OS – No good solution if guest OS kernel runs in user mode • Xen takes advantage of the xOS ring architecture x86 Protection Rings • Concept pioneered by MULTICS • Multiple levels of protection – Level 0 can do everything – Level 1 can interfere with levels 2 and 3 but cannot interfere with level 0 – Level 2 can interfere with level 3 but cannot interfere with level 0 and 1 – Level 3 has no special privileges With Conventional OSes User processes Kernel Rings 1 and 2 are not used With Xen User processes Guest OS Xen Guest OSes run in ring 1 Control transfer (I) • Hypercalls: – Synchronous calls from a domain to the Xen hypervisor – Implemented through a software trap mechanism • Same as conventional system calls Control transfer (II) • From Xen to domains: – Asynchronous event mechanism • Akin to Unix signals • Small number of events Data transfer between rings • There is now an additional protection domain between guest OSes and I/O devices – Need a fast mechanism for handling data transfers Subsystem virtualization • CPU: – Uses the borrowed virtual time scheduling algorithm (BVT) • Time and timers: – Guest domains have access to both virtual time and real time • Virtual address translation: – Xen is only involved in page table updates Subsystem virtualization • Privileged instructions: – Validated and executed by Xen Performance Comparison Higher values are better! Key • • • • L is for native Linux (upper bound) X is for XenoLinux (Xen + Linux) V is for VMWare workstation 3.2 + Linux U is for User-Mode Linux (a port of Linux that runs in user mode on the top of Linux) Conclusions • Xen is fast! • Similar performances of all four solutions for the SPEC 2000 benchmark (the one on the left) should not surprise: – This benchmark is CPU-bound, makes infrequent I/Os and interacts very little with the OS – OS performance is essentially irrelevant