Virtualization … from a reliability perspective DAIMI Henrik Bærbak Christensen 1 Credits Some slides from – E6998 - Virtual Machines Lecture 2 CPU Virtualization Scott Devine VMware, Inc. DAIMI Henrik Bærbak Christensen 2 Agenda What is it? – Overview – Classifications – Virtualization of CPU Virtualization to improve reliability – Fault Detection and Removal • Configuration Testing • Distribution Testing • Record-n-Playback – Fault Tolerance • Recovery by snapshots • Lock-step execution DAIMI Henrik Bærbak Christensen 3 Classes of Virtualization DAIMI Henrik Bærbak Christensen 4 What is it? vir•tu•al (adj): – existing in essence or effect, though not in actual fact Example – ScummVM is a program which allows you to run certain classic graphical point-and-click adventure games, provided you already have their data files. The clever part about this: ScummVM just replaces the executables shipped with the games, allowing you to play them on systems for which they were never designed! DAIMI Henrik Bærbak Christensen 5 A Physical Machine Hardware – Processors, devices, memory, etc. Software – Built to the given hardware (Instruction Set Architecture, e.g. x86) – Built to given OS (App. Programming Interface, e.g. Win XP) – OS controls hardware DAIMI Henrik Bærbak Christensen 6 A Virtual Machine Hardware Abstraction – Virtual processor, memory, devices, etc. Virtualization Software – Indirection: Decouple hardware and OS – Multiplex physical hardware across guest VMs DAIMI Henrik Bærbak Christensen 7 Why VMs? Many advantages (often business related) – Utilization of resources • VM1 that idles automatically free resources to VM2 – Of course not the case for physical machines – Isolation • Crash, virus, in VM1 does not propagate to VM2 – Encapsulation • • • • A VM is a file: (OS,Apps,Data,Config, run-time state) Content distribution: Demos as snapshot Snapshot provides recovery point Migration to new physical machines by snapshot/restore – In e.g. server farms DAIMI Henrik Bærbak Christensen 8 Why VMs? More advantages – Maintenance costs • Maintain 100 Linux-based web server machines versus a few big VM platforms – Legacy VMs • Run ancient apps (like DOS or SCUMM-based games) – Create once, run anywhere • No configuration issues – E.g. complex setup of app suite of database, servers, etc. – … and some reliability related issues that we will come back to… DAIMI Henrik Bærbak Christensen 9 Classification DAIMI Henrik Bærbak Christensen 10 Types of VMs Smith & Nair 2005 Two superclasses – Process VM – System VM Both can be subclassed based upon supporting virtualization of same or different ISA (Instruction Set Architecture). DAIMI Henrik Bærbak Christensen 11 Process / System VM DAIMI Henrik Bærbak Christensen 12 Process VM The process VM puts the virtualization border line at the process line. – A application process/program executes in a • Logical address space (assigned/handled by the OS) • Using user-level instructions and registers (CPU user-mode) • Only do IO using OS calls or High-Level Library(HHL) calls Thus a process VM becomes a virtual OS in which a process may execute. Example: Different ISA Process VM – JavaVM defines its own ISA (stack-based) as well as normal OS operations DAIMI Henrik Bærbak Christensen 13 System VM The system VM sets the virtualization border at the system or hardware line – A system/OS executes in a • Physical memory space • Using the full ISA of the underlying machine • Interact directly with IO Thus a system VM becomes a virtual machine in which a system as a whole may execute Example: Same ISA System VM – VMWare: Executes Guest OS that is running on the x86 ISA. DAIMI Henrik Bærbak Christensen 14 Other examples System VM Different ISA – VirtualPC running Windows on Mac Classic VMs – Virtual Machine Monitor (VMM) directly on hardware • VMWare ESX • Drivers Hosted VMs – VMM runs as an application in an OS • VMWare Workstation • Drivers DAIMI Henrik Bærbak Christensen 15 Exercise How would you classify ScummVM? DAIMI Henrik Bærbak Christensen 16 How does it work? Just a glimpse DAIMI Henrik Bærbak Christensen 17 Definition [Rosenblum & Garfinkel 2005] – A CPU architecture is virtualizable if it supports the basic VMM technique of direct execution—executing the virtual machine on the real machine, while letting the VMM retain ultimate control of the CPU. That is, in a ‘same ISA’, execution of a guest OS instructions should be executed directly by the hardware – Why? Performance of course! – Ex. JavaVM poses a performance penalty DAIMI Henrik Bærbak Christensen 18 Challenges The problem is that not all instructions are possible to execute directly! To see the problem we have to dig a bit into modern CPU architectures – I stopped around Z80 and Motorola 68000 – 80286: • First with protected mode… • Support multitasking, process protection, memory mgt., … DAIMI Henrik Bærbak Christensen 19 A few CPU terms Supervisor mode/Privileged mode: – “An execution mode on some processors which enables execution of all instructions, including privileged instructions. It may also give access to a different address space, to memory management hardware and to other peripherals. This is the mode in which the operating system usually runs.” (Wikipedia) DAIMI Henrik Bærbak Christensen 20 Supervisor/User mode x86 architecture – Ring 0: Kernel mode • Can do anything • OS and device drivers – Ring 3: User mode • Limited Instruction Set • Apps can fail at any time without impact on rest of system! • User applications User apps must do system call (call OS) to interact with hardware like device drivers... DAIMI Henrik Bærbak Christensen 21 Performance Switching from “user mode” to “kernel mode” is, in most existing systems, very expensive. It has been measured, on the basic request getpid, to cost 1000-1500 cycles on most machines. DAIMI Henrik Bærbak Christensen 22 Trapping What happens if code in user mode executes a privileged instruction? – In computing and operating systems, a trap is a type of synchronous interrupt typically caused by an exceptional condition (e.g. division by zero or invalid memory access) in a user process. A trap usually results in a switch to kernel mode, wherein the operating system performs some action before returning control to the originating process. DAIMI Henrik Bærbak Christensen 23 Virtualization Requirements If VMWare is an app, running in Windows, and it runs the Linux OS “inside”, it follows that – Privileged instructions have to run in user-mode! VVM: Virtual Machine Monitor – VVM runs in priviledged mode – All ‘inside’ runs in user mode DAIMI Henrik Bærbak Christensen 24 VMM and privileged instructions OK, so… – Linux runs in user mode inside a VMM – Now the Linux kernel disables interrupts (CLI) • Which is certainly a privileged instruction which is not allowed in user mode So – what can we do about that? – Emulation – Trap-and-Emulate – Binary Translation DAIMI Henrik Bærbak Christensen 25 Emulation Example: CPUState static struct { uint32 GPR[16]; uint32 LR; uint32 PC; int IE; int IRQ; } CPUState; void CPU_CLI(void) { CPUState.IE = 0; } void CPU_STI(void) { CPUState.IE = 1; } CLI=Clear Interrupt Flag STI=Set Interrupt Flag IE = Interrupt Enabled Goal for CPU virtualization techniques – Process normal instructions as fast as possible – Forward privileged instructions to emulation routines DAIMI Henrik Bærbak Christensen 26 Instruction Interpretation Emulate Fetch/Decode/Execute pipeline in software Positives – Easy to implement – Minimal complexity Negatives – Slow! DAIMI Henrik Bærbak Christensen 27 Example: Virtualizing the Interrupt Flag w/ Instruction Interpreter void CPU_Run(void) { while (1) { inst = Fetch(CPUState.PC); CPUState.PC += 4; switch (inst) { case ADD: CPUState.GPR[rd] = GPR[rn] + GPR[rm]; break; … case CLI: CPU_CLI(); break; case STI: CPU_STI(); break; } void CPU_CLI(void) { CPUState.IE = 0; } void CPU_STI(void) { CPUState.IE = 1; } void CPU_Vector(int exc) { CPUState.LR = CPUState.PC; CPUState.PC = disTab[exc]; } if (CPUState.IRQ && CPUState.IE) { CPUState.IE = 0; CPU_Vector(EXC_INT); } } } DAIMI Henrik Bærbak Christensen 28 Guest OS + Applications Page Fault Unprivileged Trap and Emulate Undef Instr MMU Emulation CPU Emulation I/O Emulation Privileged vIRQ Virtual Machine Monitor DAIMI Henrik Bærbak Christensen 29 The protocol… The Linux kernel code contains CLI – As in user mode, the CPU traps and transfer control to the VVM – VVM sets internal flag that interrupts are disabled • Interrupts are now not delivered to the guest OS until it calls the STI • which again traps and the VVM clears the internal flag! • From that time on, interrupts are again delivered. Trap-and-Emulate: – All user mode instructions execute full speed – Privileged instructions are trapped and emulated… DAIMI Henrik Bærbak Christensen 30 Issues with Trap and Emulate Not all architectures support it Trap costs may be high – Cf. performance measurements earlier DAIMI Henrik Bærbak Christensen 31 Challenges… The x86 is not virtualizable – i.e. there are privileged instructions that do not trap! Ex. – POPF = pop CPU flag from stack • Contains the interrupt flag bit and thus can enable/disable interrupts without the VVM being notified! Require advanced techniques to cope beyond trap-and-emulate… – Para Virtualization – Binary Translation DAIMI Henrik Bærbak Christensen 32 Para Virtualization Para virtualization = replacing non-virtualizable portions of the original instruction set with easily virtualizable and more efficient equivalents. – OS code has to be ported! – User apps run unmodified. Ex. Disco: Change MIPS interrupt instruction to read/write of special memory location in the VVM – More efficient if handled by Trapping – Iris OS had to be ported… DAIMI Henrik Bærbak Christensen 33 Binary Translation Basic idea: – Read code blocks before the CPU gets them – For each instruction classify • “Ident”: copy directly to translation cache (TC) • “Inline”: replace instruction by inline equivalent instructions and copy these to TC • “Callouts”: replace instruction by call to emulation code in VVM – Let CPU execute contents of translation cache instead of original block TC: keep the translated block for future execution without any translation DAIMI Henrik Bærbak Christensen 34 Basic Blocks Guest Code vPC mov ebx, eax cli and ebx, ~0xfff mov ebx, cr3 Straight-line code Basic Block sti ret DAIMI Control flow Henrik Bærbak Christensen 35 Binary Translation Guest Code vPC mov ebx, eax cli DAIMI Translation Cache mov ebx, eax call HANDLE_CLI and ebx, ~0xfff and ebx, ~0xfff mov ebx, cr3 mov [CO_ARG], ebx sti call HANDLE_CR3 ret call HANDLE_STI jmp HANDLE_RET Henrik Bærbak Christensen start 36 Binary Translation Guest Code vPC mov ebx, eax cli Translation Cache mov ebx, eax mov [CPU_IE], 0 and ebx, ~0xfff and ebx, ~0xfff mov ebx, cr3 mov [CO_ARG], ebx sti call HANDLE_CR3 ret mov [CPU_IE], 1 test [CPU_IRQ], 1 start jne DAIMI call HANDLE_INTS jmp HANDLE_RET Henrik Bærbak Christensen 37 CPU future… DAIMI Henrik Bærbak Christensen 38 Hypervisor mode Modern/Future CPU architectures – Make a CPU with a Ring -1 – VVM runs in Ring -1, and can run the OS in Ring 0. – Recent CPUs from Intel and AMD offer x86 virtualization instructions for a hypervisor to control Ring 0 hardware access. […] a guest operating system can run Ring 0 operations natively without affecting other guests or the host OS. DAIMI Henrik Bærbak Christensen 39 Virtualization and Reliable Architectures? What reliability can we get from using VM techniques? DAIMI Henrik Bærbak Christensen 40 The three approaches Three approaches to dependable systems – Fault avoidance: simply avoid introducing defects! – Fault detection and removal: Find and remove the defects before they cause failures. – Fault tolerance: Ensure that faults does not lead to failures. DAIMI Henrik Bærbak Christensen 41 DAIMI Henrik Bærbak Christensen 42 Fault Avoidance I fail so see any here – Fault avoidance in Sommerville’s perspective classify a set of static techniques • Programming languages, etc. – VVM’s are dynamic techniques… DAIMI Henrik Bærbak Christensen 43 Fault Detection and Removal Again, as in testing the detection aspect is central. How does VVM’s help? – Configuration Testing! • HelpDesk: – Large set of snapshots of typical customer configs » Like Internet Explorer 6 on Windows XP – Just resume this snapshot, reproduce defect and report to development • Configuration suite – Configuration testing on a single tester’s machine – Ex: I tested linux variants of my book’s code on my Win XP VMWare within Ubuntu 9.0.4 installed DAIMI Henrik Bærbak Christensen 44 Fault Detection Record and Playback – http://blogs.vmware.com/workstation/2008/04/enhanc ed-execut.html – See VMWare demo later DAIMI Henrik Bærbak Christensen 45 Fault Detection Distributed systems are complex Make a virtual network with a set of virtual machines – Lower bandwidth of network connection – Induce packet loss – Kill machines to find defective behavior in the rest – Test graceful degrading algorithms DAIMI Henrik Bærbak Christensen 46 Fault Tolerance Backward recovery technique – Restore previous state of system and restart Ex. Hardware/device failure on machine – Migrate last good snapshot to new machine VMWare also works on migrating running VMs – Capture run-time state of running VM – Hot migration allows full hot standby techniques… Dynamic VM management – Load balancing by create/destroy VMs – Hardware failure automatically migrate VMs DAIMI Henrik Bærbak Christensen 47 Fault Tolerance Lock-step execution – Primary VM has secondary VM running in shadow operation – If primary fails, the secondary acts as hot standby – http://www.vmware.com/files/pdf/perf-vspherefault_tolerance.pdf – See demo later… DAIMI Henrik Bærbak Christensen 48 Summary Virtualization – Isolate Apps+OS from hardware – CPU near techniques to do this fast • Trap-and-emulate & Binary Translation Dependability relation – Fault Detection • Configurations; Tricky setups; record-playback – Fault Tolerance • Support hot standby DAIMI Henrik Bærbak Christensen 49