Checkpoint/Restore in the Palacios Virtual Machine Monitor EECS 441 – Resource Virtualization Steven Jaconette, Eugenia Gabrielova, Nicoara Talpes Instructor: Peter Dinda Agenda • Background • Motivation • Design • Implementation • Future work Virtual Machine Monitors • Virtual Machine: o Software emulation or virtualization of a machine • Virtual Machine Monitor (VMM): o Allow multiple OS's to access physical machine resources • Each guest OS believes it is running directly on hardware Palacios VMM • Virtual Machine Monitor developed at Northwestern • Targeted at the Red Storm supercomputer at Sandia National Laboratories • Linked into a Host OS, allows both 64 and 32 bit guests. • Provides guests with functionality of Intel/AMD processor, memory, interrupts, and hardware devices. Palacios VMM Structure Checkpoint / Restore • Checkpoint: suspending a running OS instance, and copy it somewhere else (kernel, disc) • Restore: copy the OS instance to its destination • Used as part of OS migration • Useful when you know a machine will fail and want to move memory to a different place fast Checkpoint / Restore Checkpoint / Restore Motivation • Palacios cannot currently put guests to sleep, restore with memory intact • This functionality is the first step toward live-migration of guests • Both checkpointing and live-migration have important applications in supercomputing Checkpoint / Restore in Other Systems • Used in live OS migration • VMware Virtual Center: quiescing the VM after the precopy state • Xen Virtual Machine Monitor, same procedure as VMWare: • OS instance suspends itself, is moved to destination host. Then the suspended copy of the VM state is resumed Guest State in Palacios • Structures that make up VM: VMCB, Registers, pointers from guest info • Devices • Interrupts • Static Information • Pointers Guest State struct guest_info { ullong_t rip; uint_t cpl; struct v3_gprs vm_regs; struct v3_ctrl_regs ctrl_regs; struct v3_dbg_regs dbg_regs; struct v3_segments segments; addr_t mem_size; // Probably in bytes for now.... v3_shdw_map_t mem_map; v3_vm_operating_mode_t run_state; void * vmm_data; struct vm_time time_state; uint_t enable_profiler; struct v3_profiler profiler; v3_paging_mode_t shdw_pg_mode; struct shadow_page_state shdw_pg_state; addr_t direct_map_pt; // nested_paging_t nested_page_state; // This structure is how we get interrupts for the guest struct v3_intr_state intr_state; v3_io_map_t io_map; struct v3_msr_map msr_map; // device_map struct vmm_dev_mgr dev_mgr; struct v3_host_events host_event_hooks; v3_vm_cpu_mode_t cpu_mode; v3_vm_mem_mode_t mem_mode; void * decoder_state; v3_msr_t guest_efer; /* Do we need these ? */ v3_msr_t guest_star; v3_msr_t guest_lstar; v3_msr_t guest_cstar; v3_msr_t guest_syscall_mask; v3_msr_t guest_gs_base; }; Design 1: Serialization • "Flatten" guest state information at checkpoint • Not all guest information should be checkpointed o Devices, Interrupts o Static data from XML files • Restore from saved guest state information o Similar to configuring virtual machine at boot Design 2: Per-Guest Heap with Pointer Tagging • For each guest's heap: checkpoint heap and restore it to address space • Starting address for heap could be different after copy • Make sure pointers are not pointing to the wrong memory addresses by fixing them up • During copy, record start of heap and track the pointers for the addresses in the heap and save them as offsets • Problem: mallocs in external libraries and void pointers Per-Guest Heap w/ Pointer Tagging Per-Guest Heap with Pointer Tagging Design 3: Per-Guest Heap with User Space Mapping • Create a "per-guest" heap for each VM, as before. • Do not tag/fix pointers. • Map the heap to a well-known address in user space. o Mark the pages as "system" to prevent modification o On a checkpoint, copy from this address o For a restore, copy back to it • Change between VMs through process context switches. Per-Guest Heap in User Space Implementation • In order to create a per-guest heap, we must allocate a chunk of memory to represent the heap • The Host OS provides Palacios with malloc/free functions o Currently these are kitten kernel memory allocator functions • In order to allocate out of our chunk, we needed to define new allocation functions Implementation • Checkpoint / Restore o Checkpoint: Find next available location in user space, then copy relevant info o Queue of previously checkpointed guest data locations o Restore: Get checkpoint address from queue, copy back to guest heap Future Work • More coding is needed to test this design. • Has potential to greatly simplify checkpoint/restore of virtual machines • What's next: o Live-migration of guests o User space per-guest heaps in a different host OS Questions?