前瞻資訊科技 -虛擬化 (2) -Virtualization(V12N) 薛智文 cwhsueh@csie.ntu.edu.tw http://www.csie.ntu.edu.tw/~cwhsueh/ 100 Fall, Nov 4, Fri 678, DTH 104 國立台灣大學 資訊工程學系 Outline Introduction Xen Architecture Hypercall CPU Virtualization Memory Virtualization I/O Device Virtualization Hardware Virtual Machine Benchmark Domain 1 Summary 1 /28 資工系網媒所 NEWS實驗室 How to Virtualize ? Binary translation Hypercall Full Virtualization Para Virtualization Hardware Assisted Virtualization Trap and emulate Intel VT-x & AMD SVM 2 /28 資工系網媒所 NEWS實驗室 Virtual Machine Monitor (VMM) Hypervisor VM : Virtual Machine, Guest OS + Virtual Devices VM0 VM1 … VMN VM0 VM1 … VMN Hosted VMM, e.g. VMware Hypervisor, e.g. Xen Host Operating System Hardware Hardware Type I - Hypervisor Type II – Hosted VMM 3 /28 資工系網媒所 NEWS實驗室 Hypervisor (VMM) Type Type I + Microkernel Type I Xen (open source, Citrix), Microsoft Hyper-V Type I + Integrated kernel VMware ESX, KVM (kernel-base VM) Type II Type II (Host OS + Guest OS) VMware GSX, workstation, Microsoft virtual PC, Microsoft virtual server, Sun Virtual Box 4 /28 資工系網媒所 NEWS實驗室 Xen Architecture (1/2) Domain 0 Domain U Domain U Domain U 5 /28 資工系網媒所 NEWS實驗室 Xen Architecture (2/2) Compare to common Linux Linux Xen System Calls Hyper Calls Signals Events Interrupts Physical + Virtual Interrupts CPU PCPU + VCPU Filesystem XenStore POSIX Shared Memory Grant Tables/Shared Pages 6 /28 資工系網媒所 NEWS實驗室 Hyper Call System Call int 0x80 int 0x82 Guest OS 01 02 03 04 05 06 07 Hypervisor // linux/include/asm/unistd.h #define #define #define #define … __NR_restart_syscall __NR_exit __NR_fork __NR_read 0 1 2 3 HYPERVOSIR_sched_op int 82h hypercall Hypercall_table do_sched_op iret resume Guest OS Hyper Call 01 02 03 04 05 06 07 // xen/include/public/xen.h #define #define #define #define … __HYPERVISOR_set_trap_table __HYPERVISOR_mmu_update __HYPERVISOR_set_gdt __HYPERVISOR_stack_switch 7 /28 0 1 2 3 資工系網媒所 NEWS實驗室 Grant Table Page mapping & Page transferring Page as a unit Grant reference (GR) Grant entry Domain A create GR Domain B send GR map page access page unmap page Domain A transfer page Domain B send GR inform create GR receive page release GR release GR inform 8 /28 資工系網媒所 NEWS實驗室 Event Channel A lightweight signal mechanism Use “ports” as identifers (pending+mask) Four major purposes Guest OS Guest OS … IDC VCPU IPI VCPU VCPU … vIRQ Hypervisor Hardware IPI VCPU … pIRQ Virtual CPU Virtual Memory Physical CPU Physical Memory … Scheduling Eth0 Eth1 9 /28 … 資工系網媒所 NEWS實驗室 CPU Virtualization Architecture App App Guest OS Guest OS … Hypervisor VCPU VCPU VCPU … Scheduling PCPU PCPU PCPU … 2 scheduling algorithms (Non-Work Conserving) Simple Earliest Deadline First (SEDF) Credit 10 /28 資工系網媒所 NEWS實驗室 Interrupt Physical interrupt For the hypervisor or for guest OSes Virtual interrupt Ask guest OSes to do 8 for now (max is 24) Guest OS Guest OS … event OS Hypervisor ISR Hardware Hardware PIC Device IRQn PIC Device IRQn 11 /28 資工系網媒所 NEWS實驗室 Memory Virtualization (1/2) Two-level memory Three-level memory Virtual, Pseudo-physical, Machine hypervisor Application - Virtual Memory Guest OS OS -Physical -Pseudo-Physical Memory Memory P2M M2P Hypervisor -Machine Memory 12 /28 資工系網媒所 NEWS實驗室 Memory Virtualization (2/2) 168M memory for hypervisor 0xFC000000 0xFC400000 Area Size MPT, Machine-to-Physical Translation Table (RO) 16M Page-Frame Information 96M MPT, Machine-to-Physical Translation Table (R/W) 16M Heap 0xFFFFFFFF Linear Page Table 8M Shadow Linear Page Table 8M Per Domain Mappings 8M Direct Map 12M I/O Remap 4M 13 /28 資工系網媒所 NEWS實驗室 Memory Virtualization - Translation 4 mechanisms to manipulate page tables Paravirtualized page tables Write page tables (Only level 1 is writable) Shadow page tables Hardware-assisted paging Virtual Memory Page Table MMU (VM->PFN) Page Fault ! Shadow Page Table (VM->MFN or VM->P2M) Pseudo-Physical Memory P2M Second Level Paging HAP Machine Memory 14 /28 資工系網媒所 NEWS實驗室 Memory Virtualization - Shared Info Page Structure MAX : 32 VCPUs event channel TSC memory wall clock Compare with start_info_page Start Info Page Mapped by Information Shared Info Page Domain Builder Guest OS Static Dynamically Updated 15 /28 資工系網媒所 NEWS實驗室 I/O Device Virtualization Hypervisor also provides three mechanisms to use devices. Emulated Devices Paravirtualized Driver Pass-through 16 /28 資工系網媒所 NEWS實驗室 I/O Device Virtualization - Emulated Devices Implemented by QEMU e.g. sound card, ac97, sb16, etc QEMU-DM 17 /28 資工系網媒所 NEWS實驗室 I/O Device Virtualization - Paravirtualized Driver Split Device Driver Model An example of sending packets Back-End Driver Front-End Driver Native Driver 18 /28 資工系網媒所 NEWS實驗室 I/O Device Virtualization - I/O Ring Without data, it only transfers request/reply An example with GR Dom U Dom 0 GR GR GR Grant Table I/O Channel Hypervisor Active Grant Table Device 19 /28 資工系網媒所 NEWS實驗室 I/O Device Virtualization - Pass-Through Pass and directly use the device Dom U Dom 0 … Native Driver Hypervisor Hardware Dom U Virtual CPU Virtual Memory Physical CPU Physical Memory Native Driver Scheduling Eth0 Eth1 20 /28 … … 資工系網媒所 NEWS實驗室 Hardware Virtual Machine Intel Virtualization Technology Technology Description Virtualization Implementation VT-x Root/NonRoot CPU, Memory Extended Page Tables VT-i As VT-x, for Itanium VT-d DMA, Interrupt Devices VT-c Classify Packets Network Devices VMDq, VMDc Instructions Set IOMMU (Chipset) 21 /28 資工系網媒所 NEWS實驗室 CPU Benchmark (1/2) 8.3% Average over 100 tests, Deviation: 0.066~0.128% 22 /28 資工系網媒所 NEWS實驗室 CPU Benchmark (2/2) 5% Calculate the 32M digits of ∏. 23 /28 資工系網媒所 NEWS實驗室 Hard Disk Drive Benchmark 24 /28 資工系網媒所 NEWS實驗室 Network Benchmark (1/2) 59% Testing Time: 180 seconds, Deviation: 0.12~0.26%. 25 /28 資工系網媒所 NEWS實驗室 Network Benchmark (2/2) Average: 9.82% Sample Period: 2 seconds 26 /28 資工系網媒所 NEWS實驗室 Answers for Big Questions How fast can virtualization achieve? 95+% 99.9% What kinds of applications? Well … What problems it might incur? Technical Data Security Business Politics Globalization (G11N) = Internationalization (I18N) + Localization (L10N) … 27 /28 資工系網媒所 NEWS實驗室 Summary Stay hungry to be full [of passion]. Stay foolish to be smart [on absorption]. 假若真時真亦假 Virtualized reality. Real virtualization. Virtualized to go anywhere. Key is the system. System is the key. E.g. Virtual Tape Library 28 /28 資工系網媒所 NEWS實驗室