A Primer on Virtualization Software & Services Group 1 Agenda • Westmere EP on Intel Roadmap • Adv. Enc. Std: New Instructions (AES-NI) in Westmere EP • Intel® Virtualization Technology – a primer – – – – – – – Evolution of Virtualization Intel Virtualization Technologies A Framework for Optimizing Virtualization – improving efficiency and scaling Reducing Virtualization Overheads: Processor, Memory, I/O Improving VM scaling VMM readiness Improving VMM security through TXT • Summary Content condensed from: “Intel® Virtualization Technology in the Enterprise”, Rich Uhlig, Intel Fellow, IDF 2009 San Francisco USA. Software & Services Group Evolution of Virtualization OS OS OS OS OS OS VMM VMM VMM Host Host Host Flexible Resource Management VMM Host Basic Consolidation OS OS VMM Host No Virtualization OS Host OS Host OS Host Drives Priorities: • Lower VMM Overheads • Richer Workloads in VMs • Seamless VM Migration • Power Efficiency • Increased Scaling (VMs, vCPUs) • Improved Reliability Software & Services Group 3 Intel® Virtualization Technologies Intel® VT-x Processor Intel® VT-d Chipset Intel® VT-c Network Intel® VT-x Hardware assists for robust virtualization Intel® VT FlexMigration – Flexible live migration Intel® VT FlexPriority – Interrupt acceleration Intel® EPT – Memory Virtualization Intel® VT for Directed I/O Reliability and Security through device Isolation I/O performance with direct assignment Intel® VT for Connectivity NIC Enhancement with VMDq Single Root IOV support Network Performance and reduced CPU utilization Intel® I/OAT for virtualization Lower CPU Overhead and Data Acceleration Software & Services Group A Framework for Optimizing Virtualization • Reduce overheads from virtualization • • • • • • Intel® VT-x Latency Reductions Extended Page Tables Virtual Processor IDs APIC Virtualization (Flex Priority) I/O Assignment via DMA Remapping Introduce capabilities that increase scaling out across VMs • • • Intel® Hyper-Threading Technology PAUSE-loop Exiting Network Virtualization 2. … and scale out across VMs 1. Reduce overhead within each VM… OS OS … OS VMM Host Intel® Virtualization Technology (Intel® VT) for IA-32, Intel® 64 and Intel® Architecture (Intel® VT-x) 5 Software & Services Group Reducing VT Latencies What makes VM Context Switching expensive Virtual Machine Ctl Structure (VMCS) – Maintains Guest & Host reg. state – “Backed” by host physical memory Saving-Loading privileged state • Compounded by consistency checking Addressing Context changes • Translation Lookaside Buffer (TLB) flushes –Accessed via architectural VMREAD/VMWRITE –Enables caching of VMCS state on-die Virtual Processor IDs (VPIDs) –Tag µarch structures (TLBs) –Removes need to flush TLBs Software & Services Group 6 Intel® Virtualization Technology (VT-x) • VT-x® provides architected assists to allow guest OSes to run directly on hardware • On Nehalem and Westmere VT-x is extended with: Extended Page Tables (EPT) Eliminates VM exits to the VMM for shadow pagetable maintenance Virtual Processor IDs (VPID) Avoid flushes on VM transitions to give a lower-cost VM transition time Guest Preemption Timer -- lets a VMM preempt a guest OS Aids VMM vendors in flexibility and Quality of Service (QoS) Descriptor Table Exiting –Traps on modifications of guest DTs Allows VMM to protect a guest from internal attack Transition Latency reductions Continuing improvements in microarchitectural handling of VMM round trips Software & Services Group Issues with abstracting physical memory Address Translation • Guest OS expects contiguous, zerobased physical memory • VMM must preserve this illusion Page-table Shadowing • VMM intercepts paging operations • Constructs copy of page tables VM0 VMn Guest Page Tables Guest Page Tables Induced VM Exits Remap VMM Shadow Page Tables Overheads • VM exits add to execution time • Shadow page tables consume significant host memory TLB CPU0 Memory Software & Services Group 8 How Extended Page Tables help with abstracting Physical Memory Extended Page Tables (EPT) • Map guest physical to host address • New hardware page-table walker Performance Benefit • A guest OS can modify its own page tables freely and without VM exits Memory Savings • A single EPT supports entire VM: instead of a shadow pagetable per guest process Intel® Virtualization Technology (Intel® VT) for IA-32, Intel® 64 and Intel® Architecture (Intel® VT-x) 9 Software & Services Group Linear Addr Extended Page Table (EPT) Walk CR3 EPTP L4 L3 L2 L1 EPTP L4 L3 L2 L1 EPTP L4 L3 L2 L1 EPTP L4 L3 L2 L1 EPTP L4 L3 L2 L1 PML4 TLB PDP PDE PTE Physical Addr 2-level TLB reduces page-table walks • VPID tags help to retain TLB entries Paging-structure caches • Cache intermediate steps in walk • Result: Reduce length of walks • Common case: Much better than 24 steps Software & Services Group 10 Difficulties in virtualizing I/O 1 Virtual Device Interface • Traps device commands • Translates DMA operations • Injects virtual interrupts Software Methods • I/O Device Emulation • Paravirtualize Device Interface Para- 2 virtualization I/O Device Emulation VM0 VMn Guest Device Driver Guest Device Driver Device Model Device Model Physical Device Driver VMM Challenges • Controlling DMA and interrupts • Overheads of copying I/O buffers Storage Memory Network Software & Services Group 11 Solution: DMA remapping (Intel® VT-d) DMA Requests Dev 31, Func 7 Device ID Virtual Address Length Bus 255 Dev P, Func 2 4KB Page Frame Bus N Fault Generation Bus 0 Dev P, Func 1 4KB Page Tables Dev 0, Func 0 Device D1 DMA Remapping Engine Translation Cache Context Cache Memory Access with Host Physical Address Device Assignment Structures Address Translation Structures Device D2 Address Translation Structures Memory-resident Partitioning & Translation Structures Software & Services Group 12 Intel® VT FlexPriority APIC Task Priority Register (TPR) • Controls interrupt delivery through the APIC • Accessed very frequently by some guest OSes Without Intel VT FlexPriority With Intel VT FlexPriority VM VM Guest OS No VM Exits VM Exits VMM APIC-TPR access in software • Fetch/decode instruction • Emulate APIC-TPR behavior • Thousands of cycles per exit VMM Guest OS APIC TPR access in HW configure • Instruction executes directly • Hardware emulates APIC-TPR access • No VM exit in the common case Software & Services Group 13 Technologies to improve VM scaling • Hyper-threading • Decreasing lock holder preemption impact • Network virtualization with Virtual Machine Device Queues • Single Root I/O Virtualization (SR-IOV) Software & Services Group 14 Reducing Lock Holder preemption impact Problem: • In an SMP guest, a vCPU holding a lock may get preempted • Other vCPUs that attempt to acquire that lock spin for the full quantum Solution: Pause-Loop Exiting (PLE) • Spin-locking code typically uses PAUSE instructions in a loop • A longer than “normal” loop duration taken as a sign of lockholder preemption • When that happens, HW forces an exit into VMM • VMM takes control and schedules some other vCPU spin_lock: attempt lock-acquire; if fail { PAUSE; jmp spin_lock } Software & Services Group 15 Network Virtualization: HW traffic management Virtual Machine Device Queues (VMDq) • Data packets grouped and sorted in HW • Packets sent to their respective VMs • Round-robin servicing on transmit 10.0 9.2 9.5 4.0 (vNIC) (vNIC) (vNIC) Layer 2 Software Switch Rx1 Rx1 Rx1 Rxn Rxn Rx2 NIC w/VMDq 4.0 0.0 w/o VMDq w/ VMDq Source Intel. w/ VMDq Jumbo Frames Tests measure Wire Speed Receive (Rx) Side Performance With VMDq on Intel® 82598 10 Gigabit Ethernet Controller Intel® Virtualization Technology (Intel® VT) for Connectivity (Intel® VT-c) 16 VMn Layer 2 Classified Sorter 6.0 2.0 VM2 VMM Throughput (Gbps) 8.0 VM1 LAN MAC/PHY Rx1 Rx2 Rx1 Rxn Rx1 Rxn Idle Idle Txn Txn Tx2 Txn Tx2 Tx1 Software & Services Group PCI-SIG SR-IOV • Description: – PCI-SIG Single Root I/O Virtualization (SR-IOV) Standard: – Allows for I/O devices to be simultaneously shared among VMs – Virtual interfaces can be directly assigned to reduce routing overheads • Benefits: – Provide near native performance due to direct connectivity – Allows direct VM control of I/O virtual functions • Requirements: – Intel VT-d as the core platform ingredient – BIOS support of SR-IOV Software & Services Group Westmere: Trusted Execution Technology (TXT) Guest Guest Server Extensions VM VM • TXT uses features in processor, chipset and TPM to enable more secure platforms • TXT works through measurement, memory locking and sealing secrets • TXT helps prevent software attacks Guest VM VMM Rootkit Hypervisor Platform hardware with VT-x support Helps prevent hijacking by rootkit Processor Processor – such as, attempts to insert rogue VMM (rootkit hypervisor) Chipset – and, reset-attacks that are designed to TPM compromise platform secrets in memory – and, BIOS and firmware update attacks TXT technology incorporates multiple components VT TXT Makes Platforms More Robust Against SW-based Attacks Software & Services Group Intel VT Features/VMM Support Feature Virtualization PWR Microsoft Hyper-V Xen OSS Red Hat (RHEL) KVM Novell (SLES) Min Rev Date Min Rev Date Min Rev Date Min Rev Date Rev Date Min Rev Date Min Rev for Nehalem 3.5U4 Now Win Svr ’08 Now Any Now Any Now Any Recent Now Any Now FlexPriority 3.5U4 Now Win Svr ’08 Now 3.1 Now 5.2 Now 2.6.24 Now 10 SP1 Now FlexMigration 3.5U4 Now Win Svr ’082 Now 3.3 Now TBD TBD TBD TBD EPT + VPID 4.0 Now Win Svr ’08 R2 1H’10 3.3 Now 5.3 Now 2.6.26 Now 10 SP2 Now Chipset (VT-d) VT-d2 4.0 Now TBD TBD 3.3 Now 5.4 Q3’09 2.6.31 Now 11 Now Network (VT-c) SR-IOV TBD TBD TBD TBD 3.41 Now3 TBD TBD TBD TBD TBD TBD VMDq 3.5U4 Now Win Svr ’08 R2 1H’10 TBD TBD TBD TBD TBD TBD TBD TBD Power Mgmt (S, 4.0 Now 1.0 (Sstate TBD) Now 3.3 Now TBD TBD 11 Now SSE 4.2 4.0 Now TBD TBD Any Now Any Now Any Now Turbo 4.0 Now 1.0 Now Any Now Any Now Any Now HT (SMT) 3.5U4 Now 1.0 Now Any Now Any Now Any Now Processor (VT-x) Other VMware ESX P, C, T States) Notes: Table is based on latest information as of: VT-c : July 09; VT-x and VT-d: May 09 and subject to change; Please contact vendors directly for unreleased products 1 Requires other ecosystem support 2 3 Product allows capability but robustness improvements in future releases PCI-SIG SR-IOV standard is supported now Software & Services Group Software & Services Group 20