Virtualization Trends, Challenges and Solutions Naresh Sehgal, Ph.D., MBA Lead SW Architect Enterprise Platforms and Services Division Intel Corp, Bangalore Email: naresh.k.sehgal@intel.com Convergence 08 Robert X. Cringely on Computers… "If the automobile had followed the same development cycle as the computer… A Rolls-Royce would today cost $100, get a million miles per gallon, and explode once a year, killing everyone inside.” Convergence 08 Hardware Virtual Machines (VMs) App App ... VM0 App App Operating System GFX Physical Host Hardware Processors Memory Graphics A new layer of software... App Guest OS0 ... VM1 App App ... App ... App Guest OS1 VM Monitor (VMM) Physical Host Hardware Network Storage Keyboard / Mouse Without VMs: Single OS owns all hardware resources With VMs: Multiple OSes share hardware resources Virtualization enables multiple operating systems to run on the same platform Convergence 08 How long has virtualization been around? 1. Recent development: ~5 years 2. A while: 10 years 3. Older than Microsoft: 30 years 4. A lot longer…..>40 years Would you believe ~45 - 50 years? Convergence 08 Virtualization Strachey: Time Sharing in Large Fast Computers Open source Xen is released MIT: Project MAC Connectix is founded VMWare is founded 1950 1960 1970 1980 1990 2000 Goldberg: Survey of Virtual Machines Research IBM: M44/44X Project IBM & MIT: Compatible Time Sharing System Convergence 08 Intel introduces Intel Virtualization Technology Today Microsoft acquires Connectix Virtualization Challenges Complexity CPU virtualization requires binary translation or paravirtualization Must emulate I/O devices in software Functionality Paravirtualization may limit supported guest OSes Guest OSes “see” only simulated platform and I/O devices Reliability and Protection I/O device drivers run as part of host OS or hypervisor No protection from errant DMA that corrupts memory Performance Overheads of address translation in software Extra memory required (e.g., translated code, shadow tables) Convergence 08 Processor Virtualization Without VT VM0 Para- 3 Virtualization Guest OSes run at intended rings With VT VM0 VMn VMn Ring 3 Apps Apps Ring 3 Apps Apps Ring 1 Legacy OS Modified OS Ring 0 WinXP Linux Binary Translator Ring 0 Binary Translation Cache 1 Ring Deprivileging VMM 2 Binary Translation Processors CPU0 CPUn Convergence 08 (Standard IA-32 or IPF) VMX Root Mode VM Entry VM VMCS Configuration Exit H/W VM Control Structure (VMCS) VT-x CPU0 CPUn VMM Memory and I/O Virtualization Processors with VT-x (or VT-i) Intel® Virtualization Technology (VT) 1st VT base SW Solutions ..… App OS App App App OS OS OS Virtual Machine Monitor Intel® Processors with Virtualization Technology and others … Intel® VT CoreTM 2 Microarchitecture based systems First to market with native virtualization support Broadest HW and SW ecosystem support Significant increase in performance and improved VT performance overall segments Mobile - Intel® Core™2 Duo Mobile Processor for Intel® Centrino® Duo Mobile Technology Desktop - Intel® Core™2 Duo Desktop Processor E6000 sequence - Server Dual-Core Intel® Xeon® Processor 5100 series Get More Done On Every Server Get More Capabilities On Client Convergence 08 Today’s Uses – Servers Virtualization addresses today’s IT concerns Server Consolidation VM1 VM1 VMn … App App OS OS OS HW0 HWn App Test and Development VMn … VM1 App App App OS OS OS VMM VMM HW HW 10:1 in many cases Enables rapid deployment Virtualization increases server utilization, simplifies legacy software migration Convergence 08 Emerging Server Usage Models True “Lights Out” Datacenter Dynamic Load Balancing VM2a VM1a VM2b VM1b Disaster Recovery VM1 App 1 App 2 App 3 App 4 App OS OS OS OS OS … VMn VM1 App App OS OS VMn … OS VMM VMM VMM VMM HW HW HW HW CPU Usage 90% 62% App CPU Usage 30% 63% Balancing utilization with head room Upholding high-levels of business continuity Intel® Virtualization Technology will play an integral role on the next generation of VMMs Convergence 08 Emerging Business Usage Models Built-in Management Professional Business Platform Proactive Security Energy Efficient Performance Intel Platform Software Convergence 08 vProTM Key Features Remotely Manageability Repair down systems Securely update systems Audit powered-down PCs Prevents malicious packets from entering the OS HP OpenView Supported by over 45 OEMs, ISVs, & IT Outsourcers More details in the IDF vProTM tracks Convergence 08 Intel® Virtualization and Intel® vPro™ technology ® VM0 VM1 User Partition Service Partition Stack owned and managed by IT dept… protected from users “Firewall” Application App0 App1 Appn User OS (Win2K, XP) . “Management” Application Service OS (WinCE or Linux) Lightweight VMM (LWVMM) Intel® architecture Platform VT AMT Uses Intel VT for creating a separate independent hardware-based environment inside of the PC Service Partition – Allowing IT administrators to create a dedicated and tamper resistant service environment or partition where tasks can run independently and isolated from the main operating system as well as from the end user User partition - OS and application Help desk or console access even when user partition is “down” Convergence 08 Intel, the Intel logo, and Intel architecture are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Intel® Virtualization Technology Evolution Vector 3: I/O Focus PCI-SIG Vector 2: Platform Focus Vector 1: Processor Focus VMM Software Evolution VT-d VT-x VT-i Software-only VMMs • Binary translation • Paravirtualization Past No Hardware Support Establish foundation for virtualization in the IA-32 and Itanium architectures… Simpler and more Secure VMM through foundation of virtualizable ISAs Today Standards for IO-device sharing: • Multi-context I/O devices • Endpoint device translation caching • Under definition in the PCI-SIG* IOV Hardware support for IO-device virtualization: • Device DMA remapping • Direct assignment of I/O devices to VMs • Device-independent control over DMA … followed by on-going evolution of support: • Micro-architectural (e.g., lower VM switch times) • Architectural (e.g., extended page tables, EPT) Increasingly better CPU and I/O virtualization Performance and Functionality as I/O devices and VMMs exploit infrastructure provided by VT-x, VT-i, VT-d VMM software evolution over time with hardware support *Other names and brands may be claimed as the property of others Convergence 08 Options for I/O Virtualization Hypervisor Model Service VMs VMn VM0 Guest OS and Apps Guest OS and Apps Service VM Model I/O Services Device Drivers I/O Services Pass-through Model Guest VMs VMn VM0 Guest OS and Apps VMn VM0 Guest OS and Apps Guest OS and Apps Device Drivers Device Drivers Device Drivers Hypervisor Shared Devices Pro: High Performance Pro: I/O Device Sharing Pro: VM Migration Con: Large Hypervisor Hypervisor Hypervisor Shared Devices Assigned Devices Pro: Higher Security Pro: I/O Device Sharing Pro: VM Migration Con: Lower Performance VT Goal: Support all 3 Models Convergence 08 Pro: Higher Performance Pro: Rich Device Features Con: Limited Sharing Con: VM Migration Limits VT-d Overview VT-d provides infrastructure for I/O virtualization Defines architecture for DMA and interrupt remapping Common architecture across IA platforms Will be supported broadly across Intel® chipsets CPU CPU System Bus North Bridge DRAM VT-d Integrated Devices PCIe* Root Ports PCI Express Convergence 08 South Bridge PCI, LPC, Legacy devices, … *Other names and brands may be claimed as the property of others How VTd works? 1000 Each VM thinks it is 0 address based GPA (Guest Physical Address) But mapped to a different address in the system memory HPA (Host Physical Address) VM0 0 Catches any DMA attempt to cross VM memory boundary Convergence 08 600 100 VM1 0 VTd does the address mapping between GPA and HPA 700 100 350 50 300 10 260 250 100 200 VM2 100 0 0 DMA Remapping: Hardware Overview DMA Requests Device ID Virtual Address Length … Dev 31, Func 7 Bus 255 Dev P, Func 2 4KB Page Frame Bus N Fault Generation Bus 0 Dev P, Func 1 Dev 0, Func 0 DMA Remapping Engine Translation Cache Context Cache Memory Access with Host Physical Address Convergence 08 Device Assignment Structures 4KB Page Tables Address Translation Device D1 Structures Device D2 Address Translation Structures Memory-resident Partitioning & Translation Structures VT-d Applied to Hypervisor Model Improved Reliability and Protection Hypervisor programs remap tables Errant DMA is detected by hardware and reported to hypervisor / device driver Hypervisor Model VMn VM0 Guest OS and Apps Guest OS and Apps I/O Services Bounce Buffer Support Limited DMA addressability in I/O devices limits access to high memory “Bounce buffer” is a software technique to copy I/O buffers into high memory VT-d eliminates need for “bounce buffer” Above equally useful for standard OSes VT-d does not require a VMM to function Convergence 08 Device Drivers Hypervisor Shared Devices Pro: Higher Performance Pro: I/O Device Sharing Pro: VM Migration Con: Larger Hypervisor VT-d Applied to Service VM Model Service VM Model Device Driver Deprivileging Device drivers run above hypervisor as part of a “Service OS” Guest device drivers program devices in DMA-virtual address space Service VM Forwards DMA API calls to hypervisor Hypervisor sets up DMA-virtual to hostphysical translation Further Improvements in Protection Guest device driver cannot compromise hypervisor code or data Convergence 08 Service VMs I/O Services Device Drivers Guest VMs VMn VM0 Guest OS and Apps Hypervisor Shared Devices Pro: High Security Pro: I/O Device Sharing Pro: VM Migration Con: Lower Performance VT-d Applied to Pass-through Model Pass-through Model Direct Device Assignment to Guest OS Guest OS directly programs physical device For legacy guests, hypervisor sets up guest- to host-physical DMA mapping For remapping aware guests, hypervisor involved in map/unmap of DMA buffers PCI-SIG I/O Virtualization Working Group Activity towards standardizing natively sharable I/O devices IOV devices provide virtual interfaces, each independently assignable to VMs Convergence 08 VMn VM0 Guest OS and Apps Guest OS and Apps Device Drivers Device Drivers Hypervisor Assigned Devices Pro: Highest Performance Pro: Smaller Hypervisor Pro: Device-assisted sharing Con: VM Migration Limits DMA Remapping: IOTLB Scaling Address Translation Services (ATS) extensions to PCIe* enable IOTLB scaling ATS endpoint implements ‘Device IOTLBs’ Device-IOTLBs can be used to improve performance E.g., Cache only static translations (e.g. command buffers) Pre-fetch translations to reduce latency Minimizes dependency on root-complex caching Support device-specific demand I/O paging Convergence 08 *Other names and brands may be claimed as the property of others Address Translation Services (ATS) ATS Translation Flows Root Complex Translate Address Translated DMA Request IOTLB Translation Response Remap Hardware Translation Request Device issues Translation Requests to root-complex Root-complex provides Translation Response Device caches translation locally in ‘Device IOTLB’ Devices can issue DMA with translated address Translated DMA from enabled devices bypass address translation Device IOTLB DMA using Translated Address Endpoint Device VT-d supports per-device control of ATS Convergence 08 *Other names and brands may be claimed as the property of others Invalidation Architecture Invalidation enforces consistency of caches Required when software updates translation structures Invalidation primitives Global, domain-selective, and page-range invalidations Support for Device-IOTLB invalidation (through ATS) Invalidation software interfaces Synchronous interface through MMIO registers Queued interface through invalidation queue Convergence 08 ATS Invalidations Invalidation details Invalidation request contains unique Invalidation Tag Invalidation Responses may be coalesced *Other names and brands may be claimed as the property of others Convergence 08 Remap Hardware IOTLB Invalidation Response Root-complex issues invalidation request to device Device invalidates specified mappings from Device IOTLB Device issues Invalidation response Root Complex Invalidation Request ATS Invalidation Flow Device IOTLB Invalidate Device-IOTLB Endpoint Device Mapping to VMM Software Challenges Virtual Machines (VMs) VMM (a.k.a., hypervisor) VM0 VM1 VM2 Apps Apps Apps OS OS OS … VMn Apps OS Higher-level VMM Functions: Resource Discovery / Provisioning / Scheduling / User Interface Processor Virtualization Ring Virtual CPU Binary Deprivileging Translation Configuration Memory Virtualization Page-table EPT Configuration Shadowing I/O Device Virtualization I/ODMA DMAand Interrupt Interrupt Remapping Remapping Configuration Remapping VT-d2 Physical Platform Resources VT-x CPU0 VT-d VT-x2 CPUn Processors Convergence 08 I/O Device Emulation Sharing Storage PCI SIG VMDq Network Memory I/O Devices ® Example 6: Virtualization overhead on Intel experimental client VMM* (vs. Native OS) PCMark Performance Indicator 97.88% 100.00% 99.67% 93.90% 85.69% 83.44% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% System CPU Memory Graphic HDD • Relatively low Virtualization overheads for client benchmark •Targeting <10% overhead with improved SW techniques • Further VMM SW optimization and Next generation VT features to reduce virtualization overheads * Pre beta version Source: Intel Corporation Projections and technical specifications are based on internal analysis and subject to change Convergence 08 Summary: A better IA platform First to Market & Massive Ecosystem Support: Choice: Broadest virtualization software support in the industry Robust: First x86 hardware assisted virtualization technology (Intel VT) Innovation: common specification = enhanced virtualization on x86 and will set the standard Flexibility: Leverage Intel® Xeon® processor-based servers widely deployed infrastructure for advanced failover and dynamic load balancing Better Platform Reliability: Critical for more applications on the same server More reliability features Proven Platform Architecture - almost 40X more IA based servers than AMD based since 19961 Performance Headroom “Choose the right basket” Intel® Xeon® processors have key performance features for virtualization: dual-core, hyper-threading, I/O, memory, and larger caches 1 – source: Q4’05 IDC server Tracker, 1996-2005 total system shipped Whitepaper on Virtualization benefits: http://www.intel.com/business/bss/products/server/virtualization_wp.pdf Convergence 08 Backup Q&A Convergence 08 Example 1: SysBench Running with VMware*’s ESX Server* SysBe nch normalize d re sults graph 1.80 Normalized scores 1.60 1.40 Dual-Core AMD Opteron 285-based server 1.20 1.00 Dual-Core Intel Xeon processor 5160-based server 0.80 0.60 0.40 0.20 0.00 4 VM 2 VM 1 VM Number of virtual machines Figure 1: Normalized SysBench results for the two test servers in the one, two, and four virtual machine environments. Higher numbers are better. *Source: Principled Technologies (PT) performance report http://www.principledtechnologies.com/clients/reports/Intel/VMSysBench0706.pdf System configuration in backup foils Convergence 08 Example 2: SPECjbb Running with VMware*’s ESX Server** SPECjbb2005 normalized results graph 1.80 Normalized scores 1.60 1.40 Dual-Core AMD Opteron 285-based server 1.20 1.00 Dual-Core Intel Xeon processor 5160-based server 0.80 0.60 0.40 0.20 0.00 4 VM 2 VM 1 VM Number of virtual machines Figure 2: Normalized SPECjbb2005 results for the two test servers in the one, two, and four virtual machine environments. Higher numbers are better. * *Other names and brands may be claimed as the property of others **Source: Principled Technologies (PT) performance report Comparing Dual-Core AMD Opteron 285 with Dual-Core Intel® Xeon 5160 Convergence 08 ® Processor Example 3: Microsoft* Virtual Server* Java Performance with 4 VMs - JVM BEA WebLogic JRockit® on Microsoft* Virtual Server 1.53 1.6 VMM – Microsoft* Virtual Server* 2005 R2 SP1 Java JFT workload Guest OS - Windows 2003* Enterprise Edition R2 (32 bit) 1.4 1.2 1 1 0.8 0.6 Up to 53% gain Benchmark - JVM BEA WebLogic x`® (build R26.0.0-188-528751.5.0_04-2005110-0920-linuxx86_64) 0.4 0.2 0 HP DL385 with 2*AMD SuperMicro SDP with Opteron 2.6GHz 2* Dual-Core Intel ® Xeon Processor 3.0GHz Systems – HP DL385 2 AMD Opteron 2.6GHz 2x1MB Intel® Dual-Core Intel ® Xeon® Processor 3.0G SuperMicro SDP 16x1GB Source: Intel Corporation Projections and technical specifications are based on internal analysis and subject to change *Other names and brands may be claimed as property of others. System Configuration details in backup. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800628-8686 or 1-916-356-3104. Convergence 08 Example 4: Energy Efficient Performance Performance/Watt - - JVM BEA WebLogic JRockit® on Microsoft* Virtual Server 1.6 1.6 1.4 1.2 1 1 0.8 0.6 0.4 0.2 0 HP DL385 with 2*AMD Opteron 2.6GHz SuperMicro SDP with 2* Dual-Core Intel ® Xeon Processor 3.0GHz Intel® CoreTM 2 Duo based system provide Energy Efficient Performance (EEP ) Leadership in virtualized environment Source: Intel Corporation Projections and technical specifications are based on internal analysis and subject to change Convergence 08 Example 5: MS VS SpecJBB 2005* SpecJBB 2005 Microsoft VS R2 SP1 (RH32 Guests) Host OS 16404 18000 16000 Microsoft* Virtual Server* R2 Beta SP1 ver. 1.1.512.0 EE 13938 14000 1.18X Virtualization 12000 10000 Bops 8408 8000 Microsoft* Server 2003 X64 Enterprise Edition SP1 RTM Drop B1036 vmm.sys Microsoft* Virtual Machine Windows* Guest Editions ver. 13.705 1.66X 6000 Guest OS RedHat V9 2.4.20-8 kernel (32-bit) Workload SpecJBB 2005 1.95X 4000 2000 0 Opteron 1P RH32 Guest (SW) Intel ® XEON 5100 Intel ® XEON 5100 series (SW) series (VT) ® Intel XEON® SW Virtualized guest performance is 1.66x of Opteron Intel® XEON® VT performance is 1.18x of Software (no VT) Intel® XEON® Intel® XEON® VT performance is 1.95x of Opteron SW (no Pacifica) Source: Intel Corporation Projections and technical specifications are based on internal analysis and subject to change Convergence 08