HYPER-V IOMMU ARCHITECTURE AGENDA [Brief] Overview of DMA and Interrupts Hyper-V Current I/O Architecture Device Assignment I/O architecture Security Issues IOMMU Overview Hypervisor IOMMU Architecture Additional Use Cases References Questions TERMINOLOGY System Physical Address (SPA) – An address that refers to the physical memory in the system Guest Physical Address (GPA) – An address that refers to the guest view of physical memory Logical Processor (LP) - Physical CPU which can be a thread (SMT) or core Virtual Processor (VP) – Virtual CPU for the partitions Root partition – A partition that manages the child partitions DMA AND INTERRUPTS DIRECT MEMORY ACCESS (DMA) Allows devices to read/write data to/from memory Uses physical addressing Can be coherent or non coherent Allows processor to work in parallel while data is accessed by the device DMA (CONT.) Device A requests DMA at 0x21000 DMA Request Device A Address: 0x21000 DMA Request Physical Address: 0x21000 INTERRUPTS Interrupts are events that indicate that a condition exists somewhere in the system, the processor or within the currently executing program that requires the attention of a processor (Intel Manual) Interrupts can be generated either by the processor, devices or the currently executing program On all modern PC systems, Interrupts are processed through the Local APIC unit on each processor PROCESSOR INTERRUPT HANDLING Each processor has one Local APIC unit Each Local APIC unit has a unique id in the system Local APIC receives interrupt requests from internal and external sources and sends them to processor core for handling Local APIC can send interrupt requests to other logical processors in the system. This is called Inter Processor Interrupt (or IPI) Interrupt requests are sent by writing to the ICR register on the APIC page INTERRUPT HANDLING (CONT…) Interrupts are identified from various fields in the interrupt request shown below DEVICE INTERRUPTS Two ways for a device to generate an interrupt request: I/O APIC Message Signaled Interrupt (MSI) DEVICE INTERRUPTS I/O APIC This is used by devices that are connected to I/O APIC in the system The interrupt requests comes to the I/O APIC which sends the interrupt request to Local APIC Multiple devices can be connected to same I/O APIC input, thus sharing the interrupt DEVICE INTERRUPTS MSI Devices generate an MSI interrupt by writing MSI Data to a special address called MSI Address Each device can generate its own interrupts and there is no need to share interrupts between multiple devices This is much more flexible method and is required on the PCIe (PCI Express) devices HYPER-V CURRENT I/O ARCHITECTURE HYPER-V CURRENT I/O ARCHITECTURE Root Partition Child Partition Networking Stack Networking Stack VSP NIC Driver VSC VMBUS Hypervisor NIC LIMITATIONS Significant cross-partition communication overhead Hardware features are not utilized (such as RSS) Extra memory copy overhead Root partition bottleneck DEVICE ASSIGNMENT I/O ARCHITECTURE Root Partition Child Partition Networking Stack Networking Stack NIC Driver NIC Driver Hypervisor NIC NIC SECURITY ISSUES The child partition is untrusted The device assigned to the child partition has ability: to do DMA to generate interrupt requests This means the child partition can program the device to: Read/Write the hypervisor, root partition or other partition’s memory Spoof interrupts from other devices DoS attack by causing interrupt storm Effectively take over the whole system WHAT IS NEEDED FOR SAFE DEVICE ASSIGNMENT Ability to restrict devices to access its assigned child partition’s memory only Ability to prevent interrupt spoofing or storm attack from such devices The answer: IOMMU IOMMU OVERVIEW WHAT IS AN IOMMU? Hardware that allows DMA requests to be programmed in terms of GPA and verified using I/O page tables before accessing memory – DMA remapping Hardware that allows interrupt requests to be verified using interrupt remapping tables and targeted to the specified processors – Interrupt remapping Intel VT-d2 and AMD IOMMU DMA REMAPPING Provides translation of GPA to SPA for DMA requests Uses I/O page tables for translation Supports Domains Domain is a container for one or more devices that share one I/O page table I/O page tables are used to ensure that a device can only access memory that belongs to its assigned domain I/O page table can only be accessed by the privileged software thus preventing any tampering by untrusted software DMA REMAPPING (CONT.) Device A requests DMA at 0x21000 DMA Request Device A Address: 0x21000 IOMMU GPA SPA 0x21000 0xCF000 0x22000 0xD0000 0x23000 0xD1000 DMA Request Address: 0xCF000 DMA REMAPPING (CONT.) Device B requests DMA (uses different translation tables) DMA Request Device B Address: 0x21000 IOMMU GPA GPA 0x21000 0x21000 0x22000 0x22000 0x23000 0x23000 SPA SPA 0xEA000 0xCF000 0xEB000 0xD0000 0xEC000 0xD1000 DMA Request Address: 0xEA000 DMA REMAPPING (CONT.) Device C requests DMA DMA Request Device C Address: 0x21000 IOMMU GPA SPA 0x21000 - 0x22000 - 0x23000 - DMA REMAPPING Enables DMAs to be programmed in terms of GPA Allows for restricting DMA access to a subset of physical memory INTERRUPT REMAPPING Allows software to setup an interrupt remapping table (IRT) Interrupt requests from devices are intercepted and routed based on the IRT Restricts the interrupt requests from a device to the assigned vector on assigned processor INTERRUPT REMAPPING (CONT.) Interrupt Request Index* (2) Device: BDF-A IOMMU Interrupt Request LP 2, Vector 0x82 Logical Processor 2 Device ID Interrupt Index Target BDF-A 1 LP 1, Vector 0x80 BDF-A 2 LP 2, Vector 0x82 BDF-B 3 LP 1, Vector 0x81 BDF-C 4 LP 2, Vector 0x91 *Interrupt Index is calculated from MSI Data/Address pair or IOAPIC RTE. HYPERVISOR IOMMU ARCHITECTURE Introduces two new concepts – Device Domain and Device Interrupt Provides support for DMA remapping and interrupt remapping to multiple partitions Abstracts hardware differences between AMD IOMMU and Intel VT-d2 implementation DEVICE DOMAIN Device domain is a container for one or more devices Contains one I/O page table Associated with a partition Devices can be attached to (and detached from) a domain All DMA requests from the attached device goes through the domain I/O page table Uses PCI Segment/Bus/Device/Function as the device identifier One identity mapped domain per partition DEVICE DOMAIN Root Partition Child Partition Hypervisor Device (Assigned to child partition) IOMMU I/O Page Table (For Child Partition setup by the Hypervisor) DEVICE INTERRUPTS Hardware Interrupt Remapping Hardware interrupt remapping ensures device interrupt come on a specific physical processor and interrupt vector Ensures that a device can’t spoof an interrupt request Software Interrupt Remapping Software Interrupt Remapping maps a specific hardware interrupt from an LP to a VP Routing an interrupt from a device to specific VP, vector pair requires both software and hardware interrupt remapping DEVICE INTERRUPTS Child Partition (1) Child Partition (2) Virtual Processor Virtual Processor Root Partition Virtual Processor Virtual Processor Hypervisor Software Interrupt Remapping Table Device (Assigned to child partition 1) IOMMU Hardware Interrupt Remapping Table Logical Processor 0 Logical Processor 1 REFERENCES AMD IOMMU specifications http://www.amd.com/usen/assets/content_type/white_papers_and_tech_docs/ 34434.pdf Intel VT-d2 specifications ftp://download.intel.com/technology/computing/vptech /Intel(r)_VT_for_Direct_IO.pdf