Hypervisor IOMMU Architecture

advertisement
HYPER-V IOMMU ARCHITECTURE
AGENDA
[Brief] Overview of DMA and Interrupts
 Hyper-V Current I/O Architecture
 Device Assignment I/O architecture
 Security Issues
 IOMMU Overview
 Hypervisor IOMMU Architecture
 Additional Use Cases
 References
 Questions

TERMINOLOGY

System Physical Address (SPA) – An address that
refers to the physical memory in the system

Guest Physical Address (GPA) – An address that
refers to the guest view of physical memory

Logical Processor (LP) - Physical CPU which can be a
thread (SMT) or core

Virtual Processor (VP) – Virtual CPU for the
partitions

Root partition – A partition that manages the child
partitions
DMA AND INTERRUPTS
DIRECT MEMORY ACCESS (DMA)
Allows devices to read/write data to/from memory
 Uses physical addressing
 Can be coherent or non coherent
 Allows processor to work in parallel while data is
accessed by the device

DMA (CONT.)

Device A requests DMA at 0x21000
DMA Request
Device A
Address: 0x21000
DMA Request
Physical
Address: 0x21000
INTERRUPTS
Interrupts are events that indicate that a
condition exists somewhere in the system, the
processor or within the currently executing
program that requires the attention of a
processor (Intel Manual)
 Interrupts can be generated either by the
processor, devices or the currently executing
program
 On all modern PC systems, Interrupts are
processed through the Local APIC unit on each
processor

PROCESSOR INTERRUPT HANDLING
Each processor has one Local APIC unit
 Each Local APIC unit has a unique id in the
system
 Local APIC receives interrupt requests from
internal and external sources and sends them to
processor core for handling
 Local APIC can send interrupt requests to other
logical processors in the system. This is called
Inter Processor Interrupt (or IPI)
 Interrupt requests are sent by writing to the ICR
register on the APIC page

INTERRUPT HANDLING (CONT…)

Interrupts are identified from various fields in
the interrupt request shown below
DEVICE INTERRUPTS

Two ways for a device to generate an interrupt
request:
I/O APIC
 Message Signaled Interrupt (MSI)

DEVICE INTERRUPTS

I/O APIC



This is used by devices that are connected to I/O
APIC in the system
The interrupt requests comes to the I/O APIC which
sends the interrupt request to Local APIC
Multiple devices can be connected to same I/O APIC
input, thus sharing the interrupt
DEVICE INTERRUPTS

MSI
Devices generate an MSI interrupt by writing MSI
Data to a special address called MSI Address
 Each device can generate its own interrupts and
there is no need to share interrupts between multiple
devices
 This is much more flexible method and is required on
the PCIe (PCI Express) devices

HYPER-V CURRENT I/O ARCHITECTURE
HYPER-V CURRENT I/O ARCHITECTURE
Root Partition
Child Partition
Networking Stack
Networking Stack
VSP
NIC Driver
VSC
VMBUS
Hypervisor
NIC
LIMITATIONS
Significant cross-partition communication
overhead
 Hardware features are not utilized (such as RSS)
 Extra memory copy overhead
 Root partition bottleneck

DEVICE ASSIGNMENT I/O ARCHITECTURE
Root Partition
Child Partition
Networking Stack
Networking Stack
NIC Driver
NIC Driver
Hypervisor
NIC
NIC
SECURITY ISSUES
The child partition is untrusted
 The device assigned to the child partition has
ability:

to do DMA
 to generate interrupt requests


This means the child partition can program the
device to:
Read/Write the hypervisor, root partition or other
partition’s memory
 Spoof interrupts from other devices
 DoS attack by causing interrupt storm
 Effectively take over the whole system

WHAT IS NEEDED FOR SAFE DEVICE
ASSIGNMENT
Ability to restrict devices to access its assigned
child partition’s memory only
 Ability to prevent interrupt spoofing or storm
attack from such devices
 The answer: IOMMU

IOMMU OVERVIEW
WHAT IS AN IOMMU?
Hardware that allows DMA requests to be
programmed in terms of GPA and verified using
I/O page tables before accessing memory – DMA
remapping
 Hardware that allows interrupt requests to be
verified using interrupt remapping tables and
targeted to the specified processors – Interrupt
remapping
 Intel VT-d2 and AMD IOMMU

DMA REMAPPING

Provides translation of GPA to SPA for DMA
requests


Uses I/O page tables for translation
Supports Domains
Domain is a container for one or more devices that
share one I/O page table
 I/O page tables are used to ensure that a device can
only access memory that belongs to its assigned
domain
 I/O page table can only be accessed by the privileged
software thus preventing any tampering by
untrusted software

DMA REMAPPING (CONT.)

Device A requests DMA at 0x21000
DMA Request
Device A
Address: 0x21000
IOMMU
GPA
SPA
0x21000
0xCF000
0x22000
0xD0000
0x23000
0xD1000
DMA Request
Address: 0xCF000
DMA REMAPPING (CONT.)

Device B requests DMA (uses different
translation tables)
DMA Request
Device B
Address: 0x21000
IOMMU
GPA
GPA
0x21000
0x21000
0x22000
0x22000
0x23000
0x23000
SPA
SPA
0xEA000
0xCF000
0xEB000
0xD0000
0xEC000
0xD1000
DMA Request
Address: 0xEA000
DMA REMAPPING (CONT.)

Device C requests DMA
DMA Request
Device C
Address: 0x21000
IOMMU
GPA
SPA
0x21000
-
0x22000
-
0x23000
-
DMA REMAPPING
Enables DMAs to be programmed in terms of
GPA
 Allows for restricting DMA access to a subset of
physical memory

INTERRUPT REMAPPING
Allows software to setup an interrupt remapping
table (IRT)
 Interrupt requests from devices are intercepted
and routed based on the IRT
 Restricts the interrupt requests from a device to
the assigned vector on assigned processor

INTERRUPT REMAPPING (CONT.)
Interrupt Request
Index* (2)
Device: BDF-A
IOMMU
Interrupt
Request
LP 2, Vector
0x82
Logical
Processor
2
Device ID
Interrupt Index
Target
BDF-A
1
LP 1, Vector 0x80
BDF-A
2
LP 2, Vector 0x82
BDF-B
3
LP 1, Vector 0x81
BDF-C
4
LP 2, Vector 0x91
*Interrupt Index is calculated from MSI Data/Address pair or IOAPIC
RTE.
HYPERVISOR IOMMU ARCHITECTURE
Introduces two new concepts – Device Domain
and Device Interrupt
 Provides support for DMA remapping and
interrupt remapping to multiple partitions
 Abstracts hardware differences between AMD
IOMMU and Intel VT-d2 implementation

DEVICE DOMAIN
Device domain is a container for one or more
devices
 Contains one I/O page table
 Associated with a partition
 Devices can be attached to (and detached from) a
domain
 All DMA requests from the attached device goes
through the domain I/O page table
 Uses PCI Segment/Bus/Device/Function as the
device identifier
 One identity mapped domain per partition

DEVICE DOMAIN
Root Partition
Child Partition
Hypervisor
Device
(Assigned to
child
partition)
IOMMU
I/O Page Table
(For Child
Partition setup
by the
Hypervisor)
DEVICE INTERRUPTS

Hardware Interrupt Remapping
Hardware interrupt remapping ensures device
interrupt come on a specific physical processor and
interrupt vector
 Ensures that a device can’t spoof an interrupt request


Software Interrupt Remapping


Software Interrupt Remapping maps a specific
hardware interrupt from an LP to a VP
Routing an interrupt from a device to specific VP,
vector pair requires both software and hardware
interrupt remapping
DEVICE INTERRUPTS
Child
Partition (1)
Child
Partition (2)
Virtual
Processor
Virtual
Processor
Root Partition
Virtual
Processor
Virtual
Processor
Hypervisor
Software Interrupt Remapping Table
Device
(Assigned to child
partition 1)
IOMMU
Hardware Interrupt
Remapping Table
Logical
Processor
0
Logical
Processor
1
REFERENCES

AMD IOMMU specifications


http://www.amd.com/usen/assets/content_type/white_papers_and_tech_docs/
34434.pdf
Intel VT-d2 specifications

ftp://download.intel.com/technology/computing/vptech
/Intel(r)_VT_for_Direct_IO.pdf
Download