IEEE Hot Interconnects 18, 2010 Multi-Root Share of Single-Root I/O Virtualization (SR-IOV) Compliant PCI Express Device Jun Suzuki Teruyuki Baba Yoichi Hidaka† Nobuharu Kami Junichi Higuchi Takashi Yoshikawa System Platforms Research Laboratories, NEC † IP Network Division, NEC Background Efficient Resource Use with I/O Device Sharing Inside Single Computer • Common I/O interface: virtio • Direct access between VM and device: VT-d + SR-IOV Among Multiple Computers • Further resource efficiency, reduction of space and power consumption Computer VM VM VM Computer virtio Hypervisor I/O Device Page 2 © NEC Corporation 2010 Computer VT-d + SR-IOV I/O Device I/O Device Hot Interconnects 18, 2010 Computer Requirements for Device Sharing among Multiple Computers Interoperability with Conventional System • Conventional PCIe is single computer system • Specification modification should be minimum Performance Performance • Efficient resource share PCIe MR-IOV Device Controller Interoperability Page 3 © NEC Corporation 2010 Hot Interconnects 18, 2010 Related Works Client Client (A) Device Controller Conventional I/O Device Low Performance • Provide I/O services for network-connected clients • Use Infiniband to mitigate performance bottleneck Device Controller J. Satran et al., “Scalable I/O – a Well-Architected Way to Do Scalable, Secure and Virtualized I/O”, USENIX WIOV’08, 2008. I/O Device (B) PCIe Multi-Root I/O Virtualization (MR-IOV) Hardware Modification High Performance PCIe Bus MR PCIe Switch MR I/O Device Page 4 © NEC Corporation 2010 • PCIe standard enhancement for multi-host sharing • Change hardware of switch and I/O device “Multi-Root I/O Virtualization and Sharing Specification Revision 1.0”, PCI-SIG, 2008. Hot Interconnects 18, 2010 Recent PCIe Hardware Enhancement for Device Sharing MR-IOV SR-IOV MR-IOV adaption is slow, but SR-IOV is widely accepted Conventional PCIe Multi-Root I/O Virtualization (MR-IOV) PCIe Bus • Large modification in hardware and driver • Change hardware of PCIe switch and I/O device MR PCIe Switch MR I/O Device Computer Single-Root I/O Virtualization (SR-IOV) VM VM VM • Direct access to device from VMs • NICs and storage devices are commercially available SR I/O Device Page 5 © NEC Corporation 2010 Hot Interconnects 18, 2010 Our Proposal: Sharing “SR-IOV” Device among Computers Using “ExpEther” Interconnect – PCIe over Ethernet [1] Achieve as efficient resource sharing as MR-IOV using common SR-IOV devices Driver Driver Driver EE Brid. EE Brid. EE Brid. Performance Standard Ethernet PCIe MR-IOV EE Brid. I/O Expansion Box SR I/O SR-IOV Device Device Proposal Using SR-IOV Device Controller Interoperability [1] J. Suzuki et al., “ExpressEther – Ethernet-Based Virtualization Technology for Reconfigurable Hardware Platform”, 14th IEEE Symposium on High-Performance Interconnects, pages 45-51, 2006. Page 6 © NEC Corporation 2010 Hot Interconnects 18, 2010 Our Approach for Sharing SR-IOV Device ExpEther as interconnect platform between computers and I/O devices via standard Ethernet Device access intervention at I/O-side ExpEther bridge • Assigning VF with Address Translation • Virtual I/O Endpoint • Hardware Packet Processing Page 7 © NEC Corporation 2010 Hot Interconnects 18, 2010 No Modification to Device, Driver High Performance ExpEther as Interconnect Platform • PCIe space separation with Gr-ID (VLAN), native PCIe hot-plug • PCIe bus extension with virtual PCIe switch • Reliable and low-latency transport of PCIe-encapsulated Ethernet frames[1] Gr-ID Group 1 Group 2 Computer A PCIe Bus Computer B Computer C ExpEther Bridge ExpEther Bridge ExpEther Bridge Virtual PCIe Switch System Manager Ethernet ExpEther Bridge ExpEther Bridge ExpEther Bridge PCIe Bus I/O Device A I/O Device B I/O Device C [1] H. Shimonishi et al., “A Congestion Control Algorithm for Data Center Area Communications”, 2008 International CQR Workshop. Page 8 © NEC Corporation 2010 Hot Interconnects 18, 2010 SR-IOV Device • Virtual Function (VF) directly communicates with VM (DomU) • VF is memory-mapped to DomU, DMA between VF and DomU • Physical Function (PF) receives device configuration Case for xen Computer DomU A PCI Driver PCI Driver, SR-PCIM VF Driver Config. Reg. MMIO VF A Config. Reg. MMIO SR-IOV-Compliant I/O Device Page 9 © NEC Corporation 2010 PCI Driver VF Driver Dom0 PF Driver PF DomU B Hot Interconnects 18, 2010 VF B Config. Reg. MMIO Device Access Intervention at I/O-Side ExpEther Bridge Assigning VF to Each Computer Computer A PCI VF Driver Driver Computer B PCI VF Driver Driver (2) Virtual I/O Endpoint (3) HW Packet Processing I/O-Side ExpEther Bridge PCI Driver, SR-PCIM PF Driver Virtual Address Config. Reg. Translation Address Virtual Config. Reg. Translation (1) VF Assignment with Address Translation PF Config. Reg. MMIO VF A Config. Reg. MMIO VF B Config. Reg. MMIO SR-IOV-Compliant I/O Device Page 10 © NEC Corporation 2010 Hot Interconnects 18, 2010 (1) Address Translation for VF Assignment • SR-IOV device needs to be mapped to single address space • Making device address space and remapping it to each computer • Address translation of PCIe packets at bridge Computer A VF Driver 32 or 64 bit Computer B VF Driver VF B Address Translation VF A MMIO VF A PF VF A’ Device Computer A VF B MMIO 0x0 SR-IOV I/O Device Page 11 VF B’ Translation I/O-Side EE Brid. © NEC Corporation 2010 Computer B Address Space Hot Interconnects 18, 2010 (2) Virtual I/O Endpoint • Configuration access forwarded to virtual reg., to avoid interference • Only driver access is forwarded to device • Actual device configuration performed by ExpEther bridge PCI Driver Computer A VF Driver Configuration Access Driver Access I/O-Side EE Brid. Virtual Configuration Register Address Translation VF A Configuration Register MMIO SR-IOV I/O Device Page 12 © NEC Corporation 2010 Hot Interconnects 18, 2010 (3) Hardware Packet Processing • Address translation of PCIe packets with hardware • High-speed forwarding and efficient sharing of I/O device Computer A VF Driver I/O-Side EE Brid. Address Translation Hardware Forwarding Engine Address Translation Table VF A MMIO SR-IOV I/O Device Page 13 © NEC Corporation 2010 Hot Interconnects 18, 2010 Prototype Implementation I/O Expansion Box I/O-Side ExpEther Bridge 10GbE 10GbE XFP 10GbE 10GbE XFP ExpEther Bridge FPGA PCIe Slot for SR-IOV Device PCIe Gen1 x8 PCIe Slot for ExpEther Brid. PCIe Gen1 x8 Page 14 © NEC Corporation 2010 Hot Interconnects 18, 2010 Experimental Setup EE: ExpEther Server 1 Server 2 EE Bridge Server 3 EE Bridge 10GbE Switch 10GbE Switch EE Bridge ExpEther load-balancing enables 10GbE x2, PCIe Gen1 x8 transport 10GbE I/O Box Benchmark Client EE Bridge for Device Sharing PCIe SR-IOV 10GbE NIC IP Network Page 15 © NEC Corporation 2010 Hot Interconnects 18, 2010 Server PCIe Tree • Assigned VF is handled as conventional I/O device • Driven with vendor VF driver [root@**** ~]# lspci -vt -[0000:00]-+-00.0 Intel Corporation 82X38/X48 Express DRAM Controller +-01.0-[0000:01]----00.0 Neterion Inc. X3100 Series 10 Gigabit Ethernet PCIe +-06.0-[0000:02]--+-00.0 Intel Corporation 82571EB Gigabit Ethernet Controller | \-00.1 Intel Corporation 82571EB Gigabit Ethernet Controller +-19.0 Intel Corporation 82566DM-2 Gigabit Network Connection Page 16 © NEC Corporation 2010 Hot Interconnects 18, 2010 Shared Bandwidth of SR-IOV NIC • Receive: 9.9Gb/s. Max efficiency: 99% • Send case shows overhead. Max efficiency: 77% Client Bandwidth [Gb/s] iperf Results Server 3 Server 2 Server 1 10 9 8 7 6 5 4 3 2 1 0 3 2 1 1 Receive #Servers Sharing NIC Receive: Client -> Server Send: Server -> Client MTU: 9000, txqueuelen: 12000 Page 17 © NEC Corporation 2010 Hot Interconnects 18, 2010 2 Send 3 Send Overhead SR-IOV NIC has implementation limit to send DMA read requests at one time Increasing limit number realizes performance near host I/O slot insertion Server Shared NIC Up to N Memory Read Requests Up to N Requests Completion w/ Data Page 18 © NEC Corporation 2010 Hot Interconnects 18, 2010 TCP/IP Communication between Servers Server 1 Bandwidth [Gb/s] • Inter-server communication via SR-IOV NIC • Source of performance overhead is same as client send test 10 9 8 7 6 5 4 3 2 1 0 Server 3 Server 2 Srv1 to Srv2 Srv1 to Srv2 and Sev2 and Srv3 to Srv3 Srv1 Send Page 19 © NEC Corporation 2010 Recieve Hot Interconnects 18, 2010 Conclusion Achieved Sharing of SR-IOV Device among Multiple Computers • Interconnects computers and I/O devices with standard Ethernet • No modification to SR-IOV device, its driver • Max efficiency of shared device: 99% Proposed Technologies to Share SR-IOV Device • “PCIe over Ethernet”, ExpEther • VF assignment with address translation • Virtual I/O endpoint • Hardware packet processing Page 20 © NEC Corporation 2010 Hot Interconnects 18, 2010