Jun_MultiRootShare_2..

advertisement
IEEE Hot Interconnects 18, 2010
Multi-Root Share of Single-Root I/O Virtualization
(SR-IOV) Compliant PCI Express Device
Jun Suzuki
Teruyuki Baba
Yoichi Hidaka†
Nobuharu Kami
Junichi Higuchi
Takashi Yoshikawa
System Platforms Research Laboratories, NEC
† IP Network Division, NEC
Background
Efficient Resource Use with I/O Device Sharing
Inside Single Computer
• Common I/O interface: virtio
• Direct access between VM and device: VT-d + SR-IOV
Among Multiple Computers
• Further resource efficiency, reduction of space and power consumption
Computer
VM
VM
VM
Computer
virtio
Hypervisor
I/O Device
Page 2
© NEC Corporation 2010
Computer
VT-d +
SR-IOV
I/O Device
I/O Device
Hot Interconnects 18, 2010
Computer
Requirements for Device Sharing among Multiple Computers
Interoperability with Conventional System
• Conventional PCIe is single computer system
• Specification modification should be minimum
Performance
Performance
• Efficient resource share
PCIe
MR-IOV
Device
Controller
Interoperability
Page 3
© NEC Corporation 2010
Hot Interconnects 18, 2010
Related Works
Client
Client
(A) Device Controller
Conventional I/O Device
Low Performance
• Provide I/O services for network-connected clients
• Use Infiniband to mitigate performance bottleneck
Device
Controller
J. Satran et al., “Scalable I/O – a Well-Architected Way to Do Scalable,
Secure and Virtualized I/O”, USENIX WIOV’08, 2008.
I/O Device
(B) PCIe Multi-Root I/O Virtualization
(MR-IOV)
Hardware Modification
High Performance
PCIe Bus
MR PCIe Switch
MR I/O Device
Page 4
© NEC Corporation 2010
• PCIe standard enhancement for multi-host sharing
• Change hardware of switch and I/O device
“Multi-Root I/O Virtualization and Sharing Specification Revision 1.0”,
PCI-SIG, 2008.
Hot Interconnects 18, 2010
Recent PCIe Hardware Enhancement for Device Sharing
MR-IOV
SR-IOV
MR-IOV adaption is slow,
but SR-IOV is widely accepted
Conventional PCIe
Multi-Root I/O Virtualization (MR-IOV)
PCIe Bus
• Large modification in hardware and driver
• Change hardware of PCIe switch and I/O device
MR PCIe Switch
MR I/O Device
Computer
Single-Root I/O Virtualization (SR-IOV)
VM
VM
VM
• Direct access to device from VMs
• NICs and storage devices are commercially available
SR I/O Device
Page 5
© NEC Corporation 2010
Hot Interconnects 18, 2010
Our Proposal: Sharing “SR-IOV” Device among Computers
Using “ExpEther” Interconnect – PCIe over Ethernet [1]
Achieve as efficient resource sharing as
MR-IOV using common SR-IOV devices
Driver
Driver
Driver
EE Brid.
EE Brid.
EE Brid.
Performance
Standard
Ethernet
PCIe
MR-IOV
EE Brid.
I/O Expansion
Box
SR I/O
SR-IOV
Device
Device
Proposal
Using SR-IOV
Device
Controller
Interoperability
[1] J. Suzuki et al., “ExpressEther – Ethernet-Based Virtualization Technology for Reconfigurable Hardware Platform”, 14th IEEE
Symposium on High-Performance Interconnects, pages 45-51, 2006.
Page 6
© NEC Corporation 2010
Hot Interconnects 18, 2010
Our Approach for Sharing SR-IOV Device
 ExpEther as interconnect platform between computers and
I/O devices via standard Ethernet
 Device access intervention at I/O-side ExpEther bridge
• Assigning VF with Address Translation
• Virtual I/O Endpoint
• Hardware Packet Processing
Page 7
© NEC Corporation 2010
Hot Interconnects 18, 2010
No Modification
to Device, Driver
High Performance
ExpEther as Interconnect Platform
• PCIe space separation with Gr-ID (VLAN), native PCIe hot-plug
• PCIe bus extension with virtual PCIe switch
• Reliable and low-latency transport of PCIe-encapsulated Ethernet frames[1]
Gr-ID
Group 1
Group 2
Computer
A
PCIe Bus
Computer
B
Computer
C
ExpEther
Bridge
ExpEther
Bridge
ExpEther
Bridge
Virtual PCIe Switch
System
Manager
Ethernet
ExpEther
Bridge
ExpEther
Bridge
ExpEther
Bridge
PCIe Bus
I/O Device
A
I/O Device
B
I/O Device
C
[1] H. Shimonishi et al., “A Congestion Control Algorithm for Data Center Area Communications”, 2008 International CQR Workshop.
Page 8
© NEC Corporation 2010
Hot Interconnects 18, 2010
SR-IOV Device
• Virtual Function (VF) directly communicates with VM (DomU)
• VF is memory-mapped to DomU, DMA between VF and DomU
• Physical Function (PF) receives device configuration
Case for xen
Computer
DomU A
PCI
Driver
PCI Driver,
SR-PCIM
VF
Driver
Config. Reg. MMIO
VF A
Config. Reg. MMIO
SR-IOV-Compliant I/O Device
Page 9
© NEC Corporation 2010
PCI
Driver
VF
Driver
Dom0
PF
Driver
PF
DomU B
Hot Interconnects 18, 2010
VF B
Config. Reg. MMIO
Device Access Intervention at I/O-Side ExpEther Bridge
Assigning VF to Each Computer
Computer A
PCI
VF
Driver
Driver
Computer B
PCI
VF
Driver
Driver
(2) Virtual
I/O Endpoint
(3) HW Packet Processing
I/O-Side ExpEther Bridge
PCI Driver,
SR-PCIM
PF
Driver
Virtual
Address
Config. Reg. Translation
Address
Virtual
Config. Reg. Translation
(1) VF Assignment with
Address Translation
PF
Config. Reg. MMIO
VF A
Config. Reg. MMIO
VF B
Config. Reg. MMIO
SR-IOV-Compliant I/O Device
Page 10
© NEC Corporation 2010
Hot Interconnects 18, 2010
(1) Address Translation for VF Assignment
• SR-IOV device needs to be mapped to single address space
• Making device address space and remapping it to each computer
• Address translation of PCIe packets at bridge
Computer A
VF
Driver
32 or
64 bit
Computer B
VF
Driver
VF B
Address
Translation
VF A
MMIO
VF A
PF
VF A’
Device
Computer A
VF B
MMIO
0x0
SR-IOV I/O Device
Page 11
VF B’
Translation
I/O-Side
EE Brid.
© NEC Corporation 2010
Computer B
Address Space
Hot Interconnects 18, 2010
(2) Virtual I/O Endpoint
• Configuration access forwarded to virtual reg., to avoid interference
• Only driver access is forwarded to device
• Actual device configuration performed by ExpEther bridge
PCI
Driver
Computer A
VF
Driver
Configuration Access
Driver Access
I/O-Side
EE Brid.
Virtual Configuration
Register
Address
Translation
VF A
Configuration
Register
MMIO
SR-IOV I/O Device
Page 12
© NEC Corporation 2010
Hot Interconnects 18, 2010
(3) Hardware Packet Processing
• Address translation of PCIe packets with hardware
• High-speed forwarding and efficient sharing of I/O device
Computer A
VF
Driver
I/O-Side
EE Brid.
Address Translation
Hardware
Forwarding
Engine
Address
Translation
Table
VF A
MMIO
SR-IOV I/O Device
Page 13
© NEC Corporation 2010
Hot Interconnects 18, 2010
Prototype Implementation
I/O Expansion Box
I/O-Side ExpEther Bridge
10GbE
10GbE
XFP
10GbE
10GbE
XFP
ExpEther
Bridge
FPGA
PCIe Slot for SR-IOV Device
PCIe Gen1 x8
PCIe Slot for ExpEther Brid.
PCIe Gen1 x8
Page 14
© NEC Corporation 2010
Hot Interconnects 18, 2010
Experimental Setup
EE: ExpEther
Server 1
Server 2
EE Bridge
Server 3
EE Bridge
10GbE
Switch
10GbE
Switch
EE Bridge
ExpEther load-balancing enables
10GbE x2, PCIe Gen1 x8 transport
10GbE
I/O Box
Benchmark
Client
EE Bridge
for Device Sharing
PCIe
SR-IOV
10GbE NIC
IP Network
Page 15
© NEC Corporation 2010
Hot Interconnects 18, 2010
Server PCIe Tree
• Assigned VF is handled as conventional I/O device
• Driven with vendor VF driver
[root@**** ~]# lspci -vt
-[0000:00]-+-00.0 Intel Corporation 82X38/X48 Express DRAM Controller
+-01.0-[0000:01]----00.0 Neterion Inc. X3100 Series 10 Gigabit
Ethernet PCIe
+-06.0-[0000:02]--+-00.0 Intel Corporation 82571EB Gigabit Ethernet
Controller
|
\-00.1 Intel Corporation 82571EB Gigabit Ethernet
Controller
+-19.0 Intel Corporation 82566DM-2 Gigabit Network Connection
Page 16
© NEC Corporation 2010
Hot Interconnects 18, 2010
Shared Bandwidth of SR-IOV NIC
• Receive: 9.9Gb/s. Max efficiency: 99%
• Send case shows overhead. Max efficiency: 77%
Client Bandwidth [Gb/s]
iperf Results
Server 3
Server 2
Server 1
10
9
8
7
6
5
4
3
2
1
0
3
2
1
1
Receive
#Servers Sharing NIC
Receive: Client -> Server
Send: Server -> Client
MTU: 9000, txqueuelen: 12000
Page 17
© NEC Corporation 2010
Hot Interconnects 18, 2010
2
Send
3
Send Overhead
SR-IOV NIC has implementation limit to send
DMA read requests at one time
Increasing limit number realizes performance near host
I/O slot insertion
Server
Shared NIC
Up to N Memory
Read Requests
Up to N Requests
Completion w/ Data
Page 18
© NEC Corporation 2010
Hot Interconnects 18, 2010
TCP/IP Communication between Servers
Server 1 Bandwidth [Gb/s]
• Inter-server communication via SR-IOV NIC
• Source of performance overhead is same as client send test
10
9
8
7
6
5
4
3
2
1
0
Server 3
Server 2
Srv1 to
Srv2
Srv1 to Srv2 and
Sev2 and Srv3 to
Srv3
Srv1
Send
Page 19
© NEC Corporation 2010
Recieve
Hot Interconnects 18, 2010
Conclusion
Achieved Sharing of SR-IOV Device among Multiple Computers
• Interconnects computers and I/O devices with standard Ethernet
• No modification to SR-IOV device, its driver
• Max efficiency of shared device: 99%
Proposed Technologies to Share SR-IOV Device
• “PCIe over Ethernet”, ExpEther
• VF assignment with address translation
• Virtual I/O endpoint
• Hardware packet processing
Page 20
© NEC Corporation 2010
Hot Interconnects 18, 2010
Download