Fido: Fast Inter-Virtual-Machine Communication for Enterprise Appliances †

advertisement
Fido: Fast Inter-Virtual-Machine
Communication for Enterprise
Appliances
Anton Burtsev†, Kiran Srinivasan,
Prashanth Radhakrishnan, Lakshmi N. Bairavasundaram,
Kaladhar Voruganti, Garth R. Goodson
†University of Utah,
School of Computing
NetApp, Inc
Enterprise appliances
Network attached storage, routers, etc.
• High performance
• Scalable and highly-available access
2
Example Appliance
• Monolithic kernel
• Kernel components
Problems:
• Fault isolation
• Performance isolation
• Resource provisioning
3
Split architecture
4
Benefits of virtualization
• High availability
• Fault-isolation
• Micro-reboots
• Partial functionality in case of failure
• Performance isolation
• Resource allocation
• Consolidation and load balancing, VM migration
• Non-disruptive updates
• Hardware upgrades via VM migration
• Software updates as micro-reboots
• Computation to data migration
5
Main Problem: Performance
Is it possible to match performance of a monolithic
environment?
• Large amount of data movement between components
•
•
•
•
•
Mostly cross-core
Connection oriented (established once)
Throughput optimized (asynchronous)
Coarse grained (no one-word messages)
Multi-stage data processing
• Main cost contributors
•
•
•
•
Transitions to hypervisor
Memory map/copy operations
Not VM context switches (multi-cores)
Not IPC marshaling
6
Main Insight: Relaxed Trust Model
• Appliance is built by a single organization
• Components:
• Pre-tested and qualified
• Collaborative and non-malicious
• Share memory read-only across VMs!
• Fast inter-VM communication
• Exchange only pointers to data
• No hypervisor calls (only cross-core notification)
• No memory map/copy operations
• Zero-copy across entire appliance
7
Contributions
• Fast inter-VM communication mechanism
• Abstraction of a single address space for traditional
systems
• Case study
• Realistic microkernelized network attached storage
8
Design
9
Design Goals
• Performance
• High-throughput
• Practicality
• Minimal guest system and hypervisor dependencies
• No intrusive guest kernel changes
• Generality
• Support for different communication mechanisms in the
guest system
10
Transitive Zero Copy
• Goal
• Zero-copy across entire appliance
• No changes to guest kernel
• Observation
• Multi-stage data processing
11
Pseudo Global Virtual Address Space
264
Insight:
• CPUs support 64-bit
address space
• Individual VMs have
no need in it
0
12
Pseudo Global Virtual Address Space
264
0
13
Pseudo Global Virtual Address Space
264
0
14
Transitive Zero Copy
15
Fido: High-level View
16
Fido: High-level View
• “c” – connection management
• “m” – memory mapping
• “s” – cross-VM signaling
17
IPC Organization
• Shared memory ring
• Pointers to data
18
IPC Organization
• Shared memory ring
• Pointers to data
• For complex data structures
• Scatter-gather array
19
IPC Organization
• Shared memory ring
• Pointers to data
• For complex data structures
• Scatter-gather array
• Translate pointers
20
IPC Organization
• Shared memory ring
• Pointers to data
• For complex data structures
• Scatter-gather array
• Translate pointers
•Signaling:
• Cross-core interrupts (event channels)
• Batching and in-ring polling
21
Fast device-level communication
• MMNet
• Link-level
• Standard network device interface
• Supports full transitive zero-copy
• MMBlk
•
•
•
•
Block-level
Standard block device interface
Zero-copy on write
Incurs one copy on read
22
Evaluation
23
MMNet Evaluation
Loop
NetFront
XenLoop
MMNet
• AMD Opteron with 2 2.1GHz 4-core CPUs (8 cores
total)
• 16GB RAM
• NVidia 1Gbps NICs
• 64-bit Xen (3.2), 64-bit Linux (2.6.18.8)
• Netperf benchmark (2.4.4)
24
MMNet: TCP Throughput
12000
Throughput (Mbps)
10000
8000
Monolithic
6000
Netfront
XenLoop
4000
MMNet
2000
0
0.5
1
2
4
8
16
32
64
128 256
Message Size (KB)
25
MMBlk Evaluation
Monolithic
XenBlk
MMNet
• Same hardware
• AMD Opteron with 2 2.1GHz 4-core CPUs (8 cores total)
• 16GB Ram
• NVidia 1Gbps NICs
• VMs are configured with 4GB and 1GB RAM
• 3 GB in-memory file system (TMPFS)
• IOZone benchmark
26
MMBlk Sequential Writes
600
Throughput (MB/s)
500
400
300
Monolithic
XenBlk
200
MMBlk
100
0
4
8
16
32
64 128 256 512 1K
2K
4K
Record Size (KB)
27
Case Study
28
Network-attached Storage
29
Network-attached Storage
• RAM
• VMs have 1GB each, except FS VM (4GB)
• Monolithic system has 7GB RAM
• Disks :
• RAID5 over 3 64MB/s disks
• Benchmark
• IOZone reads/writes 8GB file over NFS (async)
30
Sequential Writes
90
80
Throughput (MB/s)
70
60
50
Monolithic
40
Native-Xen
30
MM-Xen
20
10
0
4
8
16
32
64 128 256 512 1K
2K
4K
Record Size (KB)
31
Sequential Reads
80
Throughput (MB/s)
70
60
50
40
Monolithic
30
Native-Xen
MM-Xen
20
10
0
4
8
16
32
64 128 256 512 1K
2K
4K
Record Size (KB)
32
TPC-C (On-Line Transactional Processing)
350
Transactions/minute (tpmC)
300
250
200
Monolithic
MMXen
150
Native-Xen
100
50
0
33
Conclusions
• We match monolithic performance
• “Microkernelization” of traditional systems is possible!
• Fast inter-VM communication
• The search for VM communication mechanisms is not over
• Important aspects of design
• Trust model
• VM as a library (for example, FSVA)
• End-to-end zero copy
• Pseudo Global Virtual Address Space
• There are still problems to solve
• Full end-to-end zero copy
• Cross-VM memory management
• Full utilization of pipelined parallelism
34
Thank you.
aburtsev@flux.utah.edu
35
Backup Slides
36
Related Work
• Traditional microkernels [L4, Eros, CoyotOS]
• Synchronous (effectively thread migration)
• Optimized for single-CPU, fast context switch, small
messages (often in registers), efficient marshaling (IDL)
• Buffer management [Fbufs, IOLite, Beltway Buffers]
• Shared buffer is a unit of protection
• Fast-forward – fast cache-to-cache data transfer
• VMs [Xen split drivers, XWay, XenSocket, XenLoop]
• Page flipping, later buffer sharing
• IVC, VMCI
• Language-based protection [Singularity]
• Shared heap, zero-copy (only pointer transfer)
• Hardware acceleration [Solarflare]
• Multi-core OSes [Barrelfish, Corey, FOS]
37
Download