Survey of State-of-the-art in Inter-VM Communication Mechanisms Jian Wang Introduction Shared memory research Scheduler optimization research Challenges and problems 2 Virtual Virtual Machine Machine A B Hypervisor (or Virtual Machine Monitor) Virtualization technology is mainly focused on building the isolation barrier between co-located VMs. However, applications often wish to talk across this isolation barrier E.g. High performance grid apps, web services, virtual network appliances, transaction processing, graphics rendering. Physical Machine 3 Transparent to applications BUT High communication overhead between co-located VMs Native Loopback Xen Inter-VM Flood Ping RTT (Microsecs) TCP Bandwidth (Mbps) UDP Bandwidth (Mbps) 6 140 4666 2656 4928 707 4 Domain 0 PKT Communication data path between co-located VMs 5 Packet routed VM 1 Domain-0 VM 2 Xen Put packet Ask Xen to Ask Xen to into a page transmit swap/copy pages pages 6 Advantages of using Shared Memory: No need for per-packet processing Pages reused in circular buffer Writes are visible immediately Fewer hypercalls (only for signaling) 7 VM 1 VM 2 Xen Allocate one pool Ask Xen to of pages share pages 8 1.Performance: High throughput, low latency and acceptable CPU consumption. 2.Transparency: Don't change the app. Don't change the kernel. 3. Dynamism: On-the-fly setup/teardown channels. Auto discovery. Migration support. 9 Dom1 DomX t1 Dom2 …... t2 Dom1 DomX t3 Dom1 …… t4 Dom2 10 Scheduler induced delays Jboss query1 reply1 DB Jboss DB query2 query1 reply2 Running on dedicated servers query2 reply1 reply2 Runnning on consolidated server Scheduler induced delays Network latency 11 Lack of communication awareness in VCPU scheduler Lacks knowledge of timing requirements of tasks/applications within each VM. Absence of support for real-time inter-VM interactions Unpredictability of current VM scheduling mechanisms 12 Low latency Independent of other domains’ workloads Predictable 13 Shared Memory Research 14 Xiaolan Zhang, Suzanne McIntosh Shared Memory between two domains One way communication pipe Below Socket layer Bypass TCP/IP stack No auto discovery, no migration support, no transparency 15 Server socket(); bind(sockaddr_inet); listen(); accept(); Client •Remote address socket(); • Remote port # connect(sockaddr_inet); • Local port # • Remote VM # socket(); bind(sockaddr_xen); •Remote VM # socket(); • Remote grant # connect(sockaddr_xen); System returns grant # for client 16 Kangho Kim Cheiyol Kim Bi-directional communication Transparent to applications Below Socket layer Significant kernel modifications, No migration support, TCP only 17 Domain A Event channel Domain B SQ SQ Head Tail Head Tail RQ RQ Head Tail Head Tail 18 Wei Huang, Matthew Koop IVC library providing efficient intra-physical node communication through shared memory Provides auto discovery and migration support User transparency or kernel transparency not fully supported, only MPI protocol supported 19 IVC consists of two parts: A user space communication library A kernel driver Uses a general socket style interface. 20 Prashanth Radhakrishnan, Kiran Srinivasan Map in the entire physical memory of the peer VM Zero copy between guest kernels On-the-fly setup/teardown channels not supported, In their model, VMs need to fully trust each other, which is not practical. 21 22 Jian Wang, Kartik Gopalan Enables direct traffic exchange between co-located VMs Transparency for Applications and Libraries Kernel Transparency Automatic discovery of co-located VMs On-the-fly setup/teardown XenLoop channels Migration transparency 23 XenLoop Architecture One-bit bidirectional channel Netfilter hook to capture Applications Applications other endpoint and examine outgoing to notify theLockless producerSocketpackets. Layer Socket Layer that data is available in FIFO consumer circular buffers Transport Layer Transport Layer Network Layer Software Bridge FIFO A B FIFO B A OUT IN XenLoop Layer Netfront Virtual Machine A Network Layer Event Channel N B Domain Discovery N Software Bridge B Domain 0 IN OUT Software Bridge XenLoop Layer Netfront 24 Virtual Machine B XenSosket XWay IVC MMNet Xenloop User Transparent X √ X √ √ Kernel Transparent √ X X √ √ Transparent Migration Support X X Not fully transparent X √ Standard protocol support X Only TCP Only MPI or √ app protocols √ Auto VM Discovery & Conn. Setup X X √ √ √ Complete memory isolation √ √ √ X √ Location in Software Stack Below Below User Library socket layer socket layer + syscalls Below IP layer Below IP layer Copying Overhead 2 copies 2 copies 4 copies at present 2 copies 2 copies 25 Scheduler Optimization Research 26 Preferentially scheduling communication oriented domains Introduce short term unfairness Performance VS Fairness Address inter-VM communication characteristics 27 Sriram Govindan, Arjun R Nath Prefer VM with most pending network packets Both to be sent and received Predict pending packets Receive prediction Send prediction Fairness Still preserve reservation guarantees over a coarser time scale – PERIOD 28 Packet Reception Domain 1 Domain 2 … Domain n Guest Domains domain1.pending-Hypervisor Packet arrive at the NIC Domain0.pending-Domain0.pending++ NIC Interrupt Domain0 Domain1.pending++ Schedule Domain 1. Now, schedule domain0. 29 29 Diego Ongaro, Alan L. Cox Boosting I/O domains Used when an idle domain is sent a virtual interrupt Run-queue ordering Within each state, sorts domains by credits remaining Tickling too soon Don’t tickle while sending virtual interrupts 30 Hwanju Kim, Hyeontaek Lim Use task info to determine whether a domain that gets a event notification is I/O-bound Give the domain a partial boost if it is I/O bound. Partial boosting Partial boosted VCPU can preempt a running VCPU and handle the pending event. Whenever it is inferred as non-I/O-bound, the VMM will revoke CPU from the partially boosted VCPU. Use correlation information to predict whether an event is directed for I/O tasks Block I/O Network I/O 31 Jian Wang, Kartik Gopalan Dom1 DomX t1 Dom2 …... t2 DomX t3 Dom1 …… t4 Dom2 cannot get time slice as early as possible 32 One time slice(30ms) Dom1 Dom2 One way AICT Dom1 Dom2 Dom1 Two way AICT Basic Idea Donate unused time slices to the target domain Proper Accounting When source domain donates time slice to target guest, charge credits on source domain in stead of target domain. 33 Real-time Guarantee Coordinate with guest scheduler Compositional VM systems Web Server Dom1 Application Server Dom2 Database Sever Dom3 34 For co-located inter-VM communication Shared memory greatly improves performance Optimizing scheduler has much benefits 35 Thank You. Questions? 36 Backup slides 37 XenLoop Performance Netperf UDP_STREAM 38 XenLoop Performance (contd.) 39 XenLoop Performance (contd.) 40 XenLoop Performance (contd.) Migration Transparency Colocated VMs Separated VMs Separated again 41 Future Work Compatibility with routed-mode Xen setup Implemented. Under testing. Packet interception b/w socket and transport layers Do this without changing the kernel. Will reduce 4 copies to 2 (as others), significantly improving bandwidth performance. XenLoop for Windows guest? Windows Linux XenLoop Channel XenLoop architecture mostly OS agnostic. 42