Network Slicing in Virtual Machines for Cloud Computing CS 8803 AIAD Project Report Bilal Anwer Ankur Nayak Pradeep Patil Network Slicing in Virtual Machines for Cloud Computing Motivation: The Cloud computing is the technology of the future. There are many players in cloud computing arena; IBM, Microsoft, Google and Amazon being a few of them. ACM Communications Review defines Cloud as follows: "Clouds are a large pool of easily usable and accessible virtualized resources (such as hardware, development platforms and/or services). These resources can be dynamically reconfigured to adjust to a variable load (scale), allowing also for an optimum resource utilization. This pool of resources is typically exploited by a pay‐per‐use model in which guarantees are offered by the Infrastructure Provider by means of customized SLAs". Although in current virtualized resources model there are resource guarantees provided in terms of CPU load but there are no such guarantee for network slicing. One big reason for this non availability of guarantees is the underlying virtualization infrastructure. Even though Amazon EC2[6] provides machine level granularity, it does not provide much flexibility to slice network traffic and allocate it to different virtual machines. There is also no provision for granularity of storage infrastructure. Though there has been a lot of work in CPU slicing and CPU scheduling, yet surprisingly very little work has been done to perform slicing with respect to network resources. Today, it is quite possible for one guest operating system to hog the entire network bandwidth which could have easily been shared between 2 or more guest operating systems. There is also a possibility of cloud computing being compromised by Denial of Service attacks. Nowadays though, with emergence of state‐of‐the‐art VM technologies, the issue of network slicing can be addressed to ensure fair share of the network bandwidth. There is a distinct lack of knowledge about network behavior in Cloud computing. Putting more bandwidth is not the solution and we need a ground up study of the current scenario. Network is as important resource as CPU but our research has shown that there is a need for network resource management in Cloud computing. In this project, we analyzed network behavior by implementing support for network slicing in Xen Virtual Machine Infrastructure. We performed various experiments to prove our claim that network slicing is indeed required for current Cloud computing infrastructure. We actually found Xen does not provide any guarantees with respect to network bandwidth. To this end we propose a novel architecture to provide users with capability of auditing network bandwidth in order to ensure fairness. Background and related work: The Cloud infrastructure provides users with resources for whatever services they want to run. In order to provide Cloud users the computing infrastructure, users are given a set of virtual machines which can be used to install user specified software on the machine to run different services. So, instead of having different physical machines allocated to customers Cloud Computing Provides users the ability to install their own virtual machines with user's choice of OS. Also virtual machines allow the cloud computing infrastructure provider to have a different set of machines which can be relocated in case of failures or other problems. As cloud computing is increasingly finding its use in popular culture, our project would assist in a major way to the existing technology since network bandwidth is a critical resource. We have chosen Xen as our base Virtual Machine technology over which we plan to implement our scheduling algorithms. Xen is the most popular cloud computing infrastructure and hence serves as a realistic environment. Moreover, Xen is open source and has a really active and helpful community. What is Xen? Xen is a virtual machine technology which is based on the concept of Para‐virtualization. Para‐ virtualization is the technique of changing the kernel of the operating system in order to *port* it to an idealized virtual machine abstraction. This abstraction leads to significant improvement in performance and efficiency. The basic Xen architecture consists of a hypervisor or virtual machine monitor which sits between the guest operating systems and the actual hardware. All hardware accesses are controlled by the hypervisor. The Xen domains which host the guest operating systems lies above the hypervisor Special domain, domain 0, controls the creation and management of other domains. Xen host kernel code runs in Ring 0 priority level, whereas the hosted domains run in Ring 1 or Ring 3 priority level. Xen also supports virtual machine migration by transferring VM memory pages from one machine to another to allow seamless VM migration from one physical device to the other. There has been similar work on traffic shaping in the network domain. Traffic shaping is a mechanism which allows administrators to achieve greater control of the network bandwidth. It is almost always done by delaying packets to give priority to other packets. This mechanism is used to control outgoing traffic. The above mechanism works at IP layer using IP tables but a kernel level solution is sought since it provides much faster packet forwarding. Xen Hypervisor has a lot of networking related facilities [3]. The user's can create bridges and routes inside Xen, but there is no mechanism in Xen to control the flow of packets in different virtual machines. To the best of our knowledge, there has not been any prior work to allow network slicing in core Xen architecture. Although main Xen code doesn't support any traffic shaping, a few developers have been able to implement traffic shaping in Xen using a guest VM. This method clearly has its drawbacks. Having a guest VM handle traffic shaping introduces extra overheads and latencies. Moreover, there might be security issues in having a guest VM control a critical resource like network bandwidth. Among other related work, we also explored VINI [4] and PlanetLab [5]. These tools allow users to get a slice of the network infrastructure by allocating various nodes for performing networking experiments. But, these represent "Internet Slicing" and not network slicing. We don't allocate nodes, but we allocate bandwidth. Though one achieves a similar effect by having nodes which provide network guarantees, but we attain a finer granularity by doing it at the network layer and having a separate scheduler. We can even view regular TCP (Reno) to be having an adaptive scheduling mechanism. TCP supports linear increase and multiplicative decrease to ensure that a shared link is utilized equally by all TCP connections running over it. TCP achieves fairness over a long period of time but is not amenable to other transport layer protocols. Also, the balance it achieves is over a period of time which is not deterministic and there is no control over the scheduling mechanism. As we see, none of these solutions, provide an answer to the question of network bandwidth control in cloud computing. Our work intends to provide the solution for control on network resources in cloud computing infrastructure. Experimental Setup: We performed three different types of experiments to gauge Network slicing capabilities of Xen. All these experiments were performed on the Linux platform using three different laptop machines. The configuration of these machines has been provided in the appendix. The tools used for these experiments included httpperf which was used for measuring application level performance of VMs. We also made use of Linux Kernel Packet generator for sending traffic to VMs at line rate. We measured kernel level network statistics using data from /proc file system. Specifically we looked /proc/dev/net data to find out initial and final packet counts. These were used to calculate the number of packets sent during a single experimental run. We used the following VM configuration for each VM: • • • Memory = 128MB/VM HD = 2GB Debian ‘Etch’ based domUs The three individual experiments are as follows: 1. Network throughput variation in multiple VMs 2. Web server performance with varying number of VMs 3. Impact of CPU usage on Network throughput • Network throughput variation in multiple VMs: Fig 2 300000 Packet per sec Packet per sec2 Sum 200000 100000 1518 1200 1000 700 500 200 100 64 0 Fig 3 Fig 2 shows the experimental setup for Network throughput variation in multiple VMs. We connected two machines to XenBox running two DomUs using an Ethernet switch. We sent traffic to the two DomUs from two source machines using PacketGen as shown in the figure. Fig 3 describes experimental results we obtained. We observed that in the presence of one network Active VM full bandwidth is allocated to that VM. In case of two network active VMs the behavior depends on the stability of the DoMUs as is evident from the graph. In case of stable DomU bandwidth is equally distributed among all the VMs resulting in a fair share for each VM. On the other hand if the DomUs are unstable then one VM hogs the entire bandwidth resulting in the other VM being starved for network resources. • Web server performance with varying number of VMs: Fig 4 Fig 5 Fig 4 shows the setup for measuring Web server performance with varying number of VMs. The setup consists of a source machine and a XenBox. We ran multiple Http clients on the source machine and sent traffic to individual DomUs running on the XenBox. We measured our application performance in terms of total connection per second. We collected this data by varying the number of VMs from 1 to 7. Fig 5 shows the results we obtained. We found that as we increased the number of VMs and the total number of connections dropped in an exponential manner. As we can see form the graph the maximum number starts with approximate 5000 connections and then gradually drops to a stable value of 2000 connections for 5 or more VMs. • Impact of CPU usage on Network throughput Fig 6 Fig 7 Fig 6 shows the setup for observing impact of CPU usage on Network throughput while Fig 7 shows its observed graph. In Fig 6 we have a source machine which sends traffic to one VM while the other VMs are running CPU intensive jobs. As we increase the number of VMs, the total number of packets received by the network intensive VM decreases. We also find a general pattern between packet size and network throughput. We see that maximum value of throughput is attained when the packet size is 500 bytes. Conclusions: Finally we present our conclusions based on the experiments we performed. We can conclude that • Combining CPU intensive VMs with network intensive VMs is a bad idea. In such cases network intensive VMs tend to suffer more than their counterparts because they do not get a fair share of CPU cycles for performing computations required for network related tasks. • • We also observed that Web Server performance drops rapidly with increasing number of VMs. So the service providers need to report performance numbers by also specifying the number of VMs used for generating the numbers. Finally we also conclude that network slicing is indeed required in Cloud computing services as Xen does not provide any network bandwidth guarantees as observed in the first experiment. References: 1‐ Lecture by Prof. Ling Liu, Spring 2009 class of CS8803 Advanced Internet Application Development. 2‐ Xen and the art of virtualization by Paul Barham, Boris Dragovic, Keir Fraser, Steven H, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield 3‐ http://wiki.xensource.com/xenwiki/XenNetworking 4‐ Andy Bavier , Nick Feamster , Mark Huang , Larry Peterson , Jennifer Rexford, In VINI veritas: realistic and controlled network experimentation, ACM SIGCOMM Computer Communication Review, v.36 n.4, October 2006 5‐ Brent Chun , David Culler , Timothy Roscoe , Andy Bavier , Larry Peterson , Mike Wawrzoniak , Mic Bowman, PlanetLab: an overlay testbed for broad‐coverage services, ACM SIGCOMM Computer Communication Review, v.33 n.3, July 2003 6‐ http://aws.amazon.com/ec2/