CS 8803 AIAD Project proposal Network Slicing in Virtual Machines for Cloud Computing Ankur Nayak Bilal Anwer Pradeep Patil Network Slicing in Virtual Machines for Cloud Computing 1) Motivation: The Cloud computing is the recent phenomenon. There are many players in cloud computing arena; IBM, Microsoft, Google and Amazon being a few of them. ACM Communications Review defines Cloud as follows: "Clouds are a large pool of easily usable and accessible virtualized resources (such as hardware, development platforms and/or services). These resources can be dynamically reconfigured to adjust to a variable load (scale), allowing also for an optimum resource utilization. This pool of resources is typically exploited by a pay‐per‐use model in which guarantees are offered by the Infrastructure Provider by means of customized SLAs". Although in current virtualized resources model there are resource guarantees provided in terms of CPU load but there are no such guarantee for network slicing. One big reason for this non availability of guarantees is that the underlying virtualization infrastructure. In case of Amazon its EC2[6], but it does not provide much flexibility to slice network traffic and allocate it to different virtual machines. Also there is no granularity for storage infrastructure. Though there has been a lot of work in CPU slicing and CPU scheduling, yet surprisingly very little work has been done to do slicing with respect to network resources. Today, it is quite possible for one guest operating system to hog the entire network bandwidth which could have easily been shared between 2 or more guest operating systems. There is also a possibility of cloud computing being compromised by Denial of Service attacks. Nowadays though, with emergence of state‐of‐the‐art VM technologies, the issue of network slicing can be addressed to ensure fair share of the network bandwidth. In this project, we propose to address this important issue by implementing support for network slicing in Xen Virtual Machine Infrastructure. We want to add support for network traffic control in Xen and then provide different scheduling algorithms and their performance comparison based on the cloud computing resource categorization. We'll try to come up with different ways of scheduling network bandwidth among guest VMs. We have defined network slicing in 2 different modes, static and dynamic. We'll also evaluate different adaptive schemes for bandwidth allocation dictated by on‐demand requests from network applications. We envision our project to give the administrators ease and flexibility in defining and configuring network properties and usage for different cloud environments. 2) Background and related work: The Cloud infrastructure provides users with resources for whatever services they want to run. In order to provide Cloud users the computing infrastructure, users are given a set of virtual machines which can be used to install user specified software on the machine to run different services. So, instead of having different physical machines allocated to customers Cloud Computing Provides users the ability to install their own virtual machines with user's choice of OS. Also virtual machines allow the cloud computing infrastructure provider to have a different set of machines which can be relocated in case of failures or other problems. As cloud computing is increasingly finding its use in popular culture, our project would assist in a major way to the existing technology since network bandwidth is a critical resource. We have chosen Xen as our base Virtual Machine technology over which we plan to implement our scheduling algorithms. Xen is the most popular cloud computing infrastructure and hence serves as a realistic environment. Moreover, Xen is open source and has a really active and helpful community. 2.1) What is Xen? Xen is a virtual machine technology which is based on the concept of Para‐virtualization. Para‐ virtualization is the technique of changing the kernel of the operating system in order to *port* it to an idealized virtual machine abstraction. This abstraction leads to significant improvement in performance and efficiency. The basic Xen architecture consists of a hypervisor or virtual machine monitor which sits between the guest operating systems and the actual hardware. All hardware accesses are controlled by the hypervisor. The Xen domains which host the guest operating systems lies above the hypervisor Special domain, domain 0, controls the creation and management of other domains. Xen host kernel code runs in Ring 0 priority level, whereas the hosted domains run in Ring 1 or Ring 3 priority level. Xen also supports virtual machine migration by transferring VM memory pages from one machine to another to allow seamless VM migration from one physical device to the other. There has been similar work on traffic shaping in the network domain. Traffic shaping is a mechanism which allows administrators to achieve greater control of the network bandwidth. It is almost always done by delaying packets to give priority to other packets. This mechanism is used to control outgoing traffic. Our work differs in that we introduce a general approach to handling network slicing. We emphasize on scheduling strategies apart from delaying packets to enable stronger control over the network. Xen Hypervisor has a lot of networking related facilities [3]. The user's can create bridges and routes inside Xen, but there is no mechanism in Xen to control the flow of packets in different virtual machines. To the best of our knowledge, there has not been any prior work to allow network slicing in core Xen architecture. Although main Xen code doesn't support any traffic shaping, a few developers have been able to implement traffic shaping in Xen using a guest VM. This method clearly has its drawbacks. Having a guest VM handle traffic shaping introduces extra overheads and latencies. Moreover, there might be security issues in having a guest VM control a critical resource like network bandwidth. Among other related work, we also explored VINI [4] and PlanetLab [5]. These tools allow users to get a slice of the network infrastructure by allocating various nodes for performing networking experiments. But, these represent "Internet Slicing" and not network slicing. We don't allocate nodes, but we allocate bandwidth. Though one achieves a similar effect by having nodes which provide network guarantees, but we attain a finer granularity by doing it at the network layer and having a separate scheduler. We can even view regular TCP (Reno) to be having an adaptive scheduling mechanism. TCP supports linear increase and multiplicative decrease to ensure that a shared link is utilized equally by all TCP connections running over it. TCP achieves fairness over a long period of time but is not amenable to other transport layer protocols. Also, the balance it achieves is over a period of time which is not deterministic and there is no control over the scheduling mechanism. As we see, none of these solutions, provide an answer to the question of network bandwidth control in cloud computing. Our work intends to provide the solution for control on network resources in cloud computing infrastructure. 3) Proposed Work: Before delving into what we want to do in network slicing. We shall categorize Virtual Machine Infrastructure based on its network traffic usage, which is as follows: 3.1) VM categorization based on Network Traffic: We intend to categorize servers based on their traffic loads. The server categorization is done to derive scheduling algorithms. There are a lot of cloud computing applications but we intend to focus on web based applications mostly. Thus we categorize them as follows: ‐ ‐ Control Oriented VM’s Data Oriented VM’s 3.1.1) Control Oriented VMs: It allows the user to do any configuration related settings for the particular machine. We think that even a CPU intensive application will need to have some network resources but those network resources will only be required once we need to change some configuration or do some command and control business with the virtual machine. Other than that there will be very less data traffic which basically will be the input data given to the virtual machine. 3.1.2) Data Oriented VMs: Data traffic usually most of the network resources. Virtual Machines which use high amount of Data bandwidth will obviously need more network resources then the machines with control oriented data. Although all the VMs have data and control traffic but we are doing this differentiation in order to provide a view from network resource perspective. Both Control and data Oriented VMs can be high/low in CPU usage. But we are only concerned with data usage of nodes and we want to do scheduling based on traffic loads to help us in adaptive scheduling algorithm. 3.2) Dynamic Scheduling: This division of virtual machines is done to allow the scheduler to dynamically allocate bandwidths with changing amount of network traffic. This categorization is based on network traffic usage and we don't take into account VM's CPU usage. This scheduling takes into account the change in the bandwidth usage for different VMs based on the network traffic statistics and is more about the real time scheduling on network traffic based on the changes in the traffic load. Consider the scenario where all the VMs residing on a physical host are running network intensive applications. To simplify the case we assume all VMs running Webserver applications. During normal traffic hours all the web servers will get equal share of network resources. But if some particular site running on the web server has a sudden and continuous surge in bandwidth then we need to adaptive scheduling to allocate more bandwidth for the server which has a sudden increase in its bandwidth. Here we need to make sure that any malicious user which basically owns a set of virtual machines on different physical machines does not deny other virtual machine owners their fair set of resources. Therefore even in dynamic scheduling we want to make sure that a network hungry traffic does not throttle other VM hosts sitting on the same physical machine beyond a certain limit. Attacks in Dynamic Scheduling Algorithm: Consider a Set of Physical hosts G where G= [A1, A2, A3, A4 ..., An]. Let us assume that each physical host has a set of Virtual Machines running on it. Let there be 'm' virtual machines running on one physical host. e.g.; A1=[V1,V2,V3,V4,....,Vm]. Now if a user has 'n' virtual machines and all the virtual machine's reside on different physical host, Then if a user launches a DoS attack on its virtual machines but it will ultimately result in DoS attacks on all other (m‐1)*(n) virtual machines as well. In order to launch such kind of 'attack' the user does not even need to be 'evil'. Even if the user's behavior is quite rational and a synchronization happens every night which involves sending synchronization traffic from one VM host to another mirror VM, while mirror serving the web request, then the user needs to have its normal Daily traffic t + t' where t' is the traffic load because of synchronization. If this traffic load persists everyday for one hour then it is costing all the nodes sitting inside the cloud a loss in network performance for one hour. Although these kinds of issues can be resolved by localizing customers' VMs to physical hosts but localization is not the solution as it will make the customer services more prone to hardware failures. 3.3) Static Scheduling: In this case the Virtual Machine's categorization is provided by the admin to allocate network resources. They are of three types: ‐ ‐ ‐ High Network Usage. examples of these kinds of servers can be Crawlers, Where there is a lot of Data traffic Low Network Usage. examples can include virtual machines which have not much data traffic but still they require control usage traffic Normal Network Usage: The virtual machines which have 'normal' amounts of data traffic. By Normal we mean that the network traffic usage of virtual machine is equivalent to fair share of that particular virtual machine. 3.4) User control: Administrator of the cloud will have the control of VM network resources. In case of anomalies in the user or admin specified behavior, the admin needs to be informed about the changes in network traffic for all the VMs. For example in case of above mentioned dynamic scheduling attack scenario the admin can be informed about the continuous throttling by the 'malicious' user so that proper action can be taken. 4) Plan of action: ‐ Installing Xen and becoming familiar with Xen and networking infrastructure provided by Xen. ‐ Implementing network slicing in Xen. ‐ Defining static priorities in context for network slicing for servers in the cloud. ‐ Implementing Static Scheduling inside Xen. ‐ Defining dynamic scheduling priorities in context for network slicing for servers in the cloud. ‐ Implementing Dynamic Scheduling inside Xen. ‐ Implementing user control by allowing administrators to have scheduling priority based on the user specified scheduling algorithm. ‐ Coming up with scenarios where static or dynamic scheduling can be juggled and proposing usage of each strategy in different scenarios in cloud computing. ‐ Testing and evaluation of different scheduling algorithms and obtaining results from them. 4.1) Resource requirement: The Xen is a free and open source solution so there is no problem in getting the software. We intend to use GCC compiler for our development and use GDB for debugging. The development will require an incremental approach and we'll need at least 2 x86‐based machine to install Xen on them and do initial experimentation with Xen. 4.2) Schedule: Following is the schedule we intended to follow: Time Work February end Finish setting up Xen and understand the internal network workings of Xen Mid March Read more on Xen internals and prepare an implementation spec on network slicing and start implementation March end Finish 50‐70 percent of coding April First week Finish coding and start Testing Mid April Finish Evaluation and record observations 5) Evaluation and Testing: In order to test network slicing, we need to have a simulation environment hosting the Xen hypervisor. We'll have a set of guest operating systems which shall log all the incoming traffic using tcpdump. First, we'll need to prepare special set of traffic data to evaluate the performance of packet reception in domUs without network slicing. For this we need to use tcpdump on each of the domUs and send traffic to all the domUs with equal traffic share. Once we have logged the traffic then we'll use the prepared traffic to check the performance of network slicing module. The tcpdump should provide us with accurate time stamping of incoming packets. Assuming the delay in packets being received to in dom0 and being copied to domUs is the same for all the guest OS the prioritized domUs must be getting more traffic than non‐prioritized domUs in that given period of time. Once network slicing testing is done, then we'll move towards static scheduling testing by using different priorities levels mentioned earlier. After static scheduling testing is done, we need to have dynamic scheduling testing done that would require data from a set of web servers. We'll try to be simple and will use two or three Web Servers. The data needs to be in such a form that for an initial time all the servers are getting the same amount of hits but after some time one server say server X starts getting more data. In this case if our scheduling policy allows then the other servers should be able to relinquish bandwidth for the given server. If the policy is otherwise then there must be no change in the bandwidth of server X and it will start dropping packets. This can be observed by the drop in incoming traffic for server X. 6) Future Work: We intend to implement network slicing in Xen and experiment with as much scheduling algorithms and policies as possible. But during this course of action we might come across many new things. And most obvious one is implementation of these changes in actual cloud infrastructure and its implications on the results. Also there is a need to study the affect of network scheduling in presence of CPU scheduling algorithms ie: how network scheduling can affect the CPU scheduling done by Xen. Apart from CPU and network slicing there is room for storage slicing in cloud computing. We are not aware of any cloud computing infrastructure that provides storage slicing in cloud environments. 7) References: 1‐ Lecture by Prof. Ling Liu during Spring 2009 class of CS8803 Advanced Internet Application Development. 2‐ Xen and the art of virtualization by Paul Barham, Boris Dragovic, Keir Fraser, Steven H, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield 3‐ http://wiki.xensource.com/xenwiki/XenNetworking 4‐ Andy Bavier , Nick Feamster , Mark Huang , Larry Peterson , Jennifer Rexford, In VINI veritas: realistic and controlled network experimentation, ACM SIGCOMM Computer Communication Review, v.36 n.4, October 2006 5‐ Brent Chun , David Culler , Timothy Roscoe , Andy Bavier , Larry Peterson , Mike Wawrzoniak , Mic Bowman, PlanetLab: an overlay testbed for broad‐coverage services, ACM SIGCOMM Computer Communication Review, v.33 n.3, July 2003 6‐ http://aws.amazon.com/ec2/