FIOS: A Flexible Virtualized I/O Subsystem to Alleviate Interference among Virtual Machines Qi Zhang, Hai Jin, Xiaofei Liao, Dingding Li, Wei Deng Cluster and Grid Computing Lab Services Computing Technology and System Lab Huazhong University of Science and Technology, Wuhan, 430074, China hjin@hust.edu.cn ABSTRACT Serving as the infrastructure of cloud computing, virtualization technologies have attracted considerable interest in recent years for their excellent resource utility, scalability, and high availability. Virtual M achine M onitor (VMM ), which is a key element in cloud computing, enables multiple guest operation systems running simultaneously to share the same physical resources. This may lead to significant interference of disk I/O performance among virtual machines (VM ). Particularly, the I/O performance of none I/O intensive domains can be seriously injured by the advent of I/O intensive nodes. We address this problem by building a block-level cache in the virtualized layer to absorb I/O requests from different domains. This method not only effectively alleviates the I/O performance interference caused by I/O intensive domains, but also greatly improves the I/O performance of guest OS. We implement and evaluate a Flexible I/O Subsystem (FIOS) within Xen VMM and show an evident reduction of I/O performance interference among virtual machines as well as a remarkable improvement of disk throughput. Categories and Subject Descriptors B.4.3 [Interconnections(Subsystems)]: Asynchronous/synchronous operation General Terms Performance, Design, Experimentation Keywords Cloud Computing, I/O, Xen, Virtualization Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICUIMC’12, February 20–22, 2012, Kuala Lumpur, Malaysia. Copyright 2012 ACM 978-1-4503-1172-4…$10.00. 1. INTRODUCTION Cloud computing, which makes possible for users to access large pools of computational and storage resources on demand, is gaining prominence. This trend is evidenced by the increasing development of cloud services and cloud platforms such as Gmail, Facebook, and Amazon EC2. Virtual M achine M onitor (VMM ), e.g. Xen, is playing a key role in cloud computing. VMM offers an abstract and unified layer on top of the underlying hardware resources. Also, it provides services that allow multiple computer operating systems to execute on the same computer hardware concurrently. Although virtualization technologies offers a lot of benefits including flexibility, security, ease to configuration and management, reduction of cost [1], there is an obvious problem—performance interference. Applications running in virtual machines always expected to own the physical resources exclusively so that they can achieve best performance. However, when multiple virtual machines running simultaneously on the same computer, the contention of underlying physical resources directly leads to performance degradation of the applications. Various researches [4, 6, 26] have focused on solving this problem, but most of them concentrate on the allocation of CPU and network bandwidth, little attentions have been paid to alleviate disk I/O performance interference among virtual machines, which is a critical factor in determining the overall performance of I/O applications running in virtual machines. Specifically, once an I/O intensive application starts to run, the I/O performance of co-located applications residing in other virtual machines will be seriously injured. For example, in order to provide a better user experience, interactive application M icrosoft Word require a short average response time. At the same time, when extracting Linux source code from a compressed file, intensive I/O is necessary to minimize the executing time. When these two applications are running simultaneously on different virtual machines and contending the limited disk I/O bandwidth, the I/O performance of M icrosoft Word decreases significantly. Apart from limited disk I/O bandwidth, other factors can also lead to this performance degradation. For instance, in virtual machine environments, I/O operation has to entrap into a VMM and/or a privileged VM , which may turn out to be a performance bottleneck for virtualized I/O systems [2]. Previous research has revealed that in para-virtualization, VMM protects the well behaved virtual machines from all misbehavior domains except the disk I/O intensive one [3]. This paper is the first one that provides a method which not only largely alleviates the I/O performance interference caused by I/O intensive domains, but also improves the I/O performance of virtual machines. We allocated a block-level cache in VMM to absorb I/O requests from guest VMs. And a time sharing cache management strategy is also introduced to improve the efficiency of FIOS. Result of the experiments showed an evident reduction of I/O performance interference among virtual machines as well as a remarkable improvement of disk throughput. The rest of this paper is organized as follows. We describe the related work in the next section. Section 3 discusses the design of FIOS. Section 4 introduces the Xen and Blktap briefly and represents our implementation. Section 5 uses queuing theory to analyze the performance of FIOS. Section 6 describes the experimental methodology and discusses the result. Finally, section 7 concludes the paper and talks about the future work. 2. RELATED WORK Based on Xen , Gupta et al has designed and implemented a set of primitives to enforce the performance isolation across virtual machines [6]. Firstly, they implement a XenMon tool that can accurately measure the per-VM resource consumption. Secondly, SEDF-DC scheduler was introduced to calculate the total VM resource consumption in allocating CPU. Finally, they use Share Guard to restrict the resource usage in VM M on behavior of each virtual machine. Compared with our work, most of Gupata’s research focus on the isolation of Network I/O and CPU performance, their methods do not concern with the performance of disk I/O. Similar to our solution, Yiming Hu et al have presented a novel disk storage architecture called DCD for the purpose of optimizing disk I/O performance [18]. They use a small log disk, called cache-disk, as a secondary disk cache to optimize write performance. The physical properties of cache-disk is the same as the normal disk, but the different data units and different ways in which data are accessed of cache-disk enable it to have a higher data accessing speed. Whether this architecture works well in virtualized environment requires further investigation. M any studies [4, 24, 25] focus on adjusting the schedulers in VMM to provide each virtual machine with fair performance. Some researches revealed that traditional VMM schedulers have focused on fairly sharing the processor resources among domains while leaving the scheduling of I/O resources as a secondary concern. Although certain extensions, such as boost optimization, sorting the run queue base on remaining credits and tricking the scheduler [4], have applied to VMM scheduling to improve the I/O performance of virtual machines, the effect of these approaches depends on the type of applications running in virtual machines. Another study[5] examined different combination of schedulers in both virtual machine and VM M, and found that different kinds of combination lead to various I/O performance, when NOOP schedule algorithm is used in VMM , result can be the best. However, it does not provide any method to alleviate the interference of I/O performance among virtual machines, especially caused by I/O intensive domain. Some other studies concentrating on enhancing I/O performance isolation in virtualized environment [6], but most of them pay attention to the isolation of network I/O and CPU allocation. There have also been other recent works concentrating on identifying the characterization and consumption of I/O applications [8, 19, 23] and improving the virtualized I/O performance [2, 16, 22]. But few of them concern about alleviating disk I/O performance interference among virtual machines. 3. DESIGN In this section, we firstly discuss the goals of FIOS. Then, detail solutions are described to meet our goals. The solutions are divided into four parts: the first three parts are introduced according to the processing of I/O requests in FIOS. In the fourth parts, a time-sharing strategy for the usage of physical disk bandwidth is introduced to avoid excessive memory consumption. 3.1 Goals Our primary goal is to minimize the degradation of I/O performance brought by I/O intensive virtual machines. Transparency and portability are also our goals, various kinds of operating systems and applications can be able to adapt to FIOS without any change of their source codes. Besides, the architecture of FIOS should not be restricted to any specific virtualized environments. 3.2 Solutions 3.2.1 Intercepting requests Virtual machines are not permitted to manipulate the physical disk directly. All the I/O operations emitted by virtual machines should be handled by VMM [7]. The p rocessing of I/O requests in virtualized environment is shown in figure 1. I/O requests from guest OS are firstly put into disk queue of virtual machine and disposed by virtual disk driver. Then, these requests are taken out by the physical disk driver locating in VM M, which packs these logical I/O requests into physical ones. After that, physical disk driver will send command to disk controller and start data transfer. When data transfer has been completed, VMM will inform the corresponding virtual machine and the applications which are waiting for the result of I/O operations can continue working. Virtual Machine N Virtual Machine 1 APP 1 logical I/O request 2 APP APP Guest OS Guest OS APP 1 logical I/O request Virtual Disk Driver Virtual Disk Driver 2 3 3 Virtual Machine Monitor (VMM) physical I/O request 5 physical I/O request Physical Disk Driver 4 Disk Figure 1. I/O path in virtualized environme nt. The bandwidth of physical disk is limited, thus once an I/O intensive virtual machine exists which is likely to consume the bandwidth in great qualities, the I/O performance of other virtual machines will be affected severely. Considering that the speed of memory accessing is faster than that of disk, we allocate a V-cache in VMM for each guest domain and intercept the I/O requests from virtual machines. As shown in figure 2, instead of being sent directly to physical disk driver, logical I/O requests will be intercepted by VMM in FIOS. VMM will inform the I/O application in virtual machine to continue working right after these requests have arrived in V-cache. A thread called ―FlushCtrl‖ in VMM will dispose the requests in V-cache and complete data transfer some time later. In this way, requests from I/O intensive virtual machines can be absorbed by V-cache and the crazy consuming of disk bandwidth can be reduced. Therefore, the I/O performance of other virtual machines will not be injured by the advent of I/O intensive domain. Besides, I/O applications in virtual machine do not have to wait for the completion of I/O requests, so the I/O performance of virtual machines will also be improved. Virtual Machine App 1 begin; r = receive a write request; if ( exist a request R in V-cache & R.addr == r.addr) { R.data = r. data; mark R as the one used most recently; }else{ Insert r into V-cache; mark r as the one used most recently; } end; Guest OS logical I/O request Figure 3. Processing writing request in FIO S. 2 Virtual Disk Driver 4 3 V-cache for I/O requests Physical I/O request 5 FlushCtrl VMM Physical 5 Disk Driver 6 Disk Figure 2. I/O path in FIO S. Through the real-time interception of I/O requests in VMM, we can detect the I/O intensive virtual machine quickly and accurately. The I/O intensive virtual machine has very frequent I/O operations, so its corresponding virtual disk queue is usually filled with larger numbers of requests. We use two metrics to judge whether the virtual machine is an I/O intensive one: (1) the history I/O records, i.e., number of total byte transferred by this domain and (2) the virtual I/O rate, i.e., the number of bytes transferred per CPU second [8], both of which can be collected in VM M by scanning the virtual disk queue. 3.2.2 Arranging requests Arranging I/O requests in corresponding V-cache bears many advantages. First of all, direct connection between I/O process in virtual machine and the physical disk can be cut off. From the prospect of virtual machine, its I/O operations can be considered as finished when the I/O requests reaches V-cache, therefore, even if multiple virtual machines executing at the same time, I/O requests torrents can be absorbed by V-cache, thus the bandwidth of physical disk will not become their primary competition resources. Secondly, storing requests provides the opportunity for batch requests processing, which can largely increase the overall I/O throughput. Because of semantic gap lying between VM and VMM , many I/O optimization methods no longer work well in virtual machine environments. For example, metadata operations in VM will be translated into small file read/write in VMM which directly decrease the I/O performance, however, this problem can be mitigated if these requests can be stored and processed in batch. Besides these advantages, each virtual machine in FIOS will be initialized with a V-cache to store its own I/O requests. On one hand, the isolation and security of I/O data coming from various virtual machines can be better ensured, and on the other hand, VMM can handle the requests separately according to the different priorities of virtual machines. 3.2.3 Disposing requests Although I/O requests from a certain domain are stored in a V-cache, different kinds of requests will be treated differently according to their own charactistics. As shown in figure 3, when VM M receives a writing request, FIOS tries to find out in V-cache whether there is a request whose indicative physical disk address is the same as the receiving one. If there is, we only replace the I/O data of the finding request with that of the received one; if not, the received writing request should be stored into V-cache and marked as the latest. Once the writing request has been inserted into V-cache, VMM will notify the corresponding virtual machine immediately, so that the I/O applications in virtual machine can continue running. As shown in figure 4, when receives a reading request, FIOS tries to find out whether there is a request whose indicative physical disk address is the same as the received one. If there is, return the corresponding I/O data to the I/O applications in virtual machine immediately and mark the data as the latest; if not, the virtual machine has to get the data from disk, the data will be, on one hand, send directly to the I/O applications in virtual machine, and on the other hand, stored in the V-cache so that next time when the data is needed, it can be get quickly from V-cache. begin; r = receive a read request; if ( exist a request R in V-cache & R.addr == r.addr) { return r.data to the process in VM; mark R as the one used most recently; }else{ get data from disk; return the data to process in VM; store the data in V-cache; mark r as the one used most recently; } end; Figure 4. Processing reading request in FIOS. Besides dealing with new requests, refreshing V-cache is also a key aspect of FIOS. Due to the resources of memory is rare, the strategy of refreshing the cache is of significant importance to the performance of the FIOS. We describe some key factors associated with refreshing as follows. Firstly, we need two thresholds; one is the upper limit, the other is the lower limit. They are used to indicate when to begin the refresh operation and when to stop: once the utilization of the cache reaches its upper limit, the ―FlushCtrl‖ thread in VMM starts to flush the data in V-cache until the utilization of the cache reaches its lower limit. How to determine the value of these two thresholds will be discussed in the latter part of this paper. Secondly, substitute algorithm is indispensible. We choose LRU (Least Recently Used) [9] as the replacement algorithm of V-cache. Since this method can make full use of the location feature of the program, which enhance the hitting rate of V-cache, so that the overall I/O performance of the whole system can be improved. Thirdly, there are particular occasions that refreshing should started immediately. 1) If a virtual machine shutdown normally, I/O requests in the corresponding V-cache should be disposed in time and the V-cache should be released in order to avoid unnecessary data corruption which may lead to the damage of file system. 2) If a virtual machine crashes abruptly, I/O requests in the V-cache must be processed as soon as possible to guarantee the integrity of the file system and avoid the damage of the disk image 3) Before the virtual machine begins to migrate to another VMM , remaining requests in V-cache should be flushed immediately for the sake of ensuring the integrity and consistency of the data in physical disk. Last but not the least, we created several ―FlushCtrl‖ thread in FIOS which is responsible for flushing the data in V-cache. The amount of this thread varies between 0 and 5 according to the utility of V-cache. ―FlushCtrl‖ copies all the data from corresponding V-cache into its own data space, so that the V-cache can be cleaned up and ready for further use. However, we should not ignore a serious problem caused by this method: how to guarantee data synchronism, that is to say, what if when the I/O thread in virtual machine needs to read some data which is in neither V-cache nor physical disk? At this time, VMM has to get the data from data space of corresponding ―FlushCtrl‖ thread. Due to the location characteristic of program and the LRU algorithm we have taken, the probability of the situation described above is rather small, so that the overall I/O performance of the FIOS will not be affected seriously. 3.2.4 Avoiding excessive memory consumption Compared with physical disk, memory is so rare that we cannot expand the capacity of V-cache without any limit. Otherwise, the system will collapse because of too much memory consumption. Therefore, we have designed a time-sharing strategy for the usage of physical disk bandwidth. We divided I/O intensive VM s into two categories: short-term and long-term. The former one refers to domains that produce large amount of I/O requests in a short time which V-cache is able to accommodate, while the latter category refers to VMs that constantly produce requests for a long time which V-cache cannot sustain. illustrated in figure 5), and attach each part to other none I/O intensive virtual machines in order to increase their capability of accommodating I/O requests. And then allocate all the bandwidth of physical disk to this domain. This may have several advantages, on one hand, allocating all the I/O bandwidth of physical disk to the short-term I/O intensive VM guarantees that VM1 (long-term I/O intensive) V-cache1 VM2 VM3 VM4 V-cache2 V-cache3 V-cache4 Disk Driver Virtual Machine Monitor (VMM) Disk Figure 6. Dealing with long-term I/O intensive domain . the I/O requests torrent from these domains can be handled in a short period of time. On the other hand, V-cache of other none I/O intensive VM is unlikely to reach their upper limit during this short period of time because of their increased capacity, that is to say, these VM s do not need any physical disk bandwidth. Therefore, the short-term I/O intensive virtual machine will not affect the I/O performance of other none I/O intensive VM s. As a short-term I/O intensive domain continues to run, it will become a long-term one. In this situation, described by figure 6, we firstly rebuild the V-cache (V-cahce1) for this domain to prevent its future consumption of disk bandwidth. Secondly, stop handling the I/O requests in V-cache1, allocating all the physical disk bandwidth to the other none I/O VM s. When the utilization of V-cahce1 reaches its upper limit, the state of physical disk will be judged, if it is not busy, we stop disposing the I/O requests in the V-cache of none I/O intensive VM s, and allocate all the bandwidth of physical disk to the I/O intensive VM until the none I/O intensive VMs need to refresh their V-caches again; if the physical disk is busy working, we suspend the I/O intensive virtual machine until the physical disk is free. Therefore, the priority of I/O requests from none I/O intensive VMs can be improved, so that their I/O performance will not be affected by long-term I/O intensive VM . 4. IMPLEMENTATION VM1 (short-term I/O intensive) VM2 VM3 VM4 In this section, we first introduce Xen VMM and Blktap, based on which FIOS was implemented. Secondly, we describe the implementation of V-cache in details. 4.1 Xen and Blktap part 1 part 2 part 3 V-cache1 V-cache2 V-cache3 V-cache4 part 1 part 2 part 3 Disk Driver Virtual Machine Monitor (VMM) Disk Figure 5. Dealing with short-term I/O intensive domain . As shown in figure5, for the short-term I/O intensive VM , we split its V-cache into several parts (e.g. part1, part2, part3 as In order to avoid the overhead of virtual machine performance caused by instruction translation and simulation, Xen [11] hypervisor offered a split driver model which allows the guest OS accessing the real device effectively with the help of Domain 0 [7]. As illustrated in figure 7, for physical disk, its driver has divided into two parts, the one resides in guest OS is called front-end and the other locates in domain 0 is called back-end. Both ends communicated with each other through events channels and an I/O control ring which is a memory page shared by both ends. Domain 0 User space Domain U Tapdisk User space App shown in figure 8. Every node in linked list represents an I/O request. We will discuss the benefits brought by these data structures in details in the latter part of this paper. Double Linked Lisk User I/O Ring Kernel space Disk driver Blktap Kernel space Front end List Node Int addr; char* content; int data_size; node *previous; node *next; Int addr; char* content; Head int data_size; node *previous; node *next; Event Channel Shared I/O Ring Int addr; char* content; int data_size; node *previous; node *next; Tree Node1 Xen Virtual Machine Monitor (VMM) Tree Node2 Disk …… Int addr; char* content; int data_size; Tail node *previous; node *next; int ListNodeAddr; Int addr Tree Node4 AVL Tree Tree Node3 Figure 7. Split driver model in Xen . them into the I/O control ring and notifying domain 0 to handle these requests through event channel in VM M . Domain 0 is responsible for transferring data between I/O buffers and physical disk. Blktap [12] equals to the back-end of disk driver which resides in domain 0. Besides the kernel I/O control ring shared with frontend, Blktap is also equipped with an I/O ring shared with user space thread Tapdisk, which provides users with I/O operation interfaces such as open and close, read and write. Blktap maps I/O requests in kernel ring to user ring, Tapdisk fetches the requests from user ring and issues new file I/O operations with specific manner, and then submit them into the kernel of Dom 0, just like a process does during a common disk I/O. Tapdisk has many unique advantages since it resides in the user space of domain 0. Firstly, metadata disk formats such as Copy-on-Write, encrypted disks, sparse formats and other compression features can be easily implemented. Secondly, it facilitates the development of soft devices. Thirdly, it allows soft devices to be constructed as user space applications in an virtual machine and developers can work with high level languages and debuggers [13]. Tapdisk opens the image file of virtual machine with O_DIRECT flag, which in order to guarantee the semantic of I/O operations in virtual machines, that is to say, when the virtual machine trigger the flush operations, it must wait until the data reaches the physical disk. Therefore, great burden of disk can be predicted as long as intensive I/O domain exists. 4.2 V-Cache based on Xen and Blktap Our implementation is based on Xen hypervisor, taking use of Blktap. We changed the flow of processing I/O requests by modifying Blktap. A V-cache is established in VMM . Instead of forwarding I/O requests directly to physical disk driver, FIOS puts these requests into the corresponding V-cache, and then returns as if the handling of this I/O operation has completed. I/O requests stored in cache will be handled by ―FlushCtrl‖ sometime later. In order to guarantee the correctness of I/O operations, items stored in the cache should include the content, the size and the physical disk address of the I/O data. Besides, since the size of V-cache is limited, the content must be refreshed in time to avoid overflow of V-cache. We choose double linked list supplemented with AVL tree to organize to data structure of V-cache, as is Figure 8. O rganization of V-cache . Each V-cache is represented by a head node of double linked list and a root node of AVL tree. The reason of implementing V-cache with an AVL tree is as follows: If the virtual machine issues I/O requests frequently, the size of double linked list will become very large, therefore searching the corresponding node in this list becomes a tough task which will seriously injure the performance of FIOS. So we use another data structure-AVL tree [14]-to deal with searching operation. As for an AVL tree with N nodes, the average searching complexity is O(logN), which is far less lower than that of a linked list O(N), especially when N is very large. However, in order to decrease the data redundancy in V-cache, we do not duplicate the nodes in linked list into AVL tree. As an efficient method, we only put a pointer, which indicate the memory address of the node in double linked list, into the node of AVL tree, each pointer is a four bytes integer in a 32-bit machine. Refreshing the cache is an inefficient but inevitable operation. We take the following measures to improve the efficiency refreshing. Firstly, requests are combined in V-cache. Although operating system does the same work before submitting the requests into disk queue, the maximum queue length goes up to 100 or 1000 [15], which is much smaller than the capacity of V-cache, therefore combination work in V-cache is more efficient. Secondly, requests in V-cache are sorted according to their disk addresses so that disk rotating can be reduced. Thirdly, new thread called ―FlushCtrl‖ is created to make an asynchronous refreshing. As mentioned before, the thresholds of V-cache are significant factors in deciding when to flush. However, deciding the appropriate values of thresholds is a hard problem and, more importantly, their optimal values should depend heavily on the workloads. In the situation of light I/O, the upper limit of V-cache should be larger so that the cache can store more requests before flush. On the other hand, in the situation of heavy I/O, the upper limit should be smaller to avoid the over flow of the cache [16]. We take a rate based approach to set up the value of upper limit. Suppose that a(t ) is the arriving rate of I/O requests of V-cache at a particular time t. If a(t ) a(t 1) , which means the arriving rate of I/O requests is increasing, the value of upper limit should be decreased. On the contrary, if a(t ) a(t 1) , which means the arriving rate of I/O requests is decreasing, the value of upper limit should be increased. So we decide to calculate the value of upper limit as follows: h(t ) h(t 1)* a(t 1) a(t ) (1) k Ls (kpk ) k 0 where The value of lower limit is not only decided by the arriving rate at the previous time, but also by the flushing rate, in other words, how fast I/O request can be disposed by physical driver. We suppose f (t ) is the flushing rate, when f (t ) is faster than bandwidth. arriving rate, the value of lower limit could be lager in order to make the flushing less aggressive. When f (t ) is lower than the arriving rate, the value of lower limit should be smaller so that to prevent the overflow of V-cache. So we decide to calculate the value of low threshold as follows: Ws l (t ) l (t 1)* a(t 1) f (t ) * a(t ) f (t 1) (2) 1 (4) , which represents the utilization of I/O And the average response time of each I/O request is determined by formula (5): Ls 1 1 (1 ) (5) 5.1 I/O Performance Interference in Xen When an I/O intensive VM starts to run, both of the average length of virtual disk queue and the average response time of each I/O request will be affected. Figure 10 describes the model which reflects the arrival of an I/O intensive VM . Some experiments with different functions were carried out to adjust the value of h(t ) and l (t ) , and found out that these two simple schemes perform well in practice and have low computational requirement [16]. … … N In this section, Queue Theory is used to analyze and compare the interference of I/O performance caused by I/O intensive domains in Xen VMM and FIOS. Before deduce the performance parameters of the models, the following assumptions are made: 1) All of the I/O requests arrival process conforms to the Possion distribution. And we assume the arrival rate is . The arrival of I/O requests in each VM obeys the following formula: s k! es k = 0,1,2… Vdisk Queue N T N Figure 10. I/O service model in Xen when I/O intensive VM arrives. 1 (3) N ( N T 1) ( N T 1) N virtual disk queue of each VM 3) Physical disk scheduler is CFQ (Complete Fair Queuing), which is the default and most popular scheduler in domain0. ( N T 1) time of each I/O request Ls1 … … Vdisk Queue N N Figure 9. I/O service model in Xen with Blktap. According the queue theory, the average virtual disk queue length of each virtual machine can be calculated by formula (4): the average response Ws1 can be calculated as formula (7) 1 N (7) 1 1 (8) ( N T 1) T 1 N N given the same value of the parameters , and N , Ws1 Vdisk Queue 1 Ls1 and and formula (8). As shown in figure 9, if the number of VM is N, I/O service can be modeled as follows: (6) Therefore, according to formula (6), the average length of And we also assume that the arrival rate of I/O intensive VM is T times as that of none I/O intensive VM . As usually, T is much larger than 1. 2) The discipline that physical disk driver serves the VDisk queue conforms to the exponential distribution, and the service rate is . N Vdisk Queue N-1 5. SIMULATION P{N (t s) N (t ) k} Vdisk Queue 1 N Thus Ls1 virtual disk queue length and average response time of the I/O request will be affected by the value of parameter T: the larger the T is, the more Ls1 and Ws1 will decrease, in other words, the I/O performance of none I/O intensive VM s will be affected by the arrival of I/O intensive domain. Latency is defined simply as the time that the system has spent to complete a single I/O operation. It is also an important feature to estimate the I/O performance of virtual machine. 5.2 I/O Performance Interference in FIOS Vdisk Queue 1 M N Fairness is the equality of throughput divided among different running virtual machines [16]. We use Jain’s fairness measure to quantify the fairness between virtual machines. Jain’s fairness measure [17] ranges between 0, which means completely unfair, and 1, which means completely fair. Jain’s fairness is defined as … … Vdisk Queue N-1 M N Vdisk Queue N-1 T N M n Figure 11. I/O service model in FIO S when I/O intensive VM arrives. ( fairness Figure 11 describes the model in FIOS. I/O requests can be stored in the V-cache, that is to say, they do not have to reach the physical disk before return. Thus in FIOS, the service rate ' M , where M represents the speed ratio of memory access and disk access. Since each VM is equipped with an independent V-cache, when an I/O intensive VM arrives, Ls and Ws in FIOS can be calculated by the following formula: ' N M NM (9) Ls ' ' 1 ' NM Ws ' Ls ' N NM (10) (11) N Given the same parameter and , through the comparison between (7) and (10), (8) and (11), we notice that Ls ' Ls1 and Ws ' Ws1 . It is clear that in FIOS, I/O performance interference caused by I/O intensive VM will be decreased. M oreover, through the comparison between (4) and (10), (5) and (11), we can find that Ls ' Ls and Ws ' Ws , so it is reasonable to predict that I/O performance of none I/O intensive VM s in FIOS will be better that in Xen. 6. EVALUATION In this section, we firstly describe the key metrics which we used to evaluate the performance of FIOS. Then we introduce the corresponding benchmarks we used in the experiment. Finally, we describe the evaluation steps. Also, the results of the experiments are represented and analyzed. 6.1 Key Metrics Generally speaking, we choose three primary criterions to estimate the performance of FIOS: throughput, latency and fairness. Throughput is the amount of I/O data disposed by system in a unit of time. In FIOS, the throughput is measured by specific I/O benchmarks described below. X ) 2 i i 1 (12) n n X 2 i i 1 where X i means the throughput of virtual machine i. 6.2 Experimental Steps All results in this paper were collected on an Intel Xeon platform with two 1.6GHz processors,4GB RAM and 160 GB S-ATA II hard driver with 7200 RPM (ST3160815AS). The Linux 2.6.18.8 kernel was used throughout. Each virtual machine was allocated 40GB of the 160GB physical disk and used Ext3 as the ordered model as their file sy stems. The virtual disks were created in contiguous space on physical disk to minimize the seeking time when performing I/O operations on different virtual disks. In order to avoid the affect by memory cache in virtual machine, we allocate each virtual machine with only 256M B memory, which is small enough to trigger disk operations during I/O process. Dom0 is running a 64-bit CentOS 5.4 distribution and the hypervisor is Xen 3.4.3. Cleary, different workloads have different disk access patterns. There has been no one optimal I/O system for all different workloads [18]. Thus we selected two typical benchmarks to estimate the performance of FIOS, IOZone and DBench. IOZone [20] is a file system benchmark tool. It generates and measures a variety of file operations to test file I/O performance. DBench [21] is a tool to generate I/O workloads to either a file system or to a networked server. The workload can be specified by a configuration file in the DBench working directory, which consists of a mixture of file system operations. Table1 reflects the performance of I/O subsystem in original Xen VMM while we running IOZone to read/write a 2GB file and running DBench respectively in a domain configured 256M memory. Table 1. Disk bandwidth consumption of different benchmarks IOzone DBench Average queue length Average waiting time I/O bandwidth utiliz ation 142.33 2.05 1167.05ms 0.86ms 99.00% 24.29% According to the table1, when running IOZone, the average physical disk utilization can reach to 99%, while running DBench, the average physical disk utilization is about 25%. Therefore, we use IOZone to stimulate an I/O intensive VM and use DBench to stimulate a none I/O VM . To begin with, three none I/O intensive VM s are running on Xen VMM and FIOS respectively , and the average throughput measured by DBench is collected. While they are running, another VM is started, which executes IOZone. Figure 12 shows that firstly, when running on Xen, the I/O throughput of all none I/O intensive domains is largely injured by advent of I/O intensive VM at about 200th second, which have decrease from about 125M B/sec to 85 M B/sec - a nearly 32% loss. However, while running on FIOS, there is only a very slight decrease in the value of I/O throughput of none I/O intensive VM s. This is largely due to the V-cache we have created in FIOS, which has intercepted and stored the I/O request from VM s. Secondly, comparing the two sets of curves in Figure 12, even if without the interruption of I/O intensive virtual machine, I/O throughput of none I/O intensive virtual machines running on FIOS is about 225 M B/sec, while running on the original Xen VMM , the I/O throughput is about 125M B/sec, which is almost 45% lower than the former one. The reason is that in FIOS, I/O operation will return immediately when they reach the corresponding V-cache, and does not have to wait until the data has been written into physical disk. VM1 on Xen VM2 on Xen VM3 on Xen 240 VM1 on FIOS VM2 on FIOS VM3 on FIOS 240 Throughput(MB/S) VMs on FIOS 180 I/O intensive VM starts I/O intensive VM ends VM1 on Xen VM2 on Xen VM3 on Xen 600 600 I/O intensive VM ends 400 Latency(ms) 6.3 Results We can observe the fairness of VMs’ I/O throughput from figure13, the value of fairness is 0.99 both on Xen and FIOS. Because in Xen, the default schedule algorithm is CFQ (Completely Fair Queuing) and our modification does not alter this quality. 400 I/O intensive VM starts 200 200 0 0 150 200 250 300 350 400 Time(sec) Figure 14. I/O latency of VMs on Xen. VM1 on FIOS Latency(ms) We carried out the experiment by comparing the throughput , latency and fairness of I/O operations in virtual machines running on the Xen VM M and FIOS. VM2 on FIOS VM3 on FIOS 200 200 100 100 180 0 0 I/O intensive VM starts 160 120 120 VMs on Xen 200 250 300 Time(sec) 350 400 Figure 12. None I/O intensive VMs running on Xe n and FIO S, with interruption from I/O intensive domain . Fairness Measure Xen FIOS 0.8 0.8 0.4 0.4 0.0 0.0 Figure 13. Fairness of VMs’ I/O performance . 180 200 220 240 Time(sec) 260 280 300 Figure 15. I/O latency of VMs on FIO S . Figure 14 and figure 15 have reflected the latency of I/O operations in VM s, which are running on Xen and FIOS respectively. By comparing the two figures, it is obvious that first of all, the lines are significantly serrated in figure14 while lines in figure15 are much smoother apart from for a few exceptions. Each point on lines indicates the latency of an I/O operation, saw tooth in the figure14 indicates the jitter of I/O performance. Besides, the average I/O latencies reflected in figure14 is far more lager than that reflected in figure15, which demonstrates that even without I/O intensive VM , interferences among normal domains are still significant, which largly reduce VM s’ I/O performance. Finally, when I/O intensive VM is started, I/O latencies reflected in figure14 are suffered an obvious increase, which sustained until I/O intensive VM is stopped. However, in figure15, the latencies nearly remain the same except for a few sharply increase. This increase is due to that certain time is needed to identify the I/O intensive VM . as it has I/O operations. And, in FIOS, virtual machines only have a slight interference with each other. 1VM on Xen 1VM on FIOS 2VMs on Xen 2VMs on FIOS 3VMs on Xen 4VMs on Xen 3VMs on FIOS 4VMs on FIOS Throughput(MB/sec) 240 240 I/O intensive VM ends 160 160 80 80 I/O intensive VM starts 0 0 0 20 40 60 80 100 Time(sec) 120 140 160 180 200 Figure 16. Different numbers of VMs running on Xen and FIO S, with inte rruption from I/O intensive domain . In another experiment, different numbers (1, 2, 3, 4) of none I/O intensive VM s are running on VM M respectively to see how the number of VM will affect the performance of FIOS. Also an I/O intensive VM is started at about the 60th second. As is manifested in figure 16, while running on Xen, average I/O throughput of VM has decreased more than 50%, from about 170M B/sec to 80M B/sec as the number of these machines increase from 1 to 4. However, while running on FIOS, the case is different, we can hardly notice any decline in I/O throughput of none I/O intensive VM s. This is because in FIOS, every VM has its own V-cache, which acts as a cushion of I/O operations. FIOS 200 200 Throughput (MB/sec) Xen 150 150 100 100 50 50 7. CONCLUSION AND FUTURE WORK Disk I/O is a time consuming operation. When multiple domains share the same disk resource, I/O performance interference among them is conspicuous because of the limited disk bandwidth. We have demonstrated that when 5 VM s are running simultaneously on the same VMM , their I/O performance is only 25% of that when there is only one running VM . Worse still, when there exists an I/O intensive VM , the I/O performance of other domains will be seriously injured. Our implementation has effectively avoided the interference of I/O performance among VM s. Especially prevent the remarkable injury brought by I/O intensive domains. In our system, when 5 VM s are running simultaneously on the same VMM , their I/O performance can achieve nearly 94% of that when only one domain is running. Also, the I/O performance of none I/O intensive VM will not be affected seriously by the advent of I/O intensive domain. Furthermore, while running on FIOS, VM s’ I/O performance has been promoted significantly in comparison with running on Xen VM M . M oreover, we believe that our implementation can be easily and conveniently applied to other virtualization infrastructures. In the future, our studies will focus on improving the flexibility and stability of V-cache in FIOS. When multiple I/O intensive domains are running simultaneously on VM M , I/O traffic they produce may be so heavy that exceed the capacity of FIOS, which will easily lead to the overflow of V-cache, this will cause significant decrease of overall performance of FIOS. Therefore, future cache management strategies are needed to deal with this situation. We also plan to employ SSD as the p hysical storage in this system. It is well known that compared with traditional disk , SSD bears many advantages such as higher performance and lower energy consumption, but whether SSD can be well adapted to virtualization environment is an interesting and challenging issue. 8. ACKNOWLEDGMENTS 0 1VM 2VMs 3VMs 1VM 2VMs 3VMs 0 Figure17. Different numbers of VMs running on FIO S and Xen. In the third experiment, various numbers of none I/O intensive VM s are running on Xen and FIOS respectively , however, there is no interrupt brought by I/O intensive VM . This experiment is designed to discover the I/O performance interference among none I/O intensive VM s. Figure 17 shows that if running on FIOS, the value of I/O throughput of these virtual machine decreases only about 6%, from 230M B/sec to 216 MB/sec, while the number of normal VM increases from 1 to 5. On the contrary, if running on Xen, the value of I/O throughput suffers a significant decrease about 75%, from 171M B/sec to 43M B/sec. It reveals that the new coming VM , no matter I/O intensive or not, will affect the I/O performance of existing domains as long This work is supported by the China National Natural Science Foundation (NSFC) (No. 60973133), the M oE-Intel Information Technology Special Research Foundation under grant No. M OE-INTEL-10-05. 9. REFERENCES Che, J., He, Q., Gao, Q., and Huang, D. 2008. Performance measuring and comparing of virtual machine monitors. In Proceedings of Embedded and Ubiquitous Computing (EUC). 381-386. [2] Liu, J., Huang, W., Abali, B. and Panda, D.K. 2006. High performance VM M -bypass I/O in virtual machines. In Proceedings of the annual conference on USENIX. 29-42. [1] [3] Deshane, T., M ccabe, M . and Neefe, J. Performance isolation of a misbehaving virtual machine with Xen, VM ware and Solaris containers. http://people.clarkson.edu/~jnm/publications/isolationOfMi sbehavingVMs.pdf. Ongaro, D., Cox, A.L. and Rixner, S. 2008. Scheduling I/O in virtual machine monitors. In Proceedings of the 4th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments (VEE). ACM , New York, NY, 1-10. [5] Boutcher, D. and Chandra, A. 2010. Does virtualization make disk scheduling passé? ACM SIGOPS Operating Systems Review. Vol. 44, 20-24. [4] [6] [7] Gupta, D., Cherkasova, L., Gardner, R. and Vahdat, A. 2006. Enforcing performance isolation across virtual machines in Xen. In Proceedings of International Conference on Middleware. 342-362. Fraser, K., Hand, S., Neugebauer, R., Pratt, I., Warfield, A. and Williamson, M . 2004. Safe hardware access with the Xen virtual machine monitor. In Proceedings of the 1st Workshop on Operating System and Architectural Support for the on demand IT InfraStructure (OASIS). Pasquale, B.K. and Polyzos, G.C. 1994. Dynamic I/O characterization of I/O intensive scientific applications. In Proceedings of the 1994 ACM/IEEE conference on Supercomputing. ACM , New York, NY, 660-669. [9] Chrobak, M . and Noga J. 1998. LRU is better than FIFO. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA). Society for Industrial and Applied M athematics, Philadelphia, PA, USA, 78-81. [8] [10] Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I. and Warfield, A. 2005. Live migration of virtual machines. In Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation (NSDI). Vol. 2. USENIX Association, Berkeley, CA, USA, 273-286. [11] Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. 2003. Xen and the art of virtualization. In Proceedings of the 19th ACM symposium on Operating systems principles (SOSP). ACM , New York, NY, 164-177. [12] Blktap. http://wiki.xensource.com/xenwiki/blktap. [13] Warfield, A., Hand, S., Fraser, K. and Deegan, T. 2005. Facilitating the development of soft devices. In Proceedings of the annual conference on USENIX Annual Technical Conference. 379-382. [14] Larsen, K.S. 1994. AVL trees with relaxed balance. In Proceedings of the 8th international parallel processing symposium. 888-893. [15] Ruemmler, C. and Wilkes, J. 1993. UNIX disk access patterns. In Proceedings of Winter 1993 USENIX. 405-420. [16] Batsakis, A., Burns, R., Kanevsky, A., Lentini, J. and Talpey, T. 2008. AWOL: an adaptive write optimizations layer. In Proceedings of the 6th USENIX Conference on File and Storage Technologies. 67-80. [17] Jain, R., Chiu, D.M . and Hawe,W.R. 1984. A quantitative measure of fairness and discrimination for resource allocation in shared computer system. Technical Report TR-301, DEC Research. [18] Hu, Y. and Yang, Q. 1996. DCD—disk caching disk: A new approach for boosting I/O performance. In Proceedings of the 23rd annual international symposium on Computer architecture (ISCA). ACM , New York, NY, 169-178. [19] Cherkasova, L. and Gardner, R. 2005. M easuring CPU overhead for I/O processing in the Xen virtual machine monitor. In Proceedings of USENIX Annual Technical Conference. [20] William D.Norcott. IOZone, http://www.iozone.org, 2001. [21] DBench. http://dbench.samba.org. [22] Dong, Y., Dai, J., Huang, Z., Guan, H., Tian, K. and Jiang, Y. 2009. Towards high-quality I/O virtualization. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference (SYSTOR). ACM , New York, NY, 12-19. [23] Chadha, V., Illiikkal, R., Iyer, R., M oses, J., Newell, D. and Figueiredo, R.J. 2007. I/O processing in a virtualized platform: a simulation-driven approach. In Proceedings of the 3rd international conference on Virtual execution environments (VEE). ACM , New York, NY, USA, 116-125. [24] Seelam, S.R. and Teller, P.J. 2006. Fairness and performance isolation: an analysis of disk scheduling algorithms. In Proceedings of IEEE International Conference on Cluster Computing. 1-10. [25] Seelam, S.R. and Teller, P.J. 2007. Virtual I/O scheduler: a scheduler of schedulers for performance virtualization. In Proceedings of the 3rd international conference on Virtual execution environments (VEE). ACM , New York, NY, 105-115. [26] M ei, Y., Liu, L., Pu, X. and Sivathanu, S. 2010. Performance measurements and analysis of network I/O applications in virtualized cloud. In Proceedings of the 3rd International Conference on Cloud Computing. 59-66.