PhD Hypothesis Mark White College of Engineering and Informatics NUI Galway, Wednesday 24th October 2012 A Resource Bartering Economy for Autonomous Virtual Machines Executive Statement Our proposal centers on the concept of a Barter Economy for Virtual Machines (VMs). The primary objective is to trade surplus resource entitlements between VMs to: Optimize utilization of VM resources in a Resource Pool (RP) Define more flexible SLAs (per-VM and system-wide) Reduce VM migrations Bartering will result in RP resources, which would otherwise be idle due to surplus (unused) entitlements, being optimized. For the purposes of this research the RP bounds our system i.e. a number of VMs share a local and finite set of hardware resources. The resource pool is a limiting factor in so far as the VMs within it would have been initially provisioned for reasons beyond the scope of this research (e.g. storage proximity, geographical proximity to application requests, High Availability (HA), High Performance Cluster Computing (HPCC)). To look beyond the resource pool would involve factoring of these provisioning decisions. Solutions would typically involve migration to hosts beyond the boundaries of the resource pool – typically clashing with the original provisioning decisions. We consider it infeasible to have a VM using remote resources i.e. traversing networks to access the hardware resources it requires. Rather, we focus on optimizing the existing resources within the confines of a RP - using a VM bartering economy to balance utilization. An Example: VM1 needs more Disk I/O. It is holding an entitlement to 20% CPU but only using 10%. It may trade its surplus 10% CPU entitlement with VM2 which has a surplus of Disk I/O but needs more CPU. VM1 and VM2 are complementary. Complementary VMs Complementary VMs are those that require different resources at the same time to service their workloads, resulting in optimization of all available resources in a pool. Bartering involves identification of complementary VMs with which to trade. To identify complementary VMs a ‘buyer’ VM might use an agent to search by requesting the other VMs (‘sellers’) to declare their resource entitlements. Once a potential ‘seller’ has been found, the negotiation process begins – as in the example given above. An arbitrator may be provided – if required. If more than one ‘seller’ is found then additional parameters may be required to resolve which VM to trade with or the ‘buyer’ VM may trade with all suitable VMs to acquire the resources it needs. Beloglazov et al. [1] state that over-provisioning may occur if the potential for resource sharing by co-located VMs with different workload patterns is not taken into account. They show that if workloads with similar resource usage requirements are co-located there may be greater potential for bottlenecks to occur (e.g. increased competition for disk among diskbound (I/O) applications) whereas workloads with different resource requirements may be able to exist on the same server with less contention i.e. the workloads are complementary. Pu et al. [2] have shown that co-location of CPU-intensive and network-intensive workloads incur the least resource contention, delivering higher aggregate performance. In a Utopian system, each VM would be simultaneously using a different resource to the point where each resource was being fully optimized but none was being competed for. Although this is perhaps an unrealistic situation, the degree to which co-located workloads are complementary is a key performance issue when scheduling. Multi-Dimensional Resource Allocation (MDRA) We suggest that a typical outcome of a trade would be a Multi-Dimensional (temporal and spatial e.g. moving average) Resource Allocation (MDRA). The effect of the temporal aspect of the allocation is to ‘smooth’ periods of high and low usage over time, theoretically reducing the interval between re-negotiation for more resources. The MDRA would form the basis for definition of a VM’s SLAs i.e. continuous monitoring of its own SLAs would identify the need for more resources and induce a subsequent trade. Scheduling The CPU scheduler for Virtual Machines (VMs) places requests on the queue according to a priority parameter. Requests with a higher priority (i.e. closer to 0) will be granted CPU access before those with lower priorities although, if backfilling is enabled, a lower priority request may ‘skip’ onto the CPU while the higher priority request is awaiting some external input. Scheduling priorities are calculated differently in VMWare and Xen. In VMWare, VMs are assigned a share, limit and reservation when initially provisioned. It is these parameters which form the basis of VMWare priorities and ‘entitlements’. The VMWare scheduler uses the priority to place the vCPU on the queue and monitors the entitlement as the vCPU request is being processed – dynamically altering the priority as the vCPU uses up an increasing percentage of its entitlement. In Xen, VMs are assigned a weight and a cap. These are the parameters required to calculate a Xen priority. As the VM request is processed the scheduler keeps track of whether the VM is ‘over’ or ‘under’ its entitlement and places / replaces the request appropriately on the queue accordingly. In both VMWare and Xen, the base parameters required to calculate a priority / entitlement are statically set by the operator when the VM is first provisioned and cannot be changed without operator intervention. It is our contention that we can optimize scheduling by increasing the accuracy of VM entitlements (and subsequent priorities read by the scheduler). Calculation of the priorities on the scheduler would be based on the MDRAs traded between two VMs rather than the existing VMWare (share, reservation & limit) or Xen (weight & cap) parameters. Our MDRAs would react to changing conditions, redefining resource entitlements each time two VMs perform a trade. This both eliminates the need for operator intervention and optimizes resource utilization. In effect we propose replacement of the existing static parameters with dynamic MDRAs. Changes to the scheduler may be required so that entitlements (calculated based on the VM’s MDRA) can be monitored while a VM process is consuming CPU or some other resource. Decentralization In our barter economy each VM is autonomous - managing its own resource requirements and only trading as and when it identifies an additional need for resource. This eliminates the centralization (and continuous) monitoring of RP resources. While some ‘agent’ may be deployed to search the RP for complementary VMs with which to trade, this agent would only need to be deployed on a per-trade basis. Another agent may also be required to arbitrate the trade but again, only on a per-trade basis. ‘Live’ Trading In the same way as VM migrations can be performed on a ‘live’ basis, the ideal would be that VMs do not have to pause / stop while trading their resources. Process Flow The resource allocation and scheduling processes are intrinsically related but considered as separate entities for the purposes of our research. VM entitlement (calculated based on the MDRA) will be read by the scheduler and VM requests positioned on the physical CPU (pCPU) queue accordingly. At the highest level, the full VM operation will be a continuous 3-step process: 1. Allocation (Search, Trade, MDRA, Calculation of VM entitlement for the scheduler) 2. Scheduling 3. SLA Monitoring – to identify the need for more resource. SLAs will be defined using the MDRA as a basis Related Work Bartering In 2003 Fu et al. [3] designed SHARP, a framework for secure, distributed resource management using a bartering system. They specifically examined the need for security in a system which spanned across trust domains (i.e. external networks). Our work is constrained to a local resource pool – eliminating the need for similar trust agreements. Feldman et al. [4] initially proposed a system where VMs receive a share of resources in proportion to a bid they make during an auction. The ‘auction’ takes place intermittently (with a regular time interval). This system later became Tycoon [5]. Our work is differentiated in that our research is not auction-based i.e. VMs dictate the interval between trades based on an identified need for additional resource. Theoretically the interval in our system is extended by the increased accuracy of the MDRA which results from a trade. In addition, an auction system assumes knowledge of ALL available resources (i.e. a 1VM -> many resources) whereas the relationship in our system is 1 -> 1, each VM only needing to identify resources belonging to another VM. Most interesting in the work performed by Brent N. Chun et al. [6] is expression, translation and enforcement of a resource’s value both to a particular VM and to the system as a whole. Without a normalising currency in a barter economy (e.g. £, €, $), the valuation method by which x network is traded with y CPU becomes critical. To successfully trade, a value must be assigned to both resources being traded by the VMs while also accounting for system-wide supply and demand. We agree with Chun et al. where they propose that the cluster / RP is the optimal environment in which to deploy a bartering economy because prior provisioning decisions have already placed the VMs in that particular configuration. Mancinelli et al. [7] modelled the resource characteristics of an application / VM and execution (host) environments so that reasoning could be performed on ‘compatibility’ and ‘goodness’. Compatibility may be viewed for our purposes as the notion that sufficient resources are available on the host to supply the VM’s demand while goodness examines the best way to adapt the available resources to suit the application being serviced by the VM – i.e. definition of our MDRA. Wood et al. [17, 18] designed a model which predicts the virtualization overhead required by a host and the likely resources required by an application being provisioned on that host. They close the paper expressing the intention to investigate: 1. “How [sic] these modelling techniques can be used to predict the aggregate resource requirements of virtual machines co-located [sic] on a single host and to 2. Determine [sic] when an application’s resource requirements are likely to exceed the virtual system’s capacity”. Their work in useful in that we wish to: a) Facilitate a new VM requesting access to a RP by trading with the existing VMs for the resources it requires, ideally resulting in complementary VMs being co-located b) Include continuous monitoring of SLA compliance - analogous to (2) above Scheduling “For any serious VM deployment, the platform will need to give users control over the scheduling parameters and provide flexible mechanisms that allow a wide variety of resource allocation policies.” Cherkasova et al. [8] Rosu et al. [9] examine resource re-allocation for real time systems when the requirement for additional resource is identified. Due to the systems being real-time (e.g. radar) they focus on designing a model (and metrics) to analyse the time from identification of a need to the time the application has settled back into normal operation. Their approach centralizes the resource management function rather than assigning responsibility to the application / VM. Interestingly, they distinguish between different deficit catalysts i.e. where conditions changed (in the application workload itself or in resources being used by other applications) to create the resource deficit in the application under analysis. Their metrics are similar to the settling time and steady-state error metrics in [10]. Stankovic et al. [11] first proposed a dynamic feedback-driven proportion allocator which monitors each request at the CPU as it proceeds and reduces its allocation as it begins to use more than its fair share of the available time-slices. Previous efforts required the operator to provide reservation parameters. This dynamic feed-back monitoring eliminates both: Priority-based scheduling problems such as starvation, priority inversion, and lack of fine-grain allocation The prerequisite for the operator to ‘guess’ the required CPU reservation for the application. Their work contributes in so far as our scheduler will monitor the VM’s MDRAs – adjusting access to the CPU accordingly. Cherkasova et al. [12] compared the three Xen schedulers. In particular they tested performance (scheduler errors) issues relating to the contention for CPU resource between VMs. Stage et al. [13] designed a network aware scheduling system which took different workloads on the host into account. Our work is complementary to both in that we are also interested in solutions for resource contention. Hu at al. [14] propose a genetic algorithm to identify the optimum load balance. They include historical workload data and system variations as parameters to the algorithm which examines the affect a variety of possible allocations will have on resource balance ahead of the actual allocation. They are motivated in part, as we are, by the effort to reduce the number of VM migrations required to find resources i.e. the effort to optimize is performed at source as an alternative to migrating to a remote host with surplus resources. VMWare vSphere Distributed Resources Scheduler (DRS [21]) creates cluster-based resource pools and by continuously monitoring storage, CPU and RAM utilization, can allocate available resources automatically (if configured to do so) based on pre-defined policies that reflect business needs and priorities i.e. SLAs. VMWare vMotion must be available and configured to enable this functionality as VM migrations (provided by vMotion) are the process which facilitate the load balancing activity. If vMotion is not available DRS will make recommendations which must then be manually performed by the operator. DRS continuously monitors resource usage but only ‘kicks in’ when 2 specific conditions are met: 1. There is a load imbalance i.e. a host is close to maximum CPU or memory utilization while another host has available resources 2. A VM is not receiving its resource entitlement (share, reservation or limit) Distributed Power Management (DPM) is also included in the DRS package i.e. servers can be switched off automatically to save power at times when resource requirements are reduced e.g. at night. VM affinity can also be configured in DRS such that all attempts are made to keep VMs (which are communicating amongst themselves on the same host) co-located. If co-location is not possible, these VMs should, at least, be attached to a common physical switch. The DRS software is a centralized analysis system which continuously monitors the available resources and balances the load within the cluster according to the individual requirements of the VMs. Service Level Agreements (SLAs) There are numerous resource allocation schemes for managing VMs in a DC. In particular, SLA@SOI has completed extensive research in recent years in the area of SLA-focused (e.g. CPU, memory, location, isolation, hardware redundancy level) VM allocation and re-provisioning [15]. Bouchard et al. [19] developed an architecture for SLA-based resource management, concluding that runtime monitoring of resource usage data is the key to providing the most granular SLAs possible. The underlying concept is that VMs are assigned to the most appropriate hosts in the DC according to both the service level and DC objectives. Hysera et al. [16] suggest that a provisioning scheme which also includes energy constraints may choose to violate user-based SLAs ‘if the financial penalty for doing so was [sic] less than the cost of the power required to meet the agreement’. Ejarque et al. [20] designed a virtual resource management framework which is based on compliance with SLAs. The design reacts almost instantaneously to impending SLA violations and searches for the additional resources required to maintain compliance. Their work complements ours in that the system prefers to find resources locally rather than migrating the VM to another host. Conclusion The central question is whether or not the improvement gained in provisioning and resource management / scheduling with a barter economy is sufficient to warrant the overhead required to create the economy in the first place? The gain is worth it if the economy can be run with minimal effect on existing resources and all resources (not just those available to the VM economy) are used more efficiently than in existing systems. It is proposed that an economic (mathematical) model be initially designed to encapsulate the concepts discussed herein and simulation / test would subsequently be performed using the Xen virtualized environment. Comparison of the existing Xen environment would prove our hypothesis - that autonomous VMs running in a barter economy can manage their resources, provisioning / scheduling as efficiently (if not more) than existing, centralized resource management systems. In doing so, we intend to show that more flexible and accurate SLAs can be defined for DC clients and the number of migrations required to balance the resource load can also be reduced. Appendices Appendix A: MDRA Visualization Changing resource requirements are multi-dimensional i.e. they have a spatial and a temporal aspect. For example, if the resources under consideration were CPU, network and memory, a 3D parallelogram (parallelopiped) [Figure A1] could be created from each vector and the volume of each calculated using the absolute determinant of the 3 * 3 matrix. While visualization work would take place off-host due to the resources required to perform the calculations, the actual comparisons between requirements would be performed on-host using the scalar values i.e. the vectors lengths. Figure A1 – The truth table for possible resource variations in time visualized using a parallelepiped which encapsulates 3 parameters (CPU, network and memory). Appendix B: Xen Scheduling Algorithm Each CPU manages a local run queue of runnable VCPUs. This queue is sorted by VCPU priority. A VCPU's priority can be one of two values: over or under representing whether this VCPU has or hasn't yet exceeded its fair share of CPU resource in the ongoing accounting period. When inserting a VCPU onto a run queue, it is put after all other VCPUs of equal priority to it. As a VCPU runs, it consumes credits. Every so often, a system-wide accounting thread recomputes how many credits each active VM has earned and bumps the credits. Negative credits imply a priority of over. Until a VCPU consumes its allotted credits, its priority is under. On each CPU, at every scheduling decision (when a VCPU blocks, yields, completes its time slice, or is awaken), the next VCPU to run is picked off the head of the run queue. The scheduling decision is the common path of the scheduler and is therefore designed to be light weight and efficient. No accounting takes place in this code path. When a CPU doesn't find a VCPU of priority under on its local run queue, it will look on other CPUs for one. This load balancing guarantees each VM receives its fair share of CPU resources system-wide. Before a CPU goes idle, it will look on other CPUs to find any runnable VCPU. This guarantees that no CPU idles when there is runnable work in the system. Appendix C: Barter Flow Chart References [1] Beloglazov, A., Abawajy J., Buyya, R. 2011. Energy-Aware Resource Allocation Heuristics for Efficient Management of Data Centers for Cloud Computing. Future Generation Computer Systems. Elsevier Science. Amsterdam. The Netherlands. [2] X. Pu, L. Liu, Y. Mei, S. Sivathanu, Y. Koh, C. Pu. Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments. In Intl. Conf. on Cloud Computing, 2010. [3] Y. Fu, J. Chase, B. Chun, S. Schwab, A. Vahdat, SHARP: An Architecture for Secure Resource Peering, ACM SIGOPS Operating Systems Review 37 (5) (2003) 133_148. [4] M. Feldman, K. Lai, and L. Zhang,. The Proportional-Share Allocation Market for Computational Resources. IEEE Transactions on Parallel and Distributed Systems, vol. 20, pp. 1075–1088, 2009. [5] Lai, K., Huberman, B.A., Fine, L. Tycoon: A Distributed Market-Based Resource Allocation System. Technical Report, HP Labs, PaloAlto, CA, USA (2004) April [6] Brent Chun and David Culler, Market-Based Proportional Resource Sharing for Clusters, Technical Report, University of California, Berkeley, September 1999. [7] Mancinelli, F., Inverardi, P.: A resource model for adaptable applications. In: SEAMS ’06. Proceedings of the 2006 international workshop on Self-adaptation and self-managing systems, pp. 9–15. ACM Press, New York (2006) [8]. L. Cherkasova, D. Gupta, and A. Vahdat. When Virtual is Harder than Real: Resource Allocation Challenges in Virtual Machine-Based IT Environments. Technical Report HPL2007-25, HP Laboratories Palo Alto, Feb. 2007. [9] D. Rosu, K. Schwan, S. Yalamanchili and R. Jha. On Adaptive Resource Allocation for Complex Real-Time Applications. 18th IEEE Real-Time Systems Symposium, Dec., 1997. [10] C.Lu, J.A. Stankovic, T.Abdelzaher, G.Tao, S.H. Son, M. Marley. Performance Specification and Metrics for Adaptive Real-Time Systems. In IEEE Real-Time Systems Symposium, Orlando, Florida, November 2000. [11] David C. Steere, A. Goel, J. Gruenberg, D. McNamee, C. Pu, J. Walpole. A FeedbackDriven Proportion Allocator for Real-Rate Scheduling. 3rd Symposium on Operating Systems Design and Implementation, Feb 1999. [12] Cherkasova, L.; Gupta, D. & Vahdat, A. Comparison of the Three CPU Schedulers in Xen, SIGMETRICS Perform. Eval. Rev., ACM, 2007, 35, 42-51. [13] A. Stage and T. Setzer, Network-Aware Migration Control and Scheduling of Differentiated Virtual Machine Workloads, in Proc. CLOUD’09. IEEE Computer Society, 2009, pp. 9–14. [14] Jinhua Hu, Jianhua Gu, Guofei Sun, and Tianhai Zhao,. A Scheduling Strategy on Load Balancing of Virtual Machine Resources in Cloud Computing Environment. In 3rd International Symposium on Parallel Architectures, Algorithms and Programming, pp. 89-96. [15] C. Hyser, B. McKee, R. Gardner, and B. Watson. Autonomic Virtual Machine Placement in the Data Center. Technical Report HPL-2007-189, HP Laboratories, Feb. 2008. [16] F. Checconi, T. Cucinotta, and M. Stein. Real-Time Issues in Live Migration of Virtual Machines. In Euro-Par 2009–Parallel Processing Workshops, pages 454–466. Springer, 2010. [17] T. Wood, L. Cherkasova, K. M. Ozonat, and P. J. Shenoy. Profiling and Modelling Resource Usage of Virtualized Applications. In Val´erie Issarny and Richard E. Schantz, editors, Middleware, volume 5346 of Lecture Notes in Computer Science, pages 366–387. Springer, 2008. [18] Wood, Timothy; Cherkasova, Ludmila; Ozonat, Kivanc; Shenoy, Prashant. Predicting Application Resource Requirements in Virtual Environments. HP Laboratories, Technical Report HPL-2008-122, 2008. [19] L.-O. Burchard, M. Hovestadt, O. Kao, A. Keller, and B. Linnert. The Virtual Resource Manager: An Architecture for SLA-Aware Resource Management. Proc. Fourth IEEE/ACM Int’l Symp. Cluster Computing and the Grid (CCGrid ’04), 2004. [20] J. Ejarque, M. de Palol, ´I. Goiri, F. Juli`a, J. Guitart, R. Badia, and J. Torres. SLADriven Semantically-Enhanced Dynamic Resource Allocator for Virtualized Service Providers. In Proceedings of the 2008 Fourth IEEE International Conference on eScience. IEEE Computer Society Washington, DC, USA, 2008, pp. 8–15. [21] VMWare, What’s New in VMware vSphere 5.1 – Technical White Paper V2.0, June 2012