[13] A. Stage and T. Setzer, Network-Aware Migration

advertisement
PhD Hypothesis
Mark White
College of Engineering and Informatics
NUI Galway, Wednesday 24th October 2012
A Resource Bartering Economy for Autonomous Virtual Machines
Executive Statement
Our proposal centers on the concept of a Barter Economy for Virtual Machines (VMs). The
primary objective is to trade surplus resource entitlements between VMs to:

Optimize utilization of VM resources in a Resource Pool (RP)

Define more flexible SLAs (per-VM and system-wide)

Reduce VM migrations
Bartering will result in RP resources, which would otherwise be idle due to surplus (unused)
entitlements, being optimized.
For the purposes of this research the RP bounds our system i.e. a number of VMs share a
local and finite set of hardware resources. The resource pool is a limiting factor in so far as
the VMs within it would have been initially provisioned for reasons beyond the scope of this
research (e.g. storage proximity, geographical proximity to application requests, High
Availability (HA), High Performance Cluster Computing (HPCC)). To look beyond the
resource pool would involve factoring of these provisioning decisions. Solutions would
typically involve migration to hosts beyond the boundaries of the resource pool – typically
clashing with the original provisioning decisions. We consider it infeasible to have a VM
using remote resources i.e. traversing networks to access the hardware resources it requires.
Rather, we focus on optimizing the existing resources within the confines of a RP - using a
VM bartering economy to balance utilization.
An Example:
VM1 needs more Disk I/O. It is holding an entitlement to 20% CPU but only using 10%. It
may trade its surplus 10% CPU entitlement with VM2 which has a surplus of Disk I/O but
needs more CPU. VM1 and VM2 are complementary.
Complementary VMs
Complementary VMs are those that require different resources at the same time to service
their workloads, resulting in optimization of all available resources in a pool. Bartering
involves identification of complementary VMs with which to trade.
To identify complementary VMs a ‘buyer’ VM might use an agent to search by requesting
the other VMs (‘sellers’) to declare their resource entitlements. Once a potential ‘seller’ has
been found, the negotiation process begins – as in the example given above. An arbitrator
may be provided – if required. If more than one ‘seller’ is found then additional parameters
may be required to resolve which VM to trade with or the ‘buyer’ VM may trade with all
suitable VMs to acquire the resources it needs.
Beloglazov et al. [1] state that over-provisioning may occur if the potential for resource
sharing by co-located VMs with different workload patterns is not taken into account. They
show that if workloads with similar resource usage requirements are co-located there may be
greater potential for bottlenecks to occur (e.g. increased competition for disk among diskbound (I/O) applications) whereas workloads with different resource requirements may be
able to exist on the same server with less contention i.e. the workloads are complementary.
Pu et al. [2] have shown that co-location of CPU-intensive and network-intensive workloads
incur the least resource contention, delivering higher aggregate performance. In a Utopian
system, each VM would be simultaneously using a different resource to the point where each
resource was being fully optimized but none was being competed for. Although this is
perhaps an unrealistic situation, the degree to which co-located workloads are complementary
is a key performance issue when scheduling.
Multi-Dimensional Resource Allocation (MDRA)
We suggest that a typical outcome of a trade would be a Multi-Dimensional (temporal and
spatial e.g. moving average) Resource Allocation (MDRA). The effect of the temporal aspect
of the allocation is to ‘smooth’ periods of high and low usage over time, theoretically
reducing the interval between re-negotiation for more resources. The MDRA would form the
basis for definition of a VM’s SLAs i.e. continuous monitoring of its own SLAs would
identify the need for more resources and induce a subsequent trade.
Scheduling
The CPU scheduler for Virtual Machines (VMs) places requests on the queue according to a
priority parameter. Requests with a higher priority (i.e. closer to 0) will be granted CPU
access before those with lower priorities although, if backfilling is enabled, a lower priority
request may ‘skip’ onto the CPU while the higher priority request is awaiting some external
input. Scheduling priorities are calculated differently in VMWare and Xen.
In VMWare, VMs are assigned a share, limit and reservation when initially provisioned. It is
these parameters which form the basis of VMWare priorities and ‘entitlements’. The
VMWare scheduler uses the priority to place the vCPU on the queue and monitors the
entitlement as the vCPU request is being processed – dynamically altering the priority as the
vCPU uses up an increasing percentage of its entitlement.
In Xen, VMs are assigned a weight and a cap. These are the parameters required to calculate
a Xen priority. As the VM request is processed the scheduler keeps track of whether the VM
is ‘over’ or ‘under’ its entitlement and places / replaces the request appropriately on the
queue accordingly.
In both VMWare and Xen, the base parameters required to calculate a priority / entitlement
are statically set by the operator when the VM is first provisioned and cannot be changed
without operator intervention. It is our contention that we can optimize scheduling by
increasing the accuracy of VM entitlements (and subsequent priorities read by the scheduler).
Calculation of the priorities on the scheduler would be based on the MDRAs traded between
two VMs rather than the existing VMWare (share, reservation & limit) or Xen (weight &
cap) parameters. Our MDRAs would react to changing conditions, redefining resource
entitlements each time two VMs perform a trade. This both eliminates the need for operator
intervention and optimizes resource utilization. In effect we propose replacement of the
existing static parameters with dynamic MDRAs. Changes to the scheduler may be required
so that entitlements (calculated based on the VM’s MDRA) can be monitored while a VM
process is consuming CPU or some other resource.
Decentralization
In our barter economy each VM is autonomous - managing its own resource requirements
and only trading as and when it identifies an additional need for resource. This eliminates the
centralization (and continuous) monitoring of RP resources. While some ‘agent’ may be
deployed to search the RP for complementary VMs with which to trade, this agent would
only need to be deployed on a per-trade basis. Another agent may also be required to arbitrate
the trade but again, only on a per-trade basis.
‘Live’ Trading
In the same way as VM migrations can be performed on a ‘live’ basis, the ideal would be that
VMs do not have to pause / stop while trading their resources.
Process Flow
The resource allocation and scheduling processes are intrinsically related but considered as
separate entities for the purposes of our research.
VM entitlement (calculated based on the MDRA) will be read by the scheduler and VM
requests positioned on the physical CPU (pCPU) queue accordingly. At the highest level, the
full VM operation will be a continuous 3-step process:
1. Allocation (Search, Trade, MDRA, Calculation of VM entitlement for the scheduler)
2. Scheduling
3. SLA Monitoring – to identify the need for more resource. SLAs will be defined using
the MDRA as a basis
Related Work
Bartering
In 2003 Fu et al. [3] designed SHARP, a framework for secure, distributed resource
management using a bartering system. They specifically examined the need for security in a
system which spanned across trust domains (i.e. external networks). Our work is constrained
to a local resource pool – eliminating the need for similar trust agreements.
Feldman et al. [4] initially proposed a system where VMs receive a share of resources in
proportion to a bid they make during an auction. The ‘auction’ takes place intermittently
(with a regular time interval). This system later became Tycoon [5]. Our work is
differentiated in that our research is not auction-based i.e. VMs dictate the interval between
trades based on an identified need for additional resource. Theoretically the interval in our
system is extended by the increased accuracy of the MDRA which results from a trade. In
addition, an auction system assumes knowledge of ALL available resources (i.e. a 1VM ->
many resources) whereas the relationship in our system is 1 -> 1, each VM only needing to
identify resources belonging to another VM.
Most interesting in the work performed by Brent N. Chun et al. [6] is expression, translation
and enforcement of a resource’s value both to a particular VM and to the system as a whole.
Without a normalising currency in a barter economy (e.g. £, €, $), the valuation method by
which x network is traded with y CPU becomes critical. To successfully trade, a value must
be assigned to both resources being traded by the VMs while also accounting for system-wide
supply and demand. We agree with Chun et al. where they propose that the cluster / RP is the
optimal environment in which to deploy a bartering economy because prior provisioning
decisions have already placed the VMs in that particular configuration.
Mancinelli et al. [7] modelled the resource characteristics of an application / VM and
execution (host) environments so that reasoning could be performed on ‘compatibility’ and
‘goodness’. Compatibility may be viewed for our purposes as the notion that sufficient
resources are available on the host to supply the VM’s demand while goodness examines the
best way to adapt the available resources to suit the application being serviced by the VM –
i.e. definition of our MDRA.
Wood et al. [17, 18] designed a model which predicts the virtualization overhead required by
a host and the likely resources required by an application being provisioned on that host.
They close the paper expressing the intention to investigate:
1. “How [sic] these modelling techniques can be used to predict the aggregate resource
requirements of virtual machines co-located [sic] on a single host and to
2. Determine [sic] when an application’s resource requirements are likely to exceed the
virtual system’s capacity”.
Their work in useful in that we wish to:
a) Facilitate a new VM requesting access to a RP by trading with the existing VMs for
the resources it requires, ideally resulting in complementary VMs being co-located
b) Include continuous monitoring of SLA compliance - analogous to (2) above
Scheduling
“For any serious VM deployment, the platform will need to give users control over the
scheduling parameters and provide flexible mechanisms that allow a wide variety of resource
allocation policies.” Cherkasova et al. [8]
Rosu et al. [9] examine resource re-allocation for real time systems when the requirement for
additional resource is identified. Due to the systems being real-time (e.g. radar) they focus on
designing a model (and metrics) to analyse the time from identification of a need to the time
the application has settled back into normal operation. Their approach centralizes the
resource management function rather than assigning responsibility to the application / VM.
Interestingly, they distinguish between different deficit catalysts i.e. where conditions
changed (in the application workload itself or in resources being used by other applications)
to create the resource deficit in the application under analysis. Their metrics are similar to the
settling time and steady-state error metrics in [10].
Stankovic et al. [11] first proposed a dynamic feedback-driven proportion allocator which
monitors each request at the CPU as it proceeds and reduces its allocation as it begins to use
more than its fair share of the available time-slices. Previous efforts required the operator to
provide reservation parameters. This dynamic feed-back monitoring eliminates both:

Priority-based scheduling problems such as starvation, priority inversion, and lack of
fine-grain allocation

The prerequisite for the operator to ‘guess’ the required CPU reservation for the
application.
Their work contributes in so far as our scheduler will monitor the VM’s MDRAs – adjusting
access to the CPU accordingly.
Cherkasova et al. [12] compared the three Xen schedulers. In particular they tested
performance (scheduler errors) issues relating to the contention for CPU resource between
VMs. Stage et al. [13] designed a network aware scheduling system which took different
workloads on the host into account. Our work is complementary to both in that we are also
interested in solutions for resource contention.
Hu at al. [14] propose a genetic algorithm to identify the optimum load balance. They include
historical workload data and system variations as parameters to the algorithm which
examines the affect a variety of possible allocations will have on resource balance ahead of
the actual allocation. They are motivated in part, as we are, by the effort to reduce the number
of VM migrations required to find resources i.e. the effort to optimize is performed at source
as an alternative to migrating to a remote host with surplus resources.
VMWare vSphere Distributed Resources Scheduler (DRS [21]) creates cluster-based
resource pools and by continuously monitoring storage, CPU and RAM utilization, can
allocate available resources automatically (if configured to do so) based on pre-defined
policies that reflect business needs and priorities i.e. SLAs. VMWare vMotion must be
available and configured to enable this functionality as VM migrations (provided by
vMotion) are the process which facilitate the load balancing activity. If vMotion is not
available DRS will make recommendations which must then be manually performed by the
operator. DRS continuously monitors resource usage but only ‘kicks in’ when 2 specific
conditions are met:
1. There is a load imbalance i.e. a host is close to maximum CPU or memory utilization
while another host has available resources
2. A VM is not receiving its resource entitlement (share, reservation or limit)
Distributed Power Management (DPM) is also included in the DRS package i.e. servers can
be switched off automatically to save power at times when resource requirements are reduced
e.g. at night. VM affinity can also be configured in DRS such that all attempts are made to
keep VMs (which are communicating amongst themselves on the same host) co-located. If
co-location is not possible, these VMs should, at least, be attached to a common physical
switch.
The DRS software is a centralized analysis system which continuously monitors the available
resources and balances the load within the cluster according to the individual requirements of
the VMs.
Service Level Agreements (SLAs)
There are numerous resource allocation schemes for managing VMs in a DC. In particular, SLA@SOI
has completed extensive research in recent years in the area of SLA-focused (e.g. CPU, memory,
location, isolation, hardware redundancy level) VM allocation and re-provisioning [15].
Bouchard et al. [19] developed an architecture for SLA-based resource management,
concluding that runtime monitoring of resource usage data is the key to providing the most
granular SLAs possible. The underlying concept is that VMs are assigned to the most
appropriate hosts in the DC according to both the service level and DC objectives. Hysera et
al. [16] suggest that a provisioning scheme which also includes energy constraints may
choose to violate user-based SLAs ‘if the financial penalty for doing so was [sic] less than the
cost of the power required to meet the agreement’.
Ejarque et al. [20] designed a virtual resource management framework which is based on
compliance with SLAs. The design reacts almost instantaneously to impending SLA
violations and searches for the additional resources required to maintain compliance. Their
work complements ours in that the system prefers to find resources locally rather than
migrating the VM to another host.
Conclusion
The central question is whether or not the improvement gained in provisioning and resource
management / scheduling with a barter economy is sufficient to warrant the overhead
required to create the economy in the first place? The gain is worth it if the economy can be
run with minimal effect on existing resources and all resources (not just those available to the
VM economy) are used more efficiently than in existing systems.
It is proposed that an economic (mathematical) model be initially designed to encapsulate the
concepts discussed herein and simulation / test would subsequently be performed using the
Xen virtualized environment. Comparison of the existing Xen environment would prove our
hypothesis - that autonomous VMs running in a barter economy can manage their resources,
provisioning / scheduling as efficiently (if not more) than existing, centralized resource
management systems. In doing so, we intend to show that more flexible and accurate SLAs
can be defined for DC clients and the number of migrations required to balance the resource
load can also be reduced.
Appendices
Appendix A: MDRA Visualization
Changing resource requirements are multi-dimensional i.e. they have a spatial and a temporal
aspect. For example, if the resources under consideration were CPU, network and memory, a
3D parallelogram (parallelopiped) [Figure A1] could be created from each vector and the
volume of each calculated using the absolute determinant of the 3 * 3 matrix. While
visualization work would take place off-host due to the resources required to perform the
calculations, the actual comparisons between requirements would be performed on-host using
the scalar values i.e. the vectors lengths.
Figure A1 – The truth table for possible resource variations in time visualized using a parallelepiped which
encapsulates 3 parameters (CPU, network and memory).
Appendix B: Xen Scheduling Algorithm
Each CPU manages a local run queue of runnable VCPUs. This queue is sorted by VCPU
priority. A VCPU's priority can be one of two values: over or under representing whether
this VCPU has or hasn't yet exceeded its fair share of CPU resource in the ongoing
accounting period. When inserting a VCPU onto a run queue, it is put after all other VCPUs
of equal priority to it.
As a VCPU runs, it consumes credits. Every so often, a system-wide accounting thread recomputes how many credits each active VM has earned and bumps the credits. Negative
credits imply a priority of over. Until a VCPU consumes its allotted credits, its priority is
under.
On each CPU, at every scheduling decision (when a VCPU blocks, yields, completes its time
slice, or is awaken), the next VCPU to run is picked off the head of the run queue. The
scheduling decision is the common path of the scheduler and is therefore designed to be light
weight and efficient. No accounting takes place in this code path.
When a CPU doesn't find a VCPU of priority under on its local run queue, it will look on
other CPUs for one. This load balancing guarantees each VM receives its fair share of CPU
resources system-wide. Before a CPU goes idle, it will look on other CPUs to find any
runnable VCPU. This guarantees that no CPU idles when there is runnable work in the
system.
Appendix C: Barter Flow Chart
References
[1] Beloglazov, A., Abawajy J., Buyya, R. 2011. Energy-Aware Resource Allocation
Heuristics for Efficient Management of Data Centers for Cloud Computing. Future
Generation Computer Systems. Elsevier Science. Amsterdam. The Netherlands.
[2] X. Pu, L. Liu, Y. Mei, S. Sivathanu, Y. Koh, C. Pu. Understanding Performance
Interference of I/O Workload in Virtualized Cloud Environments. In Intl. Conf. on Cloud
Computing, 2010.
[3] Y. Fu, J. Chase, B. Chun, S. Schwab, A. Vahdat, SHARP: An Architecture for Secure
Resource Peering, ACM SIGOPS Operating Systems Review 37 (5) (2003) 133_148.
[4] M. Feldman, K. Lai, and L. Zhang,. The Proportional-Share Allocation Market for
Computational Resources. IEEE Transactions on Parallel and Distributed Systems, vol. 20,
pp. 1075–1088, 2009.
[5] Lai, K., Huberman, B.A., Fine, L. Tycoon: A Distributed Market-Based Resource
Allocation System. Technical Report, HP Labs, PaloAlto, CA, USA (2004) April
[6] Brent Chun and David Culler, Market-Based Proportional Resource Sharing for Clusters,
Technical Report, University of California, Berkeley, September 1999.
[7] Mancinelli, F., Inverardi, P.: A resource model for adaptable applications. In: SEAMS
’06. Proceedings of the 2006 international workshop on Self-adaptation and self-managing
systems, pp. 9–15. ACM Press, New York (2006)
[8]. L. Cherkasova, D. Gupta, and A. Vahdat. When Virtual is Harder than Real: Resource
Allocation Challenges in Virtual Machine-Based IT Environments. Technical Report HPL2007-25, HP Laboratories Palo Alto, Feb. 2007.
[9] D. Rosu, K. Schwan, S. Yalamanchili and R. Jha. On Adaptive Resource Allocation for
Complex Real-Time Applications. 18th IEEE Real-Time Systems Symposium, Dec., 1997.
[10] C.Lu, J.A. Stankovic, T.Abdelzaher, G.Tao, S.H. Son, M. Marley. Performance
Specification and Metrics for Adaptive Real-Time Systems. In IEEE Real-Time Systems
Symposium, Orlando, Florida, November 2000.
[11] David C. Steere, A. Goel, J. Gruenberg, D. McNamee, C. Pu, J. Walpole. A FeedbackDriven Proportion Allocator for Real-Rate Scheduling. 3rd Symposium on Operating Systems
Design and Implementation, Feb 1999.
[12] Cherkasova, L.; Gupta, D. & Vahdat, A. Comparison of the Three CPU Schedulers in
Xen, SIGMETRICS Perform. Eval. Rev., ACM, 2007, 35, 42-51.
[13] A. Stage and T. Setzer, Network-Aware Migration Control and Scheduling of
Differentiated Virtual Machine Workloads, in Proc. CLOUD’09. IEEE Computer Society,
2009, pp. 9–14.
[14] Jinhua Hu, Jianhua Gu, Guofei Sun, and Tianhai Zhao,. A Scheduling Strategy on Load
Balancing of Virtual Machine Resources in Cloud Computing Environment. In 3rd
International Symposium on Parallel Architectures, Algorithms and Programming, pp. 89-96.
[15] C. Hyser, B. McKee, R. Gardner, and B. Watson. Autonomic Virtual Machine
Placement in the Data Center. Technical Report HPL-2007-189, HP Laboratories, Feb. 2008.
[16] F. Checconi, T. Cucinotta, and M. Stein. Real-Time Issues in Live Migration of Virtual
Machines. In Euro-Par 2009–Parallel Processing Workshops, pages 454–466. Springer, 2010.
[17] T. Wood, L. Cherkasova, K. M. Ozonat, and P. J. Shenoy. Profiling and Modelling
Resource Usage of Virtualized Applications. In Val´erie Issarny and Richard E. Schantz,
editors, Middleware, volume 5346 of Lecture Notes in Computer Science, pages 366–387.
Springer, 2008.
[18] Wood, Timothy; Cherkasova, Ludmila; Ozonat, Kivanc; Shenoy, Prashant. Predicting
Application Resource Requirements in Virtual Environments. HP Laboratories, Technical
Report HPL-2008-122, 2008.
[19] L.-O. Burchard, M. Hovestadt, O. Kao, A. Keller, and B. Linnert. The Virtual Resource
Manager: An Architecture for SLA-Aware Resource Management. Proc. Fourth IEEE/ACM
Int’l Symp. Cluster Computing and the Grid (CCGrid ’04), 2004.
[20] J. Ejarque, M. de Palol, ´I. Goiri, F. Juli`a, J. Guitart, R. Badia, and J. Torres. SLADriven Semantically-Enhanced Dynamic Resource Allocator for Virtualized Service
Providers. In Proceedings of the 2008 Fourth IEEE International Conference on eScience.
IEEE Computer Society Washington, DC, USA, 2008, pp. 8–15.
[21] VMWare, What’s New in VMware vSphere 5.1 – Technical White Paper V2.0, June
2012
Download