GeoMig: Online Multiple VM Live Migration Flavio Esposito Walter Cerroni

advertisement
GeoMig: Online Multiple VM Live Migration
Flavio Esposito
Walter Cerroni
Advanced Technology Group
Exegy Inc., St. Louis, MO
DEI - University of Bologna, Italy
Abstract—The Cloud computing paradigm enables innovative
and disruptive services by allowing enterprises to lease computing, storage and network resources from physical infrastructure
owners, to offer a persistently available service. This shift
in infrastructure management responsibility has brought new
revenue models and new challenges to Cloud providers. One of
those challenges is to efficiently migrate multiple virtual machines
(VMs) within the hosting infrastructure, since these migrations
are often required to be “live”, i.e., without noticeable service
interruptions. In this paper we propose a geometric programming
model and an online multi-VM live migration algorithm based on
such model. The goal of the geometric program is to minimize the
total migration time via optimal bit-rate assignments. By solving
our geometric program we gained qualitative and quantitative
insights into the design of efficient solutions for multi-VM live
migrations. We found that transferring merely a few rounds
of dirty memory pages are enough to significantly lower the
total migration time. We also demonstrated that, under realistic
settings, the proposed method converges sharply to an optimal
bit-rate assignment, making our approach a viable solution for
improving current live-migration implementations.
I. I NTRODUCTION
Resource virtualization is one of the key technologies for
an efficient deployment of Cloud computing services [1].
Decoupling service instances from the underlying processing,
storage and communication hardware allows flexible deployment of any application on any server within any data center,
independently of the specific operating system and platform
being used. In particular, the use of virtual machines (VMs) to
implement end-user services enables flexible workload management operations, such as server consolidation and multitenant isolation [2]. Live migration is an additional feature that
allows VMs to move from one host to another, with minimal
disruption to end-user service availability [3].
Most of the (virtual) services hosted today within a VM
are based on multi-tier applications [4]. Typical examples
include front-end, business logic and back-end tiers of ecommerce services, or clustered MapReduce computing environments. Joint deployment of multiple correlated VMs is also
envisioned by the emerging Network Function Virtualization
paradigm, which is radically changing the way network operators plan to develop and build future network infrastructures,
adopting a Cloud-oriented approach [5]. In general, a Cloud
customer should be considered as a tenant running multiple
VMs which could be either strictly or loosely correlated, and
exchange significant amounts of traffic. Therefore, moving a
given tenant’s workload often means migrating a group of
VMs, as well as the virtual networks used to interconnect
them. On the other hand, migrating a set of tenants for physical
machine maintenance requires moving multiple virtual machines running uncorrelated processes.
Related Work. Although existing Cloud products do offer a
range of solutions for joint management of multiple VMs running multi-tier applications [6], they do not allow simultaneous
VM live migration, do not consider potential future failures,
or do not optimize the migration bandwidth allocated to
each memory migration round, leading to unnecessarily longer
service disruption times. After the seminal work on livemigration [3], a significant number of implementations and
research efforts were carried out with the idea of moving VMs
with minimal service interruption [2], [7]–[9]. Most of the
existing work deals with single VM migration; few solutions
however considered the issue of migrating groups of correlated VMs, such as those executing multi-tier applications. In
particular, VMFlockMS [10] focuses on the migration of large
VM disk images across different data centers. Differently from
our work, this approach is intended mainly for non-live VM
migrations and the optimal allocation of inter-data center link
bandwidth is not considered. Ye et al. [11] experimentally
attempted to assess the role of different resource reservation
techniques and migration strategies on the live-migration of
multiple VMs. Kikuchi et al. [12] investigated the performance
of concurrent live-migrations in both VM consolidation and
dispersion experiments. Differently from our approach which
considers an optimal bit-rate allocation, these solutions do
not capture the significant impact of network resources on
live-migration performance. Other implementation-based studies were carried out to experiment with simultaneous livemigration under different assumptions and pursuing different
objectives [13]–[16] but they do not optimize bandwidth
allocation for each memory transfer round. To our knowledge,
the only online algorithm designed for VM live migration was
proposed in [17], where VM placement heuristics consider
server workload, VM performance degradation and energy but
do not capture network topology and bandwidth allocation for
server interconnection, as we do in our approach. Furthermore,
their migration model does not capture memory dirtying rates.
Our Contribution. In this paper, we dissect the impact of
the multi-VM live migration downtime and propose GeoMig,
a randomized online algorithm that uses a novel geometric
program to optimize each memory transfer round during the
migration of multiple VMs. We derive GeoMig’s competitive
ratio from known k-server problem results. We first propose
a geometric program [18] to optimize a set of bandwidth
2
allocation variables, each of them representing the allocated
bandwidth of a given migration round. As a consequence, the
optimal bandwidth values minimize the migration latency. In
contrast to earlier contributions in this area, e.g. [17], our
model also captures the multi-commodity flow costs of the
underlying physical network hosting the memory transfers.
Our model provides a quantitative analysis of the memory precopy phase, and the resulting service downtime. GeoMig’s
objective aims at limiting two conflicting metrics: the service
interruption (downtime), and the time of VM memory precopy. We quantify the tussle between the downtime and the
pre-copy in Section IV. GeoMig’s objective also controls the
iterative transfer algorithm used in live migration.
With our evaluation, we are able to address qualitative
design hypothesis, such as the stop-and-copy procedure always
increases the migration time when transferring multiple VMs,
and quantitative questions, such as how many dirty page transferring rounds are necessary to minimize the total migration
time? We also empirically demonstrate that, under realistic
settings, the proposed geometric program converges sharply
to an optimal bit-rate assignment solution, making it a viable
contribution to develop advanced Cloud management tools.
Paper Organization. The rest of the paper is organized as
follows: we introduce our optimization model in Section II
and GeoMig algorithm in Section III; we then discuss performance and other insights in Section IV; finally, Section V
concludes our work.
II. O PTIMAL M IGRATION OF M ULTIPLE VM S
•
•
•
•
•
•
•
•
Vmem,j is the memory size of the j-th VM to be migrated,
∀j ∈ J = {1, . . . , M };
Di,j is the memory dirtying rate during round i in the
push (i.e., pre-copy) phase of the j-th VM;
Vi,j is the amount of dirty memory of VM j copied during
round i;
Ti,j is the time needed to transfer Vi,j ;
nj is the number of rounds in the push phase of VM j;
R is the total bit-rate of the migration channel;
ri,j is the channel bit-rate reserved in round i to migrate
VM j;
τdown is the VM maximum allowed downtime.
The equations that rule the live-migration process of VM j
are [20]:
V0,j = Vmem,j = r0,j T0,j
(1)
Vi,j = Di−1,j Ti−1,j = ri,j Ti,j
i ∈ Ij = {1, . . . , nj } (2)
The last transfer round, i.e. the stop-and-copy phase, starts
as soon as i reaches the smallest value nj such that:
Vnj ,j = Dnj −1,j Tnj −1,j ≤ rnj ,j τdown
(3)
Writing equations (1) and (2) recursively results in:
T0,j =
Vmem,j
r0,j
Ti,j =
i
Vi,j
Vmem,j Y Dh−1,j
=
ri,j
r0,j
rh,j
(4)
i ∈ Ij
(5)
h=1
In this section we present our optimization model for
multiple VM migrations. We start with a quantitative analysis
of the live-migration process, then we introduce the optimal
formulation of migration rate allocation, and finally we discuss the geometric programming approach used to solve the
problem.
Note how Equation 5 is a posynomial function [21]. To
simplify the formulation of the optimization model, we fix
the size of the problem to be solved, i.e., we assume a fixed
number of rounds n̄ common to all VM migrations:
A. Optimization Model
Depending on the choice of n̄, the assumed number of rounds
can be sufficient or not to reach the stop condition. This means
that, for a given set of input parameters (namely Vmem,j , Di,j ,
and R), the chosen value of n̄ could be different from the
smallest which satisfies inequality (3). Under this assumption
we are able to quantify the behavior of the pre-copy algorithm
using n̄ as a control parameter and understand the role of the
number of rounds on live-migration performance.
From equations (4) and (5) we can compute two objective
functions when n̄ rounds are executed in the push phase: the
total pre-copy time, computed as the sum of the duration of
the push phase of each VM:
!
M
n̄−1
i
X
XY
Vmem,j
Dh−1,j
1+
(7)
Tpre (n̄, r) =
r0,j
rh,j
j=1
i=1
The two key parameters which quantify the performance of
VM live-migration are the downtime and the total migration
time. These two quantities tend to have opposite behaviors, and
should be carefully balanced. In fact, the downtime measures
the impact of the migration on the end-user’s perceived quality
of service, whereas the total migration time measures the
impact on the Cloud network infrastructure. The effect of
a generic pre-copy algorithm on VM migration timings has
already been modeled, starting with the simple case of a
single VM [19]. Then the same model has been generalized,
extending it to multiple VMs and evaluating how the performance parameters depend on VMs migration scheduling and
mutual interactions when providing services to the end-user
[20]. In this section we leverage the aforementioned multiple
VMs migration performance model to formulate a geometric
programming optimization methodology, whose goal is to find
a tradeoff between downtime and migration time.
We adopt the following parameter definitions:
• M is the number of VMs to be migrated in a batch;
nj = n̄,
Ij = I = {1, . . . , n̄}
∀j ∈ J
(6)
h=1
and the total downtime, computed as the sum of the duration
of the stop-and-copy phase of each VM:
Tdown (n̄, r) =
M
n̄
X
Vmem,j Y Dh−1,j
r0,j
rh,j
j=1
h=1
(8)
3
The expression in (7), which increases with the number of
rounds n̄, quantifies the amount of time required to bring all
migrating VMs to the stop-and-copy phase. On the other hand,
the second objective function in (8) measures the period during
which any VM is inactive, so it is suitable to quantify the
downtime of the services provided by the migrating VMs. This
expression does not include the duration of the resume phase,
which typically has a fixed value determined by technology.
If the allocated migration bit-rate ri,j is always greater than
the dirtying rate Di−1,j , the expression in (8) decreases when
n̄ increases. Therefore, with these two objective functions we
intend to capture the opposite trend of migration time and
downtime.
At each round i of the migration of each VM j, given
Vmem,j and Di−1,j , we wish to allocate the bit-rates ri,j so
that a combination of the two objective functions
Tmig (n̄, r) = Cpre Tpre (n̄, r) + Cdown Tdown (n̄, r)
(9)
which we call the total migration time, is minimized.1 Given a
flow network, that is, a directed graph G = (V, E) with source
s ∈ V and sink t ∈ V, where edge (u, v) ∈ E has capacity
C(u, v) > 0, flow fi,j (u, v) ≥ 0 is the number of bits flowing
over edge (u, v) for migration round i and VM j. Formally,
we have the following geometric program:
min Tmig (n̄, r)
(10a)
ri,j
s. t. Di−1,j < ri,j
M
X
ri,j ≤ R
∀i ∈ I ∀j ∈ J
(10b)
∀i ∈ I
(10c)
j=1
M
X
fi,j (u, v) ≤ C(u, v) ∀ (u, v) ∈ V ∀i
(10d)
j=1
X
∀i ∀j ∀ u 6= si,j , ti,j
fi,j (u, w) = 0
(10e)
w∈V
fi,j (u, v) = −fi,j (v, u) ∀i ∀j ∀ (u, v) ∈ V
(10f)
X
X
fi,j (si,j , w) =
fl (w, ti,j ) = di,j ∀ si,j , ti,j
w∈V
w∈V
(10g)
Di,j ≥ 0
∀i ∈ I ∀j ∈ J
ri,j > 0 ∀i ∈ I ∀j ∈ J
(10h)
(10i)
The page dirtying rate constraints (10b) ensure that the
VM migration is sustainable, i.e., during the entire migration
process, we can transfer (consume) memory faster than the
memory to be transferred is produced. The set of constraints
(10c) ensure that at each memory transfer round we do not
allocate more bit-rate than it is at any time available (R).
Equations (10d)-(10g) are the flow conservation constraints
and ensure that the net flow on each physical link is zero,
except for the source si,j and the destination ti,j , and that each
1 Note
that, in case of a single VM migration (M = 1) and choosing
Cpre = Cdown = 1, the expression in (9) gives exactly the time needed to
perform the push and stop-and-copy phases, i.e., the actual VM transfer time.
flow demand di,j , i.e., the bit-rate associated to the migration
of VM j at during round i, is guaranteed on each physical
link along the path from si,j to ti,j .
III. O NLINE M ULTI -VM L IVE - MIGRATION WITH GeoMig
So far we have shown how minimizing the migration time
depends on the bandwidth allocation for each dirty memory
page transfer. In this section, we argue that the destination
physical machine and the route that leads to it are also
important aspects to consider when designing a multi-VM livemigration algorithm. We consider a realistic (online) setting in
which the migration algorithm is unaware of the future multiVM migration requests, since the application behavior or the
network failures are unknown.
Historically, the performance of online algorithms have been
evaluated using the competitive analysis, where the online
algorithm is compared to the optimal offline algorithm that
knows in advance the (migration) request sequence. Given an
input sequence σ, if we denote the cost incurred by the online
algorithm A as CA (σ), and the cost incurred by the optimal
offline algorithm opt as Copt , then A is called c-competitive
if there exist a constant a such that CA (σ) ≤ c · Copt + a
for all request sequences σ. The factor c is called competitive
ratio of A. In this section, we present GeoMig, a multi-VM
live-migration online algorithm that leverages our geometric
program (10), and we show that the following competitive
ratio inequality holds: c ≤ 54 M · 2M − 2M , where M is the
number of VMs to be simultaneously migrated.
GeoMig Intuition. It is perhaps counterintuitive at first that
a greedy algorithm which migrates in an online fashion a set
of VMs to their closest physical machines, i.e., with smallest
cost, does not minimize the total migration cost over time. The
migration cost may be minimal if we consider a single (multiVM) live-migration request, but it may not be for a sequence
of multiple migration requests. Note that we do not propose
to migrate a set of VMs to a generic random point, but to a
random point in the set of feasible space of hosting machines.
For example, all physical machines reachable within a given
delay, or in a given geolocation within a data-center.
Example 1. (Knowing future failures would help). Assume
that a VM which has to migrate from a source physical
node (PN) S experiences the lowest cost when moving to
a PN D1 , with a downtime (cost) of 5 seconds, but a less
preferred path would lead the VM to a destination physical
machine D2 with a downtime of 6 seconds. Assume also that
immediately after the migration to D1 is complete, PN D1
becomes unavailable. The VM needs to migrate again e.g.,
going back to PN S, hence experiencing another 5 seconds
of downtime. The total downtime over the longer time period
is hence cost(S → D1 ) + cost(D1 → S) = 10 seconds, but
it would have been merely cost(S → D2 ) = 6 seconds if we
had known about the subsequent D1 failure.
For the reasons explained in Example 1, it is well known
that, in general, randomized algorithms may improve the competitive ratio of online algorithms. This is because, intuitively,
4
an adaptive adversary cannot predict the online algorithm
moves as they are unknown.
Migrating k-servers. We observe that, given a physical network, the problem of selecting the lowest cost hosting physical
machines to migrate multiple VMs to k = M distinct physical
machines can be reduced from the k-server problem. The kserver problem is that of planning the movement of k servers
on the vertices of a graph G under a sequence of requests.
Each request consists of the name of a vertex, and is satisfied
by placing a server at the requested vertex (migrating the VM
to one of the k servers). The requests must be satisfied in
their order of occurrence. The cost of satisfying a sequence
of requests is the distance moved by the servers. We map
the cost function of moving servers in the Euclidean space to
the precomputed pairwise migration costs resulting from our
geometric program (10).
Based on the above observations, we designed GeoMig
leveraging Harmonic [22] — a known randomized algorithm
designed for the online k-server problem. The reduction from
the k-server problem led us to the following results:
Proposition III.1. Let R be a randomized online algorithm
that manages the migration problem of multiple VMs to k
distinct servers in any physical network, modeled as an undirected graph. Then the competitive ratio against an adaptive
online adversary is CR aon ≥ k.
Proof. (sketch) In the subsection Migrating k-servers we
have informally described the reduction from the online kserver to the online migration to k servers. The result is then
an immediate Corollary of Theorem 13.7 of [23].
Theorem III.1. Given k potentially hosting physical machines, the competitive ratio of GeoMig is c ≤ 54 k · 2k − 2k.
th
Proof. (sketch) Let di be the distance between the i server
managed by the Harmonic algorithm to the requested point,
for 1 ≤ i ≤ k. The Harmonic algorithm [22] chooses
independently from the past j th server with probability pj =
1/di
Pk
. Because Grove [24] showed that the competitive
j=1 1/dj
ratio of the Harmonic algorithm problem is bounded by
5k
k
4 · 2 − 2k, we have the claim.
GeoMig uses randomized choices to subvert the high cost
induced by a non-oblivious adaptive adversary. Let us
denote with x the ID of the physical machine currently hosting
the VM to be migrated, and with yi the identifier of the new
candidate physical machine, where i belongs to the index set of
all feasible physical machines. Then, the destination physical
machine yi is chosen with probability:
pi =
1/d(x, yi )
,
k
X
1/d(x, yj )
(11)
j=1
where d(x, yi ) is the migration cost for moving a VM from
physical machine x to yi . The smaller the cost to migrate
a VM to physical machine yi , the higher the probability to
pick yi as a new hosting physical machine (among the k
feasible). With this randomized strategy, an adversary that
picks the VM to be migrated closest to the next failing physical
node cannot harm the overall long term service downtime,
as it cannot deterministically predict the GeoMig destination
physical machine.
IV. P ERFORMANCE E VALUATION
In this section, we first we show how the GeoMig performance compares with respect to a greedy migration strategy
heuristic, and then, by solving our geometric program across a
wide range of parameter settings, we answer a set of qualitative design questions, such as, should live-migration always
be used when transferring multiple VMs? and quantitative
questions, such as few dirty page transferring rounds are often
enough to minimize the total migration time.
A. Simulation Environment
To validate our model and gain insights into the performance
of (online) multi-VM live-migration, we built our own simulator leveraging the geometric program solver GGPLAB [25].
GGPLAB uses the primal dual interior point method [26]
to solve the convex version of the original (non linear)
geometric program. We were able to obtain similar results
across a wide range of the parameter space, but we present
only a significant representative subset that summarizes our
messages. We generate our physical network topologies using
the BRITE [27] topology generator and we were able to
obtain similar results using physical “fat-trees”, BarabasiAlbert, and Waxman connectivity models, with a wide of
physical network sizes. All our results are obtained with 95%
confidence intervals.
B. Simulation Results
(1) GeoMig improves up to 47% the cost of migrating
multiple VMs. We found that, independently from the size and
the connectivity model, in a physical network with up to 10%
of random physical node failures, a greedy alternative which
always selects the cheapest e.g., closest destination physical
machines has an higher average migration cost (Figures 1a
and 1b.) To render our results agnostic from the chosen bandwidth, we normalize our costs (migration times) dividing them
by the average hop-to-hop cost over the physical network.
(2) Independently from the migration strategy used (online
or offline), the downtime can be reduced up to two order
of magnitudes while increasing the number of transferring
rounds. In Figure 1c we show the impact of the downtime,
when migrating simultaneously 6 VMs. The maximum rate
was set to 1 Gbps, and the values of minimum migration
time are obtained solving Problem (10) when Cpre = 0 and
Cdown = 1. The size of the VMs is chosen from a gaussian
distribution with mean given in the legend and standard
deviation 300 MB. The results are shown in semi-logarithmic
scale to clarify that the downtime may diminish of two orders
of magnitude, as we increase the number of rounds. The same
two order of magnitude results were obtained sampling the
a smaller and an higher number of VMs with sizes from a
5
50
200
2
10
150
100
2500
mean VM size = 4 GB
mean VM size = 2 GB
mean VM size = 1 GB
2000
Pre-copy
Time
[s][s]
Migration
Time
100
Greedy
GeoMig
Optimal
Downtime [s]
150
3
10
250
Greedy
GeoMig
Optimal
Normalized Migration Cost
Normalized Migration Cost
200
1
10
0
10
50
1500
10
20
30
40
Live−Migrated VMs
50
0
(a) Barabasi-Albert (100 ph. machines).
10
20
30
40
Live−Migrated VMs
500
10
50
B
1000
−1
0
mean VM size = 4 GB
mean VM size = 2 GB
mean VM size = 1 GB
0
0
2
4
6
8
Memory Transferred [rounds]
10
(c) Tdown impact.
(b) Waxman (200 ph. machines).
0
2
4
6
8
Memory Transferred [rounds]
10
(d) Pre-copy saturation.
Fig. 1. (a-b) The Geomig online multi-VM live-migration strategy has a lower (normalized) migration cost than a Greedy heuristics that always select the
destination physical machines with the lowest cost. Costs were computed using our geometric problem on a physical network following (a) Barabasi-Albert
and (b) Waxman connectivity model. (c) Downtime reduced of 2 order of magnitude with only a few more rounds. (d) The pre-copy time reaches a saturation
point after a few dirty page transferring rounds.
2
600
500
400
300
200
100
0
0.6
0.5
10
mean VM size = 1 GB
mean VM size = 2 GB
mean VM size = 4 GB
4 GB after 10 iterations
50
Main VM memory only
Main VM + 5 dirty pages (rounds)
Main VM + 10 dirty pages (rounds)
40
0.4
0.3
nj=0
1
Migration Time [s]
Migration Time [s]
700
Convergence Time of GP [s]
mean VM size = 4 GB
mean VM size = 2 GB
mean VM size = 1 GB
Duality Gap
0.7
800
30
20
10
nj=1
nj=3
nj=5
0
10
n =7
j
0.2
nj=9
10
0.1
−1
10
0
2
4
6
8
Memory Transferred [rounds]
(a)
10
0
−2
0
2
4
6
8
Memory Transferred [rounds]
10
(b)
0 0
10
1
Iterations
(c)
10
0
0.2
0.4
0.6
Objective Weight α
0.8
1
(d)
Fig. 2. The downtime significantly decreases when increasing the number of rounds: (a) Impact of total migration time during the live-migration of 6 VMs.
(b) Convergence time of the geometric program solved with an iterative method. (c) After only 10 iterations, the primal-dual interior point method that solves
our geometric program finds rates very close to the optimal. (d) Impact of the weight α in the “tradeoff” objective function α · Tpre +(1 − α)Tdown .
uniform and a bimodal distribution where 20% of the time
the sampled VM has size 10 times bigger (results not shown).
Although a diminishing delay is expected when the number of allowed rounds increases, quantifying such downtime
decrease is important to gain insights on how to tune or rearchitect the QEMU-KVM hypervisor migration functionalities. For example, delay-sensitive applications like online gaming or high-frequency trading may require memory migrations
with the maximum possible number of transferring rounds.
For bandwidth-sensitive applications instead, e.g., peer-to-peer
applications, it may be preferable to reduce the migration bitrate, tolerating a longer service interruption as opposed to a
prolonged phase with a lower bit-rate. We have also tested
our model with different values of maximum bit-rate, as well
as different dirtying rates, but we did not find any significant
qualitative difference and hence we omit such results.
(3) The total migration time improvement diminishes as we
increase the number of transferring rounds. This diminishing
effect is a direct consequence of the diminishing duration of
each subsequent transferring round. This result gives insights
on the importance on allocating enough bandwidth, to guarantee a given quality of service to VMs running memoryintensive applications or with high dirtying rate.
The values of minimum migration time were obtained
solving GeoMig (and so Problem 10) when Cpre = 1 and
Cdown = 0. In Figure 1d, the impact of the pre-copy time
only is computed for 3 VMs at a maximum available rate of
0.5 Gbps. The size of the 6 VMs is chosen from a gaussian
distribution with mean as shown in the legend and standard
deviation 300 MB.
(4) Few transferring rounds are enough to minimize the total
migration time. In this experiment, we evaluate GeoMig with
a full posynomial function representing the total migration
time Tmig (n̄, r), i.e., when both downtime and pre-copy time
have equal weight Cdown = Cpre = 1 in the geometric
program objective function. Note that, even after the change of
variables, the posynomial function Tmig (n̄, r) is a summation
of two non-linear functions, so we should not expect to see
the total migration time as merely a summation of the two
values obtained solving separately the two subproblems with
Cdown = 0 and Cpre = 0, respectively. For this experiment,
we selected the results with 6 VMs chosen from a uniform
distribution with average size indicated by the legend, and
standard deviation of 300 MB (Figure 2a); similar results were
obtained with VM size chosen from a gaussian and a bimodal
distribution.
(5) GeoMig has a rapid convergence time and near optimal
values of bit-rate are reached after only few iterations. To the
total migration time we also have to add the convergence time
of the iterative algorithm which solves the geometric program.
If this time is too high, it may be inconvenient to wait for an
optimal solution. In Figure 2b and 2c we show that this is not
6
the case: the GeoMig convergence time is bounded by roughly
60 ms per round using a machine with standard processing
power (Intel core i3 CPU at 1.4 GHz and 4 GB).
(6) The stop-and-copy procedure always increases the migration time when transferring multiple VMs. To gain additional
insights on the impact of the two different components of the
total migration time, we solve GeoMig with different weights
on the two objective function coefficients, the pre-copy time
and the downtime. In particular, we set Cpre = α where
α ∈ [0, 1] and Cdown = (1 − α), and assess the impact of
the migration time for different transferring rounds n̄ as we
increase the value of α from 0 to 1 (Figure 2d.) nj = n̄ is a
parameter of the figure, while α represents how much weight
we assign to the pre-copy phase during the bit-rate assignment.
Migrating the VMs with a single stop-and-copy phase
means setting n̄ = nj = 0. In this case, the total migration
time is equivalent to the pre-copy time, which is equivalent to
the downtime. As soon as the number of rounds increases
(parameter n̄ > 0) the total migration time drops. This
suggests that we should never transfer (multiple) VMs with
merely a stop-and-copy procedure.
(7) Increasing the number of transferring rounds does not help
after the first few rounds. Increasing the number of rounds
leads to a diminishing marginal improvement in the total
migration time as shown by the different gradients (or slopes)
of the curves with n̄ > 0 in Figure 2d. Another interesting
observation from Figure 2d is that, as we assign more weight
to the pre-copy time component of the objective function, the
resulting total migration time increases.
V. C ONCLUSIONS
In this paper we studied the live-migration of multiple virtual machines, a fundamental problem in networked Cloud service management. First, we built a live-migration model, and a
geometric programming formulation that, when solved, returns
the minimum total migration time by optimally allocating the
bit-rates across the multiple VMs to be migrated. Then we
proposed a novel randomized online multi-VM live-migration
algorithm, GeoMig, and showed that the competitive ratio of
GeoMig is c ≤ 54 k · 2k − 2k, where k is the number of
potentially hosting physical machines which satisfy the set
of migration constraints given by our geometric program. The
optimization problem used by GeoMig aims at simultaneously
limiting both the service interruption (downtime), and the
time of VM pre-copy, along with a proper control of the
iterative memory copying algorithm used in live-migration.
By solving the geometric program across a wide range of
parameter settings, we were able to answer qualitative and
quantitative design questions. We have also shown that, under
realistic settings, the proposed geometric program converges
sharply to an optimal bit-rate assignment, making it a viable
and useful solution for improving the current live-migration
implementations.
R EFERENCES
[1] R. Buyya, J. Broberg, and A. M. Goscinski, Eds., Cloud Computing:
Principles and Paradigms. New York: Wiley, 2011.
[2] V. Medina and J. M. Garcı́a, “A survey of migration mechanisms of
virtual machines,” ACM Comp. Surveys, Jan 2014.
[3] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach,
I. Pratt, and A. Warfield, “Live migration of virtual machines,” in
Proceedings of the 2nd USENIX Symposium on Networked Systems
Design & Implementation (NSDI), Boston, MA, May 2005.
[4] X. Wang, Z. Du, Y. Chen, and S. Li, “Virtualization-based autonomic
resource management for multi-tier web applications in shared data
center,” Journal of Systems and Software, vol. 81, no. 9, pp. 1591–1608,
September 2008.
[5] “Network Functions Virtualisation (NFV): Network operator perspectives on industry progress,” ETSI White Paper, The European Telecommunications Standards Institute, 2013.
[6] S.
J.
Bigelow.
(2014,
Nov.)
How
can
an
enterprise
benefit
from
using
a
VMware
vApp?
[Online].
Available:
http://searchvmware.techtarget.com/answer/
How-can-an-enterprise-benefit-from-using-a-VMware-vApp
[7] D. Kapil, E. Pilli, and R. Joshi, “Live virtual machine migration
techniques: Survey and research challenges,” in Advance Computing
Conference (IACC), 2013 IEEE 3rd International, Feb 2013, pp. 963–
969.
[8] S. Acharya and D. A. D’Mello, “A taxonomy of live virtual machine
(VM) migration mechanisms in cloud computing environment,” in Proc.
of ICGCE, Chennai, India, Dec. 2013.
[9] R. Boutaba, Q. Zhang, and M. F. Zhani, “Virtual machine migration:
Benefits, challenges and approaches,” in Communication Infrastructures for Cloud Computing: Design and Applications, H. Mouftah and
B. Kantarci, Eds. IGI-Global, 2013.
[10] S. Al-Kiswany, D. Subhraveti, P. Sarkar, and M. Ripeanu, “VMFlock:
Virtual machine co-migration for the cloud,” in Proceedings of the
20th International ACM Symposium on High-Performance Parallel and
Distributed Computing (HPDC ’11), San Jose, CA, June 2011.
[11] K. Ye, X. Jiang, D. Huang, J. Chen, and B. Wang, “Live migration of
multiple virtual machines with resource reservation in cloud computing
environments,” in Proceedings of IEEE International Conference on
Cloud Computing (CLOUD 2011), July 2011.
[12] S. Kikuchi and Y. Matsumoto, “Performance modeling of concurrent live
migration operations in cloud computing systems using PRISM probabilistic model checker,” in Proceedings of the 4th IEEE International
Conference on Cloud Computing, (CLOUD 2011), Washington, DC, July
2011.
[13] U. Deshpande, X. Wang, and K. Gopalan, “Live gang migration of
virtual machines,” in Proceedings of the 20th International ACM Symposium on High Performance Parallel and Distributed Computing (HPDC
’11), San Jose, CA, June 2011.
[14] U. Deshpande et al., “Inter-rack live migration of multiple virtual
machines,” in Proc. of VTDC, Delft, The Netherlands, June 2012.
[15] K. Ye, X. Jiang, R. Ma, and F. Yan, “VC-migration: Live migration of
virtual clusters in the cloud,” in Proceedings of the 13th ACM/IEEE
International Conference on Grid Computing (GRID 2012), Beijing,
China, September 2012.
[16] E. Keller, S. Ghorbani, M. Caesar, and J. Rexford, “Live migration of
an entire network (and its hosts),” in In Proc. of HotNets-XI, 2012.
[17] A. Beloglazov and R. Buyya, “Optimal online deterministic algorithms
and adaptive heuristics for energy and performance efficient dynamic
consolidation of virtual machines in cloud data centers,” Conc. and
Comp.: Practice and Experience, Sept. 2012.
[18] R. J. Duffin, E. L. Peterson, and C. M. Zener, Geometric Programming:
Theory and Application. New York: Wiley, 1967.
[19] H. Liu et al., “Performance and energy modeling for live migration of
virtual machines,” Cluster Computing, vol. 16, no. 2, June 2013.
[20] W. Cerroni and F. Callegati, “Live migration of virtual network functions
in cloud-based edge networks,” in Proc. of ICC, June 2014.
[21] S. Boyd and L. Vandenberghe, Convex Optimization.
New York:
Cambridge University Press, 2004.
[22] P. Raghavan and M. Snir, “Memory versus randomization in on-line
algorithms,” IBM Journal of Research and Development, vol. 38, no. 6,
pp. 683–707, November 1994.
[23] R. Motwani and P. Raghavan, Randomized Algorithms. New York, NY,
USA: Cambridge University Press, 1995.
[24] E. F. Grove, “The harmonic online k-server algorithm is competitive,” in
Proceedings of the Twenty-third Annual ACM Symposium on Theory of
Computing, ser. STOC ’91. New York, NY, USA: ACM, 1991, pp. 260–
266. [Online]. Available: http://doi.acm.org/10.1145/103418.103448
[25] [Online]. Available: http://stanford.edu/∼boyd/ggplab/
[26] S. J. Wright, Primal-Dual Interior-Point Methods. Philadelphia: Society
for Industrial and Applied Mathematics (SIAM), 1997.
[27] A. e. a. Medina, “BRITE: An approach to universal topology generation,” in In Proc. of MASCOT, 2001, pp. 346–.
Download