full text PDF

advertisement
Harmonia: Tenant-Provider Cooperation for Work-Conserving
Bandwidth Guarantees
Frederick Douglas, Lucian Popa, Sujata Banerjee, Praveen Yalagandula, Jayaram
Mudigonda, Matthew Caesar
Hewlett Packard Labs
HPE-2016-16
Keyword(s):
Cloud bandwidth guarantees; multi-path networks
Abstract:
In current cloud datacenters, a tenant’s share of the network bandwidth is determined by
congestion control interactions with other tenants’ traffic. This can lead to unfair bandwidth
allocation, and unpredictable application performance. Performance isolation can be achieved
by providing bandwidth guarantees for each tenant. However, since networks are lightly-loaded
on average and providers do not want to waste network resources, a guarantee scheme should
be work conserving – i.e., fully utilize the network, even when some tenants are not active.
Unfortunately, the work-conserving guarantee schemes proposed thus far suffer from two
problems that have hindered their deployment: (a) they are complex and incur a significant
overhead to the provider; (b) they have not been designed or tested with multi-path networks,
even though all modern cloud datacenter networks have multiple paths and, as we will show,
bandwidth guarantees must be split across multiple paths to fully utilize the network. To address
these challenges, we propose a radical approach that departs from the belief that in a cloud with
bandwidth guarantees, the provider alone should be in charge of spare bandwidth allocation.
Unlike prior proposals, we empower tenants to control the work-conserving bandwidth, just as
today, while the provider only has to enforce bandwidth guarantees as an additional service.
Our solution, Harmonia, relies on two key ingredients: (a) we separate bandwidth guaranteed
traffic from work-conserving traffic through the use of priority queues, and (b) we use MPTCP to
apportion traffic (at packet granularity) to be guaranteed or work conserving, as well as to split
guarantees across multiple paths.
External Posting Date: February 19, 2016 [Fulltext]
Internal Posting Date: February 19, 2016 [Fulltext]
 Copyright 2016 Hewlett Packard Enterprise Development LP
Harmonia: Tenant-Provider Cooperation for Work-Conserving
Bandwidth Guarantees
Frederick Douglas (UIUC), Lucian Popa (Databricks), Sujata Banerjee (HP Labs),
Praveen Yalagandula (Avi Networks), Jayaram Mudigonda (Google), Matthew Caesar (UIUC)
ABSTRACT
CPU and memory, and making the cloud more attractive for
network-constrained applications.
However, slicing cloud networks into statically guaranteed shares is insufficient. Average load in datacenter networks is typically low [5, 20] and clouds provide good average network performance – albeit with no lower bound
on the worst case. Thus, the ability to be work-conserving
(i.e., to redistribute idle bandwidth to the active flows) is
crucial; perhaps more so even than guarantees. Therefore,
cloud providers cannot offer just strict bandwidth guarantees; they must offer work-conserving bandwidth guarantees
(from now on WCBG), where tenants exceed their guarantees when there is free capacity [11, 14, 15, 17].
Unfortunately, providing WCBG is a much harder problem than providing non-work-conserving guarantees. Existing solutions for offering WCBG suffer from two problems
that have hindered their deployment.
The first problem with existing solutions is that they are
complex, with high overhead. To be work-conserving, current solutions implement complex adaptive algorithms [11,
15, 17]. Moreover, to be efficient and satisfy guarantees, these algorithms must operate on very small timescales (milliseconds), resulting in high overhead.
The second problem is that none of the existing methods
for sharing bandwidth in the cloud have been designed for or
tested with multi-path networks, even though (a) almost all
modern cloud datacenter networks have multiple paths and
(b) reserving bandwidth guarantees across multiple paths is
necessary to fully utilize the network, as we will show later
in this paper (§3). We will argue that support for multiple
paths cannot be easily retrofitted onto existing solutions. In
fact, the problem of offering WCBG is so difficult that most
approaches only provide guarantees when congestion occurs
at the edge of the network [11, 17].
These reasons tip the balance for cloud providers, such as
Amazon, Microsoft, IBM or HP, into not offering the guaranteed bandwidth service, even after numerous efforts aimed
at this goal.
This paper proposes a radical lightweight approach that
will make it much easier for providers to offer WCBG on
today’s networks. Our solution combines three key ideas:
In current cloud datacenters, a tenant’s share of the network
bandwidth is determined by congestion control interactions
with other tenants’ traffic. This can lead to unfair bandwidth
allocation, and unpredictable application performance. Performance isolation can be achieved by providing bandwidth
guarantees for each tenant. However, since networks are
lightly-loaded on average and providers do not want to waste
network resources, a guarantee scheme should be work conserving – i.e., fully utilize the network, even when some
tenants are not active. Unfortunately, the work-conserving
guarantee schemes proposed thus far suffer from two problems that have hindered their deployment: (a) they are complex and incur a significant overhead to the provider; (b) they
have not been designed or tested with multi-path networks,
even though all modern cloud datacenter networks have multiple paths and, as we will show, bandwidth guarantees must
be split across multiple paths to fully utilize the network.
To address these challenges, we propose a radical approach
that departs from the belief that in a cloud with bandwidth
guarantees, the provider alone should be in charge of spare
bandwidth allocation. Unlike prior proposals, we empower
tenants to control the work-conserving bandwidth, just as today, while the provider only has to enforce bandwidth guarantees as an additional service. Our solution, Harmonia,
relies on two key ingredients: (a) we separate bandwidthguaranteed traffic from work-conserving traffic through the
use of priority queues, and (b) we use MPTCP to apportion
traffic (at packet granularity) to be guaranteed or work conserving, as well as to split guarantees across multiple paths.
1.
INTRODUCTION
Modern cloud datacenter networks are mostly free-foralls. A tenant’s share of the network bandwidth is determined by the interaction of its VMs’ congestion control algorithms with the unpredictable and wildly varying network
usage of other tenants. This can lead to unfair bandwidth allocation and unpredictable application performance [3,4,11,
12, 14, 15, 17, 18].
Providing bandwidth guarantees in the cloud would lead
to predictable lower bounds on the performance of tenant
applications [3, 4, 8, 11, 14, 15, 17]. Bandwidth guarantees
would thus be an excellent addition to today’s cloud offerings, putting the network on par with other resources such as
1. We fully decouple the provision of guarantees from the
goal of fully utilizing the network. Unlike previous
proposals for achieving WCBG, e.g., [11, 14, 15, 17],
1
pre-installed, such that users don’t need to install it themselves.) Furthermore, we stress that installing MPTCP is the
only change that users need to make to their VMs and applications: MPTCP handles the splitting of traffic onto the
guaranteed and work-conserving paths, while presenting the
application with what appears to be a single TCP connection, through the standard sockets API. Note that MPTCP
has a dual role in our design, both enabling the split between
priority queues (guaranteed vs. non guaranteed traffic), as
well as across multiple physical paths.
work conservation in Harmonia is achieved just as it
is today – by tenants’ congestion control algorithms
filling all available capacity, rather than by providers’
control processes constantly tweaking bandwidth allocations. This decoupling is achieved by separating a
flow into ‘guaranteed’ and ‘work-conserving’ portions,
with the guaranteed portion rate-limited and directed to
higher-priority queues in the network. This separation
should be transparent to the tenant.
2. We make multi-pathing a primary design decision. In
§3 we demonstrate that in a multi-path topology – such
as a fat tree – bandwidth reservation schemes that cannot split a single VM-to-VM reservation across multiple paths are fundamentally inefficient. Intuitively,
there are situations where single path schemes are forced
to duplicate one reservation onto two links, whereas
a multi-path scheme could safely split the reservation
load over the two links.
2. PRIOR WORK ON BANDWIDTH GUARANTEES IN THE CLOUD
There is a tradeoff between the benefits of offering bandwidth guarantees and the overhead to the cloud providers.
With existing proposals, the overhead dominates; therefore,
cloud providers do not offer bandwidth guarantees. We now
discuss why existing proposals are so expensive.
Gatekeeper [17] and EyeQ [11] require datacenters with
fully provisioned network cores; otherwise, they cannot provide guarantees. However, all cloud datacenters that we are
aware of are oversubscribed, and congestion is known to typically occur in the core [5, 20]. Moreover, oversubscription
may never disappear: many providers consider spending the
money to build networks with higher bisection bandwidth to
be wasteful, since average utilization is low.
FairCloud [14] (the PS-P allocation scheme) can be very
expensive because it requires a large number of hardware
queues on each switch, on the order of the number of hosted
VMs. Current commodity switches provide only a few queues.
Finally, ElasticSwitch [15] has a significant CPU overhead, requiring an extra core per VM in the worst case, and
one core per 15 VMs on average.
Further, as we show in the next section, prior solutions are
not well suited to run on multi-path networks. This is problematic, since most datacenter networks today have multiple
paths. Moreover, trying to retrofit support for multiple paths
on these solutions would greatly increase their already large
overhead.
3. To make the previous two ideas transparent to the user,
and accommodate packet reordering, uneven congestion across paths and failed links, we propose the use
of a multi-path transport protocol inside tenant VMs.
In our prototype, we use MPTCP [16]. MPTCP is
a mature standard beginning to see serious adoption
(e.g., Apple iOS 7). In the future, other such protocols
could be used instead of MPTCP.
In a nutshell, our solution works as follows. We completely separate guaranteed traffic from work-conserving traffic1 by using priority queuing: guaranteed traffic is forwarded
using a queue with higher priority compared to the workconserving, opportunistic, traffic. We fully expose this decoupling at the level of tenant VMs, through multiple virtual
network interfaces: one interface’s traffic is given a high priority but is rate-limited to enforce guarantees, while the others are not rate-limited, but get a lower priority.
We spread all the guaranteed traffic uniformly across all
paths. We make sure that the guaranteed traffic will not
congest any link by relying on prior work for non-workconserving guarantees, such as Oktopus [3] or a subset of
ElasticSwitch [15]. At the same time, to achieve work conservation, we allow tenants to send lower priority traffic at
will on all paths. Tenants who are content with non-workconserving guarantees, or who want only best-effort traffic
without any guarantees, can operate unmodified from today.
However, we expect the tenants that want to achieve full
WCBG to replace TCP with MPTCP [16] in their VMs. As
MPTCP is a transparent, drop-in replacement for TCP, and
very easy to install (just an apt-get on Ubuntu, for example), we do not believe this to be unduly demanding. (In
fact we expect providers to offer VM images with MPTCP
3. RESERVING BANDWIDTH ON MULTIPATH NETWORKS
In this section, we aim to answer the following question:
how can bandwidth guarantees be reserved efficiently in a
multi-path datacenter network? We first show that singlepath reservations are not efficient (§3.1). Next, we describe
efficient multi-path reservations, and show that existing solutions are hard to adapt for such reservations (§3.2).
We consider only symmetric, tree-like topologies, such as
the fat tree (folded Clos) and derivatives, e.g., [1,7]. All current and in-progress datacenter topologies we are aware of
belong to this category. We leave the study of other topologies – random [19] and small world [21] graphs, topologies
containing servers [9, 10], etc. – to future work.
1
We refer to the traffic sent by a tenant up to its bandwidth guarantee as guaranteed traffic, and that above its bandwidth guarantee as
work-conserving traffic.
2
BwminA
BwminB
A
…
B
BwminX
X
VMs
Figure 1: Hose model example
We focus on the most popular model for offering bandwidth guarantees, the hose model [3, 4, 11, 14, 15, 17]. Fig. 1
shows an example of the hose model, which provides the
abstraction of a single switch that connects all the VMs of
one tenant with dedicated links. The capacity of each virtual link represents the minimum bandwidth guaranteed to
the VM connected to that link.
Most operators today use multiple paths by dynamically
load balancing flows along different paths, e.g., [2, 7]. However, each flow is still physically routed along a single path.
When providing bandwidth guarantees on multi-path networks, cloud providers can start with a similar approach
and always route each VM-to-VM flow across a single path.
When this path is selected at the time of VM placement, we
call this approach static VM-to-VM routing. When it can
be updated during runtime, we call it dynamic VM-to-VM
routing.
Unfortunately, reserving bandwidth based on single-path
VM-to-VM routing (either static or dynamic) is fundamentally inefficient and cannot effectively use the network infrastructure, as it leads to duplicate reservations: when each
VM-to-VM reservation uses a single path, one VM’s hosemodel guarantee might need to be duplicated across multiple
paths, as we will show. Dynamically reserving VM-to-VM
guarantees on individual paths instead of statically pinning
them to a path could potentially reduce the degree of inefficiency, but would be extremely complex and difficult. We
discuss these issues in more detail next and show that for
multi-path networks, bandwidth guarantees should be split
across multiple paths.
Figure 2: Example showing that reserving bandwidth
guarantees using static single path routing between VMs
is inefficient. The red and green boxes represent servers
that host VMs that request guarantees. The VMs on a
fully shaded server request a total of 1Gbps. Colored
lines represent proposed link reservations.
tity RL on any such link L. Any reservation higher than RL
would be wasteful.
The definition of an efficient reservation can be generalized to multi-path networks by replacing the single link L
with a cut in the network topology. If a set of links S forms
a cut, an efficient reservation would reserve on S a total of
R, with R analogous to the single-path case.
We now present an example topology and guarantee request for which no efficient single-path reservation exists.
Example of single-path inefficiency: Fig. 2 shows (half
of) a quaternary fat tree [1]. All links are 1 Gbps, which
makes this a fully-subscribed network (it is easy to derive
similar examples for oversubscribed topologies). There are
two tenants in this example, depicted as red and green. Each
tenant requests a hose model guarantee as in Fig. 1. The sum
of the hose model bandwidth guarantees requested by the
VMs located on each server is proportional to the fraction
of the server that is shaded; VMs on a fully shaded sever
request a total of 1Gbps.2 For example, the red VMs on H6
request 333Mbps. Now let’s try to find an efficient singlepath static reservation.
To begin, consider only the servers H1, H5 and H8 belonging to the red tenant in Fig. 2. The efficient way to connect these VMs in the hose model is to fully reserve a tree
embedded in the physical topology. Any other approach requires more reservations. There are four such trees to choose
from; without loss of generality we choose the one marked
with solid red lines.
Now, consider the green tenant with servers H3 and H7.
3.1 Single-Path Reservations are Wasteful
To show that single path reservations are not efficient, we
define the notion of an efficient reservation. On single-path
networks (such as single-rooted trees), reserving bandwidth
guarantees for the hose model on a link L requires reserving the minimum of the summed guarantees of the nodes in
the two halves of the network connected by L [3, 4, 15]. For
example, in Fig. 1, assume VMs A and B are on one side
of L, and all other VMs are on the other. In this case, the
bandwidth reserved on L should be RL = min(BwminA +
BwminB , BwminC + . . . + BwminX ). R is the maximum
traffic that can be sent on L under the hose model, and we define an efficient reservation as reserving exactly such a quan-
2
Note that the network does indeed have enough capacity to sustain
this reservation load. To see this, imagine aggregating the capacity
of the parallel links of the multi-path tree into a single-path tree.
Then map all hose models’ virtual switches onto the core switch.
3
Given the tree we just reserved for the red tenant, the efficient reservation must go through A4; we choose the path
marked with the solid green line.
Finally, consider host H6. Satisfying the hose model for
the red tenant means satisfying any hose-model compliant
communication pattern. For instance, VMs in H5 could send
at 1 Gbps to H1 while VMs in H6 send 333 Mbps to H8.
Thus, H6 must have a reservation to H8 that does not intersect H5’s reservation to H1. Since E3-A3 is fully reserved,
the reservation to H8 must go through A4, as shown by the
dotted red line. However, A4-E4 is fully reserved by green.
Therefore, this request pattern cannot be satisfied by static
single-path reservations.3
Fundamentally, single path reservations are inefficient because reserving different VM-to-VM flows for a single VM
across different paths can lead to duplicate bandwidth reservations. This can be unavoidable when not all reservations
fit in the capacity of a single link. For example, in Fig 2, any
single-path reservation for the red tenant would be inefficient
(irrespective of the green tenant).
In trying to use single-path routing while providing bandwidth guarantees, providers can go beyond static reservations and dynamically reserve bandwidth. Specifically, when
a new VM-to-VM communication starts, a centralized load
balancer for guarantees may dynamically assign a path to
this communication. This approach suffers from the same
fundamental limitation described before (e.g., assume each
shaded component in Fig. 2 is a single VM), however, when
the number of VMs is large it could mitigate its effect. Note
that the guarantee load balancer needs to be centralized, since
it must also perform admission control, and needs to know
the reservation status on each link that is going to be used by
the reservation. However, this approach is significantly more
complex than Hedera [2], an already complex and high overhead load-balancing system. Essentially, a dynamic bandwidth reservation system must implement: Hedera’s load
balancing algorithm [2] + Oktopus’s admission control algorithm [3] + ElasticSwitch guarantee partitioning [15] (or
its equivalent in Oktopus). This would lead to an extremely
complex and heavyweight system that would not even be
work-conserving! Making such a system work-conserving
would be an even more challenging task, due to the scale
and high level of dynamicity of the traffic that needs to be
centrally coordinated.
Work!conserving interfaces
VM
Interface with
bandwidth guarantees
Rate!limiter
Hypervisor
High!priority queues
Low!priority queues
Figure 3: Overview of Harmonia’s architecture.
as the one presented in Fig. 2, we can apply the Oktopus [3]
algorithm on the conceptual single-path tree that results from
merging links on the same level.
For example, for the red tenant in Fig. 2 we would have the
following reservations: 0.5 Gbps on links E1-A1 and E1-A2,
0.25 Gbps on all links between the core and the aggregation
switches, 0.5 Gbps on E4-A3 and E4-A4, and 1.33
2 Gbps on
E3-A3 and E3-A4.
Since efficient bandwidth reservations require multiple paths,
a VM pair communicating with a single TCP/UDP flow must
have its flow split across multiple paths. Unfortunately, none
of the prior proposals for providing WCBG is easily adaptable to such reservations.
For example, we can try applying prior works designed
for single-path reservations – e.g. ElasticSwitch [15] – to
multi-path reservations by scattering packets uniformly on
all paths. However, this approach becomes infeasible when
there are failures in the network. In this case, uniformly scattering packets no longer works, because the network is no
longer symmetric, and some links will become more loaded/congested than others [6, 16]. Thus, a complex mechanism must be used to keep each link uncongested. This
mechanism must essentially implement something like
MPTCP [16], or like ElasticSwitch’s Rate Allocation [15]
on each path. For providers, implementing such an approach
inside hypervisors is a very complex and resource intensive
task. An alternative could be to rely on new hardware support [6], but this is not currently available.
4. Harmonia
Our solution, Harmonia, addresses both of the concerns
we have raised: it significantly reduces the overhead for
providers to offer WCBG compared to previous approaches,
and it works on multi-path networks.
At a high level, Harmonia separates out the guaranteed
traffic from the work-conserving traffic by assigning them to
separate priority queues in the network switches. In addition,
both of these traffic types utilize all paths available to them.
Fig. 3 presents an overview of our solution’s architecture,
which can be summarized by four high level architectural
features that we describe next.
3.2 Multi-Path Reservations
We have shown that single-path reservations are inefficient. Now we discuss how to create and use efficient multipath reservations.
For symmetric tree-like multi-path topologies, there is an
easy method for efficient reservations: reserve bandwidth
uniformly across all paths. For example, on a fat tree, such
3
As a final note, we add that this proof remains valid no matter what
non-zero bandwidth guarantee H6 requests – we chose 333 Mbps
as an arbitrary example.
4
(1) Separate VM interfaces for guaranteed and work
conserving traffic: Harmonia exposes to each VM multiple virtual network interfaces. The main interface (solid blue
line in Fig. 3) is associated with guaranteed network bandwidth.4 The other interfaces are for sending work-conserving
traffic, i.e.,the volume of traffic that exceeds the VM’s bandwidth guarantee. This traffic is best-effort; any or all packets
may be dropped. The traffic splitting between the guaranteed and work conserving interfaces is handled by MPTCP
– as far as the application knows, its communications are just
standard, single-path TCP connections.
(2) Rate Limiting in hypervisors to enforce guarantees:
The high priority paths are only intended for satisfying guarantees; tenants cannot be allowed to use them at will. Therefore, Harmonia must rate-limit the traffic on these paths to
enforce the guarantees. For this purpose we can use any
non-work-conserving solution, such as the centralized Oktopus [3], or the distributed Guarantee Partitioning of ElasticSwitch [15]. Both of these solutions are implemented entirely in the hypervisor, and work with commodity switches.
Providing non-work-conserving guarantees in the hypervisor is much simpler than providing WCBG, because of
the different timescale on which rate-limiters must be updated. When providing WCBG, rate-limiters must be updated whenever conditions change on any link inside the
network, while for non-work-conserving guarantees, ratelimiters need only be updated when traffic demands change
at a given VM.
(3) Priority queuing in network switches: Most commodity network switches are equipped with a small number
of static priority queues. We utilize two of these queues in
Harmonia to handle guaranteed and work-conserving traffic
separately. Inside each network entity, the guaranteed traffic
is forwarded using a higher priority queue, while the workconserving traffic is forwarded using lower priority queue.
Thus, work-conserving traffic is forwarded only when a link
is not fully utilized by guaranteed traffic. Since, our system
uses only two priority queues on each switch, we do not expect this to be a barrier towards adoption.
(4) Multi-path Support: We need to both (a) utilize
our multi-path reservations, and (b) allow tenants to use the
spare capacity on all paths. To achieve (a), we scatter the
guaranteed traffic uniformly across all the available paths.
Since this traffic is rate-limited and the offered guarantees
do not oversubscribe the network capacity, the guaranteed
traffic should never congest any link in the network. Packet
scattering has been shown to perform well when congestion
is low [6, 16]. Then, to achieve (b) – i.e., to fully utilize
any spare bandwidth – we expose all physical paths to each
VM using “work-conserving interfaces” (one for each path).
The packets on these interfaces will be tagged as lower pri-
ority. (Again, note that users will not use these paths explicitly; MPTCP on top of multiple interfaces presents itself as
a vanilla TCP implementation through the sockets API.)
For example, consider a multi-rooted fat-tree topology in
a datacenter with 8 core switches; this results in maximum 8
independent shortest paths between two VMs. In our current
architecture, each VM has 3 interfaces exposed to it: a main
interface G, and two additional interfaces W1 and W2 for the
work conserving traffic. When a VM sends packets from its
G interface to the G interface of another VM, it will appear
as if they are connected to the conceptual dedicated switch of
the hose model, with link bandwidths set to the bandwidth
guarantee, as shown in Fig. 1. Under the hood, this traffic
is rate-limited, scattered across all the available paths, and
forwarded using the high priority queues.
The other 8 combinations of interfaces, i.e., between a G
and a W interface or between two W interfaces, are each
mapped to the low priority queue
√ on one of the 8 paths.
Thus, we need to expose at least ⌊ N ⌋ + 1 virtual interfaces
to each VM, where N is the maximum number of distinct
paths between two VMs.
To fully utilize all these paths and thus gain work conservation in addition to their guarantees, tenants must use
a multi-path protocol, such as MPTCP [16]. We also envision providers implementing simple forms of multi-path
UDP protocols; otherwise, each UDP flow can only be sent
on one of these paths, either as part of the guaranteed bandwidth, or on one of the work-conserving paths.
To route along multiple paths, we use different VLANs,
similar to SPAIN [13]. Each VLAN is rooted at a core switch.
Each core switch has two VLANs, at two different priority levels (i.e., one for guaranteed traffic and one for workconserving traffic). Thus, if the network has 8 core switches,
Harmonia requires 16 VLANs. The capabilities of modern
switches should easily satisfy this requirement5.
A kernel module in the hypervisor dynamically tags each
packet with a VLAN. Traffic between two guaranteed interfaces are scattered uniformly across all high priority
VLANs, while each work-conserving path is consistently
mapped to the same lower priority VLAN.
Harmonia copes well with failures. The rate-limit for the
guaranteed traffic of each VM is divided uniformly across
all paths. When a link L fails, only the VMs that are routed
through L are affected; however, no link in the network can
be congested by the traffic with guarantees, which means
packet scattering still provides good results. Therefore, all
the other VMs that are not forwarded through L are not affected by L’s failure.
Tenants who do not care about WCBG could join a Harmonia datacenter without being exposed to our MPTCP-and5
Standard 802.1Q supports 4094 VLANs, allowing our design to
scale to over 2,000 core switches. A typical datacenter fat tree
might have 48 switches, which would correspond to 7 virtual network interfaces in our design. A theoretical 2000-ary fat tree would
support 2 billion hosts.
4
More precisely, rather than having one purely guaranteed interface, we discriminate among pairs of interfaces: only the subflow
from the sender’s first interface to the receiver’s first interface is
mapped to a guaranteed, high-priority VLAN.
5
60
One core CPU overhead (%)
ElasticSwitch
40
30
20
10
0
Figure 4: Harmonia provides bandwidth guarantees in
the presence of malicious users. A tenant has a 225Mbps
hose-model bandwidth guarantee across its VMs. The
tenant is able to grab the entire bandwidth when there
are no other users and his guarantee is respected even
when he is using TCP and many other users blast UDP
traffic on all interfaces.
0
50
100
150
200
250
No. of VM-to-VM flows
Figure 5: CPU Overhead: Harmonia vs. ElasticSwitch.
tion, the TCP flows fully utilized the available capacity, competing evenly on top of the guarantees. When UDP flows
were added, both the UDP and TCP receivers’ guarantees
were honored, giving them both an average total throughput of roughly 225Mbps. We repeated the experiment with a
varying number of UDP senders, to ensure that a large number of active flows would not detract from a single flow’s
guarantee.
Harmonia adds very little CPU overhead to servers it runs
on, even when managing many flows. We compare Harmonia’s CPU overhead with ElasticSwitch’s overhead by repeating the ElasticSwitch overhead experiment [15]. This
experiment starts a number of long flows, and measures the
average CPU utilization of the WCBG system on one machine. Results are shown in Fig. 5.
Fig. 5 shows that Harmonia markedly improves upon the
overhead of ElasticSwitch. Whereas ElasticSwitch quickly
becomes a large burden on its servers as more flows are
started, Harmonia maintains a fairly small footprint: it stays
under 10% of one core on an Intel Xeon X3370 CPU while
ElasticSwitch reaches 50%.
virtual-interfaces setup: their single normal interface can be
treated either as a guaranteed one, giving them non-workconserving guarantees, or as a work-conserving one, giving
them traditional best-effort service. If the best-effort tenants
are getting an unacceptably low share of the non-guaranteed
bandwidth, providers could switch to three priority queues:
(0) guaranteed traffic, (1) traffic of best-effort tenants, and
(2) work-conserving traffic of tenants with guarantees. (Note
that queues 0 and 2 are the ones discussed up to now.)
5.
Harmonia
50
EVALUATION
We have implemented a prototype of Harmonia and we
have tested it on a small multi-path datacenter testbed with
36 servers. Our topology is similar to the one in Fig. 2; we
use 3 core switches and 6 racks, each with 6 servers. To
implement non-work-conserving bandwidth guarantees (and
rate-limit the guaranteed interfaces), we reuse part of ElasticSwitch [15] (more specifically the Guarantee Partitioning
algorithm).
Our goals for this evaluation are to show that our solution: (1) provides guarantees on multi-path networks, (2)
is work-conserving on multi-path networks and (3) has low
overhead.
To demonstrate the first two goals, we constructed the following experiment. We selected 6 servers in one rack (under
a single ToR switch), and ran 2 VMs on each server: one
receiving a single TCP flow, and the other receiving multiple full-blast UDP flows on all possible paths. All senders
were located on servers in other racks. Each VM was given
a 225Mbps bandwidth guarantee. This fat tree topology had
3 core switches, making a total of 3Gbps available to the 12
receiving VMs; we leave 10% of the capacity unreserved,
hence the 225Mbps guarntees.
Fig. 4 depicts the throughput of the TCP flows as we increase the number of UDP senders. With no UDP competi-
6. SUMMARY
We close with two key design points about Harmonia. (1)
Harmonia splits the task of bandwidth management between
providers and tenants: the provider handles traffic up to the
bandwidth guarantee, while the tenant handles traffic above
it. This simplifies providers’ efforts to deploy guarantees.
We note that our solution is transparent to tenants, which
do not need to change their applications but only need to
use VMs with MPTCP installed (a drop-in replacement for
standard TCP implementations).
(2) This is the first work to consider multi-path reservations for work-conserving bandwidth guarantees in cloud
datacenters. We showed that single path reservations are inefficient, and existing proposals for bandwidth guarantees
are not easily adaptable to multi-path reservations.
6
Using experiments in a real datacenter, we showed preliminary results indicating that Harmonia indeed achieves its
goals.
7.
Computing. In SIGCOMM. ACM, 2013.
[16] C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh,
D. Wischik, and M. Handley. Improving Datacenter
Performance and Robustness with Multipath TCP. In
ACM SIGCOMM, 2011.
[17] H. Rodrigues, J. R. Santos, Y. Turner, P. Soares, and
D. Guedes. Gatekeeper: Supporting bandwidth
guarantees for multi-tenant datacenter networks. In
USENIX WIOV, 2011.
[18] A. Shieh, S. Kandula, A. Greenberg, C. Kim, and
B. Saha. Sharing the Data Center Network. In Usenix
NSDI, 2011.
[19] A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey.
Jellyfish: Networking Data Centers Randomly. In
USENIX NSDI, 2012.
[20] Srikanth K and Sudipta Sengupta and Albert
Greenberg and Parveen Patel and Ronnie Chaiken.
The Nature of Datacenter Traffic: Measurements &
Analysis. In IMC. ACM, 2009.
[21] G. Wang, D. G. Andersen, M. Kaminsky,
K. Papagiannaki, T. S. E. Ng, M. Kozuch, and M. P.
Ryan. c-Through: Part-time optics in data centers. In
SIGCOMM. ACM, 2010.
REFERENCES
[1] M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable,
commodity data center network architecture. In
SIGCOMM. ACM, 2008.
[2] M. Al-Fares, S. Radhakrishnan, B. Raghavan,
N. Huang, and A. Vahdat. Hedera: Dynamic Flow
Scheduling for Data Center Networks. In NSDI, 2010.
[3] H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron.
Towards Predictable Datacenter Networks. In ACM
SIGCOMM, 2011.
[4] H. Ballani, K. Jang, T. Karagiannis, C. Kim,
D. Gunawardena, et al. Chatty Tenants and the Cloud
Network Sharing Problem. NSDI’13.
[5] T. Benson, A. Akella, and D. A. Maltz. Network
traffic characteristics of data centers in the wild. In
IMC. ACM, 2010.
[6] A. A. Dixit, P. Prakash, Y. C. Hu, and R. R. Kompella.
On the impact of packet spraying in data center
networks. In INFOCOM. IEEE, 2013.
[7] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula,
C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and
S. Sengupta. VL2: A Scalable and Flexible Data
Center Network. ACM SIGCOMM, 2009.
[8] C. Guo, G. Lu, H. J. Wang, S. Yang, C. Kong, P. Sun,
W. Wu, and Y. Zhang. Secondnet: a data center
network virtualization architecture with bandwidth
guarantees. In CoNEXT. ACM, 2010.
[9] C. Guo and G. Lu et al. BCube: A High Performance,
Server-centric Network Architecture for Modular Data
Centers. ACM SIGCOMM, 2009.
[10] C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu.
Dcell: A Scalable and Fault-tolerant Network
Structure for Data Centers. In SIGCOMM, 2008.
[11] V. Jeyakumar, M. Alizadeh, D. Mazières,
B. Prabhakar, C. Kim, and A. Greenberg. EyeQ:
Practical Network Performance Isolation at the Edge.
In USENIX NSDI, 2013.
[12] T. Lam, S. Radhakrishnan, A. Vahdat, and
G. Varghese. NetShare: Virtualizing Data Center
Networks across Services. UCSD TR, 2010.
[13] J. Mudigonda, P. Yalagandula, M. Al-Fares, and J. C.
Mogul. SPAIN: COTS data-center Ethernet for
multipathing over arbitrary topologies. In USENIX
NSDI, 2010.
[14] L. Popa, G. Kumar, M. Chowdhury,
A. Krishnamurthy, S. Ratnasamy, and I. Stoica.
FairCloud: Sharing the Network in Cloud Computing.
In ACM SIGCOMM, 2012.
[15] L. Popa, P. Yalagandula, S. Banerjee, J. C. Mogul,
Y. Turner, and J. R. Santos. ElasticSwitch: Practical
Work-Conserving Bandwidth Guarantees for Cloud
7
Download