Network Sharing

advertisement
Network Sharing
The story thus far: Sharing

Omega + Mesos

How to share End-host resource


Think CPU, Memory, I/O
Different ways to share:

Fair sharing: Idealist view.


Proportional sharing: Ideal for public cloud


Get access to an amount equal to how much you pay
Priority-Deadline based sharing: Ideal for private data center.


Everyone should get equal access
Company care about completion times.
What about the network?

Isn’t this important?
Network Caring is Network Sharing

Network is import to a jobs completion time.

Default network sharing is TCP

Vague notion of fair sharing



Fairness is based on individual flows
Work-conserving
Per-Flow based sharing is biased

VMs with many flows get a greater share of the network
What is the best form of Network Sharing

Fair sharing:

Per-Source based fairness?


Per-Destination based fairness?


Map can cheat
Fairness === Bad:



Reducers cheats– many flows to one destination.
No one can predict anything.
And we like things to be prediction: we like short and predictable
latency
Min-Bandwidth Guarantees

Perfect!! But:


Implementation can lead to inefficiency
How do you predict bandwidth demands
What
is the best formUnpredictability&
of Network Sharing
Performance&

Fair sharing:

Per-Source based fairness?
 Reducers cheats– many flows to one destination.
200ms'

Per-Destination based fairness?


Map can cheat
Fairness === Bad:


No one can predict anything.
And we like things to be prediction: we like short and predictable
Mean,'
th
latency
99 'pcile'

Median'
Min-Bandwidth Guarantees

Perfect!! But:


4'Apr'2013'
Implementation can lead to inefficiency
How do you predict bandwidth demands
NSDI'2013'
6'
Conges5on&
Kills&
Predictability&
What
is
the
best
form
of
Network
Sharing
Performance&Unpredictability&

Fair sharing:

Per-Source based fairness?
 Reducers cheats– many flows to one destination.
200ms'

Per-Destination based fairness?


Map can cheat
Fairness === Bad:


No one can predict anything.
And we like things to be prediction: we like short and predictable
Mean,'
th
latency
99 'pcile'

Median'
Min-Bandwidth Guarantees

Perfect!! But:


4'Apr'2013'
Implementation can lead to inefficiency
How do you predict bandwidth demands
NSDI'2013'
6'
7'
How can you share the network?

Endhost sharing schemes


Use default TCP? Never!
Change the hypervisor!


Change the endhost’s TCP stack


Requires virtualization
Invasive and undesirably
In-Network sharing schemes

Use queues and rate-limiters


Utilize ECN


Limited enforcing to 7-8 different guarantees
Requires switches that support ECN mechanism
Other switch modifications

Expensive and highly unlikely except maybe OpenFlow.
ElasticSwitch: Practical Work-Conserving
Bandwidth Guarantees for Cloud Computing
Lucian Popa Praveen Yalagandula* Sujata Banerjee
Jeffrey C. Mogul+ Yoshio Turner Jose Renato Santos
HP Labs
* Avi Networks
+ Google
Goals
Provide Minimum Bandwidth Guarantees in Clouds
1.

Tenants can affect each other’s traffic



MapReduce jobs can affect performance of user-facing applications
Large MapReduce jobs can delay the completion of small jobs
Bandwidth guarantees offer predictable performance
Goals
1.
Provide Minimum Bandwidth Guarantees in Clouds
Hose model
Virtual (imaginary) Switch
VS
BX
X
BY
Y
BZ
Bandwidth
Guarantees
Z
VMs of one tenant
Other models based on hose model such as TAG [HotCloud’13]
Goals
Provide Minimum Bandwidth Guarantees in Clouds
Work-Conserving Allocation
1.
2.

Tenants can use spare bandwidth from unallocated or
underutilized guarantees
Goals
Provide Minimum Bandwidth Guarantees in Clouds
Work-Conserving Allocation
1.
2.


Tenants can use spare bandwidth from unallocated or
underutilized guarantees
Significantly increases performance


Average traffic is low [IMC09,IMC10]
Traffic is bursty
Goals
2.
Provide Minimum Bandwidth Guarantees in Clouds
Work-Conserving Allocation
ElasticSwitch
Bmin
X
Bmin
Y
XY bandwidth
1.
Everything reserved & used
Free capacity
Bmin
Time
Goals
1.
2.
3.
Provide Minimum Bandwidth Guarantees in Clouds
Work-Conserving Allocation
Be Practical



Topology independent: work with oversubscribed topologies
Inexpensive: per VM/per tenant queues are expensive  work
with commodity switches
Scalable: centralized controller can be bottleneck 
distributed solution

Hard to partition:VMs can cause bottlenecks anywhere in the network
Goals
1.
2.
3.
Provide Minimum Bandwidth Guarantees in Clouds
Work-Conserving Allocation
Be Practical
Prior Work
Guarantees
Workconserving
Practical
X
(fair sharing)
√
√
SecondNet [CoNEXT’10]
√
X
√
Oktopus [SIGCOMM’10]
√
X
~X (centralized)
Gatekeeper [WIOV’11],
EyeQ [NSDI’13]
√
√
X (congestion-free
core)
FairCloud (PS-P) [SIGCOMM’12]
√
√
X (queue/VM)
Hadrian [NSDI’13]
√
√
X (weighted RCP)
ElasticSwitch
√
√
√
Seawall [NDSI’11], NetShare [TR],
FairCloud (PS-L/N) [SIGCOMM’12]
Outline



Motivation and Goals
Overview
More Details



Guarantee Partitioning
Rate Allocation
Evaluation
ElasticSwitch Overview: Operates At Runtime
Tenant selects bandwidth guarantees.
Models: Hose, TAG, etc.
Oktopus [SIGCOMM’10]
Hadrian [NSDI’10]
CloudMirror [HotCLoud’13]
VMs placed, Admission Control
ensures all guarantees can be met
VM setup
ElasticSwitch
Enforce bandwidth guarantees
& Provide work-conservation
Runtime
ElasticSwitch Overview: Runs In Hypervisors
• Resides in the hypervisor of each host
• Distributed: Communicates pairwise following data flows
VM
VM
VM
Hypervisor ElasticSwitch
Network
VM
VM
Hypervisor ElasticSwitch
VM
Hypervisor ElasticSwitch
ElasticSwitch Overview: Two Layers
Guarantee Partitioning
Rate Allocation
Hypervisor
Provides Guarantees
Provides Work-conservation
ElasticSwitch Overview: Guarantee Partitioning
1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees
VM-to-VM control is necessary, coarser granularity is not enough
ElasticSwitch Overview: Guarantee Partitioning
1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees
VS
Intra-tenant
BX
BY
X
Z
Y
BXY
BZ
BXZ
VM-to-VM guarantees  bandwidths as if tenant
communicates on a physical hose network
ElasticSwitch Overview: Rate Allocation
1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees
2. Rate Allocation: uses rate limiters, increases rate between X-Y above
BXY when there is no congestion between X and Y
VS
Work-conserving allocation
Inter-tenant
BX
X
X
Hypervisor
BY
Y
BZ
Z
RateXY ≥ BXY
Limiter
Y
Hypervisor
Unreserved/Unused Capacity
ElasticSwitch Overview: Periodic Application
Guarantee Partitioning
VM-to-VM
guarantees
Rate Allocation
Hypervisor
Applied periodically and on
new VM-to-VM pairs
Demand
estimates
Applied periodically, more often
Outline



Motivation and Goals
Overview
More Details



Guarantee Partitioning
Rate Allocation
Evaluation
Guarantee Partitioning – Overview
VS1
BX
X
BQ
Y
Z
T
Q
Z
T
BXZ
Max-min allocation
X
BTY
BXY
Goals:
A. Safety – don’t violate hose model
B. Efficiency – don’t waste guarantee
C. No Starvation – don’t block traffic
Y
BQY
Q
Guarantee Partitioning – Overview
VS1
BX
X
BQ
Y
Z
T
BX = … = BQ = 100Mbps
Max-min allocation
Q
Z
66Mbps
X
T
33Mbps
33Mbps
Goals:
A. Safety – don’t violate hose model
B. Efficiency – don’t waste guarantee
C. No Starvation – don’t block traffic
Y
33Mbps
Q
Guarantee Partitioning – Operation
VS1
BX
X
BQ
Y
Z
T
Q
Z
XZ
BX
X
BXY =
XY
BX
T
XY
XY
min( BX , BY )
XY
BY
Hypervisor divides guarantee of each hosted
VM between VM-to-VM pairs in each direction
Source hypervisor uses the minimum between
the source and destination guarantees
Q
TY
BY
Y
QY
BY
Guarantee Partitioning – Safety
VS1
BX
X
BQ
Y
Z
T
Q
Z
XZ
BX
X
BXY =
XY
BX
T
XY
XY
min( BX , BY )
XY
BY
Safety: hose-model guarantees are not exceeded
Q
TY
BY
Y
QY
BY
Guarantee Partitioning – Operation
VS1
BX
X
BQ
Y
Z
T
Q
Z
BX = … = BQ = 100Mbps
X
BXY =
T
XY
XY
min( BX , BY )
Q
Y
Guarantee Partitioning – Operation
VS1
BX
X
BXY =
BQ
Y
Z
T
BX = … = BQ = 100Mbps
Q
XY
XY
min( BX , BY )
Z
XZ
BX =
X
50
XY
BX = 50
T
TY
BY =
BXY = 33
XY
BY =
33
Q
33
Y
QY
BY =
33
Guarantee Partitioning – Efficiency
VS1
BX
X
BQ
Z
Y
T
BX = … = BQ = 100Mbps
Q
Z
XZ
BX =
X
1
50
T
TY
BY =
BXY = 33
XY
BX = 50
What happens when flows have low demands?
XY
BY =
33
33
Y
QY
BY =
Q
Hypervisor divides guarantees max-min based on demands
(future demands estimated based on history)
33
Guarantee Partitioning – Efficiency
VS1
BX
X
BQ
Z
Y
T
BX = … = BQ = 100Mbps
Q
Z
XZ
BX =
X
66
50
T
TY
BY =
BXY = 33
XY
BX = 50 33
1
What happens when flows have low demands?
2
How to avoid unallocated guarantees?
XY
BY =
33
Q
33
Y
QY
BY =
33
Guarantee Partitioning – Efficiency
VS1
BX
X
BQ
Y
Z
T
BX = … = BQ = 100Mbps
Q
Z
XZ
BX =
X
66
50
T
TY
BY =
BXY = 33
XY
BX = 50 33
Source considers destination’s allocation
when destination is bottleneck
Guarantee Partitioning converges
XY
BY =
33
Q
33
Y
QY
BY =
33
Outline



Motivation and Goals
Overview
More Details



Guarantee Partitioning
Rate Allocation
Evaluation
Rate Allocation
RXY
Spare bandwidth
Fully used
X
Guarantee Partitioning
BXY
BXY
Time
Congestion data
Rate Allocation
Rate RXY
Limiter
Y
Rate Allocation
RXY = max(BXY , RTCP-like)
X
Guarantee Partitioning
BXY
Congestion data
Rate Allocation
Rate RXY
Limiter
Y
Rate Allocation
RXY = max(BXY , RTCP-like)
1000
Rate Limiter Rate
Another Tenant
900
800
Mbps
700
600
500
400
300
Guarantee
200
100
0
144.178
144.678
145.178
145.678
146.178
146.678
Seconds
147.178
147.678
148.178
Rate Allocation
RXY = max(BXY , Rweighted-TCP-like)
Weight is the BXY guarantee
X
Z
RXY = 333Mbps
BXY = 100Mbps
RXT = 666Mbps
BZT = 200Mbps
L = 1Gbps
Y
T
Rate Allocation – Congestion Detection

Detect congestion through dropped packets


Hypervisors add/monitor sequence numbers in packet headers
Use ECN, if available
Rate Allocation – Adaptive Algorithm

Use Seawall [NSDI’11] as rate-allocation algorithm


TCP-Cubic like
Essential improvements (for when using dropped packets)
Many flows probing for spare bandwidth affect guarantees of others
Rate Allocation – Adaptive Algorithm

Use Seawall [NSDI’11] as rate-allocation algorithm


TCP-Cubic like
Essential improvements (for when using dropped packets)

Hold-increase: hold probing for free bandwidth after a congestion
event. Holding time is inversely proportional to guarantee.
Rate increasing
Guarantee
Holding time
Outline



Motivation and Goals
Overview
More Details



Guarantee Partitioning
Rate Allocation
Evaluation
Evaluation – MapReduce

Setup



44 servers, 4x oversubscribed topology, 4 VMs/server
Each tenant runs one job, all VMs of all tenants same guarantee
Two scenarios:

Light



10% of VM slots are either a mapper or a reducer
Randomly placed
Heavy


100% of VM slots are either a mapper or a reducer
Mappers are placed in one half of the datacenter
Evaluation – MapReduce
1
CDF
0.8
0.6
0.4
0.2
0
0.2
0.5
1
5
Worst case shuffle completion time / static reservation
Evaluation – MapReduce
ElasticSwitch
1
CDF
0.8
No Protection
Longest completion reduced
from No Protection
Work-conserving pays off: finish faster
than static reservation
0.6
Light Setup
0.4
0.2
0
0.2
0.5
1
5
Worst case shuffle completion time / static reservation
Evaluation – MapReduce
ElasticSwitch
ElasticSwitch enforces guarantees in worst case
1
No Protection
up to 160X
CDF
0.8
0.6
Guarantees are useful in reducing
worst-case shuffle completion
0.4
0.2
Heavy Setup
0
0.2
0.5
1
5
Worst case shuffle completion time / static reservation
ElasticSwitch Summary

Properties
1.
2.
3.

Bandwidth Guarantees: hose model or derivatives
Work-conserving
Practical: oversubscribed topologies, commodity switches,
decentralized
Design: two layers


Guarantee Partitioning: provides guarantees by transforming hosemodel guarantees into VM-to-VM guarantees
Rate Allocation: enables work conservation by increasing rate
limits above guarantees when no congestion
HP Labs is hiring!
Future Work

Reduce Overhead


Multi-path solution



ElasticSwitch: average 1 core / 15 VMs ,worst case 1 core /VM
Single-path reservations are inefficient
No existing solution works on multi-path networks
VM placement

Placing VMs in different locations impacts the gaurantees that
can be made.
Open Questions

How do you integrate network sharing with endhost sharing.

What are the implications of different sharing mechanisms with each
other?

How does the network architecture affect network sharing?

How do you do admission control?

How do you detect demand?

How does payment fit into this question? And if it does, when VMs
from different people communicate, who dictates price, who gets
charged?
Elastic Switch – Detecting Demand

Optimize for bimodal distribution flows


Most flows short, a few flows carry most bytes
Short flows care about latency, long flows care about
throughput

Start with a small guarantee for a new VM-to-VM flow

If demand not satisfied, increase guarantee exponentially
Perfect Network Architecture

What happens in the perfect network architecture?

Implications:


No loss in network
Only at the edge of the network:



Edge uplinks between Server and ToR
Or hypervisor to VM – virtual links
These losses core networks are real:


VL2 at Azure
Clos at Google
'
Core&
Edge&
How do you integrate network sharing with
endhost sharing.
Open Questions

…
…

70%'
60%'

50%'
40%'

30%'
20%'
10%'

0%'
'
What
are the
implications of different sharing mechanisms
99.9th&
percen5le&
with each
other?
u5liza5on&
(%)&
HoUest&storage&cluster:&
1000x&
more&
drops&
How does the network architecture
affect
network
sharing?at&
the&Edge,&than&Core.&
How do you do admission control?
&
How does payment fit into this question?
And
ifclusters:&
it does, when
16&of&
17&
VMs from different people communicate, who dictates price,
Core' charged?Edge'
0&drops&in&the&Core.&&
who gets
Timescales:&over&2&weeks,&
99.9th&pcile&=&several&minutes&
4'Apr'2013'
NSDI'2013'
15'
OpenTransmit/Receive&
Questions
Modules&

How do you integrate network sharing with endhost sharing.
What
2Gb/s' VM'
areRate'limit.'
the implications 1Gb/s'
of different sharing mechanisms
with each other?
Conges%on'detectors'
2Gb/s'

1Gb/s'
How does the network architecture affect network8Gb/s'
sharing?
2Gb/s' VM'

VM'
VM'
Rate'limit.'
How do you do admission control?
 How does
payment fit into this question? And if it does, when
Rate'limit.'
8Gb/s' VM'
VM' 8Gb/s'
VMs from different people communicate, who dictates price,
who gets charged?
PerZdes%na%on'rate'limiters:'
only'if'dest.'is'congested…'bypass'otherwise'
4'Apr'2013'
NSDI'2013'
25'
OpenTransmit/Receive&
Questions
Modules&

How do you integrate network sharing with endhost sharing.
RCP:'Rate'feedback'(R)'every'10kB'
(no'perZsource'state'needed)'
What
2Gb/s' VM'
areRate'limit.'
the implications 1Gb/s'
of different sharing mechanisms
with each other? Feedback'pkt'
Conges%on'detectors'
Rate:'1G
b

2Gb/s'
VM'
How does the network architecture affect network8Gb/s'
sharing?
2Gb/s' VM'

/s '
1Gb/s'
VM'
Rate'limit.'
How do you do admission control?
 How does
payment fit into this question? And if it does, when
Rate'limit.'
8Gb/s' VM'
VM' 8Gb/s'
VMs from different people communicate, who dictates price,
who gets charged?
PerZdes%na%on'rate'limiters:'
only'if'dest.'is'congested…'bypass'otherwise'
4'Apr'2013'
NSDI'2013'
26'
Download