ppt - Bob Briscoe

advertisement
Network Performance Isolation in Data
Centres using ConEx Congestion Policing
draft-briscoe-conex-policing-01
draft-briscoe-conex-data-centre-02
Bob Briscoe
Chief Researcher, BT
IRTF DCLC Jul 2014
Bob Briscoe’s work is part-funded by the European Community
under its Seventh Framework Programme through the
Trilogy 2 project (ICT-317756)
© British Telecommunications plc
1
purpose of talk
• work proposal for the data centre latency control r-g
– data centre queuing delay control
– designed for global scope (inter-data-centre,... Inter-net)
– this talk: adds first step: intra-data centre
• without any new protocols
• started in the IETF congestion exposure (ConEx) w-g
• generalised for initial deployment without ConEx
– and even without ECN end-to-end
– now even without ECN on switches (in slides, not draft)
© British Telecommunications plc
2
Network Performance Isolation
in Data Centres
• An important problem
– isolating between tenants, or departments
– virtualisation isolates CPU / memory / storage
– but network and I/O system
is highly multiplexed & distributed
bit-rate
• SDN-based (edge) capacity partitioning*
– configuration churn: nightmare at scale
– poor use of capacity
bit-rate
time
• edge-based weighted round robin (or WFQ)
– More common
– but biases towards heavy hitters (no concept of time)
© British Telecommunications plc
* every problem in computer science can be solved by not thinking
then hiding the resulting
mess under a layer of abstraction
3
time
Outline Design – First Step
edge bottlenecks by capacity design
• Edge policing like Diffserv
– but congestion policing (per guest)
• isolation within FIFO queue
w
VM sender
VM receiver
congestion policer
guest OS
hypervisor
switching
hosts
switches
• no config on switches
© British Telecommunications plc
4
bottleneck congestion policer
• in a well-provisioned link, policer rarely intervenes
• but whenever needed, it limits queue growth
FIFO
incoming
packet
stream
Y
AQM
p(t)
w1
congestion
token
bucket
network
buffer
policer
w2
wi
…
di(t)
outgoing packet stream
foreach pkt {
i = classify_user(pkt)
di += wi*(tnow-ti) //fill
ti = tnow
di -= s * p
//drain
if (di<0) {drop(pkt)}
}
ci
…
s: packet size
p: drop prob of AQM
meter
© British Telecommunications plc
5
actually each bucket needs to be two buckets
to limit bursts of congestion
policer
AQM
police if either bucket empties
kwi
• similar code
di(t)
C
congestion
burst limiter
p(t)
wi
– except 2 token buckets
ci
main congestion
token bucket
meter
© British Telecommunications plc
6
...
if (di1<0 || di2<0)
{drop(pkt)}
performance isolation outcome
• WRR or WFQ
rate
time
• congestion policer
rate
– with unequal traffic loads
time
• congestion policer
rate
– treats equal traffic loads
equivalently to WRR
time
© British Telecommunications plc
7
Outline Design
edge and core queue control
• Edge policing like Diffserv
– but congestion policing (per-guest)
• Hose model
• intra-class isolation in all FIFO queues
w
VM sender
VM receiver
TEP / congestion policer
TEP / audit
guest OS
hypervisor
switching
hosts
switches
• FIFO ECN marking on L3 switches
• no other config on switches
© British Telecommunications plc
8
trusted path congestion feedback
• Initial deployment
– all under control of infrastructure admin
• ECN on guest hosts: optional
transport
sender
ConEx
packets
policer
transport
receiver
audit
– ECN enabled across tunnel
infrastructure
• ConEx on guest hosts: optional
– any ConEx-enabled packet
doesn’t require tunnel feedback
transport
sender
TEP
policer
Non-ConEx
packets
feedback between
tunnel endpoints
• details – see spare slide or draft
infrastructure
© British Telecommunications plc
9
transport
receiver
TEP
Features of Solution
• Network performance isolation between tenants
• No loss of LAN-like multiplexing benefits
• work-conserving
•
•
•
•
Zero (tenant-related) switch configuration
No change to existing switch implementations
Weighted performance differentiation
Simplest possible contract
• per-tenant network-wide allowance
• tenant can freely move VMs around without changing allowance
• sender constraint, but with transferable allowance
• Transport-Agnostic
• Extensible to wide-area and inter-data-centre interconnect
© British Telecommunications plc
10
call for interest
• implementation in hypervisors
• evaluation
© British Telecommunications plc
11
Network Performance Isolation in Data
Centres using congestion policing
draft-briscoe-conex-policing-01
draft-briscoe-conex-data-centre-02
Q&A
& spare slides
measuring contribution to congestion
= bytes weighted by congestion level
= bytes dropped (or ECN-marked)
= ‘congestion-volume’
as simple to measure as volume 1%
bit-rate
time
congestion
0.01%
0.01% congestion
1% congestion
3MB
300MB
100MB
10GB
1MB
1MB
© British Telecommunications plc
13
time
unilateral deployment technique for data centre operator
• exploits:
• widespread edge-edge tunnels in multi-tenant DCs to isolate forwarding
• a side-effect of standard tunnelling (IP-in-IP or any ECN link encap)
D
S
00  10
E
C
N
ingress
1
D
S
policer
E
C
N
D
S
E
C
N
11  drop
congested
network 10  11
element
2
D
S
E
C
N
E
• for e2e transports that don’t
support ECN, the operator can:
1. at encap: alter 00 to 10 in outer
2. at interior buffers: turn on ECN
• defers any drops until egress E
• audit A just before egress
can see packets to be dropped
14
IPFIX
egress
A
D
S
E
C
N
audit
E
C
N
D
S
meter
collector
E
C
N
D
S
D
S
E
C
N
3
exporter
• for e2e transports that don’t
support ConEx, the operator can
create its own trusted feedback:
3. at decap: only for Not-ConEx
packets, feedback aggregate
congestion marking counters:
• CE outer, Not-ECT inner = loss
• CE outer, ECT inner = ECN
Download