Modern congestion control

advertisement
TCP Congestion Control
at the Network Edge
Jennifer Rexford
Fall 2014 (TTh 3:00-4:20 in CS 105)
COS 561: Advanced Computer Networks
http://www.cs.princeton.edu/courses/archive/fall14/cos561/
Original TCP Design
Internet
2
Wireless and Data-Center Networks
data-center
network
Internet
wireless
network
3
Data-Center Networks
4
Modular Network Topology
• Containers
• Racks
– Multiple servers
– Top-of-rack switches
5
Tree-Like Topologies
CR
CR
AR
AR
AR
AR
S
S
S
S
S
S
S
S
S
…
S
...
…
S
…
S
…
• Many equal-cost paths
• Small round-trip times (e.g., < 250 microseconds)
6
Commodity Switches
• Low-cost switches
– Especially for top-of-rack switches
• Simple memory architecture
– Small packet-buffer space
– Shared buffer over all input ports
– Simple drop-tail queues
7
Multi-Tier Applications
Front end
Server
Aggregator
Aggregator
Aggregator
… …
Aggregator
…
Worker
8
Worker
…
Worker
Worker
Worker
8
Application Mix
• Partition-aggregate workflow
– Multiple workers working in parallel
– Straggler slows down the entire system
– Many workers send response at the same time
• Diverse mix of traffic
– Low latency for short flows
– High throughput for long flows
• Multi-tenancy
– Many tenants sharing the same platform
– Running network to high levels of utilization
• Small number of large flows on links
9
TCP Incast Problem
• Multiple workers transmitting to one aggregator
– Many flows traversing the same link
– Burst of packets sent at (nearly) the same time
– … into a relatively small switch memory
• Leading to high packet loss
– Some results are slow to arrive
– May be excluded from the final results
• Developer software changes
– Limit the size of worker responses
– Randomize the sending time for responses
10
Queue Buildup
• Mix of long and short flows
– Long flows fill up the buffers in switches
– … causing queuing delay (and loss) for the short flows
– E.g., queuing delay of 1-14 milliseconds
• Large relative to propagation delay
– E.g., 100 microseconds intra-rack
– E.g., 250 microseconds inter-rack
– Leading to RTT variance and big throughput drop
• Shared switch buffers
– Short flows on one port
– … affected by long flows on other ports
11
TCP Outcast Problem
• Mix of flows at two different
input ports
– Many inter-rack flows
– Few intra-rack flows
– Destined for same output
• Burst of packet arrivals
– Arriving on one input port
– Causing bursty loss for the other
AR
AR
S
S
S
S
S
…
S
…
• Harmful to the intra-rack flows
– Lose multiple packets
– Loss detected by timeout
– Irony: worse throughput despite lower RTT!
12
Delayed Acknowledgments
• Sending ACKs can be expensive
– E.g., send 40-byte ACK packet for each data packet
• Delay ACKs reduce the overhead
– Receiver waits before sending the ACK
– … in the hope of piggybacking the ACK on a response
• Delayed-ACK mechanism
– Set a timer when the data arrives (e.g., 200 msec)
– Piggyback the ACK or send ACK for every other packet
– … or send an ACK after the timer expires
• Timeout for delayed ACK is an eternity!!
– Disable delayed ACKs, or shorten the timer
13
Data-Center TCP (DCTCP)
• Key observation
– TCP reacts to the presence of congestion
– … not to the extent of congestion
• Measuring extent of congestion
– Mark packets when buffer exceeds a threshold
• Reacting to congestion
– Reduce cwnd in proportion to fraction of marked packets
• Benefits
– React early, as queue starts to build
– Prevent harm to packets on other ports
– Get workers to reduce sending rate early
14
Poor Multi-Path Load Balancing
• Multiple shortest paths between pairs of hosts
– Spread the load over multiple paths
• Equal-cost multipath
– Round robin
– Hash-based
• Uneven load
– Elephant flows congest some paths
– … while other paths are lightly loaded
• Reducing congestion
– Careful routing of elephant flows
15
Wireless Networks
16
TCP Design Setting
• Relatively low packet loss
– E.g., hopefully less than 1%
– Okay to retransmit lost packets from the sender
• Loss is caused primarily by congestion
– Use loss as an implicit signal of congestion
– … and reduce the sending rate
• Relatively stable round-trip times
– Use RTT estimate in retransmission timer
• End-points are always on
• Stable end-point IP addresses
– Use IP addresses as end-point identifiers
17
Problems in Wireless Networks
• Limited bandwidth
• High latencies
Internet
• High bit-error rates
• Temporary disconnections
• Slow handoffs
• Mobile device disconnects
to save energy, bearers, etc.
18
Link-Level Retransmission
• Retransmit over the wireless link
–Hide packet losses from end-to-end
–… by retransmitting lost packets on wireless link
–Works for any transport protocol
19
Split Connection
• Two TCP connections
– Between fixed host and the base station
– Between base and the mobile device
• Other optimizations
– Compression, just-in-time delivery, etc.
20
Burst Optimization
• Radio wakeup is expensive
–Wake up
–Establish a bearer
–Use battery and signaling resources
• Burst optimization
–Send bigger chunks less often
–… to allow the mobile device to go to idle state
21
Lossless Handover
• Mobile moves from one base station to another
– Packets in flight still arrive at the old base station
– … and could lead to bursty loss (and TCP timeout)
• Old base station can buffer packets
– Send buffered packets to the new base station
Internet
22
Freezing the Connection
• Mobile device can predict temporary disconnection
– E.g., fading, handoff
• Mobile can ask the fixed host to stop sending
– Advertise a receive window of 0
• Benefits
– Avoids wasted transmission of data
– Avoid loss that triggers timeouts, decrease in cwnd, etc.
23
Discussion
CUBIC paper
24
Download