MPAT: Aggregate TCP Congestion Management as a Building Block for Internet QoS Manpreet Singh, Prashant Pradhan* and Paul Francis * 1 Each TCP flow gets equal bandwidth Congestion window size 40 35 30 25 Red flow 20 Blue flow 15 10 5 0 0 10 20 30 40 50 Time 2 Our Goal: enable bandwidth apportionment among TCP flows in a best-effort network Congestion window size 40 35 30 25 Red flow 20 Blue flow 15 10 5 0 0 10 20 30 40 50 Time 3 Transparency: – No network support: • ISPs, routers, gateways, etc. • Clients unmodified – TCP-friendliness • “Total” bandwidth should be the same 40 40 30 30 20 20 10 10 0 0 0 10 20 30 40 50 0 10 20 30 40 50 4 Why is it so hard? • Fair share of a TCP flow keeps changing dynamically with time. Lot of cross-traffic Server Client bottleneck 5 Why not open extra TCP flows ? • pTCP scheme [Sivakumar et. al.] – Open more TCP flows for a high-priority application • Resulting behavior is unfriendly to the network • Large number of flows active at a bottleneck lead to significant unfairness in TCP 6 Why not modify the AIMD parameters? • mulTCP scheme [ Crowcroft et. al. ] – Use different AIMD parameters for each flow • Increase more aggressively on successful transmission. • Decrease more conservatively on packet loss. • Unfair to the background traffic • Does not scale to larger differentials – Large number of timeouts – Two mulTCP flows running together try to “compete” with each other 7 Properties of MPAT • Key insight: send the packets of one flow through the open congestion window of another flow. • Scalability – Substantial differentiation between flows (demonstrated up to 95:1) – Hold fair share (demonstrated up to 100 flows) • Adaptability – Changing performance requirements – Transient network congestion • Transparency – Changes only at the server side – Friendly to other flows 8 MPAT: an illustration Unmodified client Server Total congestion window = 10 Congestion Target 4:1 window Flow1 Flow2 5 5 8 2 9 MPAT: transmit processing cwnd TCP1 cwnd TCP2 1 2 3 4 5 Send three additional red packets through the congestion window of blue flow. 6 7 8 1 2 10 MPAT: implementation Maintain a virtual mapping • New variable: MPAT window • Actual window = min ( MPAT window, recv window) • Map each outgoing packet to one of the congestion windows. Seqno Congestion window 1 Red 2 Red 3 Red 4 Red 5 Red 6 Blue 7 Blue 8 Blue 1 Blue 2 Blue 11 MPAT: receive processing For every ACK received on a flow, update the congestion window through which that packet was sent. Seqno window cwnd TCP1 cwnd TCP2 1 2 3 4 5 6 7 8 1 2 1 Red 2 Red 3 Red 4 Red 5 Red 6 Blue 7 Blue 8 Blue 1 Blue 2 Blue Incoming Acks 1 2 .. . 7 8 1 2 12 TCP-friendliness Invariant: Each congestion window experiences the same loss rate. Congestion window 40 30 Red flow 20 Blue flow Total 10 0 0 10 20 30 40 50 Time 13 MPAT decouples reliability from congestion control • Red flow is responsible for the reliability of all red packets. – (e.g. buffering, retransmission, etc. ) • Does not break the “end-to-end” principle. 14 Experimental Setup • Wide-area network test-bed • Planet-lab • Experiments over the real internet • User-level TCP implementation • Unconstrained buffer at both ends • Goal: • Test the fairness and scalability of MPAT 15 Bandwidth Apportionment The MPAT scheme used to apportion total bandwidth in the ratio 1:2:3:4:5 Bandwidth (KBps) 1000 800 600 400 200 0 0 50 100 150 200 250 300 Time elapsed (sec) MPAT can apportion available bandwidth among its flows, 16 irrespective of the total fair share Scalability of MPAT Achieved differential 95 MPAT mulTCP 80 60 40 20 0 0 20 40 60 Target differential 80 95 95 times differential achieved in 17 experiments Responsiveness 100 100 90 80 70 60 50 40 30 20 90 Relative bandwidth 80 70 239.9 240 240.1 Achieved differential 240.2 60 Target differential 50 40 30 20 10 0 0 60 120 180 240 300 360 420 480 540 Time elapsed (sec) MPAT adapts itself very quickly to dynamically changing 18 performance requirements Fairness • 16 MPAT flows • Target ratio: 1 : 2 : 3 : … : 15 : 16 • 10 standard TCP flows in background Relative Bandwidth 4 3 2 1.6 1 0 0 100 200 300 Time elapsed (sec) 400 19 Applicability in real world • Deployment: – Enterprise network – Grid applications • Gold vs Silver customers • Background transfers 20 Sample Enterprise network (runs over the best-effort Internet) New Delhi (application server) San Jose (database server) New York (web server) Zurich 21 (transaction server) Background transfers • Data that humans are not waiting for – Non-deadline-critical • Examples – – – – – Pre-fetched traffic on the Web File system backup Large-scale data distribution services Background software updates Media file sharing • Grid Applications 22 Future work • Benefit short flows: – Map multiple short flows onto a single long flow – Warm start • Middle box – Avoid changing all the senders • Detect shared congestion: – Subnet-based aggregation 23 Conclusions • MPAT is a very promising approach for bandwidth apportionment • Highly scalable and adaptive: • Substantial differentiation between flows (demonstrated up to 95:1) • Adapts very quickly to transient network congestion • Transparent to the network and clients: • Changes only at the server side • Friendly to other flows 24 Extra slides… 25 Reduced variance Bandwidth of a mulTCP flow with N=16 3000 2800 2600 2400 2200 2000 1800 1600 1400 1200 1000 Bandwidth (KBps) Bandwidth (KBps) Total bandwidth of an MPAT aggregate with N=16 0 20 40 60 Time elapsed (sec) 80 100 120 2000 1800 1600 1400 1200 1000 800 600 400 200 0 0 20 40 60 80 100 Time elapsed (sec) MPAT exhibits much lower variance in throughput than mulTCP 26 120 Fairness across aggregates Bandwidth of 5 MPAT aggregates running simultaneously 1000 Bandwidth (KBps) 900 800 700 600 N=2 500 N=6 N=4 N=8 400 N=10 300 200 100 0 0 50 100 150 200 250 300 350 Time elapsed (sec) Multiple MPAT aggregates “cooperate” with each other 27 Multiple MPAT aggregates running simultaneously cooperate with each other N # Fast # Timeouts Bandwidth (KBps) With aggregation No aggregation With aggregation No aggregation With aggregation No aggregation 2 (MPAT) 4 (MPAT) 6 (MPAT) 8 (MPAT) 10 (MPAT) 5 (TCP) 565 564 555 535 537 577 540 542 546 539 531 538 44 34 39 38 41 41 28 25 27 25 26 30 97.6 98.5 98.1 99.3 97.9 93.6 106.7 110.9 108.5 106.4 109.5 107.1 28 Congestion Manager (CM) Goal: To ensure fairness Feedback Sender Receiver TCP1 TCP2 TCP3 TCP4 Data Callbacks API CM Congestion controller Scheduler Flow integration Per-”aggregate” statistics (cwnd, ssthresh, rtt, etc) Per-flow scheduling • An end-system architecture for congestion management. • CM abstracts all congestion-related info into one place. • Separates reliability from congestion control. 29 Issues with CM TCP1 TCP2 TCP3 CM maintains one congestion window per “aggregate” Congestion Manager TCP4 TCP5 Unfair allocation of bandwidth to CM flows 30 mulTCP • Goal: Design a mechanism to give N times more bandwidth to one flow over another. • TCP throughput = f(α, β) / (rtt *sqrt(p)) • • • • α: additive increase factor β: multiplicative decrease factor p: loss probability rtt: round-trip time • Set α = N and β = 1 - 1/(2N) • Increase more aggressively on successful transmission. • Decrease more conservatively on packet loss. • Does not scale with N • Loss process induced is much different from that of N standard TCP flows. • Unstable controller as N increases. 31 Gain in throughput of mulTCP 32 Drawbacks of mulTCP • Does not scale with N • Large number of timeouts • The loss process induced by a single mulTCP flow is much different • Increased variance with N • Amplitude increases with N • Unstable controller as N grows • Two mulTCP flows running together try to “compete” with each other 33 TCP Nice • Two-level prioritization scheme • Only give less bandwidth to low-priority applications • Cannot give more bandwidth to deadlinecritical jobs 34