TCP Nice: A Mechanism for Background Transfers Arun Venkataramani, Ravi Kokku, Mike Dahlin Laboratory of Advanced Systems Research(LASR) Computer Sciences, University of Texas at Austin July 12, 2016 Department of Computer Sciences, UT Austin What are background transfers? Data that humans are not waiting for Non-deadline-critical Unlimited demand Examples - Prefetched traffic on the Web - File system backup - Large-scale data distribution services - Background software updates - Media file sharing July 12, 2016 Department of Computer Sciences, UT Austin Desired Properties Utilization of spare network capacity No interference with regular transfers - Self-interference - applications hurt their own performance - Cross-interference - applications hurt other applications’ performance July 12, 2016 Department of Computer Sciences, UT Austin Goal: Self-tuning protocol Many systems use magic numbers - Prefetch threshold [DuChamp99, Mogul96, …] Rate limiting [Tivoli01, Crovella98, …] Off-peak hours [Dykes01, many backup systems, …] Limitations of manual tuning - Difficult to balance concerns Proper balance varies over time Burdens application developers July 12, 2016 Complicates application design Increases risk of deployment Reduces benefits of deployment Department of Computer Sciences, UT Austin TCP Nice Goal: abstraction of free infinite bandwidth Applications say what they want - OS manages resources and scheduling Self tuning transport layer - Reduces risk of interference with FG traffic - Significant utilization of spare capacity by BG - Simplifies application design July 12, 2016 Department of Computer Sciences, UT Austin Outline Motivation How Nice works Evaluation Case studies Conclusions July 12, 2016 Department of Computer Sciences, UT Austin Why change TCP? TCP does network resource management - Need flow prioritization Alternative: router prioritization + More responsive, simple one bit priority - Hard to deploy Question: - Can end-to-end congestion control achieve noninterference and utilization? July 12, 2016 Department of Computer Sciences, UT Austin Traditional TCP Reno Adjusts congestion window (cwnd) - Competes for “fair share” of BW (=cwnd/RTT) Packet losses signal congestion - Additive Increase - no losses Increase cwnd by one packet per RTT - Multiplicative Decrease: - one packet loss (triple duplicate ack) halve cwnd - multi-packet loss set cwnd = 1 Problem: signal comes after damage done July 12, 2016 Department of Computer Sciences, UT Austin TCP Nice Proactively detects congestion Uses increasing RTT as congestion signal - Congestion incr. queue lengths incr. RTT Aggressive responsiveness to congestion Only modifies sender-side congestion control - Receiver and network unchanged - TCP friendly July 12, 2016 Department of Computer Sciences, UT Austin TCP Vegas Early congestion detection using RTTs Additive decrease on early congestion Restrict number of packets maintained by each flow at bottleneck router Small queues, yet good utilization July 12, 2016 Department of Computer Sciences, UT Austin TCP Nice Basic algorithm - 1. Early Detection thresh. queue length incr. in RTT 2. Multiplicative decrease on early congestion 3. Maintain small queues like Vegas 4. Allow cwnd < 1.0 per-ack operation: - if(curRTT > minRTT + threshold*(maxRTT – minRTT)) numCong++; per-round operation: - if(numCong > f.W) W W/2 else { … Vegas congestion control } July 12, 2016 Department of Computer Sciences, UT Austin Nice: the works Reno Add * Nice Add * Add * Add * Mul + Add + Mul + Mul + t.B m pkts B minRTT = t maxRTT = t+B/m Non-interference getting out of the way in time Utilization maintaining a small queue July 12, 2016 Department of Computer Sciences, UT Austin Evaluation of Nice Analysis Simulation micro-benchmarks WAN measurements Case studies July 12, 2016 Department of Computer Sciences, UT Austin Theoretical Analysis Prove small bound on interference Main result – interference decreases exponentially with bottleneck queue capacity, independent of the number of Nice flows Guided algorithmic design - All 4 aspects of protocol essential Unrealistic model July 12, 2016 Fluid approximation Synchronous packet drop Department of Computer Sciences, UT Austin TCP Nice Evaluation Micro-benchmarks (ns simulation) - test interference under more realistic assumptions - Near optimal interference - 10-100X less than standard TCP - determine if Nice can get useful throughput - 50-80% of spare capacity - test under stressful network conditions - test under restricted workloads, topologies July 12, 2016 Department of Computer Sciences, UT Austin TCP Nice Evaluation Measure prototype on WAN + test interference in real world + determine if throughput is available to Nice - limited ability to control system Results - Nice causes minimal interference - standard TCP – significant interference - Significant spare BW throughout the day July 12, 2016 Department of Computer Sciences, UT Austin NS Simulations Background traffic: - Long-lived flows Nice, rate limiting, Reno, Vegas, Vegas-0 Ideal: Router prioritization Foreground traffic: Squid proxy trace - Reno, Vegas, RED On/Off UDP Single bottleneck topology Parameters - spare capacity, number of Nice flows, threshold Metric July 12, 2016 average document transfer latency Department of Computer Sciences, UT Austin Network Conditions Document Latency (sec) 1e3 100 10 V0 Reno Nice 1 Vegas Router Prio 0.1 1 10 Spare Capacity 100 Nice causes low interference even when there isn’t much spare capacity. July 12, 2016 Department of Computer Sciences, UT Austin Scalability 1e3 Document Latency (sec) Vegas 100 Reno 10 V0 Nice 1 Router Prio 0.1 1 10 Num BG flows W < 1 allows Nice to scale to any number of background flows July 12, 2016 Department of Computer Sciences, UT Austin 100 Utilization 8e4 Vegas BG Throughput (KB) Reno 6e4 Router Prio 4e4 V0 Nice 2e4 0 1 10 Num BG flows 100 Nice utilizes 50-80% of spare capacity w/o stealing any bandwidth from FG July 12, 2016 Department of Computer Sciences, UT Austin Case study 1 : Data Distribution Tivoli Data Exchange system Complexity - 1000s of clients different kinds of networks – phone line, cable modem, T1 Problem - tuning data sending rate to reduce interference and maximize utilization Current solution July 12, 2016 manual setting for each set of clients Department of Computer Sciences, UT Austin Case study 1 : Data Distribution Ping latency (in ms) 350 Manual tuning 300 Nice point Austin to London 250 200 150 100 50 20 40 60 80 100 120 140 Completion time(in seconds) July 12, 2016 Department of Computer Sciences, UT Austin 160 180 Case study 2: Prefetch-Nice Novel non-intrusive prefetching system [USITS’03] Eliminates network interference using TCP Nice Eliminates server interference using simple monitor Readily deployable - no modifications to HTTP or the network JavaScript based Self-tuning architecture, end-to-end interference elimination July 12, 2016 Department of Computer Sciences, UT Austin Case study 2: Prefetch-Nice Response time Hand-tuned threshold Aggressive threshold + Self-tuning protocol Aggressiveness A self-tuning architecture gives optimal performance for any setting July 12, 2016 Department of Computer Sciences, UT Austin Prefetching Case Study Response time (sec) Cable modem 120 100 80 Reno Nice 60 40 20 0 None Cons Prefetching mode July 12, 2016 Department of Computer Sciences, UT Austin Aggr Prefetching case study Response time (sec) Phone line 4000 3500 3000 2500 2000 1500 1000 500 0 Reno Nice None Cons Prefetching mode July 12, 2016 Department of Computer Sciences, UT Austin Aggr Conclusions End-to-end strategy can implement simple priorities Enough usable spare bandwidth out there that can be Nicely harnessed Nice makes application design easy July 12, 2016 Department of Computer Sciences, UT Austin TCP Nice Evaluation Key Results Analytical - interference bounded by factor that falls exponentially with bottleneck queue size independent of no. of Nice flows all 3 aspects of algorithm fundamentally important Microbenchmarks (ns simulation) - near optimal interference interference up to 10x less than standard TCP TCP-Nice reaps substantial fraction of spare BW Measure prototype on WAN - TCP Nice: Minimal interference - standard TCP: Significant interference July 12, 2016 significant spare BW available throughout the day Department of Computer Sciences, UT Austin Freedom from thresholds Document Latency (sec) 20 16 I » O(e-B/m.(1-t)) 12 8 4 0 0 0.2 0.4 0.6 Threshold 0.8 1 Dependence on threshold is weak - small threshold )low interference July 12, 2016 Department of Computer Sciences, UT Austin Manual tuning Prefetching – attractive, restraint necessary - prefetch bandwidth limit [DuChamp99, Mogul96] - threshold probability of access [ Venk’01] - pace packets by guessing user’s idle time [Cro’98] - prefetching during off-peak hours [Dykes’01] Data distribution / Mirroring - tune data sending rates on a per-user basis - Windows XP – Background (BITS) - rate throttling approach - different rates different priority levels July 12, 2016 Department of Computer Sciences, UT Austin Goal: Self-tuning protocol Hand-tuning may not work - Difficult to balance concerns - Proper balance varies over time Burdens application developers - Complicates application design - Increases risk of deployment - Reduces benefits of deployment July 12, 2016 Department of Computer Sciences, UT Austin Nice: the works Reno Add * Vegas Add * Add * Add * Add + Add + Nice Add * Add + Mul + t.B m pkts B minRTT = t Nested congestion triggers: - Nice < Vegas < Reno July 12, 2016 Department of Computer Sciences, UT Austin maxRTT = t+B/m TCP Vegas Expected m Diff Actual a b 1/minRTT W* Window E = W / minRTT A = W /observedRTT Diff = (E – A)/minRTT if(Diff < a / minRTT) WÃW+1 else if(Diff > b / minRTT) WÃW-1 Additive decrease when packets queue July 12, 2016 Department of Computer Sciences, UT Austin Nice features Small a, b not good enough. Not even zero! Need to detect absolute queue length Aggressive backoff necessary Protocol must scale to large number of flows Solution - Detection ! ‘threshold’ queue build up - Avoidance ! Multiplicative backoff - Scalability ! window is allowed to drop below 1 July 12, 2016 Department of Computer Sciences, UT Austin Internet measurements Transfer times (sec) Long Nice and Reno flows Interference metric: flow completion time 80 70 60 50 40 30 20 10 0 BG FG FG BG G F 2 G /B FG G F 9 / G F G B 8 Nice vs Reno: cable modem July 12, 2016 Department of Computer Sciences, UT Austin Internet measurements Long Nice and Reno flows Interference metric: flow completion time Extensive measurements over different kinds of networks - Phone line - Cable modem - Transcontinental link (Austin to London) - University network (Abilene: UT Austin to Delaware) July 12, 2016 Department of Computer Sciences, UT Austin Internet measurements Transfer times (sec) Long Nice and Reno flows Interference metric: flow completion time 100 80 60 40 BG FG 20 0 FG BG G F 2 G /B FG G F 9 / G F G B 8 Nice vs Reno: Austin to London July 12, 2016 Department of Computer Sciences, UT Austin Internet measurements Transfer times (sec) Long Nice and Reno flows Interference metric: flow completion time 400 350 300 250 200 150 100 50 0 BG FG FG BG 2FG FG/BG Nice vs Reno: phone line July 12, 2016 Department of Computer Sciences, UT Austin Internet measurements Austin to London Response Time (sec) 25 20 Reno Nice 15 10 5 0 May 10 May 11 May 12 May 13 May 14 May 15 Time of day Useful spare capacity throughout the day July 12, 2016 Department of Computer Sciences, UT Austin Internet measurements Cable modem Response Time (sec) 300 250 200 Reno Nice 150 100 50 0 May 13 May 14 May 15 May 16 Time of day Useful spare capacity throughout the day July 12, 2016 Department of Computer Sciences, UT Austin NS Simulations - Red 80000 BG Throughput (KB) 70000 60000 50000 40000 30000 20000 Router Prio Nice-0.03 Nice Vegas Reno 10000 0 July 12, 2016 1 10 Num BG flows Department of Computer Sciences, UT Austin 100 NS Simulations - Red 100 Document Latency (sec) Reno Vegas 10 Nice 1 Router Prio 0.1 July 12, 2016 1 Nice-0.03 10 Spare Capacity Department of Computer Sciences, UT Austin 100 Analytic Evaluation All three aspects of protocol essential Main result – interference decreases exponentially with bottleneck queue capacity irrespective of the number of Nice flows - fluid approximation model - long-lived Nice and Reno flows - queue capacity >> number of foreground flows July 12, 2016 Department of Computer Sciences, UT Austin