FAST TCP Steven Low CS/EE netlab.CALTECH.edu Oct 2003 FAST Protocols for Ultrascale Networks Internet: distributed feedback control system TCP: adapts sending rate to congestion AQM: feeds back congestion information AQM wi tan -1i (t ) 1 T (t ) 2 l p 1 ( yl (t ) cl ) cl Faculty Doyle (CDS,EE,BE) Low (CS,EE) Newman (Physics) Paganini (UCLA) Staff/Postdoc Bunn (CACR) Jin (CS) Ravot (Physics) Singh (CACR) StarLight p Rb’(s) xi CERN y TCP q research & production networks Chicago Rf (s) x WAN in Lab Caltech Calren2/Abilene Geneva xi ( t ) qi ( t ) i di i (t )qi (t ) Multi-Gbps 50-200ms delay Theory Experiment People Implementation Students Choe (Postech/CIT) Hu (Williams) J. Wang (CDS) Z.Wang (UCLA) Wei (CS) 155Mb/s SURFNet Amsterdam equilibrium 10Gb/s slow start FAST retransmit time out FAST recovery Industry Doraiswami (Cisco) Yip (Cisco) Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco netlab.caltech.edu/FAST Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP netlab.caltech.edu WWW, Email, Napster, FTP, … Applications TCP/AQM IP Transmission Ethernet, ATM, POS, WDM, … High Energy Physics Large global collaborations 2000 physicists from 150 institutions in >30 countries 300-400 physicists in US from >30 universities & labs SLAC has 500TB data by 4/2002, world’s largest database Typical file transfer ~1 TB At 622Mbps: ~ 4 hrs At 2.5Gbps: ~ 1 hr At 10Gbps: ~15min Gigantic elephants! LHC (Large Hadron Collider) at CERN, to open 2007 Generate data at PB (1015B)/sec Filtered in realtime by a factor of 106 to 107 Data stored at CERN at 100MB/sec Many PB of data per year To rise to Exabytes (1018B) in a decade netlab.caltech.edu HEP high speed network … that must change netlab.caltech.edu HEP Network (DataTAG) NewYork ABILEN E UK SuperJANET4 It GARR-B STARLIGHT ESNET GENEVA GEANT NL SURFnet STAR-TAP CALRE N Fr Renater 2.5 Gbps Wavelength Triangle 2002 10 Gbps Triangle in 2003 netlab.caltech.edu Newman (Caltech) Performance at large windows DataTAG Network: CERN (Geneva) – StarLight (Chicago) – SLAC/Level3 (Sunnyvale) ns-2 simulation average utilization 95% 1G 27% 19% 10Gbps capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps; 100 ms round trip latency; 100 flows J. Wang (Caltech, June 02) netlab.caltech.edu txq=100 txq=10000 Linux TCP Linux TCP txq=100 FAST capacity = 1Gbps; 180 ms round trip latency; 1 flow C. Jin, D. Wei, S. Ravot, etc (Caltech, Nov 02) Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP netlab.caltech.edu WWW, Email, Napster, FTP, … Applications TCP/AQM IP Transmission Ethernet, ATM, POS, WDM, … Congestion Control RTT Source 1 2 W W 1 2 W 1 2 ~ W packets per RTT Lost packet detected by missing ACK Congestion signal: delay and loss netlab.caltech.edu time ACKs data Destination 1 2 W time Congestion control pl(t) xi(t) Example congestion measure pl(t) Loss (Reno) Queueing delay (Vegas) netlab.caltech.edu TCP/AQM pl(t) TCP: Reno Vegas xi(t) AQM: DropTail RED REM/PI AVQ Congestion control is a distributed asynchronous algorithm to share bandwidth It has two components TCP: adapts sending rate (window) to congestion AQM: adjusts & feeds back congestion information They form a distributed feedback control system Equilibrium & stability depends on both TCP and AQM And on delay, capacity, routing, #connections netlab.caltech.edu Network model x Rf(s) F1 Network TCP y G1 FN GL q Rb R f li e Rb li e netlab.caltech.edu AQM s li s li ’(s) p if source i uses link l if source i uses link l Vegas model for every RTT if W/RTTmin – W/RTT < then W ++ { if W/RTTmin – W/RTT > then W -- } queue size Fi: Gl: 1 xi 2 Ti (t ) if xi (t )qi (t ) i d i 1 xi 2 Ti (t ) if xi (t )qi (t ) i d i xi 0 else p l c1l ( yl (t ) cl ) netlab.caltech.edu E2E queueing delay Link queueing delay Vegas model x Rf(s) F1 Network TCP y G1 FN GL q Rb 1 Fi sgn 1 2 T (t ) netlab.caltech.edu AQM xi ( t ) qi ( t ) i di ’(s) p yl (t ) Gl 1 cl Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP netlab.caltech.edu WWW, Email, Napster, FTP, … Applications TCP/AQM IP Transmission Ethernet, ATM, POS, WDM, … Methodology Protocol (Reno, Vegas, RED, REM/PI…) x(t 1) F ( p(t ), x(t )) p(t 1) G ( p(t ), x(t )) Equilibrium Performance Throughput, loss, delay Fairness Utility netlab.caltech.edu Dynamics Local stability Cost of stabilization Model Network Links l of capacities cl Sources s L(s) - links used by source s Us(xs) - utility if source rate = xs x1 x1 x3 c2 x1 x2 c1 c1 c2 x2 netlab.caltech.edu x3 Summary: duality model Flow control problem (Kelly, Malloo, Tan 98) U ( x ) max s xs 0 s s subject to Rx c Primal-dual algorithm x(t 1) F ( RT p(t ), x(t )) p(t 1) G ( p(t ), Rx (t )) Reno, Vegas DropTail, RED, REM TCP/AQM Maximize utility with different utility functions Result (L 00): (x*,p*) primal-dual optimal iff yl* cl with equality if netlab.caltech.edu pl* 0 Example utility functions Reno - 1 : 3/ 2 tan 1 Ti 2 / 3 xiTi Reno - 2 : xiTi 1 log Ti 2 xiTi 3 Vegas i log xi : General : netlab.caltech.edu (1 ) 1 xi1 log xi 1 1 Game interpretation Source s: max U s ( xs ) xs Rls pl xs 0 xs (t 1) U Link l: l ' 1 s Rls pl (t ) s max pl Rls xs cl pl 0 s pl (t 1) pl (t ) l xs (t ) cl s netlab.caltech.edu Synchronous convergence Theorem (L & Lapsley 99) Provided R has full row rank & Us strictly concave: Gradient projection algorithm of dual problem Converges to optimal primal-dual solutions if 2 l SL Limit point: unique Pareto optimal Nash equilibrium netlab.caltech.edu Asynchronous convergence Sources and links update & compute at different times with different frequencies using delayed info Theorem (L & Lapsley 99) Converges in asynchronous environment with smaller netlab.caltech.edu Equilibrium of Vegas Network Link queueing delays: pl Queue length: clpl Sources Throughput: xi E2E queueing delay : qi Packets buffered: xi qi i d i Ui(x) = i di log x Utility funtion: Proportional fairness netlab.caltech.edu Validation (L. Wang, Princeton) Source rates (pkts/ms) # src1 src2 1 5.98 (6) 2 2.05 (2) 3.92 (4) 3 0.96 (0.94) 1.46 (1.49) 4 0.51 (0.50) 0.72 (0.73) 5 0.29 (0.29) 0.40 (0.40) # 1 2 3 4 5 queue (pkts) 19.8 (20) 59.0 (60) 127.3 (127) 237.5 (238) 416.3 (416) netlab.caltech.edu src3 src4 3.54 (3.57) 1.34 (1.35) 0.68 (0.67) 3.38 (3.39) 1.30 (1.30) baseRTT (ms) 10.18 (10.18) 13.36 (13.51) 20.17 (20.28) 31.50 (31.50) 49.86 (49.80) src5 3.28 (3.34) Methodology Protocol (Reno, Vegas, RED, REM/PI…) x(t 1) F ( p(t ), x(t )) p(t 1) G ( p(t ), x(t )) Equilibrium Performance Throughput, loss, delay Fairness Utility netlab.caltech.edu Dynamics Local stability Cost of stabilization Stability: Reno/RED x TCP Rf(s) F1 Network FN q TCP: Small Small c Large N RED: Small Large delay netlab.caltech.edu y G1 AQM GL Rb p ’(s) Theorem (Low et al, Infocom’02) Reno/RED is locally stable if c 3 3 2 N 3 (c N ) ( 1- ) 2 4 2 2 (1 ) 2 Stability: scalable control x TCP Rf(s) F1 Network FN q xi (t ) xi e y G1 AQM GL Rb p ’(s) i q (t ) i mi i p l (t ) 1 yl (t ) cl cl Theorem (Paganini, Doyle, L, CDC’01) Provided R is full rank, feedback loop is locally stable for arbitrary delay, capacity, load and topology netlab.caltech.edu Stability: Stabilized Vegas x TCP Rf(s) F1 Network FN q y G1 AQM GL Rb 1 xi ( t ) qi ( t ) -1 xi tan ( t ) 1 i (t )qi (t ) i di 2 T (t ) p ’(s) p l (t ) 1 yl (t ) cl cl Theorem (Choe & L, Infocom’03) Provided R is full rank, feedback loop is locally stable if max xiTi (a, ) netlab.caltech.edu Stability: Stabilized Vegas x TCP -1 Rf(s) F1 Network FN q 1 xi sgn 1 2 T (t ) y G1 AQM GL Rb xi ( t ) qi ( t ) i di p ’(s) p l (t ) 1 yl (t ) cl cl Theorem (Choe & L, Infocom’03) Provided R is full rank, feedback loop is locally stable if max xiTi (a, ) netlab.caltech.edu Stability: FAST x TCP Rf(s) F1 Network FN q y G1 AQM GL Rb 1 xi ( t ) qi ( t ) -1 xi tan ( t ) 1 i (t )qi (t ) i di 2 T (t ) p ’(s) p l (t ) 1 yl (t ) cl cl Application Stabilized TCP with current routers Queueing delay as congestion measure has right scaling Incremental deployment with ECN netlab.caltech.edu Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP netlab.caltech.edu WWW, Email, Napster, FTP, … Applications TCP/AQM IP Transmission Ethernet, ATM, POS, WDM, … Window control algorithm Theorem (Jin, Wei, L ‘03) In absence of delay Mapping from w(t) to w(t+1) is contraction Global exponential convergence Full utilization after finite time Utility function: i log xi (proportional fairness) netlab.caltech.edu Network (Sylvain Ravot, caltech/CERN) netlab.caltech.edu FAST BMPS 10 9 7 FAST 2 1 Internet2 Land Speed Record netlab.caltech.edu 1 2 FAST Standard MTU Throughput averaged over > 1hr #flows Aggregate throughput 88% FAST Standard MTU Utilization averaged over > 1hr 90% 90% Average utilization 92% 95% 1hr 1 flow netlab.caltech.edu 1hr 2 flows 6hr 7 flows 1.1hr 6hr 9 flows 10 flows Aggregate throughput 92% FAST Standard MTU Utilization averaged over 1hr 2G 48% Average utilization 95% 1G 27% 16% 19% txq=100 txq=10000 Linux TCP Linux TCP netlab.caltech.edu FAST Linux TCP Linux TCP FAST SCinet Caltech-SLAC experiments Acknowledgments SC2002 Baltimore, Nov 2002 netlab.caltech.edu/FAST Prototype C. Jin, D. Wei Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA) Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil, J. Williams StarLight: T. deFanti, L. Winkler Major sponsors ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF FAST Dynamic sharing: 3 flows Dynamic sharing on Dummynet netlab.caltech.edu capacity = 800Mbps delay=120ms 3 flows iperf throughput Linux 2.4.x (HSTCP: UCL) Linux FAST Dynamic sharing: 3 flows Linux Steady throughput HSTCP netlab.caltech.edu STCP queue FAST loss Linux throughput 30min Dynamic sharing on Dummynet capacity = 800Mbps HSTCP delay=120ms 14 flows iperf throughput Linux 2.4.x (HSTCP: UCL) netlab.caltech.edu STCP queue Room for mice ! FAST loss Linux throughput HSTCP HSTCP netlab.caltech.edu 30min STCP Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP netlab.caltech.edu WWW, Email, Napster, FTP, … Applications TCP/AQM IP Transmission Ethernet, ATM, POS, WDM, … Network model x y R F1 Network TCP G1 FN q AQM GL R T p Rli 1 if source i uses link l IP routing x(t 1) F ( RT p(t ), x(t )) p(t 1) G ( p(t ), Rx (t )) Reno, Vegas netlab.caltech.edu DT, RED, … Motivation Primal : max max R x 0 Dual : netlab.caltech.edu min p 0 U ( x ) i i subject to Rx c i U i ( xi ) xi max Rli pl pl cl i max Ri xi 0 l l Motivation Primal : max max R x 0 Dual : min p 0 U ( x ) i i subject to Rx c i U i ( xi ) xi max Rli pl pl cl i max Ri xi 0 l l Shortest path routing! Can TCP/IP maximize utility? netlab.caltech.edu TCP-AQM/IP Theorem (Wang, et al 03) Primal problem is NP-hard Proof Reduce integer partition to primal problem Given: integers {c1, …, cn} Find: set A s.t. c c iA netlab.caltech.edu i iA i TCP-AQM/IP Theorem (Wang, et al 03) Primal problem is NP-hard Achievable utility of TCP/IP? Stability? Duality gap? Conclusion: Inevitable tradeoff between achievable utility routing stability netlab.caltech.edu Ring network destination r TCP/AQM IP netlab.caltech.edu Single destination Instant convergence of TCP/IP Shortest path routing Link cost = pl(t) + dl price routing pl(0) pl(1) r(0) r(1) … static r(t), r(t+1) , … Ring network destination Stability: r ? Utility: V ? r* : optimal routing V* : max utility r TCP/AQM IP netlab.caltech.edu pl(0) pl(1) r(0) r(1) … r(t), r(t+1) , … Ring network destination Stability: r ? Utility: V ? link cost = pl(t) + dl r netlab.caltech.edu Theorem (Infocom 2003) “No” duality gap Unstable if = 0 starting from any r(0), subsequent r(t) oscillates between 0 and 1 Ring network destination Stability: r ? Utility: V ? link cost = pl(t) + dl r Theorem (Infocom 2003) Solve primal problem asymptotically as | r * r | 0 V * V 0 netlab.caltech.edu Ring network destination Stability: r ? Utility: V ? link cost = pl(t) + dl r netlab.caltech.edu Theorem (Infocom 2003) large: globally unstable small: globally stable medium: depends on r(0) General network Conclusion: Inevitable tradeoff between achievable utility routing stability random graph 20 nodes, 200 links netlab.caltech.edu Achievable utility netlab.caltech.edu/FAST FAST TCP: motivation, architecture, algorithms, performance. submitted for publication, July 1, 2003 -release: August 2003 Inquiry: fast-support@cs.caltech.edu FAST Project Review Caltech, Oct 27-28, 2003 netlab.caltech.edu