FAST TCP

advertisement
FAST TCP
Steven Low
CS/EE
netlab.CALTECH.edu
Oct 2003
FAST Protocols for Ultrascale Networks
Internet: distributed feedback control system
TCP: adapts sending rate to congestion
AQM: feeds back congestion information


AQM

wi
tan -1i (t ) 1 
T (t ) 2
l 
p
1
( yl (t )  cl )
cl
Faculty
Doyle (CDS,EE,BE)
Low (CS,EE)
Newman (Physics)
Paganini (UCLA)
Staff/Postdoc
Bunn (CACR)
Jin (CS)
Ravot (Physics)
Singh (CACR)
StarLight
p
Rb’(s)
xi 
CERN
y
TCP
q
research & production
networks
Chicago
Rf (s)
x
WAN in Lab
Caltech
Calren2/Abilene
Geneva
xi ( t ) qi ( t )
 i di
  i (t )qi (t )
Multi-Gbps
50-200ms delay

Theory
Experiment
People
Implementation
Students
Choe (Postech/CIT)
Hu (Williams)
J. Wang (CDS)
Z.Wang (UCLA)
Wei (CS)
155Mb/s
SURFNet
Amsterdam
equilibrium
10Gb/s
slow
start
FAST
retransmit
time
out
FAST
recovery
Industry
Doraiswami (Cisco)
Yip (Cisco)
Partners
CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco
netlab.caltech.edu/FAST
Outline
 Motivation
 Network model
 FAST TCP
 Equilibrium
 Stability
 Experiments
 TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
High Energy Physics
 Large global collaborations
2000 physicists from 150 institutions in >30 countries
300-400 physicists in US from >30 universities & labs
 SLAC has 500TB data by 4/2002, world’s largest database
 Typical file transfer ~1 TB
At 622Mbps: ~ 4 hrs
At 2.5Gbps: ~ 1 hr
At 10Gbps: ~15min
Gigantic elephants!
 LHC (Large Hadron Collider) at CERN, to open 2007
Generate data at PB (1015B)/sec
Filtered in realtime by a factor of 106 to 107
Data stored at CERN at 100MB/sec
Many PB of data per year
To rise to Exabytes (1018B) in a decade
netlab.caltech.edu
HEP high speed network
… that must change
netlab.caltech.edu
HEP Network (DataTAG)
NewYork
ABILEN
E
UK
SuperJANET4
It
GARR-B
STARLIGHT
ESNET
GENEVA
GEANT
NL
SURFnet
STAR-TAP
CALRE
N
Fr
Renater
 2.5 Gbps Wavelength Triangle 2002
 10 Gbps Triangle in 2003
netlab.caltech.edu
Newman (Caltech)
Performance at large windows
DataTAG Network:
CERN (Geneva) –
StarLight (Chicago) –
SLAC/Level3 (Sunnyvale)
ns-2 simulation
average
utilization
95%
1G
27%
19%
10Gbps
capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps,
10Gbps; 100 ms round trip latency; 100 flows
J. Wang
(Caltech, June 02)
netlab.caltech.edu
txq=100
txq=10000
Linux TCP
Linux TCP
txq=100
FAST
capacity = 1Gbps; 180 ms round trip latency;
1 flow
C. Jin, D. Wei, S. Ravot, etc (Caltech, Nov 02)
Outline
 Motivation
 Network model
 FAST TCP
 Equilibrium
 Stability
 Experiments
 TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
Congestion Control
RTT
Source
1 2
W
W
1 2
W
1 2
 ~ W packets per RTT
 Lost packet detected by missing ACK
 Congestion signal: delay and loss
netlab.caltech.edu
time
ACKs
data
Destination
1 2
W
time
Congestion control
pl(t)
xi(t)
Example congestion measure pl(t)
 Loss (Reno)
 Queueing delay (Vegas)
netlab.caltech.edu
TCP/AQM
pl(t)
TCP:
 Reno
 Vegas
xi(t)
AQM:
 DropTail
 RED
 REM/PI
 AVQ
 Congestion control is a distributed asynchronous algorithm
to share bandwidth
 It has two components


TCP: adapts sending rate (window) to congestion
AQM: adjusts & feeds back congestion information
 They form a distributed feedback control system


Equilibrium & stability depends on both TCP and AQM
And on delay, capacity, routing, #connections
netlab.caltech.edu
Network model
x
Rf(s)
F1
Network
TCP
y
G1
FN
GL
q
Rb
R 
f li
e
Rb li  e
netlab.caltech.edu
AQM

 s li

 s li
’(s)
p
if source i uses link l
if source i uses link l
Vegas model
for every RTT
if W/RTTmin – W/RTT <  then W ++
{
if W/RTTmin – W/RTT >  then W --
}
queue size
Fi:
Gl:

1
xi   2
 Ti (t )
if
xi (t )qi (t )   i d i

1
xi   2
 Ti (t )
if
xi (t )qi (t )   i d i
xi  0
else
p l  c1l ( yl (t )  cl )
netlab.caltech.edu
E2E queueing delay
Link queueing delay
Vegas model
x
Rf(s)
F1
Network
TCP
y
G1
FN
GL
q
Rb

1
Fi 
sgn 1 
2
T (t )
netlab.caltech.edu
AQM
xi ( t ) qi ( t )
i di

’(s)
p
yl (t )
Gl 
1
cl
Outline
 Motivation
 Network model
 FAST TCP
 Equilibrium
 Stability
 Experiments
 TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
Methodology
Protocol
(Reno, Vegas, RED, REM/PI…)
x(t  1)  F ( p(t ), x(t ))
p(t  1)  G ( p(t ), x(t ))
Equilibrium
 Performance
 Throughput, loss, delay
 Fairness
 Utility
netlab.caltech.edu
Dynamics
 Local stability
 Cost of stabilization
Model
 Network
 Links l of capacities cl
Sources s
L(s) - links used by source s
Us(xs) - utility if source rate = xs
x1
x1  x3  c2
x1  x2  c1
c1
c2
x2
netlab.caltech.edu
x3
Summary: duality model
 Flow control problem (Kelly, Malloo, Tan 98)
U ( x )
max
s
xs  0
s
s
subject to
Rx  c
 Primal-dual algorithm
x(t  1)  F ( RT p(t ), x(t ))
p(t  1)  G ( p(t ), Rx (t ))
Reno, Vegas
DropTail, RED, REM
 TCP/AQM
 Maximize utility with different utility functions
 Result
(L 00):
(x*,p*) primal-dual optimal iff
yl*  cl with equality if
netlab.caltech.edu
pl*  0
Example utility functions
Reno - 1 :
3/ 2
tan 1
Ti

2 / 3 xiTi
Reno - 2 :
xiTi
1
log
Ti
2 xiTi  3
Vegas
 i log xi
:
General :
netlab.caltech.edu
(1   ) 1 xi1

log xi

 1
 1
Game interpretation
 Source s:
max U s ( xs )  xs  Rls pl
xs  0
xs (t  1)  U
 Link l:
l
' 1
s


  Rls pl (t ) 
 s



max pl   Rls xs  cl 
pl  0
 s




pl (t  1)   pl (t )   l   xs (t )  cl 
 s


netlab.caltech.edu

Synchronous convergence
Theorem (L & Lapsley 99)
Provided R has full row rank & Us strictly concave:
 Gradient projection algorithm of dual problem
 Converges to optimal primal-dual solutions if
2
l 
 SL
 Limit point: unique Pareto optimal Nash
equilibrium
netlab.caltech.edu
Asynchronous convergence
Sources and links update & compute
 at different times
 with different frequencies
 using delayed info
Theorem (L & Lapsley 99)
 Converges in asynchronous environment with
smaller 
netlab.caltech.edu
Equilibrium of Vegas
Network
 Link queueing delays: pl
 Queue length:
clpl
Sources
 Throughput:
xi
 E2E queueing delay :
qi
 Packets buffered:
xi qi   i d i
Ui(x) = i di log x
 Utility funtion:
 Proportional fairness
netlab.caltech.edu
Validation
(L. Wang, Princeton)
Source rates (pkts/ms)
# src1
src2
1 5.98 (6)
2 2.05 (2)
3.92 (4)
3 0.96 (0.94) 1.46 (1.49)
4 0.51 (0.50) 0.72 (0.73)
5 0.29 (0.29) 0.40 (0.40)
#
1
2
3
4
5
queue (pkts)
19.8 (20)
59.0 (60)
127.3 (127)
237.5 (238)
416.3 (416)
netlab.caltech.edu
src3
src4
3.54 (3.57)
1.34 (1.35)
0.68 (0.67)
3.38 (3.39)
1.30 (1.30)
baseRTT (ms)
10.18 (10.18)
13.36 (13.51)
20.17 (20.28)
31.50 (31.50)
49.86 (49.80)
src5
3.28 (3.34)
Methodology
Protocol
(Reno, Vegas, RED, REM/PI…)
x(t  1)  F ( p(t ), x(t ))
p(t  1)  G ( p(t ), x(t ))
Equilibrium
 Performance
 Throughput, loss, delay
 Fairness
 Utility
netlab.caltech.edu
Dynamics
 Local stability
 Cost of stabilization
Stability: Reno/RED
x
TCP
Rf(s)
F1
Network
FN
q
TCP:
 Small 
 Small c
 Large N
RED:
 Small 
 Large delay
netlab.caltech.edu
y
G1
AQM
GL
Rb
p
’(s)
Theorem (Low et al, Infocom’02)
Reno/RED is locally stable if
 c 3 3
2

N
3
(c  N ) 
( 1- ) 2
4  2   2 (1   ) 2
Stability: scalable control
x
TCP
Rf(s)
F1
Network
FN
q
xi (t )  xi e

y
G1
AQM
GL
Rb
p
’(s)
i
q (t )
 i mi i
p l (t ) 
1
 yl (t )  cl 
cl
Theorem (Paganini, Doyle, L, CDC’01)
Provided R is full rank, feedback loop is locally stable
for arbitrary delay, capacity, load and topology
netlab.caltech.edu
Stability: Stabilized Vegas
x
TCP
Rf(s)
F1
Network
FN
q
y
G1
AQM
GL
Rb

1
xi ( t ) qi ( t )
-1
xi 
tan

(
t
)
1

  i (t )qi (t )
i di
2
T (t )
p
’(s)

p l (t ) 
1
 yl (t )  cl 
cl
Theorem (Choe & L, Infocom’03)
Provided R is full rank, feedback loop is locally stable if
max xiTi   (a,  )
netlab.caltech.edu
Stability: Stabilized Vegas
x
TCP
-1
Rf(s)
F1
Network
FN
q

1

xi 
sgn 1 
2
T (t )
y
G1
AQM
GL
Rb
xi ( t ) qi ( t )
i di
p
’(s)

p l (t ) 
1
 yl (t )  cl 
cl
Theorem (Choe & L, Infocom’03)
Provided R is full rank, feedback loop is locally stable if
max xiTi   (a,  )
netlab.caltech.edu
Stability: FAST
x
TCP
Rf(s)
F1
Network
FN
q
y
G1
AQM
GL
Rb

1
xi ( t ) qi ( t )
-1
xi 
tan

(
t
)
1

  i (t )qi (t )
i di
2
T (t )
p
’(s)

p l (t ) 
1
 yl (t )  cl 
cl
Application
 Stabilized TCP with current routers
 Queueing delay as congestion measure has right scaling
 Incremental deployment with ECN
netlab.caltech.edu
Outline
 Motivation
 Network model
 FAST TCP
 Equilibrium
 Stability
 Experiments
 TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
Window control algorithm
Theorem (Jin, Wei, L ‘03)
In absence of delay
 Mapping from w(t) to w(t+1) is contraction
 Global exponential convergence
 Full utilization after finite time
 Utility function: i log xi (proportional fairness)
netlab.caltech.edu
Network
(Sylvain Ravot, caltech/CERN)
netlab.caltech.edu
FAST BMPS
10
9
7
FAST
2
1
Internet2
Land Speed
Record
netlab.caltech.edu
1
2
FAST
 Standard MTU
 Throughput averaged over > 1hr
#flows
Aggregate throughput
88%
FAST
 Standard MTU
 Utilization averaged over > 1hr
90%
90%
Average
utilization
92%
95%
1hr
1 flow
netlab.caltech.edu
1hr
2 flows
6hr
7 flows
1.1hr
6hr
9 flows
10 flows
Aggregate throughput
92%
FAST
 Standard MTU
 Utilization averaged over 1hr
2G
48%
Average
utilization
95%
1G
27%
16%
19%
txq=100
txq=10000
Linux TCP
Linux TCP
netlab.caltech.edu
FAST
Linux TCP
Linux TCP
FAST
SCinet
Caltech-SLAC experiments
Acknowledgments
SC2002
Baltimore, Nov 2002
netlab.caltech.edu/FAST

Prototype
C. Jin, D. Wei

Theory
D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang
(UCLA)

Experiment/facilities

Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.
Ravot (Caltech/CERN), S. Singh

CERN: O. Martin, P. Moroni

Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip

DataTAG: E. Martelli, J. P. Martin-Flatin

Internet2: G. Almes, S. Corbato

Level(3): P. Fernes, R. Struble

SCinet: G. Goddard, J. Patton

SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.
Navratil, J. Williams

StarLight: T. deFanti, L. Winkler

Major sponsors
ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
FAST
Dynamic sharing: 3 flows
Dynamic sharing on Dummynet





netlab.caltech.edu
capacity = 800Mbps
delay=120ms
3 flows
iperf throughput
Linux 2.4.x (HSTCP: UCL)
Linux
FAST
Dynamic sharing: 3 flows
Linux
Steady throughput
HSTCP
netlab.caltech.edu
STCP
queue
FAST
loss
Linux
throughput
30min
Dynamic sharing on Dummynet
 capacity = 800Mbps
HSTCP
 delay=120ms
 14 flows
 iperf throughput
 Linux 2.4.x (HSTCP: UCL)
netlab.caltech.edu
STCP
queue
Room for mice !
FAST
loss
Linux
throughput
HSTCP
HSTCP
netlab.caltech.edu
30min
STCP
Outline
 Motivation
 Network model
 FAST TCP
 Equilibrium
 Stability
 Experiments
 TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
Network model
x
y
R
F1
Network
TCP
G1
FN
q
AQM
GL
R
T
p
Rli  1 if source i uses link l
IP routing
x(t  1)  F ( RT p(t ), x(t ))
p(t  1)  G ( p(t ), Rx (t ))
Reno, Vegas
netlab.caltech.edu
DT, RED, …
Motivation
Primal : max max
R
x 0
Dual :
netlab.caltech.edu

min 
p 0

U ( x )
i
i
subject to Rx  c
i



U i ( xi )  xi max  Rli pl    pl cl 
i max
Ri
xi  0 
l
 l

Motivation
Primal : max max
R
x 0
Dual :

min 
p 0

U ( x )
i
i
subject to Rx  c
i



U i ( xi )  xi max  Rli pl    pl cl 
i max
Ri
xi  0 
l
 l

Shortest path routing!
Can TCP/IP maximize utility?
netlab.caltech.edu
TCP-AQM/IP
Theorem (Wang, et al 03)
Primal problem is NP-hard
 Proof
Reduce integer partition to primal problem
Given: integers {c1, …, cn}
Find: set A s.t.
c  c
iA
netlab.caltech.edu
i
iA
i
TCP-AQM/IP
Theorem (Wang, et al 03)
Primal problem is NP-hard
 Achievable utility of TCP/IP?
 Stability?
 Duality gap?
Conclusion: Inevitable tradeoff between
 achievable utility
 routing stability
netlab.caltech.edu
Ring network
destination
r
TCP/AQM
IP
netlab.caltech.edu
 Single destination
 Instant convergence of TCP/IP
 Shortest path routing
 Link cost =  pl(t) +  dl
price
routing
 pl(0)
 pl(1)
r(0)
r(1)
…
static
r(t), r(t+1) ,
…
Ring network
destination
 Stability: r ?
 Utility: V ?
r* : optimal routing
V* : max utility
r
TCP/AQM
IP
netlab.caltech.edu
 pl(0)
 pl(1)
r(0)
r(1)
…
r(t), r(t+1) ,
…
Ring network
destination
 Stability: r ?
 Utility: V ?
link cost =  pl(t) +  dl
r
netlab.caltech.edu
Theorem (Infocom 2003)
 “No” duality gap
 Unstable if  = 0
starting from any r(0), subsequent
r(t) oscillates between 0 and 1
Ring network
destination
 Stability: r ?
 Utility: V ?
link cost =  pl(t) +  dl
r
Theorem (Infocom 2003)
 Solve primal problem asymptotically
as   
| r *  r | 0
V *  V  0
netlab.caltech.edu
Ring network
destination
 Stability: r ?
 Utility: V ?
link cost =  pl(t) +  dl
r
netlab.caltech.edu
Theorem (Infocom 2003)
  large: globally unstable
  small: globally stable
  medium: depends on r(0)
General network
Conclusion: Inevitable tradeoff between
 achievable utility
 routing stability
random graph
20 nodes, 200 links
netlab.caltech.edu
Achievable utility
netlab.caltech.edu/FAST
 FAST TCP: motivation, architecture,
algorithms, performance.
submitted for publication, July 1, 2003
 -release: August 2003
Inquiry: fast-support@cs.caltech.edu
 FAST Project Review
Caltech, Oct 27-28, 2003
netlab.caltech.edu
Download