Congestion Responsiveness of Internet Traffic (a fresh look at an old problem) Ravi Prasad & Constantine Dovrolis Networking and Telecommunications Group College of Computing, Georgia Tech TCP and Internet stability Stable network: the offered load stays below the capacity (ρ<1) Otherwise, persistent packet losses Congestion collapse: fully utilized links, but almost zero per-flow goodput Conventional wisdom #1: the Internet manages to be stable due to TCP congestion control TCP: more than 90% of Internet traffic TCP reduces offered load (send window) upon signs of congestion Negative-feedback loop, stabilizing queueing system Conventional wisdom #2: stability can be maintained without admission control or resource reservations TCP-centric congestion control If all flows use TCP, or TCP-friendly congestion control, then the Internet will be stable TCP congestion control -> no congestion collapse “Promoting the use of end-to-end congestion control in the Internet”, Floyd & Fall, ToN’99 “Congestion control principles”, Floyd, RFC2914, 2000 Key modeling unit: persistent flows (they last forever!) “Rate control in communication networks: shadow prices, proportional fairness and stability”, Kelly et al., JORS’98 “Congestion control for high performance, stability, and fairness in general networks”, Paganini et al., ToN’05 Number of active flows does not change with time Infinitely long flows can be effectively controlled Flows are generated by users/applications, not by the transport layer! Receiver Sender Application Request Response Transport Network Examples: user clicks web page, p2p movie download, machinegenerated periodic FS synchronization Session: Set of finite (i.e., nonpersistent) flows, generated by single user action Key issue: session arrival process Does the session arrival rate reduce during congestion? Two fundamental flow arrival models Closed-loop model Fixed number of users, each user can generate one session at a time New session arrival: depends on completion of previous session E.g., ingress traffic in campus network (student downloads) Open-loop model Sessions arrive in network independently of congestion Theoretically, infinite population of users E.g., egress traffic at popular Web server Very different models in terms of congestion responsiveness & stability 1 2 3 N Related work Open-loop traffic model “Statistical bandwidth sharing: a study of congestion at flow level”, Fredj et al., Sigcomm’01 “Stability and performance analysis of networks supporting services”, Veciana et al., ToN’01 Closed-loop traffic model “A new method for the analysis of feedback-based protocols with applications to engineering web traffic over the Internet”, Heyman et al., Sigmetrics’99 “Dimensioning bandwidth for elastic traffic in high-speed data networks”, Berger & Kogan, ToN’00 Main open issues: 1. What do the previous two models imply for the congestion responsiveness of aggregate Internet traffic? 2. Which of the previous two models is closer to real Internet traffic? Our contributions Introduce two new metrics for congestion responsiveness of aggregate Internet traffic Elasticity and instability coefficient Examine congestion responsiveness of several traffic models, including open-loop, closed-loop, and mixed traffic Open-loop TCP traffic is less congestion responsive than even UDP traffic! Closed-loop traffic is more congestion responsive than persistent flows Design experimental methodology to measure Close-loop Traffic Ratio (CTR) Measure CTR in several Internet packet traces 70-90% of Internet traffic appears to be closed-loop Several of implications for networking research & practice Outline Congestion responsiveness metrics Elasticity Instability coefficient Results for ideal Processor Sharing (PS) server Closed-loop flow arrival model Open-loop flow arrival model Congestion responsiveness of four traffic models Persistent TCP flows UDP constant-rate streams Open-loop TCP flows Closed-loop TCP flows Congestion responsiveness of real network traffic Methodology and measurements Summary and implications Elasticity metric Quantifies the extent to which a traffic aggregate backs off upon a congestion event U and U ’ : average throughput of aggregate traffic prior and during stimulus, respectively Defined as fractional change in throughput U U ' f U Depends on congestion event cause Canonical congestion event: a persistent TCP transfer (stimulus) that is not limited by the receiver’s window Elasticity f=1 Completely responsive Stimulus Cross-traffic f=0 Completely unresponsive Elasticity Positive elasticity Stimulus Cross-traffic Negative elasticity When cross traffic increases its rate upon congestion Instability Coefficient Instability coefficient quantifies whether (and how fast) a traffic aggregate can lead to congestion collapse upon congestion at time t Defined as (t)=dN(t)/dt N(t) : number of active sessions at time t ≤0 Fixed or decreasing number of active sessions Stable network >0 Increasing number of active sessions Has the potential to cause congestion collapse Larger ; faster move towards congestion collapse Instability Coefficient Simulation of a stable network: = 0 Open-loop model: session arrival rate 200/sec Instability Coefficient Simulation of an unstable network > 0 Open-loop model: session arrival rate 400/sec Outline Congestion responsiveness metrics Elasticity Instability coefficient Results for ideal Processor Sharing (PS) server Closed-loop flow arrival model Open-loop flow arrival model Congestion responsiveness of four traffic models Persistent TCP flows UDP constant-rate streams Open-loop TCP flows Closed-loop TCP flows Congestion responsiveness of real network traffic Methodology and measurements Summary and implications Closed-loop model – PS server N users: cycles of transfer and idle periods S : Average session size TT : Average transfer duration TI : Average idle time TT increases during congestion Na: Number of active sessions Elasticity f = 1/(Na+1) Instability coefficient : cannot be positive indefinitely ( Na<N ) Roffered NS TI TT NS CTI E[ N a ] , 1 1 CTI E[ N a ] N , 1 S Open-loop model – PS server Poisson session arrivals S : Average session size : Session arrival rate Offered load = S/C Stable only if <1 Expected throughput for new transfer: C(1-) : available bw Elasticity f=0 Instability coefficient: 0 if >1 Roffered S S C S E[ ] C (1 ), 1 T Mixed traffic Internet traffic: mix of open-loop and closed-loop traffic Mixed traffic can be characterized by Closed-loop Traffic Ratio (CTR) Traffic load from closed loop model CTR Total traffic load fmix = CTR* fclosed mix > 0 when open > 1 Not when open +closed >1 Outline Congestion responsiveness metrics Elasticity Instability coefficient Results for ideal Processor Sharing (PS) server Closed-loop flow arrival model Open-loop flow arrival model Congestion responsiveness of four traffic models Persistent TCP flows UDP constant-rate streams Open-loop TCP flows Closed-loop TCP flows Congestion responsiveness of real network traffic Methodology and measurements Summary and implications Persistent TCP transfers N homogenous transfers Stimulus increases RTT and loss rate from (T,p) to (T’,p’) UMass model to estimate TCP average throughput f 1 NM T' 3 2bp' NM T 3 2bp 1 N 1 Number of transfers remains constant, i.e., = 0 Constant-rate UDP transfers Fixed number of constant-rate flows UDP flows do not react to congestion, and they do not retransmit lost packets Throughput after stimulus: U’= (1-p)U Elasticity f = p >0 Truly congestion responsive traffic should have larger elasticity than loss rate Instability coefficient is zero Number of flows does not change during congestion Cannot cause congestion collapse Open-loop TCP transfers Poisson stream of TCP flows Size uniformly distributed between 16-20pkts Arrival rate chosen to vary offered load Ideally, f=0 when <1 But, negative elasticity is possible with TCP redundant retransmissions Increased offered load after stimulus is positive when >1 Possible congestion collapse Open-loop traffic is net’s worse enemy Closed-loop TCP transfers When loss rate ~ 0 (i.e., small number of sessions) Stimulus increases RTT from T to T’ Transfer latency increases from kT to kT’ k (T 'T ) f kT 'TI With small number of active sessions: Elasticity: about constant Elasticity > 1/(Na+1) Closed-loop TCP traffic: more elastic than persistent flows With large number of active sessions: Summary Traffic class Elasticity Stability Persistent TCP elastic f=1/(N+1) stable N homogenous flows UDP const-rate inelastic f=p stable f≤0 unstable if > 1 p: loss rate Open-loop TCP inelastic Closed-loop TCP elastic f>1/(Na+1) stable Outline Congestion responsiveness metrics Elasticity Instability coefficient Results for ideal Processor Sharing (PS) server Closed-loop flow arrival model Open-loop flow arrival model Congestion responsiveness of four traffic models Persistent TCP flows UDP constant-rate streams Open-loop TCP flows Closed-loop TCP flows Congestion responsiveness of real network traffic Methodology and measurements Summary and implications What to measure? Direct elasticity measurements require packet traces at bottleneck during stimulus We have access to only a couple of such links Direct measurements of instability coefficient require packet traces during congestion events We have access to only a couple of congested links Alternative: Measure CTR (closed-loop traffic ratio) Indirect metric for congestion responsiveness High CTR (close to one): mostly closed-loop traffic Low CTR (close to zero): mostly open-loop traffic CTR estimation (overview) Start with packet trace from Internet link Per-packet: arrival time, src/dst address & ports, size Focus only on TCP traffic: HTTP and well-known ports Identify users: Downloads: user is associated with unique DST address Uploads: user is associated with unique SRC address Multi-user hosts and NATs is a problem (see paper for details) For each user, identify sessions: Session: one or more connections (“jobs”) associated with same user action E.g., Web page download: multiple HTTP connections Classify sessions as open-loop or closed-loop: Successive sessions from same user: closed-loop Session from a new user, or session arriving from known user after a long idle period: open-loop From Connections to Jobs to Sessions An HTTP 1.1 connection can stay alive across multiple sessions Job : Segment of TCP connection that belongs to a single session Intra-job packet interarrivals: TCP and network-dependent (short) Inter-job packet interarrivals: caused by user actions (long) Classify interarrivals based on Silence Threshold (STH) 1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114 1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 Inter job gap Intra job gap Silence Threshold (STH) estimation Inter job gap Intra job gap Group jobs from same user in sessions Intuition: jobs from same session will have short interarrivals (machine-generated) Minimum Session Interarrival (MSI) threshold MSI aims to distinguish machine-generated from user-initiated events MSI = 1-5 seconds 1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114 1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 <MSI >MSI Inter job gap session 1 session 2 Intra job gap session 3 Classify sessions as open/closed-loop First session from a user is always open-loop Session from a returning user is also open-loop, if it starts more than MTT seconds since completion of last session MTT: Maximum Think Time Typically, MTT would be several minutes 1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114 1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 <MSI >MSI > MTT Inter job gap session 1 Open session 2 Open < MTT Intra job gap session 3 Close Robustness to MSI & MTT thresholds Examined CTR variation in the following ranges: MSI: 0.1sec-2sec MTT : 10min-25min CTR variation < 0.05 Linear regression: CTR/MSI = -0.0044/sec CTR/MTT = 0.0037/min We use: MSI=1 Sec. MTT=15 Min. Sample CTR measurements Link location Year Georgia Tech. 05 Los Nettos Direction Duration TCP HTTP Download Well-known ports GB(%) Bytes(%) CTR Bytes(%) CTR In 2Hr. 129(97) 44.7 0.90 18.8 0.60 Out 2Hr. 208(99) 37.3 0.63 10.6 0.70 04 Core 1Hr. 59(95) 36.2 0.93 29.3 0.83 UNC, Chapel Hill 03 In 1Hr. 41(87) 22.9 0.95 3.6 0.69 Out 1Hr. 153(97) 19.0 0.76 16.8 0.91 Abilene, Indianapo lis 02 Core 1Hr. 172(96) 8.0 0.78 33.9 0.91 Core 1Hr. 178(85) 11.5 0.82 35.8 0.89 Univ. of Auckland, NZ 01 In 6Hr. 0.6(95) 42.4 0.92 30.6 0.24 Out 6Hr. 1.4(98) 70.4 0.79 7.6 0.72 Outline Congestion responsiveness metrics Elasticity Instability coefficient Results for ideal Processor Sharing (PS) server Closed-loop flow arrival model Open-loop flow arrival model Congestion responsiveness of four traffic models Persistent TCP flows UDP constant-rate streams Open-loop TCP flows Closed-loop TCP flows Congestion responsiveness of real network traffic Methodology and measurements Summary and implications Summary Persistent transfers have very different congestion responsiveness than finite-size transfers Focus on open-loop and closed-loop flow arrivals TCP or TCP-like protocols are not sufficient to avoid congestion collapse Negative feedback at session/application layer holds key for network stability Measurements show high CTR values for most Internet links we examined Possibly why Internet is mostly stable Is AQM an effective controller? Active Queue Management (AQM) Most AQM models assume persistent TCP flows Provides congestion signal to flows Stabilizes buffer occupancy Controls link utilization However, AQM is ineffective controller in presence of open-loop TCP traffic Flow arrival process does not react to AQM drops Congestion collapse still possible with AQM Is admission control necessary? Admission control is an effective way to control the offered load with open-loop traffic Avoids flow aborts and reattempts See proposals by J. Roberts and others However, admission control is not required with closed-loop traffic Closed-loop traffic is self-regulating As long as the maximum possible number of active sessions does not exceed a certain threshold What about TCP-friendliness? “TCP friendliness” has been proposed for all non-TCP traffic as a way to avoid congestion collapse However, like TCP, open-loop TCP friendly sessions can still cause congestion collapse TCP friendliness is more important for fairness reasons (share bw almost equally with TCP) Traffic models for simulations-analysis Time to drop the persistent flows assumption! It is not realistic It has very different congestion responsiveness than real Internet traffic More realistic aggregate traffic models: Mix of both open-loop and closed-loop finite-size sessions We need more CTR measurements to characterize the mix We need mathematical models for closed-loop traffic behavior, considering user behavior under congestion Session/application congestion control Several existing applications generate sessions independent of network congestion (bad!) Example-1: NNTP servers transfer news periodically Example-2: CDN servers exchange content as needed or periodically Client-side control mechanism: Do not start new session before current session completes Server-side control mechanism: Use admission control when number of active sessions exceeds threshold