Congestion Responsiveness of Internet Traffic (a fresh look at an old problem)

advertisement
Congestion Responsiveness of
Internet Traffic
(a fresh look at an old problem)
Ravi Prasad
&
Constantine Dovrolis
Networking and Telecommunications Group
College of Computing,
Georgia Tech
TCP and Internet stability
 Stable network: the offered load stays below the capacity
(ρ<1)
 Otherwise, persistent packet losses
 Congestion collapse: fully utilized links, but almost zero
per-flow goodput
 Conventional wisdom #1: the Internet manages to be
stable due to TCP congestion control
 TCP: more than 90% of Internet traffic
 TCP reduces offered load (send window) upon signs of
congestion
 Negative-feedback loop, stabilizing queueing system
 Conventional wisdom #2: stability can be maintained
without admission control or resource reservations
TCP-centric congestion control
 If all flows use TCP, or TCP-friendly congestion control, then
the Internet will be stable



TCP congestion control -> no congestion collapse
“Promoting the use of end-to-end congestion control in the
Internet”, Floyd & Fall, ToN’99
“Congestion control principles”, Floyd, RFC2914, 2000
 Key modeling unit: persistent flows (they last forever!)




“Rate control in communication networks: shadow prices,
proportional fairness and stability”, Kelly et al., JORS’98
“Congestion control for high performance, stability, and fairness
in general networks”, Paganini et al., ToN’05
Number of active flows does not change with time
Infinitely long flows can be effectively controlled
Flows are generated by users/applications,
not by the transport layer!
Receiver
Sender
Application
Request
Response
Transport
Network
 Examples: user clicks web page,
p2p movie download, machinegenerated periodic FS
synchronization
 Session: Set of finite (i.e., nonpersistent) flows, generated by
single user action
 Key issue: session arrival
process
 Does the session arrival rate
reduce during congestion?
Two fundamental flow arrival models
 Closed-loop model



Fixed number of users, each
user can generate one session
at a time
New session arrival: depends
on completion of previous
session
E.g., ingress traffic in campus
network (student downloads)
 Open-loop model



Sessions arrive in network
independently of congestion
Theoretically, infinite
population of users
E.g., egress traffic at popular
Web server
 Very different models in
terms of congestion
responsiveness & stability
1
2
3
N
Related work
 Open-loop traffic model


“Statistical bandwidth sharing: a study of congestion at flow
level”, Fredj et al., Sigcomm’01
“Stability and performance analysis of networks supporting
services”, Veciana et al., ToN’01
 Closed-loop traffic model


“A new method for the analysis of feedback-based protocols
with applications to engineering web traffic over the
Internet”, Heyman et al., Sigmetrics’99
“Dimensioning bandwidth for elastic traffic in high-speed
data networks”, Berger & Kogan, ToN’00
 Main open issues:
1.
What do the previous two models imply for the congestion
responsiveness of aggregate Internet traffic?
2. Which of the previous two models is closer to real Internet
traffic?
Our contributions
 Introduce two new metrics for congestion responsiveness of
aggregate Internet traffic

Elasticity and instability coefficient
 Examine congestion responsiveness of several traffic models,
including open-loop, closed-loop, and mixed traffic


Open-loop TCP traffic is less congestion responsive than even
UDP traffic!
Closed-loop traffic is more congestion responsive than
persistent flows
 Design experimental methodology to measure Close-loop
Traffic Ratio (CTR)


Measure CTR in several Internet packet traces
70-90% of Internet traffic appears to be closed-loop
 Several of implications for networking research & practice
Outline
 Congestion responsiveness metrics
 Elasticity
 Instability coefficient
 Results for ideal Processor Sharing (PS) server
 Closed-loop flow arrival model
 Open-loop flow arrival model
 Congestion responsiveness of four traffic models
 Persistent TCP flows
 UDP constant-rate streams
 Open-loop TCP flows
 Closed-loop TCP flows
 Congestion responsiveness of real network traffic
 Methodology and measurements
 Summary and implications
Elasticity metric
 Quantifies the extent to which a traffic aggregate
backs off upon a congestion event
 U and U ’ : average throughput of aggregate traffic
prior and during stimulus, respectively
 Defined as fractional change in throughput
U U '
f 
U
 Depends on congestion event cause
 Canonical congestion event: a persistent TCP transfer
(stimulus) that is not limited by the receiver’s window
Elasticity
 f=1
 Completely responsive
Stimulus
Cross-traffic
 f=0
 Completely unresponsive
Elasticity
 Positive elasticity
Stimulus
Cross-traffic
 Negative elasticity
 When cross traffic increases its rate upon congestion
Instability Coefficient
 Instability coefficient quantifies whether (and
how fast) a traffic aggregate can lead to
congestion collapse upon congestion at time t
 Defined as (t)=dN(t)/dt
 N(t) : number of active sessions at time t
 ≤0
 Fixed or decreasing number of active sessions
 Stable network
 >0
 Increasing number of active sessions
 Has the potential to cause congestion collapse
 Larger ; faster move towards congestion collapse
Instability Coefficient
 Simulation of a stable network:  = 0
 Open-loop model: session arrival rate 200/sec
Instability Coefficient
 Simulation of an unstable network  > 0
 Open-loop model: session arrival rate 400/sec
Outline
 Congestion responsiveness metrics
 Elasticity
 Instability coefficient
 Results for ideal Processor Sharing (PS) server
 Closed-loop flow arrival model
 Open-loop flow arrival model
 Congestion responsiveness of four traffic models
 Persistent TCP flows
 UDP constant-rate streams
 Open-loop TCP flows
 Closed-loop TCP flows
 Congestion responsiveness of real network traffic
 Methodology and measurements
 Summary and implications
Closed-loop model – PS server
 N users: cycles of transfer
and idle periods
 S : Average session size
 TT : Average transfer


duration
TI : Average idle time
TT increases during
congestion
 Na: Number of active
sessions
 Elasticity f = 1/(Na+1)
 Instability coefficient
: cannot be positive
indefinitely ( Na<N )
Roffered
NS

TI  TT
NS

CTI
E[ N a ] 

,   1
1 
CTI
E[ N a ]  N 
,  1
S
Open-loop model – PS server
 Poisson session arrivals
 S : Average session size
  : Session arrival rate
 Offered load = S/C
 Stable only if  <1
 Expected throughput for
new transfer:
 C(1-) : available bw
 Elasticity
f=0
 Instability coefficient:
  0 if  >1
Roffered  S

S
C
S
E[ ]  C (1   ),   1
T
Mixed traffic
 Internet traffic: mix of open-loop and
closed-loop traffic
 Mixed traffic can be characterized by
Closed-loop Traffic Ratio (CTR)
Traffic load from closed loop model
CTR 
Total traffic load
 fmix = CTR* fclosed
 mix > 0 when open > 1
 Not when open +closed >1
Outline
 Congestion responsiveness metrics
 Elasticity
 Instability coefficient
 Results for ideal Processor Sharing (PS) server
 Closed-loop flow arrival model
 Open-loop flow arrival model
 Congestion responsiveness of four traffic models
 Persistent TCP flows
 UDP constant-rate streams
 Open-loop TCP flows
 Closed-loop TCP flows
 Congestion responsiveness of real network traffic
 Methodology and measurements
 Summary and implications
Persistent TCP transfers
 N homogenous transfers
 Stimulus increases RTT and loss
rate from (T,p) to (T’,p’)
 UMass model to estimate TCP
average throughput
f  1
NM
T'
3
2bp'
NM
T
3
2bp
1

N 1
 Number of transfers remains
constant, i.e.,  = 0
Constant-rate UDP transfers
 Fixed number of constant-rate flows
 UDP flows do not react to congestion, and they do
not retransmit lost packets
 Throughput after stimulus: U’= (1-p)U
 Elasticity f = p >0
 Truly congestion responsive traffic should have
larger elasticity than loss rate
 Instability coefficient is zero
 Number of flows does not change during congestion
 Cannot cause congestion collapse
Open-loop TCP transfers
 Poisson stream of TCP flows


Size uniformly distributed
between 16-20pkts
Arrival rate  chosen to
vary offered load 
 Ideally, f=0 when <1

But, negative elasticity is
possible with TCP redundant
retransmissions
 Increased offered load
after stimulus
  is positive when >1

Possible congestion collapse
 Open-loop traffic is net’s
worse enemy
Closed-loop TCP transfers

When loss rate ~ 0 (i.e.,
small number of sessions)


Stimulus increases RTT
from T to T’
Transfer latency
increases from kT to kT’
k (T 'T )
f 
kT 'TI


With small number of active
sessions:

Elasticity: about constant


Elasticity > 1/(Na+1)
Closed-loop TCP traffic: more
elastic than persistent flows
With large number of active
sessions:
Summary
Traffic class
Elasticity
Stability
Persistent TCP
elastic f=1/(N+1)
stable
N homogenous flows
UDP const-rate
inelastic
f=p
stable
f≤0
unstable if  > 1
p: loss rate
Open-loop TCP
inelastic
Closed-loop TCP
elastic f>1/(Na+1)
stable
Outline
 Congestion responsiveness metrics
 Elasticity
 Instability coefficient
 Results for ideal Processor Sharing (PS) server
 Closed-loop flow arrival model
 Open-loop flow arrival model
 Congestion responsiveness of four traffic models
 Persistent TCP flows
 UDP constant-rate streams
 Open-loop TCP flows
 Closed-loop TCP flows
 Congestion responsiveness of real network traffic
 Methodology and measurements
 Summary and implications
What to measure?
 Direct elasticity measurements require packet
traces at bottleneck during stimulus
 We have access to only a couple of such links
 Direct measurements of instability coefficient
require packet traces during congestion events
 We have access to only a couple of congested links
 Alternative: Measure CTR (closed-loop traffic
ratio)
 Indirect metric for congestion responsiveness
 High CTR (close to one): mostly closed-loop traffic
 Low CTR (close to zero): mostly open-loop traffic
CTR estimation (overview)
 Start with packet trace from Internet link
 Per-packet: arrival time, src/dst address & ports, size
 Focus only on TCP traffic: HTTP and well-known ports
 Identify users:
 Downloads: user is associated with unique DST address
 Uploads: user is associated with unique SRC address
 Multi-user hosts and NATs is a problem (see paper for
details)
 For each user, identify sessions:
 Session: one or more connections (“jobs”) associated with
same user action
 E.g., Web page download: multiple HTTP connections
 Classify sessions as open-loop or closed-loop:
 Successive sessions from same user: closed-loop
 Session from a new user, or session arriving from known
user after a long idle period: open-loop
From Connections to Jobs to Sessions





An HTTP 1.1 connection can
stay alive across multiple
sessions
Job : Segment of TCP
connection that belongs to a
single session
Intra-job packet
interarrivals: TCP and
network-dependent (short)
Inter-job packet
interarrivals: caused by user
actions (long)
Classify interarrivals based
on Silence Threshold (STH)
1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114
1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
Inter job gap
Intra job gap
Silence Threshold (STH) estimation
Inter job gap
Intra job gap
Group jobs from same user in sessions
 Intuition: jobs from
same session will have
short interarrivals
(machine-generated)
 Minimum Session
Interarrival (MSI)
threshold
 MSI aims to distinguish
machine-generated from
user-initiated events

MSI = 1-5 seconds
1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114
1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
<MSI >MSI
Inter job gap
session 1
session 2
Intra job gap
session 3
Classify sessions as open/closed-loop
 First session from a user is
always open-loop
 Session from a returning user
is also open-loop, if it starts
more than MTT seconds since
completion of last session
 MTT: Maximum Think Time

Typically, MTT would be
several minutes
1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114
1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380
1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
<MSI >MSI
> MTT
Inter job gap
session 1
Open
session 2
Open
< MTT
Intra job gap
session 3
Close
Robustness to MSI & MTT thresholds
 Examined CTR variation in
the following ranges:
 MSI: 0.1sec-2sec
 MTT : 10min-25min
 CTR variation < 0.05
 Linear regression:
 CTR/MSI = -0.0044/sec
 CTR/MTT = 0.0037/min
 We use:
 MSI=1 Sec.
 MTT=15 Min.
Sample CTR measurements
Link
location
Year
Georgia
Tech.
05
Los
Nettos
Direction
Duration
TCP
HTTP Download
Well-known ports
GB(%)
Bytes(%)
CTR
Bytes(%)
CTR
In
2Hr.
129(97)
44.7
0.90
18.8
0.60
Out
2Hr.
208(99)
37.3
0.63
10.6
0.70
04
Core
1Hr.
59(95)
36.2
0.93
29.3
0.83
UNC,
Chapel
Hill
03
In
1Hr.
41(87)
22.9
0.95
3.6
0.69
Out
1Hr.
153(97)
19.0
0.76
16.8
0.91
Abilene,
Indianapo
lis
02
Core
1Hr.
172(96)
8.0
0.78
33.9
0.91
Core
1Hr.
178(85)
11.5
0.82
35.8
0.89
Univ. of
Auckland,
NZ
01
In
6Hr.
0.6(95)
42.4
0.92
30.6
0.24
Out
6Hr.
1.4(98)
70.4
0.79
7.6
0.72
Outline
 Congestion responsiveness metrics
 Elasticity
 Instability coefficient
 Results for ideal Processor Sharing (PS) server
 Closed-loop flow arrival model
 Open-loop flow arrival model
 Congestion responsiveness of four traffic models
 Persistent TCP flows
 UDP constant-rate streams
 Open-loop TCP flows
 Closed-loop TCP flows
 Congestion responsiveness of real network traffic
 Methodology and measurements
 Summary and implications
Summary
 Persistent transfers have very different
congestion responsiveness than finite-size
transfers
 Focus on open-loop and closed-loop flow arrivals
 TCP or TCP-like protocols are not sufficient to
avoid congestion collapse
 Negative feedback at session/application layer
holds key for network stability
 Measurements show high CTR values for most
Internet links we examined
 Possibly why Internet is mostly stable
Is AQM an effective controller?
 Active Queue Management (AQM)
 Most AQM models assume persistent TCP flows
 Provides congestion signal to flows
 Stabilizes buffer occupancy
 Controls link utilization
 However, AQM is ineffective controller in
presence of open-loop TCP traffic
 Flow arrival process does not react to AQM drops
 Congestion collapse still possible with AQM
Is admission control necessary?
 Admission control is an effective way to
control the offered load with open-loop traffic
 Avoids flow aborts and reattempts
 See proposals by J. Roberts and others
 However, admission control is not required with
closed-loop traffic
 Closed-loop traffic is self-regulating
 As long as the maximum possible number of active
sessions does not exceed a certain threshold
What about TCP-friendliness?
 “TCP friendliness” has been proposed for
all non-TCP traffic as a way to avoid
congestion collapse
 However, like TCP, open-loop TCP friendly
sessions can still cause congestion collapse
 TCP friendliness is more important for
fairness reasons (share bw almost equally
with TCP)
Traffic models for simulations-analysis
 Time to drop the persistent flows assumption!
 It is not realistic
 It has very different congestion responsiveness
than real Internet traffic
 More realistic aggregate traffic models:
 Mix of both open-loop and closed-loop finite-size
sessions
 We need more CTR measurements to characterize
the mix
 We need mathematical models for closed-loop
traffic behavior, considering user behavior under
congestion
Session/application congestion control
 Several existing applications generate sessions
independent of network congestion (bad!)
 Example-1: NNTP servers transfer news periodically
 Example-2: CDN servers exchange content as
needed or periodically
 Client-side control mechanism:
 Do not start new session before current session
completes
 Server-side control mechanism:
 Use admission control when number of active
sessions exceeds threshold
Download