Long-Range-Dependence in a Changing Internet Traffic Mix

advertisement
STATISTICAL and APPLIED MATHEMATICAL
SCIENCES INSTITUTE
Long-Range Dependence in a Changing
Internet Traffic Mix
Cheolwoo Park
SAMSI
Félix Hernández-Campos
Don Smith
Department of Computer Science,
UNC-Chapel Hill
J. S. Marron
Department of Statistics and
Operations Research,
UNC-Chapel Hill
David Rolls
Department of Mathematics and
Statistics,
UNC-Wilmington
1
Measurements
Capture TCP/IP packet headers on
Gigabit Ethernet link (inbound from
Internet)
UNC
1 Gbps
Ethernet
Internet
~35,000 Internet
Users
Monitor
(tcpdump)
2
Summary data
• Two-hour traces, 2nd week in April of 2002 and 2003
– 5:00 AM, 10:00 AM, 3:00 PM, 9:30 PM on each of 7 days
– 28 traces (56 hours) per year
• 2002 Traces
~ 5 billion packets
~ 1.6 terabytes of network traffic
• 2003 Traces
~ 10 billion packets
~ 2.9 terabytes of network traffic
~ 95% TCP packets
~ 5% UDP packets
~ 75% TCP packets
~ 25% UDP packets
~ 93% TCP bytes
~ 7% UDP bytes
~ 86% TCP bytes
~ 14% UDP bytes
10% max 2-hr. mean link
utilization
18% max 2-hr. mean link
utilization
0.01%-0.16% packets dropped by
monitor
0 packets dropped by monitor
3
Hurst parameter (H) estimates
and confidence intervals
• H estimated from wavelet
analysis tools (“logscale
diagrams” of D. Veitch)
• H estimates for 2003 packet
counts were significantly
lower than for 2002 (not true
for byte counts).
• Several traces had H > 1 or
very wide confidence
intervals.
• H estimates were
independent of time of day or
day of week (both packets
and bytes) in both years.
4
H not related to link utilization
or active TCP connections
5
Extreme examples of H > 1 or
wide confidence intervals
Trace
Type
H and CI
Wednesday 10:00 AM, 2002
packets
0.84 [0.44, 1.24]
Wednesday 10:00 AM, 2002
bytes
0.65 [0.26, 1.05]
Wednesday 3:00 PM, 2002
bytes
1.23 [1.11, 1.34]
Monday 10:00 AM, 2003
packets
1.31 [1.13, 1.49]
Friday 3:00 PM, 2003
bytes
0.87 [0.46, 1.27]
Saturday 10:00 AM, 2003
packets
1.18 [0.98, 1.39]
Table I. Traces with H estimates or CI ranges that represent extreme examples
6
Dependent SiZer analysis of
wide CI example
• Test for statistically
significant differences
from FGN process
with parameters
estimated from data,
H=0.8
• Top: local linear
smoothing of data with
different window
widths
• Bottom: statistical
inference on trends of
smoothed curve at
each window width
7
Dependent SiZer analysis of H > 1
example
• Analysis shows
both non-linear
trends and greater
variability than
FGN process at
many time scales
8
Logscale diagram of typical
2002 and 2003 traces
• Protocol dependent
analysis suggested by
increase in UDP
• Filtered traces to
create new traces:
TCP only and UDP
only
• TCP is dominant
influence in all cases
except 2003 packet
counts where UDP
dominates.
• Sharp increase at
middle scales shapes
H estimate (less slope
so lower H).
9
Same conclusion for all traces.
Why?
10
The Blubster effect (2003’s hot new
peer-to-peer file sharing application)
• Recall that UDP
packets increased to
25% of 2003 packets
(but only 14% of
bytes).
• Analysis of UDP
packets found 70%
from application
(Blubster) in 2003 that
was negligible in 2002.
• Second filtering: make
Blubster-only and
“Rest (TCP + other
UDP) traces.
• Blubster alone
dominated H estimate
for packets, not bytes
11
Why?
Blubster’s packet traffic is periodic
• SiZer analysis of
Blubster trace looking
for structure beyond
white noise
• Found high-frequency
variability with periods
in 1-5 second range
(caused by update and
search queries among
peers)
• These correspond to the
time-scales in logscale
diagram where UDP
dominates the wavelet
coefficients.
12
Results summary
• We presented results from a study of traffic on the UNC Internet
link from two years, 2002 and 2003.
• A single application generating about 18% of packets and < 10%
of bytes in traces can strongly influence the H estimate (in this
case, because of periodic behavior).
• A significant number of traces produced H estimates >1 or wide
confidence intervals.
• Dependent Sizer is an effective tool for augmenting wavelet
analysis and understanding structure in Internet data.
• H was not related to time-of-day, day-of-week, link utilization, or
number of active TCP connections.
13
Download