PPT

advertisement
Estimating TCP Latency
Approximately with
Passive Measurements
Sriharsha Gangam, Jaideep
Chandrashekar, Ítalo Cunha, Jim Kurose
Motivation: Network
Troubleshooting
?
?
?
Latency and network
performance problems?
Where? Why?
• Home network or access link?
o WiFi signal or access link?
• ISP, enterprise and content provider networks
o Within the network or outside?
2
Passive Measurement
in the Middle
Passive measurement
device (gateway, router)
A
B
Challenge: Constrained Resources
• Decompose path latency of TCP flows
o Latency: Important interactive web-applications
o Other approaches: active probing, NetFlow
• Challenges
o Constrained devices and high data rates
3
TCP Latency
Source
TS1
TM1
TD1
Time
• Existing methods:
Emulate TCP state for
each flow
Destination
TD2
• Accurate estimates but
expensive
TM2
TS2
RTT = TM2 – TM1
4
TCP Latency
Source
• Scaling challenges
Large TCP
state
Time
o Large packet and flow
arrivals
o Constrained devices
(limited memory/CPU)
Destination
• ALE – Approximate
Latency Estimator
Expensive
lookup
5
Design Space
Ideal
solution
Accuracy
Existing solutions
e.g., tcptrace
ALE Goals:
Configurable
tradeoff
Expensive for
constrained devices
Overhead
6
ALE Overview
• Eliminate packet
timestamps
Source
Destination
o Group into time intervals
with granularity (w)
o SEG: 0-99, ACK: 100
Time
• Store expected ACK
ΔT = w
ΔT = w
• Use probabilistic set
membership data
structure
o Counting Bloom Filters
7
ALE Overview
• Sliding window of buckets (time intervals)
o Buckets contain a counting bloom filter (CBF)
RTT = Δ + w + w/2 Lookup
hash(inv(FlowId),
100)
hash(FlowId, 99+1)
i
i
i
i
S 0, 99 A 100
w
w/2 Match Max Error: w/2
i Δ
CBF
Current Bucket
TALE
CBF
CBF
CBF
w
w
w
Missed
Latencies
W
8
Controlling Error with
ALE Parameters
ALE-U (uniform)
Number of buckets = W/w
Increase W:
Higher
Coverage
W
w
Decrease w: Higher Accuracy
9
ALE-Exponential (ALE-E)
• Process large and small latency flows simultaneously
• Absolute error is proportional to the latency
• Larger buckets shift slowly
w
A
D
B
E
F
G
C
2w
4w
DC
BA
DCBA
Counting
Bloom Filter
(CBF) Merge
10
Error Sources
• Bloom filters are probabilistic structures
o False positives and negatives
Artifacts from TCP
• Retransmitted packets and Reordered packets
o Excess (to tcptrace) erroneous RTT samples
• ACK numbers not on SEQ boundaries, Cumulative ACKs
11
Evaluation Methodology
• 2 x Tier 1 backbone link traces (60s each) from
CAIDA
o Flows: 2,423,461, Packets: 54,089,453
o Bi-directional Flows: 93,791, Packets: 5,060,357
• Ground truth/baseline comparison: tcptrace
o Emulates TCP state machine
• Latencies ALE and tcptrace
o Packet and flow level
• Overhead of ALE and tcptrace
o Memory and compute
12
Latency Distribution
300 ms(%)
60 ms 120 ms Frequency
500 ms
60
40
20
0
0
100
200 300 400
Latency (ms)
500
Majority of the
latencies
600
13
Accuracy: RTT Samples
• Range 0-60 ms
• Box plot of RTT
differences
• More memory =>
Closer to tcptrace
Exact match with tcptrace
Increasing overhead
(smaller ‘w’)
14
Accuracy: RTT Samples
ALE-E(12) outperforms ALE-U(24)
for small latencies
Increasing overhead
Exact match with tcptrace
15
Accuracy: Flow Statistics
CDF
• Many applications use
aggregated flow
statics
• Small w (more buckets)
o Median flow latencies
approach tcptrace
Exact match with tcptrace
16
Accuracy: Flow Statistics
Small w: Standard
deviation of flow
latencies approach
tcptrace
Exact match with tcptrace
17
Compute and Memory
Overhead
• Sample flows uniformly at random at rates 0.1, 0.2, .
. . 0.7
o 5 pcap sub-traces per rate
o Higher sampling rate (data rate) ⇒ More state for tcptrace
• GNU Linux taskstats API
o Compute time and memory usage
18
Compute Time
Time (s)
0
20
40
60
80
TCPTRACE
U(96)
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Sampling Rate
ALE scales with
increasing data rates
19
Memory Overhead
TCPTRACE
ALE-U(96)
• RSS memory: ≈64 MB
(0.1) to ≈460 MB (0.7)
• RSS memory: 2.0 MB for
all Sampling rates
• VSS memory: ≈74 MB
(0.1) to ≈468 MB (0.7)
• VSS memory: 9.8 MB for
all sampling rates
ALE uses a fixed size data structure
20
Conclusions
• Current TCP measurements emulate state
o Expensive: high data rates, constrained devices
• ALE provides low overhead data structure
o Sliding window of CBF buckets
• Improvements over compute and memory with
tcptrace sacrificing small accuracy
o Tunable parameters
• Simple hash functions and counters
o Hardware Implementation
21
Thank You
22
ALE Compute and
Memory Bounds
• Insertion: O(h) time
o CBF with h hash functions
• Matching ACK number: O(hn) time
o ‘n’ buckets
• Shifting buckets: O(1) time
o Linked list of buckets
• ALE-E: O(C) time to merge CBFs
o ‘C’ counters in the CBF
• Memory usage: n × C × d bits
o ‘d‘ bit CBF counters
23
ALE Error Bounds
• ALE-U: Average case error w/4
• ALE-U: Worst case error w/2
• ALE-E: Average case error (3w/16)2i
• ALE-E: Worst case error (2i−1w)
o ACK has a match in bucket i
24
ALE Parameters
•
•
•
•
‘w’ on accuracy requirements
‘W’ estimate of the maximum latencies expected
CBFs with h = 4 hash functions
C on traffic rate, w, false positive rate and h
• E.g., For w = 20 ms, h = 4, and m = R × w, the
constraint for optimal h (h = (C/m) ln 2) yields C =
40396 counters
25
Compute Time
Time (s)
0
50 0
100
0
150
0
20 0
0
25 0
0
0.1
10 min trace
0.2
TCPTRACE
U(96)
0.5
0.4
0.3
Sampling Rate
ALE scales with
increasing data rates
0.6
0.7
26
Accuracy: RTT Samples
EXCESS MISSED VALID.SAMPLES
(0,60] (60,120] (120,300](300,2e+03
Number of Samples
0e+ E(12U(12U(24U(48U(96
)
)
)
)
1e+00 )
2e+05
3e+05
4e+05
05
2 00 0
00
40
0
6 0000
8 0000
00
5 00 0
10 0 0 0
15 0 0 0 0
000
0e +
1e+00
2e+05
3e+05
05
27
Memory Overhead
Maximum Virtual Memory (MB)
0
100
200
3 00
Maximum Resident Memory (MB)
4 00
0
10 0
2 00
300
4 00
TCPTRACE
U(96)
U(96)
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Sample Rate
TCPTRACE
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Sample Rate
28
ALE Overview
• Eliminate packet timestamps
o Time quantization: Use fixed number of intervals
S
S
S
0, 99
100,
199
200, 299
S
300, 399
400,
S
499
S 500, 599
T1
S
600, 699
700,
S
799
S 800, 899
T3
T2
• Store expected acknowledgement number
S
100
S
400
S
700
S
200
S
500
S
800
S
300
S
600
S
900
T1
T2 = T1 + w
T3 = T1 + 2w
29
Motivation: Network
Troubleshooting
Latency Sensitive
Applications
Latency and network
performance problems?
Where? Why?
30
Download