Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose Motivation: Network Troubleshooting ? ? ? Latency and network performance problems? Where? Why? • Home network or access link? o WiFi signal or access link? • ISP, enterprise and content provider networks o Within the network or outside? 2 Passive Measurement in the Middle Passive measurement device (gateway, router) A B Challenge: Constrained Resources • Decompose path latency of TCP flows o Latency: Important interactive web-applications o Other approaches: active probing, NetFlow • Challenges o Constrained devices and high data rates 3 TCP Latency Source TS1 TM1 TD1 Time • Existing methods: Emulate TCP state for each flow Destination TD2 • Accurate estimates but expensive TM2 TS2 RTT = TM2 – TM1 4 TCP Latency Source • Scaling challenges Large TCP state Time o Large packet and flow arrivals o Constrained devices (limited memory/CPU) Destination • ALE – Approximate Latency Estimator Expensive lookup 5 Design Space Ideal solution Accuracy Existing solutions e.g., tcptrace ALE Goals: Configurable tradeoff Expensive for constrained devices Overhead 6 ALE Overview • Eliminate packet timestamps Source Destination o Group into time intervals with granularity (w) o SEG: 0-99, ACK: 100 Time • Store expected ACK ΔT = w ΔT = w • Use probabilistic set membership data structure o Counting Bloom Filters 7 ALE Overview • Sliding window of buckets (time intervals) o Buckets contain a counting bloom filter (CBF) RTT = Δ + w + w/2 Lookup hash(inv(FlowId), 100) hash(FlowId, 99+1) i i i i S 0, 99 A 100 w w/2 Match Max Error: w/2 i Δ CBF Current Bucket TALE CBF CBF CBF w w w Missed Latencies W 8 Controlling Error with ALE Parameters ALE-U (uniform) Number of buckets = W/w Increase W: Higher Coverage W w Decrease w: Higher Accuracy 9 ALE-Exponential (ALE-E) • Process large and small latency flows simultaneously • Absolute error is proportional to the latency • Larger buckets shift slowly w A D B E F G C 2w 4w DC BA DCBA Counting Bloom Filter (CBF) Merge 10 Error Sources • Bloom filters are probabilistic structures o False positives and negatives Artifacts from TCP • Retransmitted packets and Reordered packets o Excess (to tcptrace) erroneous RTT samples • ACK numbers not on SEQ boundaries, Cumulative ACKs 11 Evaluation Methodology • 2 x Tier 1 backbone link traces (60s each) from CAIDA o Flows: 2,423,461, Packets: 54,089,453 o Bi-directional Flows: 93,791, Packets: 5,060,357 • Ground truth/baseline comparison: tcptrace o Emulates TCP state machine • Latencies ALE and tcptrace o Packet and flow level • Overhead of ALE and tcptrace o Memory and compute 12 Latency Distribution 300 ms(%) 60 ms 120 ms Frequency 500 ms 60 40 20 0 0 100 200 300 400 Latency (ms) 500 Majority of the latencies 600 13 Accuracy: RTT Samples • Range 0-60 ms • Box plot of RTT differences • More memory => Closer to tcptrace Exact match with tcptrace Increasing overhead (smaller ‘w’) 14 Accuracy: RTT Samples ALE-E(12) outperforms ALE-U(24) for small latencies Increasing overhead Exact match with tcptrace 15 Accuracy: Flow Statistics CDF • Many applications use aggregated flow statics • Small w (more buckets) o Median flow latencies approach tcptrace Exact match with tcptrace 16 Accuracy: Flow Statistics Small w: Standard deviation of flow latencies approach tcptrace Exact match with tcptrace 17 Compute and Memory Overhead • Sample flows uniformly at random at rates 0.1, 0.2, . . . 0.7 o 5 pcap sub-traces per rate o Higher sampling rate (data rate) ⇒ More state for tcptrace • GNU Linux taskstats API o Compute time and memory usage 18 Compute Time Time (s) 0 20 40 60 80 TCPTRACE U(96) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Sampling Rate ALE scales with increasing data rates 19 Memory Overhead TCPTRACE ALE-U(96) • RSS memory: ≈64 MB (0.1) to ≈460 MB (0.7) • RSS memory: 2.0 MB for all Sampling rates • VSS memory: ≈74 MB (0.1) to ≈468 MB (0.7) • VSS memory: 9.8 MB for all sampling rates ALE uses a fixed size data structure 20 Conclusions • Current TCP measurements emulate state o Expensive: high data rates, constrained devices • ALE provides low overhead data structure o Sliding window of CBF buckets • Improvements over compute and memory with tcptrace sacrificing small accuracy o Tunable parameters • Simple hash functions and counters o Hardware Implementation 21 Thank You 22 ALE Compute and Memory Bounds • Insertion: O(h) time o CBF with h hash functions • Matching ACK number: O(hn) time o ‘n’ buckets • Shifting buckets: O(1) time o Linked list of buckets • ALE-E: O(C) time to merge CBFs o ‘C’ counters in the CBF • Memory usage: n × C × d bits o ‘d‘ bit CBF counters 23 ALE Error Bounds • ALE-U: Average case error w/4 • ALE-U: Worst case error w/2 • ALE-E: Average case error (3w/16)2i • ALE-E: Worst case error (2i−1w) o ACK has a match in bucket i 24 ALE Parameters • • • • ‘w’ on accuracy requirements ‘W’ estimate of the maximum latencies expected CBFs with h = 4 hash functions C on traffic rate, w, false positive rate and h • E.g., For w = 20 ms, h = 4, and m = R × w, the constraint for optimal h (h = (C/m) ln 2) yields C = 40396 counters 25 Compute Time Time (s) 0 50 0 100 0 150 0 20 0 0 25 0 0 0.1 10 min trace 0.2 TCPTRACE U(96) 0.5 0.4 0.3 Sampling Rate ALE scales with increasing data rates 0.6 0.7 26 Accuracy: RTT Samples EXCESS MISSED VALID.SAMPLES (0,60] (60,120] (120,300](300,2e+03 Number of Samples 0e+ E(12U(12U(24U(48U(96 ) ) ) ) 1e+00 ) 2e+05 3e+05 4e+05 05 2 00 0 00 40 0 6 0000 8 0000 00 5 00 0 10 0 0 0 15 0 0 0 0 000 0e + 1e+00 2e+05 3e+05 05 27 Memory Overhead Maximum Virtual Memory (MB) 0 100 200 3 00 Maximum Resident Memory (MB) 4 00 0 10 0 2 00 300 4 00 TCPTRACE U(96) U(96) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Sample Rate TCPTRACE 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Sample Rate 28 ALE Overview • Eliminate packet timestamps o Time quantization: Use fixed number of intervals S S S 0, 99 100, 199 200, 299 S 300, 399 400, S 499 S 500, 599 T1 S 600, 699 700, S 799 S 800, 899 T3 T2 • Store expected acknowledgement number S 100 S 400 S 700 S 200 S 500 S 800 S 300 S 600 S 900 T1 T2 = T1 + w T3 = T1 + 2w 29 Motivation: Network Troubleshooting Latency Sensitive Applications Latency and network performance problems? Where? Why? 30