PPT

Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose Motivation: Network Troubleshooting ? ? ? Latency and network performance problems? Where? Why? • Home network or access link? o WiFi signal or access link? • ISP, enterprise and content provider networks o Within the network or outside? 2 Passive Measurement in the Middle Passive measurement device (gateway, router) A B Challenge: Constrained Resources • Decompose path latency of TCP flows o Latency: Important interactive web-applications o Other approaches: active probing, NetFlow • Challenges o Constrained devices and high data rates 3 TCP Latency Source TS1 TM1 TD1 Time • Existing methods: Emulate TCP state for each flow Destination TD2 • Accurate estimates but expensive TM2 TS2 RTT = TM2 – TM1 4 TCP Latency Source • Scaling challenges Large TCP state Time o Large packet and flow arrivals o Constrained devices (limited memory/CPU) Destination • ALE – Approximate Latency Estimator Expensive lookup 5 Design Space Ideal solution Accuracy Existing solutions e.g., tcptrace ALE Goals: Configurable tradeoff Expensive for constrained devices Overhead 6 ALE Overview • Eliminate packet timestamps Source Destination o Group into time intervals with granularity (w) o SEG: 0-99, ACK: 100 Time • Store expected ACK ΔT = w ΔT = w • Use probabilistic set membership data structure o Counting Bloom Filters 7 ALE Overview • Sliding window of buckets (time intervals) o Buckets contain a counting bloom filter (CBF) RTT = Δ + w + w/2 Lookup hash(inv(FlowId), 100) hash(FlowId, 99+1) i i i i S 0, 99 A 100 w w/2 Match Max Error: w/2 i Δ CBF Current Bucket TALE CBF CBF CBF w w w Missed Latencies W 8 Controlling Error with ALE Parameters ALE-U (uniform) Number of buckets = W/w Increase W: Higher Coverage W w Decrease w: Higher Accuracy 9 ALE-Exponential (ALE-E) • Process large and small latency flows simultaneously • Absolute error is proportional to the latency • Larger buckets shift slowly w A D B E F G C 2w 4w DC BA DCBA Counting Bloom Filter (CBF) Merge 10 Error Sources • Bloom filters are probabilistic structures o False positives and negatives Artifacts from TCP • Retransmitted packets and Reordered packets o Excess (to tcptrace) erroneous RTT samples • ACK numbers not on SEQ boundaries, Cumulative ACKs 11 Evaluation Methodology • 2 x Tier 1 backbone link traces (60s each) from CAIDA o Flows: 2,423,461, Packets: 54,089,453 o Bi-directional Flows: 93,791, Packets: 5,060,357 • Ground truth/baseline comparison: tcptrace o Emulates TCP state machine • Latencies ALE and tcptrace o Packet and flow level • Overhead of ALE and tcptrace o Memory and compute 12 Latency Distribution 300 ms(%) 60 ms 120 ms Frequency 500 ms 60 40 20 0 0 100 200 300 400 Latency (ms) 500 Majority of the latencies 600 13 Accuracy: RTT Samples • Range 0-60 ms • Box plot of RTT differences • More memory => Closer to tcptrace Exact match with tcptrace Increasing overhead (smaller ‘w’) 14 Accuracy: RTT Samples ALE-E(12) outperforms ALE-U(24) for small latencies Increasing overhead Exact match with tcptrace 15 Accuracy: Flow Statistics CDF • Many applications use aggregated flow statics • Small w (more buckets) o Median flow latencies approach tcptrace Exact match with tcptrace 16 Accuracy: Flow Statistics Small w: Standard deviation of flow latencies approach tcptrace Exact match with tcptrace 17 Compute and Memory Overhead • Sample flows uniformly at random at rates 0.1, 0.2, . . . 0.7 o 5 pcap sub-traces per rate o Higher sampling rate (data rate) ⇒ More state for tcptrace • GNU Linux taskstats API o Compute time and memory usage 18 Compute Time Time (s) 0 20 40 60 80 TCPTRACE U(96) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Sampling Rate ALE scales with increasing data rates 19 Memory Overhead TCPTRACE ALE-U(96) • RSS memory: ≈64 MB (0.1) to ≈460 MB (0.7) • RSS memory: 2.0 MB for all Sampling rates • VSS memory: ≈74 MB (0.1) to ≈468 MB (0.7) • VSS memory: 9.8 MB for all sampling rates ALE uses a fixed size data structure 20 Conclusions • Current TCP measurements emulate state o Expensive: high data rates, constrained devices • ALE provides low overhead data structure o Sliding window of CBF buckets • Improvements over compute and memory with tcptrace sacrificing small accuracy o Tunable parameters • Simple hash functions and counters o Hardware Implementation 21 Thank You 22 ALE Compute and Memory Bounds • Insertion: O(h) time o CBF with h hash functions • Matching ACK number: O(hn) time o ‘n’ buckets • Shifting buckets: O(1) time o Linked list of buckets • ALE-E: O(C) time to merge CBFs o ‘C’ counters in the CBF • Memory usage: n × C × d bits o ‘d‘ bit CBF counters 23 ALE Error Bounds • ALE-U: Average case error w/4 • ALE-U: Worst case error w/2 • ALE-E: Average case error (3w/16)2i • ALE-E: Worst case error (2i−1w) o ACK has a match in bucket i 24 ALE Parameters • • • • ‘w’ on accuracy requirements ‘W’ estimate of the maximum latencies expected CBFs with h = 4 hash functions C on traffic rate, w, false positive rate and h • E.g., For w = 20 ms, h = 4, and m = R × w, the constraint for optimal h (h = (C/m) ln 2) yields C = 40396 counters 25 Compute Time Time (s) 0 50 0 100 0 150 0 20 0 0 25 0 0 0.1 10 min trace 0.2 TCPTRACE U(96) 0.5 0.4 0.3 Sampling Rate ALE scales with increasing data rates 0.6 0.7 26 Accuracy: RTT Samples EXCESS MISSED VALID.SAMPLES (0,60] (60,120] (120,300](300,2e+03 Number of Samples 0e+ E(12U(12U(24U(48U(96 ) ) ) ) 1e+00 ) 2e+05 3e+05 4e+05 05 2 00 0 00 40 0 6 0000 8 0000 00 5 00 0 10 0 0 0 15 0 0 0 0 000 0e + 1e+00 2e+05 3e+05 05 27 Memory Overhead Maximum Virtual Memory (MB) 0 100 200 3 00 Maximum Resident Memory (MB) 4 00 0 10 0 2 00 300 4 00 TCPTRACE U(96) U(96) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Sample Rate TCPTRACE 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Sample Rate 28 ALE Overview • Eliminate packet timestamps o Time quantization: Use fixed number of intervals S S S 0, 99 100, 199 200, 299 S 300, 399 400, S 499 S 500, 599 T1 S 600, 699 700, S 799 S 800, 899 T3 T2 • Store expected acknowledgement number S 100 S 400 S 700 S 200 S 500 S 800 S 300 S 600 S 900 T1 T2 = T1 + w T3 = T1 + 2w 29 Motivation: Network Troubleshooting Latency Sensitive Applications Latency and network performance problems? Where? Why? 30

PPT

Related documents

Products

Support

PPT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib