Two Samples are Enough: Opportunistic Flow-level Latency Estimation using NetFlow Myungjin Lee†, Nick Duffield‡, Ramana Rao Kompella† †Purdue University, ‡AT&T Labs–Research Per-hop Measurements are important AS 1 IPTV/VoIP/VoD Server AS 2 100 ms R1 R2 R3 Which router causes the problem?? Why 100 ms?! 2 Aggregate vs. Per-Flow AS 1 10 ms IPTV/VoIP/VoD Server R1 R2 5 ms R3 flow-level latency measurements on a per-hop basis AS 2 Aggregate latencies look all right. Why? Why 100 ms?! 3 Existing Approaches Active probes and tomography Chen et al. [SIGCOMM’04], Duffield et al. [IMC’03] Problems Problem formulation is under-constrained No per-flow latency measurements Lossy Difference Aggregator (LDA) Kompella et al. [SIGCOMM’09] Problems 4 Require hardware modification No per-flow latency measurements Basic Framework: NetFlow A measurement framework widely deployed in routers Maintains per-flow state in the form of a flow record Packet and byte counts Flow duration (flow start and end timestamps) Usage Normally used for accounting, traffic matrix estimation, etc. Does not support per-flow latency measurements Goal: Enable per-flow latency estimation 5 Harness flow start and end timestamps in NetFlow framework Obtaining Two Delay Samples Flow ID Start TS End TS 1 21 IPTV/VoIP/VoD Server 2 A AS 1 R2 R1 R3 1 B AS 2 − − 6 = = DelayFlow ID 1 Delay 2 Start TS End TS 2 2 4 Two delay samples / flow Problem 1: Independent Packet Sampling Flow ID Start TS End TS 1 1 A 2 Only update 1st packet R1 R2 1 coordination between NetFlow instances No B Only update 2nd packet 7 Flow ID Start TS End TS 4 4 Solution: Hash-based Sampling Hash Space Flow ID Start TS End TS 1 1 1 2 Sampling Space A 2 R1 R2 1 Hash-based sampling achieves coordination B Flow ID Start TS 2 8 NotSampled sampledatatboth both NetFlow NetFlow End TS Instances Instances 2 Problem 2: Packet Loss Flow ID Start TS End TS 1 2 1 A 2 1 Update both packets R1 R2 Packet losses may cause inconsistencies B X Only update 1st packet 9 Flow ID Start TS End TS 2 2 Solution: Packet Digests Flow ID Start TS End TS Start PD End PD 1 21 0x01 0x01 0x02 A 2 Detect unusable timestamp R1 R2 1 Packet digest achieves packet association B X Flow ID 10 Detect unusable timestamp Start TS End TS Start PD End PD 2 2 0x01 0x01 Consistent NetFlow (CNF) Architecture Issue I: No coordination between NetFlow instances Solution: Hash-based sampling (filtering) in RFC 5475 Same packets are selected on different NetFlow instances IETF PSAMP working group Issue II: Packet losses Different packets are sampled on different NetFlow instances Discrepancy in selected packets due to packet losses Solution: Maintaining packet digests 11 Hash of the invariant packet contents Use timestamps iff packet digests match at the two routers Trivial Estimator: Endpoint Use two delay samples belonging to the same flow Obtain accurate latency estimates for small flows Problem: Accuracy penalty for large flows Solution: Multiflow estimator Flow ID Time Endpoint estimator = Avg ( 12 , ) Better Estimator: Multiflow Key insight: Packets experiencing same queuing busy periods will experience similar delays Use background delay samples from other flows Use only delay samples between the start and end of a flow Flow ID Time Multiflow estimator = Avg ( 13 , , , ) Evaluation Simulation Setting Endpoint vs. Multiflow estimators Comparison with Trajectory sampling 14 Simulation Setting Modified YAF Simulate a queuing model with RED active queue management policy Dataset CHIC trace 15 1 min. trace collected from an OC-192 backbone link About 13M packets and 1M flows Median relative error of delay mean estimates Multiflow vs. Endpoint Estimators 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Multiflow performs better than Endpoint Endpoint Endpoint obtains good accuracy Multiflow 1 16 10 100 Flow size 1000 10000 Trajectory Sampling Shares some similarity with CNF architecture Routers use consistent hash function to sample packets Facilitates direct observation of packet trajectories Requires flow ID and timestamps for per-flow latency estimation 17 Aggregate all sampled packets with same flow key Compute their average latency Comparison with Trajectory Sampling 1 0.9 0.8 CDF 0.7 Multiflow Multiflow is 2-3x Trajectory better than Trajectory 0.6 0.5 0.4 0.3 0.2 0.1 0 0.001 0.01 0.1 1 Relative error of delay mean estimates 18 Packet sampling rate = 0.01 10 Summary Our approach retrofits per-flow latency estimates in the NetFlow framework Two main ideas Consistent NetFlow architecture ensures that different routers record the same set of flows Multiflow estimator achieves significantly accurate estimates of per-flow latencies compared to prior approach 19 Questions? 20