Trajectory Sampling for Direct Traffic Observation Matthias Grossglauser joint work with Nick Duffield AT&T Labs – Research Traffic Engineering Two large flows overload! Traffic Engineering overload! New egress point for first flow Multi-homed customer Traffic Engineering OSPF shortest path splitting overload! Traffic Engineering • Goal: domain-wide control & management to – Satisfy performance goals – Use resources efficiently • Knobs: – Configuration & topology: provisioning, capacity planning – Routing: OSPF weights, MPLS tunnels, BGP policies,… – Traffic classification (diffserv), admission control,… • Measurements are key: closed control loop – Characterize demand: what’s coming in? – Observe network state: how is the network reacting? (low-level adaptivity!) – Check performance: what’s the customer’s QoS? Traffic Matrix vs. Path Matrix • Traffic matrix – # bytes from ingress i to egress j • Path matrix – Spatial flow of traffic through domain – # bytes for every path from i to j Flow Measurement flow 1 flow 2 flow 3 flow 4 • IP flow abstraction – Set of packets with “same” src and dest IP addresses – Packets that are “close” together in time (a few seconds) • Cisco NetFlow – Router maintains a cache of statistics about active flows – Router exports a measurement record for each flow Inferring the Path Matrix from the Traffic Matrix Network State Uncertainty • Hard to get an up-to-date snapshot of… • …routing – – – – Large state space Vendor-specific implementation Deliberate randomness Multicast • …element states – Links, cards, protocols,… • …element performance – Packet loss, delay at links missing alarms missing “down” alarms noise spurious down Direct Traffic Observation • Goal: direct observation – No network model & state estimation • Basic idea: – Sample packets at each link – Sampling decision based on hash over packet content – Consistent sampling trajectories – Labels based on second hash function • Exploit entropy in packet content to obtain statistically representative set of trajectories Sampling and Labeling • Fields of interest collected only once • Multicast: trajectory is a tree Fields Included in Hashes Collisions: Identical Packets Sampling and Labeling Hashes • x: subset of packet bits, represented as binary number • Sampling hash – h(x) = x mod A – Sample if h(x) < r – r/A: thinning factor • Labeling hash – g(x) = x mod M • Make appropriate choice of A, M – predictable patterns should “mix” well Pseudo-Random Sampling • Goal: infer metrics of interest from trajectory samples – E.g., what fraction of traffic of customer x on a link y? • Question: is sample set statistically representative? – Obvious for “really random” sampling – Distribution of a field in the sampled subset = real distribution? – In other words: does the complement of the field provide enough entropy? Quality of Deterministic Sampling • Experiment: statistical test to check if sampled and full distributions are close – Chi-square statistic to verify independence hypothesis – Hypothesis: sampled distribution consistent with full distribution m01 m02 ... m0 I n j : # samples in bin j mij : # packets (un)sampled in bin j m11 m12 ... m1I 1 I n1 n2 ... nI (mij m ij )2 T i 0 j 1 mij – Confidence level C(T) for hypothesis, where C is 2 cdf of with I-1 degrees of freedom m0 m1 n Chi-square Test on Source Address If C (T ) 1 , then accept hypothesis Bitwise Independence • 2x2 contingency table formed by – sampling decision – l-th bit of packet Optimal Sampling • Fix amount of measurement traffic c per time period • Problem: – – – – n: number of samples in sampling period M: alphabet size, m=log2(M) bits/label nm: total amount of measurement traffic [bits] Goal: maximize # unique labels, subject to nm<c • Result: – optimal alphabet size M*=c log(2) – optimal number of samples n*=M*/log(M*) – example: c=1Mb/period Label Collisions and Trajectory Ambiguity Ambiguity cont. • Rule for acyclic subgraphs + unicast packets: – unambiguous if each connected component of the subgraph is • (a) a source tree • (b) a sink tree without loss Inference Experiment • Experiment: infer from trajectory samples – Estimate fraction of traffic from customer – Source address customer – Source address sampling + label • Fraction of customer traffic on backbone link: ̂ # unique labels common on b, c ˆ # unique labels on b Estimated Fraction (c=1000bit) Estimated Fraction (c=10kbit) Sampling Device MPLS: simple additional logic to look “behind” label stack Sampling Device Implementation • Interface vs. processing speed – OC-192: 10 Gbps – State of the art DSP: • Proc: 600M MACs x 32 bit: 20 Gbps • I/O: 300MHz x 256 bit: 70 Gbps – Moore’s law vs. interface speed growth • Vendor interest: cisco, juniper, avici • Advantages Summary – Trajectory sampling estimates path matrix …and other metrics: loss, link delay – Direct observation: no routing model + network state estimation – No router state – Multicast (source tree), DDoS (sink tree) – Control over measurement overhead – Small measurement delay • Disadvantages – Requires support on linecards • Open questions & research problems – Collection, storage, querying (in progress) – Management interface