CS-459 Report Manalis Nikiforos, SN 1751, 1/12/2010 Trajectory Sampling for Direct Traffic Observation by N. G. Duffield & M. Grossglauser AT&T Labs - Research The abstract of the paper starts with the statement that traffic measurement is critical for all networks, measurements should provide a clear picture of the spatial flow of traffic which will lead to better resource allocation and capacity planning for the Network. Trajectory Sampling is the method that the paper proposes for that matter. The main idea is to observe the trajectories of a subset of the packets. Key advantages are that it is stateless, it is not expensive and that the traffic that the method produces is low and controlled. Introduction The traffic observation is essential for all traffic management, either short-term traffic control or long-term traffic engineering, we need a good picture of the network to do any of those actions. Most of those specific actions, such as route optimization and failover planning already have integrated some kind of a network measurement method. But we still need to know more, we need to know all the in and out fraction of the traffic, at every given moment to be able to focus in a specific flaw of a customer of an ISP for example. They are 2 kinds of measurements, the Indirect measurements, that rely on the network model, their disadvantages are that they can’t perform good on a stateless heterogeneous network with the uncertainty the randomness that they suffer. Plus that outside effects that can occur might affect them. The other kind are the direct measurements, do not rely on network model but on the observation of the nodes in all the network so they do not suffer from the previous problems. Trajectory Sampling is a direct method, it samples packets that traverse through the network, the sampling decision is based on a hash function and it also labels the packets with the use of another hash function. Formal Description The trajectory of a packet is the set of links that this packet traversed into the domain. We get only the invariant part of the packets which can be seen here. We decide what to sample based on the sampling hash function h of the invariant packet content a packet is sampled if h(φ(x)) is part of the D, where D is the given sampling domain, the indicator function hD is defined through and called sampling function. As we have the same sampling function on each link, and we use a deterministic function, it’s obvious that identical packets will produce identical labels. We expect to have rare identical packets. The trajectories can be ambiguous or unambiguous, we use a label subgraph to determine this as a label subgraph is unambiguous if each connected component of the subgraph is a source tree or a sink tree such that for each node on the sink tree, the degree of the outbound link is the sum of the degrees of the inbound links. A label can be attributed to its trajectory if its unique, or if it can be disambiguated, unambiguous labels can tell us things about the network. On the Performance parts we have many problems to solve, like finding proper hash functions and evaluating them, dealing with identical packets and finding the best sample number. The paper gets in a lot of mathematics, it states that the sampling hash h(φ(x))=φ(x) mod A is to be used and need to find the appropriate number for the prefix length, it presents this graph for this and continues with more mathematics about the hash functions. After dealing with that, it presents an optimization problem: How to maximize the expected number of unique (unambiguous) samples, subject to the constraint that the total measurement traffic n*m must not exceed a predefined constant c and then presents this graph. On the traffic measurement part it “tests” the system in a known environment with a simple scenario, a service provider wants to determine what fraction of packets on a certain link, on certain time belongs to a certain customer. They used a specific packet trace assuming that the source prefix S is the customer, the rest is other traffic, similarly, destination prefix D is the link we want. The results were very good as shown below: In the discussion part of the paper, they analyze the implementation issues, showing that the computing power for sampling hash and sampling decision as the computing power for the identification hash is not really much concern and that state of the art DSPs can easily handle those. The magnitude of the DPSs power is double the speed the networks at the moment can handle. Other approaches to their problem are Aggregation-based approaches like Link measurements (aggregation-based, direct) where the statistics are measured at per-link basis and there are no measurements about the network as a whole Flow aggregation (aggregation-based, indirect) where one of the routers does the entire job, they are too many data (to transfer / store / compute) and if it is extended to all ingress routers to get all the data, several drawbacks will appear. (They will have to emulate the routing protocols, which can’t be verified and we can’t really emulate a network with its all its dynamic changes properly) And Sampling-based approaches (like their own approach) and Active end-to-end probes (sampling-based, indirect) where hosts send probes to estimate metrics, it does not require network measurement support but it has the same problems as Link measurements Sampling has been proposed again at connection-oriented networks also its used on ATM cells to apply QoS but the main difference is that those approaches don’t use a sampling hash function. As an extension, trajectory sampling could be used to detect a Distributed Denial-of-Service Attack (DDoS), or at Filtering, and it also could be used to Probe Packets on end-to-end route. The conclusions of the paper are: Its been proposed a method for consistent sampling over a network The packets that we want to sample will produce a label through a hash function Those labels can be used for post sampling analysis The system does not need lots of computing power It can be used with stateless routers The packets are directly observed. As further work: A wider class of hash functions should be tried Tests should be run in a real network to try and use the benefits of Trajectory sampling in Network manipulation