Traffic Matrix Estimation: Existing Techniques and New Directions A. Medina (Sprint Labs, Boston University) , N. Taft (Sprint Labs), K. Salamatian (University of Paris VI), S. Bhattacharyya, C. Diot (Sprint Labs) Presented by Matthew Caesar Problem scope • Environment: – Single ISP, provides SLAs to customers • Goal: Estimate traffic matrix – Amount of traffic flowing between each (origin, destination) pair – Hard to measure exactly (requires extensive logging and/or offline parsing) • Why would we want to know the traffic matrix? – Helps determine load balancing, routing protocols configuration, dimensioning, provisioning, failover strategies – Allows quantification of cost of providing QoS vs. overprovisioning Solution idea • Main idea: – Measure utilization (“link count”) on each network link • Can be easily done in router fast path • Done via snmp query – Find a set of OD flows that would produce the measured link counts • Sticky issue: how to find the set of OD flows? – Three techniques: • Linear Programming (LP) • Bayesian estimation • Expectation Maximization (EM) Traffic Estimation • Assumptions can be operator’s knowledge (eg. maybe some pairs are always zero) • Prior TM: sometimes need seed TM to start with • Routing Matrix • Link counts (link utilizations) Problem setup • See whiteboard Scheme #1: Linear Programming (LP) • Linear program: – Objective function + constraints • Main idea: – Try to maximize the total amount of traffic routed through the network – Given contraints: • Total traffic must be less than the measured link count • Flow conservation • Observations: – Leads to solutions where OD pairs with few intermediate hops will be assigned large amts of bandwidth, while more distant pairs will get much less bandwidth – Solution: put more weight on pairs separated by greater distances Scheme #2: Bayesian Inference • See whiteboard Scheme #3: Expectation Maximization (EM) • See whiteboard Evaluation Method • Impossible to obtain “real” traffic matrix via direct measurement. – Therefore, use simulations • How to characterize flow between OD pairs? – Tried Constant, Poisson, Gaussian, Uniform and Bimodal (flash crowd) TMs Results: Linear programming vs. Statistical methods • Linear programming method performs poorly – – – – Assigns zero to many OD pairs, increasing error Problem: tries to match OD pairs to link counts Different objective functions give similar results error too high for use in practical networks • Bayesian and EM: – EM beats Bayesian in terms of average error and worst case error – Estimation errors correlated to heavily shared links (links with many OD flows are more likely to be misestimated) Results: Goodness of prior • Goodness of prior matrix (seed values) – Bayesian is much more sensitive to the prior matrix than EM • However, EM is also quite sensitive • Perhaps because: EM method has deterministic convergence behavior (can be analyzed) while Bayesian has stochastic convergence (it oscillates) – After a certain point, additional measurements don’t provide additional gain • Measuring over long periods of time only gives small additional improvement Results: Marginal gains • What improvement could be gained if we could measure some components of the traffic matrix directly? – Carrier may have the option to deploy a certain amount of monitoring equipment • 3 ways to add rows: – Randomly, row-sum (by traffic volume), and error magnitude • Results: – Error rate drops off roughly linearly with each additional row added – Bayesian not sensitive to order rows are added – EM does better when rows added by largest-error first – reduction in adding a row is 2% for 13 OD pairs Other results • Which OD pairs are most difficult to estimate? – Error increases as the link-sharing factor increases, also as path length increases • How to characterize OD flows? – Poisson and Gaussian assumption holds well, but only for certain hours during the day. Recommendations • Network operators know a lot about their network. We need to devise methods to allow incorporation of network specific information into the estimation scheme. • We need a better model of OD flows through an ISP. – Possible solution: “gravity models” based on utility factor (see whiteboard) • We need a good way to generate good prior TMs. References: Statistical INference: • http://ic.arc.nasa.gov/ic/projects/bayes-group/html/bayestheorem-long.html • http://www.math.uah.edu/stat/prob/prob5.html • http://www.statisticalengineering.com/bayes_thinking.htm • http://www.stat.psu.edu/~jls/stat544/2001/lec22.pdf • http://wwweksl.cs.umass.edu/library/Statistics/ExpectationMaximization/ • http://www.owlnet.rice.edu/~msmiley/elec431/em.htm Traffic Matrix Estimation: • http://dimacs.rutgers.edu/Workshops/MiningTutorial/grossgla user-slides.ppt