Traffic Matrix Estimation: Existing Techniques

advertisement
Traffic Matrix Estimation:
Existing Techniques and New
Directions
A. Medina (Sprint Labs, Boston University) , N.
Taft (Sprint Labs), K. Salamatian (University of
Paris VI), S. Bhattacharyya, C. Diot (Sprint Labs)
Presented by Matthew Caesar
Problem scope
• Environment:
– Single ISP, provides SLAs to customers
• Goal: Estimate traffic matrix
– Amount of traffic flowing between each
(origin, destination) pair
– Hard to measure exactly (requires extensive
logging and/or offline parsing)
• Why would we want to know the traffic
matrix?
– Helps determine load balancing, routing
protocols configuration, dimensioning,
provisioning, failover strategies
– Allows quantification of cost of providing
QoS vs. overprovisioning
Solution idea
• Main idea:
– Measure utilization (“link count”) on each network link
• Can be easily done in router fast path
• Done via snmp query
– Find a set of OD flows that would produce the measured link counts
• Sticky issue: how to find the set of OD flows?
– Three techniques:
• Linear Programming (LP)
• Bayesian estimation
• Expectation Maximization (EM)
Traffic Estimation
• Assumptions can be operator’s knowledge (eg.
maybe some pairs are always zero)
• Prior TM: sometimes need seed TM to start with
• Routing Matrix
• Link counts (link utilizations)
Problem setup
• See whiteboard
Scheme #1: Linear Programming (LP)
• Linear program:
– Objective function + constraints
• Main idea:
– Try to maximize the total amount of traffic routed through the
network
– Given contraints:
• Total traffic must be less than the measured link count
• Flow conservation
• Observations:
– Leads to solutions where OD pairs with few intermediate hops will
be assigned large amts of bandwidth, while more distant pairs will
get much less bandwidth
– Solution: put more weight on pairs separated by greater distances
Scheme #2: Bayesian Inference
• See whiteboard
Scheme #3: Expectation
Maximization (EM)
• See whiteboard
Evaluation Method
• Impossible to obtain “real” traffic matrix via
direct measurement.
– Therefore, use simulations
• How to characterize flow between OD
pairs?
– Tried Constant, Poisson, Gaussian, Uniform
and Bimodal (flash crowd) TMs
Results: Linear programming vs.
Statistical methods
• Linear programming method performs poorly
–
–
–
–
Assigns zero to many OD pairs, increasing error
Problem: tries to match OD pairs to link counts
Different objective functions give similar results
 error too high for use in practical networks
• Bayesian and EM:
– EM beats Bayesian in terms of average error and worst
case error
– Estimation errors correlated to heavily shared links
(links with many OD flows are more likely to be misestimated)
Results: Goodness of prior
• Goodness of prior matrix (seed values)
– Bayesian is much more sensitive to the prior
matrix than EM
• However, EM is also quite sensitive
• Perhaps because: EM method has deterministic
convergence behavior (can be analyzed) while
Bayesian has stochastic convergence (it oscillates)
– After a certain point, additional measurements
don’t provide additional gain
• Measuring over long periods of time only gives
small additional improvement
Results: Marginal gains
• What improvement could be gained if we could
measure some components of the traffic matrix
directly?
– Carrier may have the option to deploy a certain amount
of monitoring equipment
• 3 ways to add rows:
– Randomly, row-sum (by traffic volume), and error
magnitude
• Results:
– Error rate drops off roughly linearly with each
additional row added
– Bayesian not sensitive to order rows are added
– EM does better when rows added by largest-error first
–  reduction in adding a row is 2% for 13 OD pairs
Other results
• Which OD pairs are most difficult to
estimate?
– Error increases as the link-sharing factor
increases, also as path length increases
• How to characterize OD flows?
– Poisson and Gaussian assumption holds well,
but only for certain hours during the day.
Recommendations
• Network operators know a lot about their
network. We need to devise methods to
allow incorporation of network specific
information into the estimation scheme.
• We need a better model of OD flows
through an ISP.
– Possible solution: “gravity models” based on
utility factor (see whiteboard)
• We need a good way to generate good prior
TMs.
References:
Statistical INference:
• http://ic.arc.nasa.gov/ic/projects/bayes-group/html/bayestheorem-long.html
•
http://www.math.uah.edu/stat/prob/prob5.html
•
http://www.statisticalengineering.com/bayes_thinking.htm
•
http://www.stat.psu.edu/~jls/stat544/2001/lec22.pdf
•
http://wwweksl.cs.umass.edu/library/Statistics/ExpectationMaximization/
• http://www.owlnet.rice.edu/~msmiley/elec431/em.htm
Traffic Matrix Estimation:
• http://dimacs.rutgers.edu/Workshops/MiningTutorial/grossgla
user-slides.ppt
Download