Time Series from their Observed Sums: Network Tomography Edoardo M. Airoldi School of Computer Science Carnegie Mellon University (joint work with Christos Faloutsos) SIGKDD, Seattle, WA August 23nd 2004 Acknowledgements Srinivasan Seshan, CSD, CMU Russel Yount and Frank Kietzke, Network Development, CMU Stephen Fienberg, Statistics, CMU Jin Cao, Bell Labs Claudia Tebaldi, NCAR Yin Zhang, AT&T Labs Outline Introduction / Motivation Survey Proposed Methods Results Conclusions Application Domains Communication Networks goal: Who is sending to whom refs: Cao et al (2001), Liang & Yu (2003), Zhang et al (2004) Transportation Networks goal: Who is going where Network Probing (Rish et al, IBM) goal: Which server is down refs: Rish et al (2002, 2004) Communication Networks A large ISP network has 100s of nodes, 1000s of links, 10000s routes, and over 1 petabyte (1015 bytes) per day OD flows C B Reliability analysis Traffic engineering A link loads Predict link loads under unexpected/planned router/link failures Optimize routes to minimize congestion Capacity planning Forecast future capacity requirements Mathematical Formulation X1 X X2 Y X3 X4 One Constraint: Total i Yi = 0 Link Flows LINK Situation at time = t Routing Matrix A OD Flows X1 Y1 1 1 0 0 = = X 2 Y2 0 0 1 1 X 3 Y3 1 0 1 0 X 4 Problem Definition Given: topology, fixed routing scheme A[nxm], traffic on the links of the network Y(t)=[Y1(t), …, Yn(t)] over time t = 1, …, T Find: non-observable traffic between origindestination (OD) pairs X(t)=[X1(t), …, Xm(t)] over time t = 1, …, T. Y(t) = A·X(t) Under-constrained A Glance at the Data Find OD Flows X(t) X1(t1) X1(t2) X1(t3) X1(t4) X2(t1) X2(t2) X2(t3) X2(t4) X3(t1) X3(t2) X3(t3) X3(t4) X4(t1) X4(t2) X4(t3) X4(t4) Y1(t1) Y1(t2) Y1(t3) Y1(t4) Y2(t1) Y2(t2) Y2(t3) Y2(t4) Y3(t1) Y3(t2) Y3(t3) Y3(t4) Time Measure Link Flows Y(t) ? Kb hour of the day Our Problem: No Traffic Matrix Traffic matrix Gives traffic volumes between origin and destination Very difficult to directly measure Direct measurement [Feldmann et al. 2000] Semi-standard router feature: Netflow Collect flow-level data around the whole edge of the network Combine with routing data Cisco, Juniper, etc. Not always well supported Potential performance impact on routers Huge amount of data (500GB/day) Widely available SNMP data gives only link loads Even this data is not perfect (glitches, loss, …) Outline Introduction / Motivation Survey Proposed Methods Results Conclusions Infinite Exact Solutions Measurements (Yt) and routing scheme A[3x4] allow for many feasible OD flows (Yt) For example: 29 139 OD 1 167 37 4 OD 9 32 Links Links The problem is under-constrained and we need some assumptions Related Work Solutions in the past y = Ax Direct solution: SVD Scoring criterion: GLS, maximum likelihood, entropy, Bayesian methods, … Regularization: assume independent OD flows Estimate OD flows xt using { yt-, … yt+ } Estimated OD Kb hour of the day hour of the day Pitfalls of Past Approaches Unrealistic Models: Gaussian or Poisson OD traffic flows. But we observe bursty, log-Normal traffic flows. Time Dependence across Epochs: Never explicitly addressed, and typically assume xt independent over time. But we observe time dependence of single OD flows. Empirical Laws: log-Normality Aggregate OD flows look log-log Normal Counts Counts Log Bytes Log-Log Bytes [ 12321 OD time series. CMU validation data. ] Outline Introduction / Motivation Survey Proposed Method 1st Stage - Linear Dynamical Systems 2nd Stage - Bayesian Dynamical Systems Results Conclusions The Model A smooth average process { t : t > 0 } A possibly bursty process { xt : t > 0 } to model the OD traffic flows Parameter Estimation Estimate parameters underlying the average process { t : t > 0 } Calibrate priors for the parameters driving the dynamic of the OD flows process { xt : t > 0 } Estimate the OD flows using a Particle Filter Outline Introduction / Motivation Survey Proposed Method 1st Stage - Linear Dynamical Systems 2nd Stage - Bayesian Dynamical Systems Results Conclusions Introducing Time Dependence We introduce explicit time dependence: (t) = F[nxn] (t-1) + e(t) The distinct OD flows, components of (t), are assumed to be independent Use EM algorithm Introducing Time Dependence Our Linear Dynamical System contains the models by Cao et al. as a special case Outline Introduction / Motivation Survey Proposed Method 1st Stage - Linear Dynamical Systems 2nd Stage - Bayesian Dynamical Systems Results Conclusions Bayesian Dynamical System Gamma and log-Normal OD flows (Xt) Use preliminary estimates of { t : t > 0 }, the average OD flows, to softly constrain the dynamical behavior of the OD flows to identify the correct solution for Xt Non-Deterministic Dynamics Introduce explicit non-deterministic dynamics (F) on the average OD flows: ’(t+1) = F’[nxn] · ’(t) Diagonal matrix F’[nxn] : F’[i,i] ~ log-Normal Learning Latent Dynamics We want a preliminary estimate for Ft in: t+1 = Ft+1 t ? P(247|Y247) Solve for F247 P(246|Y246) Outline Introduction / Motivation Survey Proposed Methods Results Datasets Importance of Time Dependence Importance of non-Gaussianity Informative Priors for non-Gaussian BDS Conclusions Validation Data sets Consider star network topologies [ 4 OD flows, 9 OD flows and 16 OD flows ] Carnegie Mellon Lucent Technologies [ 12321 time series ] [ 32 time series ] X1 X X2 X3 X4 Y LINK Situation at time = t Log-Normal OD Traffic Flows The validation OD traffic flows are skewed on both data sets Outline Introduction / Motivation Survey Proposed Methods Results Datasets Importance of Time Dependence Importance of non-Gaussianity Informative Priors for non-Gaussian BDS Conclusions Reduce Variability Narrower range of possible values for the OD traffic flows: those which receive positive posterior probability Robust Estimates Capture sharp changes in the distribution of the OD traffic flows Outline Introduction / Motivation Survey Proposed Methods Results Datasets Importance of Time Dependence Importance of non-Gaussianity Informative Priors for non-Gaussian BDS Conclusions Capture Several Bursts Kb time Outline Introduction / Motivation Survey Proposed Methods Results Datasets Importance of Time Dependence Importance of non-Gaussianity Informative Priors for non-Gaussian BDS Conclusions Priors and Bayesian inference Informative Priors on { t : t > 0 } lead to uni-modal posteriors True values Speed and Scalability The computing is time about 3 minutes [ 4 OD - 3 Links using R on Mac G4 667 ] Linear in (#OD) for each time point 1 day worth of data in 45 minutes Model Comparison Numerical Comparison l2 Outline Introduction / Motivation Survey Proposed Methods Results Conclusions Past Approaches Unreasonable Models: Gaussian or Poisson arrivals Time Dependence: never explicitly addressed Conclusions Log-Normal models account for skewed and bursty, non-observable OD flows Novel BDS captures time dependence of data thus reducing the variability of the estimates Informative priors serve as soft constraints to overcome the under-determinacy of the problem Future Work More tests on bigger networks from 2-star (4-D) to 4-star (16-D) Fit non-parametric seasonal components for the non-observable OD flows BACK - UP Network Engineering State-of-the-Art: guess and tweak Guess based on experience & intuition Manually tweak things, and hope the best Disadvantages Manual process: time consuming, error prone Not very reliable: intuition may be wrong, unexpected side effects Suboptimal performance: wastes resource/time Need to repeat the exercise when traffic pattern changes A More Scientific Approach? Feldmann et al. 2000 Shaikh et al. 2002 Tomography Fortz et al. 2002 A: "Well, we don't know the topology, we don't know the traffic matrix, the routers don't automatically adapt the routes to the traffic, and we don't know how to optimize the routing configuration. But, other than that, we're all set!" [Rexford2000, Kurose2003] Contributions Realistic Models: Gamma and log-Normal P( OD Flows(t) | (t) ) Explicit Time Dependence: E( OD Flows(t) | y(t) … y(1) ) Contributions Informative priors in a Bayesian Dynamical System for an under-constrained problem Drive our inferences to the correct solution Get high quality particles Easy solution for Sparse Traffic Exploring the OD space Gibbs sampler with Metropolis steps is able to explore P(Xt| Yt) We prove irreducibility of the chains P(Xt|Yt) > 0 [ Gamma, log-Normal ] P(Xt|Yt) = 0 P(Xt|Yt) > 0 Non-Deterministic Dynamics Introduce explicit non-deterministic dynamics (F) on the average OD flows: ’(t+1) = F’[nxn] · ’(t) Diagonal matrix F’[nxn] : F’[i,i] ~ log-Normal leads to: ’(t+1) = F’·’(t) e(t+1) = eF·e(t) (t+1) = F+(t) Better OD Flows in 4 Steps 1 4 2 3 Immanuel Kant + o(1) In making inferences on non-observable quantities we find the model we look for! Assume a model that reasonably approximates real OD flows, and of course it does not hurt to have a prior opinion about it … Learning OD Flows Typical solutions are based on: Generalized Least Squares Maximum Likelihood Bayesian methods Entropy These methods generate one set of OD flows X from multiple observations {Y1,..,YT}. In general: max X s.t. p·D1[X, Xobs] + q·D2[{Y}, {Yobs}] Y = A X, X 0, Random p,q [0,1] fixed Intrinsic Dimensionality The routing matrix A has m rows < n columns, and its m rows are linearly independent The space Rn+ where the OD flows live, can be decomposed into a sub-space R(n-m)+ with an open interior, and a degenerate sub-space Rm+ It is possible to rearrange A=[A1,A2], and X=[X1,X2] accordingly, so that given X2 R(n-m)+ -1 X1 = A1·(Y - A2X2) Rm+ Doubly Stochastic BDS