Basics of traffic modeling II Lecturer: Dmitri A. Moltchanov E-mail: dmitri.moltchanov@tut.fi http://www.cs.tut.fi/kurssit/ELT-53606/ Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 Outline • Facts about the Internet traffic; • Points of interest in traffic modeling; • Step-by-step modeling procedure; – a point of interest; – level (layer) of interest; – what statistics to capture; – choosing a candidate model; – fitting parameters; – testing accuracy of a model; • Example of traffic modeling using AR(1) process. Lecture: Basics of traffic modeling II 2 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1. Facts about the Internet traffic The most important fact: Internet traffic constantly changes in time (every 5̃ years). Observations, trends and facts on the Internet traffic up to 2005: • TCP accounts for most of the packet traffic in the Internet; • traffic flows are bidirectional, but often asymmetric; • most TCP sessions are short-lived; • the packet arrival process in the Internet is not Poisson; • the session arrival process may be approximated by Poisson distribution; • packet sizes are bimodally distributed; • unknown stochastic properties of packet arrivals; • Internet traffic continues to changes. Lecture: Basics of traffic modeling II 3 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1.1. Domination of TCP TCP accounts for most of the packet traffic in the Internet: • beginning of 90th : – it was firstly observed that TCP dominates. • middle of 90th : – introduction of multimedia services; – development of RTP, RTCP, RTSP... UDP share was expected to grow. • beginning on 2000: – TCP still dominant protocol; – multimedia content also extensively uses TCP. Reasons of TCP dominance: • p2p technologies – up to 60% of all traffic; • multimedia is usually placed on web pages; • availability of TCP. Lecture: Basics of traffic modeling II 4 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1.2. Bidirectional asymmetric traffic flows Traffic flows are bidirectional, but often asymmetric: • bidirectional exchange of data: – ftp, http, ssh, e-mail, etc; – request-response patterns; – usually more traffic in downstream direction. • asymmetric access: – xDSL technologies; – cable modems; – GPRS, etc. • what are the current trends: – p2p applications may generate bidirectional asymmetric traffic; ∗ users usually limits upstream capabilities; ∗ asymmetric access technologies. Lecture: Basics of traffic modeling II 5 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1.3. Short-lived TCP sessions Most TCP sessions are short-lived: • almost 90% of TCP connections exchange fewer than 10Kbytes: – WWW service: request - response; – http v1.0: separate connection for an object on a page; – http v1.1: single connection for a page; – most pages and objects are less than 10Kbytes in length. • most of TCP connections lasts less than few seconds: – driving force: http; – may change due to p2p technologies. • what the effect: – heavy-tailed distribution of session sizes. – heavy-tail: a lot of frequencies corresponding to large histogram bins; – reasons for heavy-tail: most of the sessions are small, some a big (ftp). Lecture: Basics of traffic modeling II 6 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1.4. Session arrivals are Poisson The session arrival process may be approximated by Poisson distribution: • what are the reasons: – there are a lot of users getting access to a certain site; – users can be assume the be independent; – situation is similar to telephone network where Poisson assumption holds. Lecture: Basics of traffic modeling II 7 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1.5. Special distribution of packet sizes Packet sizes are bimodally distributed: • around 50% of packets are as large as possible: – these are TCP data packets; – recall, it is determined by MTU of Ethernet: 1500 bytes; – around 50% of packets are 1500 bytes in length. • around 40% of packets are as small as possible: – these are TCP ACKs; – recall, it is determined by headers of TCP (20 bytes), IP (20 bytes): 40 bytes; – around 40% of packets are 40 bytes in length. • around 10% of packet lengths are uniformly distributed between 40 and 1500; • additional peaks: fragmentation of IP packets. Lecture: Basics of traffic modeling II 8 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1.6. Packet arrivals are not homogenous Poisson The packet arrival process in the Internet is not homogenous Poisson: • common belief was: – aggregated traffic is Poisson (or at least Markovian) in nature; – a lot of studies have been made with Poisson assumption; – why? easy to use in performance evaluation studies • reality: – arrival process is not homogenous Poisson; – interarrival times can be correlated; – packet arrival process may not be even covariance stationary; – there can be so-called packet ’clumps’ or ’batches’ • result: packet arrival process is far from common assumptions: – what is the IP traffic nature??? Lecture: Basics of traffic modeling II 9 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1.7. Unknown properties of the IP traffic What was suggested over decades: • 80th : Poisson nature of the aggregated packet traffic: – common agreement: this assumption is no longer valid! • 90th : self-similar nature of the aggregated traffic: – the most respected hypotheses today; – seems a little bit strange. • 2000: is it simply non-stationary? – probably the correct answer: – small timescales: stationary; – long timescales: non-stationary. Lecture: Basics of traffic modeling II 10 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1.8. Changing nature of Internet traffic Internet traffic continuous to changes: • what applications dominated over decades: – 80 − 95: e-mail, remote access; – 95 − 2005: WWW, large file transfers; – 2005 − 2010: p2p/WWW/video – 60%/30%%/10%; – 2010−: video is taking over the Internet. • how to deal with: – you cannot rely upon ’old measurements’; – new measurements are always required; – those which were made in early 00s (are) may not be representative. Lecture: Basics of traffic modeling II 11 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 1.9. Find more about Internet traffic Where you may find more about current Internet traffic: • Internet traffic archive: http://ita.ee.lbl.gov; • Internet traffic report: http://www.internettrafficreport.com; • National laboratory for applied network research (NLANR): http://www.nlanr.net; • NLANR measurement and operations analysis team (MOAT): http://moat.nlanr.net; • National Internet measurement infrastructure (NIMI): http://www.ncne.nlanr.net/nimi; • tcpdump measurements software: http://www.tcpdump.org; • wireshark software: http://www.wireshark.com; • research papers: – free search engine: http://researchindex.org/; – free search engine: http://scholar.google.com/; – ieee: http://ieeexplore.ieee.org/Xplore/guesthome.jsp. Lecture: Basics of traffic modeling II 12 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 2. Step-by-step traffic modeling procedure Step-by-step procedure: • determine the point of interest; • determine the level of interest; • measure traffic at the point of interest; • decide what statistics should be captured; • estimate statistics of traffic observations: • choose a candidate model; • fit parameters of the model; • test accuracy of the model. Lecture: Basics of traffic modeling II 13 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 3. Points of interest in traffic modeling You have to take into account: • where you are asked to model the traffic (evaluate performance, dimension a system)? customer side network side 2 3 1 Figure 1: Points at which traffic is usually measured and modeled. Lecture: Basics of traffic modeling II 14 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 3.1. Point 1: particular application: We distinguish between: • voice application; • video application; • data transfers: – ftp information; – http information; – ssh information. What is important: properties of transport layer protocol and application: • UDP: no specific pattern: – does not affect much properties of application; – you may model traffic of application only. • TCP: very specific pattern: – affect data transmission; – should be taken into account. Lecture: Basics of traffic modeling II 15 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 Congestion window, MSSs 18 16 TCP Reno 8 TCP Tahoe 4 1 time Figure 2: TCP traffic pattern: TCP Reno and TCP Tahoe. • how much traffic does the application have? • ftp: large files; ssh, e-mail, http: large and small transfers. Lecture: Basics of traffic modeling II 16 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 3.2. Point 2: aggregated traffic from a number of applications We distinguish between: • heterogenous applications; • homogenous applications. customer side customer side voip voip voip video voip ftp voip voip Figure 3: Homogenous and heterogenous traffic aggregates. Lecture: Basics of traffic modeling II 17 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 3.3. Point 3: aggregated network traffic What is that: • aggregation of a large number of flows. access router backbone router access router access router Figure 4: Aggregated backbone traffic. • may have quite sophisticated properties; • practically, cannot be obtained as superposition of individual flows. Lecture: Basics of traffic modeling II 18 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 4. Level (layer) of interest for traffic modeling Packet traffic can be represented: • at the session level: – request for downloading files from ftp server; – request for downloading pages from www server. • at the packet level. Which level to choose: • depends on your task; General notes: • session level: – usually claimed for follow Poisson process; – reality might be quite different! • packet level: any behavior should be expected. Lecture: Basics of traffic modeling II 19 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 5. What statistics to capture? General answer is not straightforward: • what are the aims of traffic modeling: – just propose a new, better traffic model? – carry out performance evaluation? – what kind of performance evaluation simulation/analytic? • how close you are going to describe the traffic: – trade-off between accuracy and complexity! – is it sufficient just to get basic ideas? – is there interest in precise parameters? • what statistics are important: – mean, variance, distribution, ACF? – you can never say before you get results; – you can never say before you capture a certain parameter. Lecture: Basics of traffic modeling II 20 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 5.1. What statistics are important? Note the following: • mean value: – must be captured; • variance: – must be captured; – one may use standard deviation or coefficient of variation instead. • lag-1 ACF: – was found to be important; – may have unexpected effect. • structure of the ACF: – sometimes may affect significantly (e.g. long-range dependence). • histogram of relative frequencies: – captures all moments of one-dimensional distribution; – required when you have to be pretty sure. Lecture: Basics of traffic modeling II 21 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 5.2. Common matching schemes Common matching: • mean and variance: – usually easy to do; – used to get mean performance parameters. • mean, variance and lag-1 ACF: – there are a number of models and algorithms; – relatively easy to do. • mean and ACF: – there are a number of models and algorithms; – sometimes not easy to do. • histogram: – one may look for analytical distribution; – usually easy to do using discrete distribution. • histogram and ACF. Lecture: Basics of traffic modeling II 22 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 6. Choosing a candidate model Input information • parameters of the traffic that have to be captured; • a set of traffic models. What you have to know: • traffic models and their properties; • analytical tractability of models: – simulation: any model is suitable; – analytical: only tractable models are suitable. Examples: • analytically tractable: renewal models, Markovian models; • analytically intractable: most non-Markovian models. Lecture: Basics of traffic modeling II 23 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 6.1. Classes of models and characteristics Classes of models: • renewal class of models: – distribution can be arbitrary; – ACF is zero for all lags: no autocorrelation. • autoregressive class: – distribution is normal; – ACF is a sum of exponential/geometric terms. • Markov-modulated models: – distribution can be arbitrary; – ACF is a sum of exponential/geometric terms. • models with self similar properties: – distribution can be either normal (FBM) or arbitrary (F-ARIMA); – ACF non-zero for large lags: long-range dependence. Lecture: Basics of traffic modeling II 24 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 6.2. Receipts You may use the following when you are to capture: • first two moments: – Erlang, hyperexponential, exponential distributions; – approximation by discrete distribution: p1 , p2 , . . . , pk such that P i pi = 1. • first m moments: – special case of phase-type distribution; – approximation by discrete distribution. • first two moments and lag-1 of ACF: – Markov modulated processes; – autoregressive processes. • first two moments and ACF: – Markov modulated processes; – autoregressive processes. Lecture: Basics of traffic modeling II 25 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 6.3. Example of the problem Assume we have: • observations of RV X that is defined on [0, ∞); • we have to capture first two moments: mean and SCV E[X], 2 C = σ 2 /µ < 1. (1) What one may guess: • Erlang distribution (E2 ): – defined on [0, ∞); – has C 2 < 1. • shifted exponential distribution: – defined on [d, ∞). – has C 2 < 1. • what to choose? Lecture: Basics of traffic modeling II 26 Networking analysis and dimensioning I fX(x) D.Moltchanov, TUT, 2013 E2: fX(x) = bxe-bx Shifted exp: fX(x) = be-b (x-d) d x Figure 5: pdfs of shifted exponential and E2 distributions. Conclusion: • shifted exponential does not satisfy implicit requirement X ∈ [0, ∞) • we choose Erlang distribution; Lecture: Basics of traffic modeling II 27 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 7. Fit parameters of models Note the following: • no general algorithms; • algorithms are specific for a class of models; • there could be more than a single algorithm for a chosen model; • there could be no algorithms for a chosen model. General procedure: • determine parameters of the model: – these parameters must completely characterize a model; – not only parameters you are going to capture. • derive equation relating measuring statistics and parameters: – note that some parameters can be free. Lecture: Basics of traffic modeling II 28 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 8. Tests accuracy of the model Note the following: • sometimes is not performed: – when you exactly match parameters and sure about traffic properties. • is always needed: – approximation is always used at a certain step. Tests: • compare distributions: – χ2 test; – Smirnov’s test. • compare autocorrelations: – just visually; – use specific tests. Lecture: Basics of traffic modeling II 29 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 9. Example What we have to do: • propose a model of the aggregated traffic; • capture histogram and ACF as close as possible; • model should be further used in simulation study. network side ... 1 Figure 6: The point at which traffic is to be modeled. Lecture: Basics of traffic modeling II 30 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 9.1. Measuring the traffic at the point of interest We carried out two sufficiently long measurements: • reality: 2000 observations may not be sufficient! • disclaimer: these observations do not represent real traffic of any kind! Y(i) Y(i) 30 30 24 24 18 18 12 12 6 6 0 0 0 500 1000 1500 2000 0 500 1000 i (a) Experiment 1 1500 2000 i (b) Experiment 2 Figure 7: Traffic observations obtained in two experiments. Lecture: Basics of traffic modeling II 31 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 9.2. Estimating statistics What you may guess? • are they stationary ergodic? • what kind of distribution these traces come from? • is the same approximating distribution the same for both traces? • which model to use to capture statistics? What to do to get basic knowledge: • compute statistics; • analyze statistics to identify properties. What statistics we usually start with: • histogram of relative frequencies; • normalized autocorrelations function. Lecture: Basics of traffic modeling II 32 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 Histograms looks like as follows: • are they really normal? • testing using χ2 : yes with level of significance 0.9! fi,E(D) fi,E(D) 0.11 0.11 0.083 0.083 0.055 0.055 0.028 0.028 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 iD (a) Experiment 1 25 30 35 iD (b) Experiment 2 Figure 8: Histograms of presented traces with normal approximations. Lecture: Basics of traffic modeling II 33 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 NACFs look like as follows: • we have no anomalies; • such NACFs are inherent for stationary processes. KY(i) KY(i) 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 2 4 6 8 10 0 2 4 6 i, lag (a) Experiment 1 8 10 i, lag (b) Experiment 2 Figure 9: Normalized ACFs of presented traces with geometric approximations. Lecture: Basics of traffic modeling II 34 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 9.3. Choosing a candidate model What are our observations: • observations are stationary ergodic: assumption; • empirical distribution is normal; • ACF is distributed according to a single exponential/geometric term. Which model to guess: • autoregressive model or order 1: AR(1): Y (n) = φ0 + φ1 Y (n − 1) + (n), n = 1, 2, . . . , (2) – φ0 and φ1 are some parameters, ∼ N (0, σ 2 ); – marginal distribution is normal, NACF K(i) = φi1 , i = 0, 1, . . . . • Markov modulated model: – may approximate Normal distribution; – NACF is a sum of exponential/geometric terms. Lecture: Basics of traffic modeling II 35 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 9.4. Fitting AR(1) model What we have to do: estimate the following: φ0 , φ1 , σ 2 []. (3) Properties of AR(1) model: • if AR(1) process is covariance stationary we have: E[Y ] = µY , σ 2 [Y ] = γY (0), Cov(Y0 , Yi ) = γY (i). (4) • µY , σ 2 [Y ] and γY (i) of AR(1) are related to φ0 , φ1 and σ 2 [] as φ0 µY = , 1 − φ1 σ 2 [] σ [Y ] = , 1 − φ21 2 γY (i) = φi1 γY (0). (5) • φ0 , φ1 and σ 2 [] are related to statistics of observations as: φ1 = KX (1), φ0 = µX (1 − φ1 ), σ 2 [] = σ 2 [X](1 − φ21 ), (6) – KX (1), µX and σ 2 [X] are the lag-1 value of ACF, mean and variance of observations. Lecture: Basics of traffic modeling II 36 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 9.5. Testing for accuracy of fitting Why we need it: • we decided to capture histogram and NACF; • we fit only first two moments and lag-1 value of ACF! Is there a case when we need not to do testing: • assume we were to capture only µX , σ 2 [X] and KX (1); • since we explicitly fit them, AR(1) model exactly represents them. What allows us to assume we get fair approximation: • AR(1) model is characterized by only three parameters that all were matched; • distribution of AR(1) model is normal; • NACF of AF(1) models is geometrically distributed. Lecture: Basics of traffic modeling II 37 Networking analysis and dimensioning I D.Moltchanov, TUT, 2013 The first step: • generate trace from the model: – for simplicity you may generate exactly the same amount of observation. Test histograms using χ2 or Smirnov’s test for two samples: • first sample: empirical observations; • second sample: generated from model; • hypotheses to be tested: – H0 : distributions of two samples are the same; – H1 : distributions of both samples are different. Test NACFs: • you may carry out visual test by plotting NACFs of both samples; • you may test for significant correlation using Box-Ljiung statistics. Lecture: Basics of traffic modeling II 38