here

advertisement
Basics of traffic modeling II
Lecturer: Dmitri A. Moltchanov
E-mail: dmitri.moltchanov@tut.fi
http://www.cs.tut.fi/kurssit/ELT-53606/
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
Outline
• Facts about the Internet traffic;
• Points of interest in traffic modeling;
• Step-by-step modeling procedure;
– a point of interest;
– level (layer) of interest;
– what statistics to capture;
– choosing a candidate model;
– fitting parameters;
– testing accuracy of a model;
• Example of traffic modeling using AR(1) process.
Lecture: Basics of traffic modeling II
2
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1. Facts about the Internet traffic
The most important fact: Internet traffic constantly changes in time (every 5̃ years).
Observations, trends and facts on the Internet traffic up to 2005:
• TCP accounts for most of the packet traffic in the Internet;
• traffic flows are bidirectional, but often asymmetric;
• most TCP sessions are short-lived;
• the packet arrival process in the Internet is not Poisson;
• the session arrival process may be approximated by Poisson distribution;
• packet sizes are bimodally distributed;
• unknown stochastic properties of packet arrivals;
• Internet traffic continues to changes.
Lecture: Basics of traffic modeling II
3
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1.1. Domination of TCP
TCP accounts for most of the packet traffic in the Internet:
• beginning of 90th :
– it was firstly observed that TCP dominates.
• middle of 90th :
– introduction of multimedia services;
– development of RTP, RTCP, RTSP... UDP share was expected to grow.
• beginning on 2000:
– TCP still dominant protocol;
– multimedia content also extensively uses TCP.
Reasons of TCP dominance:
• p2p technologies – up to 60% of all traffic;
• multimedia is usually placed on web pages;
• availability of TCP.
Lecture: Basics of traffic modeling II
4
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1.2. Bidirectional asymmetric traffic flows
Traffic flows are bidirectional, but often asymmetric:
• bidirectional exchange of data:
– ftp, http, ssh, e-mail, etc;
– request-response patterns;
– usually more traffic in downstream direction.
• asymmetric access:
– xDSL technologies;
– cable modems;
– GPRS, etc.
• what are the current trends:
– p2p applications may generate bidirectional asymmetric traffic;
∗ users usually limits upstream capabilities;
∗ asymmetric access technologies.
Lecture: Basics of traffic modeling II
5
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1.3. Short-lived TCP sessions
Most TCP sessions are short-lived:
• almost 90% of TCP connections exchange fewer than 10Kbytes:
– WWW service: request - response;
– http v1.0: separate connection for an object on a page;
– http v1.1: single connection for a page;
– most pages and objects are less than 10Kbytes in length.
• most of TCP connections lasts less than few seconds:
– driving force: http;
– may change due to p2p technologies.
• what the effect:
– heavy-tailed distribution of session sizes.
– heavy-tail: a lot of frequencies corresponding to large histogram bins;
– reasons for heavy-tail: most of the sessions are small, some a big (ftp).
Lecture: Basics of traffic modeling II
6
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1.4. Session arrivals are Poisson
The session arrival process may be approximated by Poisson distribution:
• what are the reasons:
– there are a lot of users getting access to a certain site;
– users can be assume the be independent;
– situation is similar to telephone network where Poisson assumption holds.
Lecture: Basics of traffic modeling II
7
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1.5. Special distribution of packet sizes
Packet sizes are bimodally distributed:
• around 50% of packets are as large as possible:
– these are TCP data packets;
– recall, it is determined by MTU of Ethernet: 1500 bytes;
– around 50% of packets are 1500 bytes in length.
• around 40% of packets are as small as possible:
– these are TCP ACKs;
– recall, it is determined by headers of TCP (20 bytes), IP (20 bytes): 40 bytes;
– around 40% of packets are 40 bytes in length.
• around 10% of packet lengths are uniformly distributed between 40 and 1500;
• additional peaks: fragmentation of IP packets.
Lecture: Basics of traffic modeling II
8
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1.6. Packet arrivals are not homogenous Poisson
The packet arrival process in the Internet is not homogenous Poisson:
• common belief was:
– aggregated traffic is Poisson (or at least Markovian) in nature;
– a lot of studies have been made with Poisson assumption;
– why? easy to use in performance evaluation studies
• reality:
– arrival process is not homogenous Poisson;
– interarrival times can be correlated;
– packet arrival process may not be even covariance stationary;
– there can be so-called packet ’clumps’ or ’batches’
• result: packet arrival process is far from common assumptions:
– what is the IP traffic nature???
Lecture: Basics of traffic modeling II
9
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1.7. Unknown properties of the IP traffic
What was suggested over decades:
• 80th : Poisson nature of the aggregated packet traffic:
– common agreement: this assumption is no longer valid!
• 90th : self-similar nature of the aggregated traffic:
– the most respected hypotheses today;
– seems a little bit strange.
• 2000: is it simply non-stationary?
– probably the correct answer:
– small timescales: stationary;
– long timescales: non-stationary.
Lecture: Basics of traffic modeling II
10
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1.8. Changing nature of Internet traffic
Internet traffic continuous to changes:
• what applications dominated over decades:
– 80 − 95: e-mail, remote access;
– 95 − 2005: WWW, large file transfers;
– 2005 − 2010: p2p/WWW/video – 60%/30%%/10%;
– 2010−: video is taking over the Internet.
• how to deal with:
– you cannot rely upon ’old measurements’;
– new measurements are always required;
– those which were made in early 00s (are) may not be representative.
Lecture: Basics of traffic modeling II
11
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
1.9. Find more about Internet traffic
Where you may find more about current Internet traffic:
• Internet traffic archive: http://ita.ee.lbl.gov;
• Internet traffic report: http://www.internettrafficreport.com;
• National laboratory for applied network research (NLANR): http://www.nlanr.net;
• NLANR measurement and operations analysis team (MOAT): http://moat.nlanr.net;
• National Internet measurement infrastructure (NIMI): http://www.ncne.nlanr.net/nimi;
• tcpdump measurements software: http://www.tcpdump.org;
• wireshark software: http://www.wireshark.com;
• research papers:
– free search engine: http://researchindex.org/;
– free search engine: http://scholar.google.com/;
– ieee: http://ieeexplore.ieee.org/Xplore/guesthome.jsp.
Lecture: Basics of traffic modeling II
12
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
2. Step-by-step traffic modeling procedure
Step-by-step procedure:
• determine the point of interest;
• determine the level of interest;
• measure traffic at the point of interest;
• decide what statistics should be captured;
• estimate statistics of traffic observations:
• choose a candidate model;
• fit parameters of the model;
• test accuracy of the model.
Lecture: Basics of traffic modeling II
13
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
3. Points of interest in traffic modeling
You have to take into account:
• where you are asked to model the traffic (evaluate performance, dimension a system)?
customer side
network side
2
3
1
Figure 1: Points at which traffic is usually measured and modeled.
Lecture: Basics of traffic modeling II
14
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
3.1. Point 1: particular application:
We distinguish between:
• voice application;
• video application;
• data transfers:
– ftp information;
– http information;
– ssh information.
What is important: properties of transport layer protocol and application:
• UDP: no specific pattern:
– does not affect much properties of application;
– you may model traffic of application only.
• TCP: very specific pattern:
– affect data transmission;
– should be taken into account.
Lecture: Basics of traffic modeling II
15
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
Congestion window, MSSs
18
16
TCP Reno
8
TCP Tahoe
4
1
time
Figure 2: TCP traffic pattern: TCP Reno and TCP Tahoe.
• how much traffic does the application have?
• ftp: large files; ssh, e-mail, http: large and small transfers.
Lecture: Basics of traffic modeling II
16
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
3.2. Point 2: aggregated traffic from a number of applications
We distinguish between:
• heterogenous applications;
• homogenous applications.
customer side
customer side
voip
voip
voip
video
voip
ftp
voip
voip
Figure 3: Homogenous and heterogenous traffic aggregates.
Lecture: Basics of traffic modeling II
17
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
3.3. Point 3: aggregated network traffic
What is that:
• aggregation of a large number of flows.
access router
backbone router
access router
access router
Figure 4: Aggregated backbone traffic.
• may have quite sophisticated properties;
• practically, cannot be obtained as superposition of individual flows.
Lecture: Basics of traffic modeling II
18
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
4. Level (layer) of interest for traffic modeling
Packet traffic can be represented:
• at the session level:
– request for downloading files from ftp server;
– request for downloading pages from www server.
• at the packet level.
Which level to choose:
• depends on your task;
General notes:
• session level:
– usually claimed for follow Poisson process;
– reality might be quite different!
• packet level: any behavior should be expected.
Lecture: Basics of traffic modeling II
19
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
5. What statistics to capture?
General answer is not straightforward:
• what are the aims of traffic modeling:
– just propose a new, better traffic model?
– carry out performance evaluation?
– what kind of performance evaluation simulation/analytic?
• how close you are going to describe the traffic:
– trade-off between accuracy and complexity!
– is it sufficient just to get basic ideas?
– is there interest in precise parameters?
• what statistics are important:
– mean, variance, distribution, ACF?
– you can never say before you get results;
– you can never say before you capture a certain parameter.
Lecture: Basics of traffic modeling II
20
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
5.1. What statistics are important?
Note the following:
• mean value:
– must be captured;
• variance:
– must be captured;
– one may use standard deviation or coefficient of variation instead.
• lag-1 ACF:
– was found to be important;
– may have unexpected effect.
• structure of the ACF:
– sometimes may affect significantly (e.g. long-range dependence).
• histogram of relative frequencies:
– captures all moments of one-dimensional distribution;
– required when you have to be pretty sure.
Lecture: Basics of traffic modeling II
21
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
5.2. Common matching schemes
Common matching:
• mean and variance:
– usually easy to do;
– used to get mean performance parameters.
• mean, variance and lag-1 ACF:
– there are a number of models and algorithms;
– relatively easy to do.
• mean and ACF:
– there are a number of models and algorithms;
– sometimes not easy to do.
• histogram:
– one may look for analytical distribution;
– usually easy to do using discrete distribution.
• histogram and ACF.
Lecture: Basics of traffic modeling II
22
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
6. Choosing a candidate model
Input information
• parameters of the traffic that have to be captured;
• a set of traffic models.
What you have to know:
• traffic models and their properties;
• analytical tractability of models:
– simulation: any model is suitable;
– analytical: only tractable models are suitable.
Examples:
• analytically tractable: renewal models, Markovian models;
• analytically intractable: most non-Markovian models.
Lecture: Basics of traffic modeling II
23
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
6.1. Classes of models and characteristics
Classes of models:
• renewal class of models:
– distribution can be arbitrary;
– ACF is zero for all lags: no autocorrelation.
• autoregressive class:
– distribution is normal;
– ACF is a sum of exponential/geometric terms.
• Markov-modulated models:
– distribution can be arbitrary;
– ACF is a sum of exponential/geometric terms.
• models with self similar properties:
– distribution can be either normal (FBM) or arbitrary (F-ARIMA);
– ACF non-zero for large lags: long-range dependence.
Lecture: Basics of traffic modeling II
24
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
6.2. Receipts
You may use the following when you are to capture:
• first two moments:
– Erlang, hyperexponential, exponential distributions;
– approximation by discrete distribution: p1 , p2 , . . . , pk such that
P
i
pi = 1.
• first m moments:
– special case of phase-type distribution;
– approximation by discrete distribution.
• first two moments and lag-1 of ACF:
– Markov modulated processes;
– autoregressive processes.
• first two moments and ACF:
– Markov modulated processes;
– autoregressive processes.
Lecture: Basics of traffic modeling II
25
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
6.3. Example of the problem
Assume we have:
• observations of RV X that is defined on [0, ∞);
• we have to capture first two moments: mean and SCV
E[X],
2
C = σ 2 /µ < 1.
(1)
What one may guess:
• Erlang distribution (E2 ):
– defined on [0, ∞);
– has C 2 < 1.
• shifted exponential distribution:
– defined on [d, ∞).
– has C 2 < 1.
• what to choose?
Lecture: Basics of traffic modeling II
26
Networking analysis and dimensioning I
fX(x)
D.Moltchanov, TUT, 2013
E2: fX(x) = bxe-bx
Shifted exp: fX(x) = be-b (x-d)
d
x
Figure 5: pdfs of shifted exponential and E2 distributions.
Conclusion:
• shifted exponential does not satisfy implicit requirement X ∈ [0, ∞)
• we choose Erlang distribution;
Lecture: Basics of traffic modeling II
27
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
7. Fit parameters of models
Note the following:
• no general algorithms;
• algorithms are specific for a class of models;
• there could be more than a single algorithm for a chosen model;
• there could be no algorithms for a chosen model.
General procedure:
• determine parameters of the model:
– these parameters must completely characterize a model;
– not only parameters you are going to capture.
• derive equation relating measuring statistics and parameters:
– note that some parameters can be free.
Lecture: Basics of traffic modeling II
28
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
8. Tests accuracy of the model
Note the following:
• sometimes is not performed:
– when you exactly match parameters and sure about traffic properties.
• is always needed:
– approximation is always used at a certain step.
Tests:
• compare distributions:
– χ2 test;
– Smirnov’s test.
• compare autocorrelations:
– just visually;
– use specific tests.
Lecture: Basics of traffic modeling II
29
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
9. Example
What we have to do:
• propose a model of the aggregated traffic;
• capture histogram and ACF as close as possible;
• model should be further used in simulation study.
network side
...
1
Figure 6: The point at which traffic is to be modeled.
Lecture: Basics of traffic modeling II
30
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
9.1. Measuring the traffic at the point of interest
We carried out two sufficiently long measurements:
• reality: 2000 observations may not be sufficient!
• disclaimer: these observations do not represent real traffic of any kind!
Y(i)
Y(i)
30
30
24
24
18
18
12
12
6
6
0
0
0
500
1000
1500
2000
0
500
1000
i
(a) Experiment 1
1500
2000
i
(b) Experiment 2
Figure 7: Traffic observations obtained in two experiments.
Lecture: Basics of traffic modeling II
31
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
9.2. Estimating statistics
What you may guess?
• are they stationary ergodic?
• what kind of distribution these traces come from?
• is the same approximating distribution the same for both traces?
• which model to use to capture statistics?
What to do to get basic knowledge:
• compute statistics;
• analyze statistics to identify properties.
What statistics we usually start with:
• histogram of relative frequencies;
• normalized autocorrelations function.
Lecture: Basics of traffic modeling II
32
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
Histograms looks like as follows:
• are they really normal?
• testing using χ2 : yes with level of significance 0.9!
fi,E(D)
fi,E(D)
0.11
0.11
0.083
0.083
0.055
0.055
0.028
0.028
0
0
0
5
10
15
20
25
30
35
0
5
10
15
20
iD
(a) Experiment 1
25
30
35
iD
(b) Experiment 2
Figure 8: Histograms of presented traces with normal approximations.
Lecture: Basics of traffic modeling II
33
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
NACFs look like as follows:
• we have no anomalies;
• such NACFs are inherent for stationary processes.
KY(i)
KY(i)
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0
2
4
6
8
10
0
2
4
6
i, lag
(a) Experiment 1
8
10
i, lag
(b) Experiment 2
Figure 9: Normalized ACFs of presented traces with geometric approximations.
Lecture: Basics of traffic modeling II
34
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
9.3. Choosing a candidate model
What are our observations:
• observations are stationary ergodic: assumption;
• empirical distribution is normal;
• ACF is distributed according to a single exponential/geometric term.
Which model to guess:
• autoregressive model or order 1: AR(1):
Y (n) = φ0 + φ1 Y (n − 1) + (n),
n = 1, 2, . . . ,
(2)
– φ0 and φ1 are some parameters, ∼ N (0, σ 2 );
– marginal distribution is normal, NACF K(i) = φi1 , i = 0, 1, . . . .
• Markov modulated model:
– may approximate Normal distribution;
– NACF is a sum of exponential/geometric terms.
Lecture: Basics of traffic modeling II
35
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
9.4. Fitting AR(1) model
What we have to do: estimate the following:
φ0 ,
φ1 ,
σ 2 [].
(3)
Properties of AR(1) model:
• if AR(1) process is covariance stationary we have:
E[Y ] = µY ,
σ 2 [Y ] = γY (0),
Cov(Y0 , Yi ) = γY (i).
(4)
• µY , σ 2 [Y ] and γY (i) of AR(1) are related to φ0 , φ1 and σ 2 [] as
φ0
µY =
,
1 − φ1
σ 2 []
σ [Y ] =
,
1 − φ21
2
γY (i) = φi1 γY (0).
(5)
• φ0 , φ1 and σ 2 [] are related to statistics of observations as:
φ1 = KX (1),
φ0 = µX (1 − φ1 ),
σ 2 [] = σ 2 [X](1 − φ21 ),
(6)
– KX (1), µX and σ 2 [X] are the lag-1 value of ACF, mean and variance of observations.
Lecture: Basics of traffic modeling II
36
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
9.5. Testing for accuracy of fitting
Why we need it:
• we decided to capture histogram and NACF;
• we fit only first two moments and lag-1 value of ACF!
Is there a case when we need not to do testing:
• assume we were to capture only µX , σ 2 [X] and KX (1);
• since we explicitly fit them, AR(1) model exactly represents them.
What allows us to assume we get fair approximation:
• AR(1) model is characterized by only three parameters that all were matched;
• distribution of AR(1) model is normal;
• NACF of AF(1) models is geometrically distributed.
Lecture: Basics of traffic modeling II
37
Networking analysis and dimensioning I
D.Moltchanov, TUT, 2013
The first step:
• generate trace from the model:
– for simplicity you may generate exactly the same amount of observation.
Test histograms using χ2 or Smirnov’s test for two samples:
• first sample: empirical observations;
• second sample: generated from model;
• hypotheses to be tested:
– H0 : distributions of two samples are the same;
– H1 : distributions of both samples are different.
Test NACFs:
• you may carry out visual test by plotting NACFs of both samples;
• you may test for significant correlation using Box-Ljiung statistics.
Lecture: Basics of traffic modeling II
38
Download