Modelling Global Internet Dynamics - SLAC

advertisement
Modelling Global Internet Dynamics
Dr. Robert Baker
Troy Mackay, Brett Carson, Dr. Rajanathan Rajaratnam
University of New England
Professor Les Cottrell
Stanford University
The Internet (Cheswick, 1999)
A Planned Shopping Centre
Space-Time Convergence
• This convergence, connecting
origin-destination pairs, is
defined by the rate of time
discounting (and distance
minimisation) and its rate is a
function of the technology of
transfer
• The space-time convergence
means that, at least theoretically,
the mathematical operators can
be projected beyond this
interaction to larger distance
scales and smaller time scales
• It suggests that the trip operators
is the same for the Internet as for
a shopping centre.
• As were approach the singularity
(for Internet Trips) , special
features emerge, such as ‘virtual
distance’, ‘virtual trips’ ‘time
reversal’
The Stanford Internet Experiments
• The Stanford experiments were undertaken
by Professor Les Cottrell at the Linear
Accelerator Centre, Stanford, USA.
• The Stanford experiments have been running
from 1998 to 2004 with various numbers of
monitoring sites and remote hosts. The year
2000 had the greatest connectivity between
number of monitoring sites and remote hosts
and presents the best opportunity to test the
model.
• It features 27 global monitoring sites in 2000
pinging transactions every hour to 170
remote hosts. The experiment measures the
time taken from these origin-destination
pairs and further measures the amount of
packets that were shed from congestion on
the route.
Definitions
Latency
Latency is a synonym for delay and measures how much time it takes for
a packet of data to get from one designated point to another in a network.
•Propagation: Constrained by the speed of light
•Transmission: The medium and size of the packet can introduce delay
•Router and other processing: Each hub takes time to examine the packet
•Internal connectivity: Delay within networks from intermediate devices
Latency and latitude/longitude co-ordinates will be the time-space
variables
Packet Loss
When too many packets arrive on an origin-destination trip, routers hold
them in buffers until the traffic decreases. When the buffer fills up during
times of congestion, the router drops packets. This is part of what is
called the ‘Internet Protocol’ (IP).
Packet loss is what is being measured here as a proxy of peak demand.
Chaos 2000 (Jan 1 and Jan 2, 2000)
1.2
1
0.8
0.6
0.4
0.2
0
1
-0.2
8
15 22
29 36 43
50 57
64 71
78 85
92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 211
General Space-Time Trip Differential Equation
J [ ]  A
 2
x
2
B
 2
x t
C
 2
t
2
D
 
x
E
 
t
 F
A Classification of Relevant Trip Equations using Space-Time
Operator Matrix
Variable 
/ x
2/ x2
/ t
Continuity Eq
Diffusion Eq
2/ t2
Supermarket Eq
(Gravity Model)
Wave Eq
2/ t2 +/ t
Internet Eq
2/ x2+/ x
Telegrapher’s Eq
Trips to a Supermarkets or Planned Shopping Centres (Time
Discounting Behaviour)

 Population Variable ( )  Population Variable 
x
c  exp( x)  Gravity Model
2
 Population Variable( )  Population Variable
2
 t
 c  sin t  or cost   Internet Demand Wave
Does this type of model apply to Internet traffic?
Does this type of model apply to Internet traffic?
Time-based Random Walk
Each monitoring site serves a number of remote hosts at a particular
locality and there are i remote hosts linked to each monitoring site i .
Assume that each of these remote hosts can hop to adjacent sites with
a frequency Γ that does not depend on the characteristics of i . These
hops can access sites forward in time or backwards in time. It is
assumed the movement forwards or backwards are equally likely.
Assumptions
1. The jump frequency of transactions between sites is constant and it is
assumed independent of the site index i and its location in space.
2. This frequency of movement does not depend on the distribution of
remote hosts or users in the neighbourhood of the i th site.
3. The time distance between sites and the type of transfer network does
not influence the process, the only thing that is important is the timebased ordering of the points.
4. The time distance between sites is very much smaller than the
smallest significant wavelength and the A amplitude of the demand
wave (A sin kt) must be insignificant outside kmax.
Map of Monitoring Sites
1
143.108.25.100
10
netmon.physics.carleton
18
netmon.desy.de
2
cmuhep2.phys.cmu.edu
11
otf1.er.doe.gov
19
sgiserv.rmki.kfki.hu
3
fermi.physics.umd.edu
12
vicky.stanford.edu
20
suncs02.cern.ch
4
gull.cs.wisc.edu
12
patabwa.stanford.edu
20
sunstats.cern.ch
5
hepnrc.hep.net
13
pitcairn.mcs.anl.gov
21
noc.ilan.net.il
6
jasper.cacr.caltech.edu
14
wwwmics.er.doe.gov
22
rainbow.inp.nsk.su
7
jlab7.jlab.org
15
dxcnaf.cnaf.infn.it
23
ccjsun.riken.go.jp
8
missinglink.transpac.or
16
gate.itep.ru
24
cloud.kaist.kr.apan.net
9
netdb3.es.net
17
icfamon.dl.ac.uk
25
yumj2.kek.jp
Map of Host Sites
• 1. We take the raw hourly
The Internet Demand Wave
packet loss data for a
given period and perform Data Reduction Method (Time)
an average for each hour.
• 2. This plotted for the
average week (per hour)
for the monitoring host/
remote host pairs.
• 3. The data begins Monday
00:00 local time of the
remote host and is
truncated to extract the
first five days (Monday to
Friday 00:00 -24:00 local
time). An inverse temporal
translation is made back to
GMT. We then can view the
graph of the Internet
demand wave. For example
Vicky.stanford.edu
4. Weekends tend to have significantly less congestion, so extracting
week days gives a cleaner Fourier spectrum
5. We apply a Discrete Fourier Transform to the data for each host/pair
For example Vicky.stanford.edu-pinglafex.cbpf.br
Phase vs Longitude Linear Least Squares Regression
Let

and  be longitude and phase respectively.
Let  i ,  i  be the longitude and phase of the ith data point.
Let G   and  L   be the linear least squares regression of the n data points
 i , i 
where
i  1n
Since we must have d  0,1 to satisfy the boundary conditions of continuity over
d
the 24 hour boundary.
The case d  0 corresponds to local congestion dominated data, where the
d
packet loss distribution is not strongly dependant on remote host longitude
and is most probably due to local effects only.
d
The case d  1
.
corresponds
to remote congestion dominated data, where the
packet loss distribution is correlated with the remote host longitude.
We wish to find
G
and
L
such as to minimize the normalized sum
of the squares of the residuals RG and
We have:
RG


RL
RL



1 n
2
2
min G  i   i  , G  i   i  2 

n i 1
1 n
2
2




min





,






2


i
G
i
i
G
i
n i 1


1 n
2
2









min




,





2


L
i
i
L
i
i
n i 1
1 n
2
2





min



,




2


L
i
L
i
n i 1



Critical points can occur at boundaries  G ,  L    ,  , discontinuities
or local extrema:
Thus we are able to minimize RG and
RL and find suitable  G and  L
by comparing the values at the possible critical points.
Scaling to the Earth’s Rotation: Global Periodicity [0,1]
We also wish to scale RG and
RL so as to produce a useful statistic
for comparison. To this end we multiply by a scale factor so that RG
and
RL
take values on the interval 0,1   . The maximum sum of
RMAX
squares of angular residuals
occurs when the data points are
uniformly distributedr in the direction perpendicular to the regression line.
are distributed uniformly distributed on the interval
So that residuals
r
  ,    . Thus
RMAX


1
2


2
r
 dr

2
3
Therefore, define the statistic which will define global (and local periodicity)
 
2
R
RMAX

3R
2
Table for Global and Local Periodicity for Internet Traffic 2000
SRC_HOST
SRC_LAT
SRC_LONG
N
N_0.05
N_0.05/N
PHASE
CHI^2/CHI_0^2
143.108.25.100
-22.0
-46.0
65
62
0.954
151.7
0.339
ccjsun.riken.go.jp
35.7
139.8
95
60
0.632
151.5
0.418
cloud.kaist.kr.apan.net
37.6
127.0
92
74
0.804
134.3
0.665
cmuhep2.phys.cmu.edu
40.4
-80.0
44
38
0.864
142.7
0.139
dxcnaf.cnaf.infn.it
44.5
11.3
63
47
0.746
-171.2
0.410
fermi.physics.umd.edu
39.0
-76.9
40
24
0.600
143.3
0.066
gate.itep.ru
55.0
37.0
69
63
0.913
-171.5
0.499
gull.cs.wisc.edu
43.1
-89.4
56
36
0.643
149.8
0.293
hepnrc.hep.net
41.0
-88.0
170
87
0.512
128.0
0.178
icfamon.dl.ac.uk
53.0
-2.0
60
59
0.983
173.2
0.324
jasper.cacr.caltech.edu
34.1
-118.1
64
41
0.641
128.1
0.167
jlab7.jlab.org
37.0
-76.0
209
130
0.622
136.6
0.186
missinglink.transpac.org
41.0
-87.0
45
27
0.600
156.1
0.061
netdb3.es.net
38.0
-122.0
70
47
0.671
134.5
0.186
netmon.desy.de
53.0
9.0
56
54
0.964
-177.0
0.416
netmon.physics.carleton.ca
45.4
-75.7
45
43
0.956
138.0
0.220
noc.ilan.net.il
31.8
35.2
45
34
0.756
170.1
0.231
otf1.er.doe.gov
38.9
-77.0
51
42
0.824
177.3
0.447
patabwa.stanford.edu
37.4
-122.2
81
38
0.469
146.3
0.064
pitcairn.mcs.anl.gov
41.9
-88.0
214
175
0.818
131.5
0.165
rainbow.inp.nsk.su
55.1
83.1
54
41
0.759
-136.6
0.621
sgiserv.rmki.kfki.hu
47.4
19.3
44
39
0.886
-160.6
0.555
suncs02.cern.ch
46.2
6.1
23
2
0.087
147.5
0.023
sunstats.cern.ch
46.2
6.1
142
83
0.585
144.2
0.080
vicky.stanford.edu
37.4
-122.2
83
40
0.482
143.1
0.060
wwwmics.er.doe.gov
38.9
-77.0
68
53
0.779
161.0
0.214
yumj2.kek.jp
36.1
140.3
47
47
1.000
-112.5
0.819
Avg
160.0
Vicky.stanford.edu (2000)
vicky.stanford.edu
180
120
phase_5
60
0
-180
-120
-60
0
-60
-120
-180
dst_long
60
120
180
Case Studies (2000)
•
•
•
•
•
vicky.stanford.edu (West USA)
hepnrc.hep.net (East USA)
sunstats.cern.ch ( Switzerland)
icfamon.dl.ac.uk (UK)
yumj2.kek.jp (Japan)
hepnrc.hep.net (East USA)
Internet Demand Wave (2000)
hepnrc.hep.net
180
120
phase_5
60
0
-180
-120
-60
0
60
120
-60
-120
-180
dst_long
Global/Local Periodicity Regression Plot
(2000) for 5% Periodicity
180
sunstats.cern.ch (Switzerland)
Internet Demand Wave (2000)
sunstats.cern.ch
180
120
phase_5
60
0
-180
-120
-60
0
60
120
-60
-120
-180
dst_long
Global/Local Periodicity Regression
Plot (2000) for 5% Periodicity
180
icfamon.dl.ac.uk (UK)
icfamon.dl.ac.uk
Internet Demand Wave (2000)
180
120
phase_5
60
0
-180
-120
-60
0
60
120
-60
-120
-180
dst_long
Global/Local Periodicity Regression Plot
(2000) for 5% Periodicity
180
yumj2.kek.jp (Japan)
yumj2.kek.jp
Internet Demand Wave (2000)
180
120
phase_5
60
0
-180
-120
0
-60
60
120
-60
-120
-180
dst_long
Global/Local Periodicity Regression Plot
(2000) for 5% Periodicity
180
(2) Time Gaussian Behaviour
What is the relations between distance and ping time
latencies?
Is Internet traffic normally distributed?
Spatial and Time Partitioning
Same as Padmanabhan and Subramanian (2001) Microsoft
Ping Times: 5-15ms; 16-25ms; 26-35ms,…
Distance Units: Concentric Aggregation, 75km; 150km,
225km;…
(a) The cumulative probability for a gravity-
type distribution for the distance between
client and proxy for America-Online
(Source: Padmanabhan and Subramanian,
2001)
(b) The cumulative probability for a gravitytype distribution for a regional shopping
mall (Bankstown Square, 1998 afternoon
distribution; Baker 2000)
(c) The results of a probe machine at Seattle,
USA, measuring transaction delay in four
categories (5-15ms; 25-35ms; 45-55ms 6575ms) relative to geographic distance.
(Source: Padmanabhan and Subramanian ,
2001; Baker 2001)
A Time Gaussian is a Solution of the
Time Discounting Differential Equation

1
2
 p t , x  
exp

t
/
4
Mx


12
o
2 M x 
Key relationship
t
2
=2MΔx.
vicky.stanford.edu (West USA) (1998-2003)
Testing the Relationship
t
2 =2Mx 1998-2003
hepnrc.hep.net (East USA)
Cumulative Frequency of Latency Bands
and Distance Mid-points 2000
Testing the Relationship
t
2 =2Mx 1998-2003
sunstats.cern.ch ( Switzerland)
Cumulative Frequency of Latency Bands and
Distance Mid-points 2000
Testing the Relationship
t
2
=2Mx 1998-2003
icfamon.dl.ac.uk (UK)
Testing the Relationship
t
2
=2Mx 1998-2003
3 Distance Decay
The distance decay metric is a
corollary of a time gaussian.
For example: hepnrc.hep.net
(East USA)
(a) Log-linear Gravity Model
(b) 3-D Contour Model
Showing Gaussian
Distribution
(c) 2-D Density Plot Showing
Gaussian Distribution
Space-Time Convergence
• This convergence, connecting origin-destination pairs, is defined by
the rate of time discounting (and distance minimisation) and its rate is
a function of the technology of transfer
• The space-time convergence means that, at least theoretically, the
mathematical operators can be projected beyond this interaction to
larger distance scales and smaller time scales
• It suggests that the trip operators are the same for the Internet as for a
shopping centre.
• As were approach the singularity (for Internet Trips) , special features
emerge, such as ‘virtual distance’, ‘virtual trips’ ‘time reversal’
Finite Difference Form
A continuous distribution can also be ‘sampled’, where we can work
backwards and derive the ‘finite difference’ form which can be solved
numerically.
Towards the end of this sampling, introduce a constant space-time
rectangular grid for the independent variables (t, x) by choosing points for
integers n and i.
xi = nx
tj = it
This grid system is shown below and are arbitrarily determined by x and
t. This system could represent the sampling mesh constructed to
provide data for space-time distributions in the space-time convergence
Time and Space Estimation
The time derivative is estimates by taking a Taylor expansion around the point ti
 i 1

1 2 2
  i  t
 t
 At 3 
2
t
i
t t i 2
t
Taking the differences, yields the central difference system

  i 1   i 1 2t   At 2 
t t i
The central second difference is stated as:
i  x   M t 2  i 1 ( x )   i 1 ( x )  2 i ( x )  At 2 
Similarly, for the space derivative is derived around a point xi estimated from data
forward over space from the revolution of the Earth (the Euler Forward scheme):

 i  1   i  x
x x i
Re-arranging the terms, yields the finite difference equation equivalent to the
supermarket differential equation:
 in1   in   ( in1   in1  2 i2
where  is the modulus representing the ratio of space to time mesh (Ghez,
1988) and is defined by:
  Mx / t 2
The trip to the destination (the n +1 site) is requires a convergence without any
oscillations and the finite difference trip back to the origin must be stable.
The finite difference equation cannot have oscillatory solutions and this will occur
if all the coefficient have the same sign.
The modulus of the space-time grid for the data collection is positive, like M, and
the coefficients of must be positive. Therefore, the modulus must obey the
inequality of 0    1 and the trip from the destination to the residence is
restricted by:
Mx  t
2
or for the gravity coefficient
k 2 x
 
2
 t
This is the gravity inequality for spatial interaction modelling for one time zone
and applies to distance minimisation strategies
There is a Gaussian inequality derived similarly for time minimisation strategies
Is there evidence for this inequality in the Internet Experiments?
2 Mx  t
2
Is there evidence for a gravity inequality?
k 2 x

2
 t
hepnrc.hep.net (East USA)
There is distance decay for the 5-15ms ping times (the ping times of least
congestion) is a negative exponential function with an R-squared value of 0.73
and β value of 0.015. The frequency for this distribution is calculated at 0.208 and
this corresponds to a more localised spatial interaction (less than 350km)
For the 15-25 ms latency, the log-linear regression still showed a significant line of
best fit where the R-squared value is 0.53 and the  value is now 0.004 meaning
that the destinations were dispersed over a wider area (less than 1000km). The
corollary a lower interaction frequency (where k = 0.11).
Conclusion
The space-time convergence suggests that Internet transactions should be part
of spatial interaction modelling
Using the packet loss demand proxy from the Stanford Internet experiments,
there is an Internet demand wave and it has similar features found in shopping
trip modelling
The Internet equation is defined by:
  2 
 2 
x t
t
This equation has two components
There is a local time gaussian component with distance decay: Distance does
matter!
There is a global drift component from the 24-hour rotation of the Earth.
There is a statistic that can classify sites as global or local periodic by
standardising to unity the Earth’s rotation as the slope of the regression line.
Conclusion (cont.)
•
The Internet allows for us to look at trip behaviour near the space-time
convergence.
•
The finite difference form allows for the examination of the convergence of the
space-time mesh near this point.
•
The result is an inequality for the convergence to be stable and the definition
of a gravity inequality.
•
Examination of the ping latency data from 1998-2004 for the Stanford Internet
experiments suggests the inequality for convergence exists and there is a
fundamental boundary from the speed of light in transmission.
•
The space-time distributions for one monitoring site hepnrc.hep.net (East
USA) suggest that the gravity inequality is robust for this site.
Download