Modelling Global Internet Dynamics Dr. Robert Baker Troy Mackay, Brett Carson, Dr. Rajanathan Rajaratnam University of New England Professor Les Cottrell Stanford University The Internet (Cheswick, 1999) A Planned Shopping Centre Space-Time Convergence • This convergence, connecting origin-destination pairs, is defined by the rate of time discounting (and distance minimisation) and its rate is a function of the technology of transfer • The space-time convergence means that, at least theoretically, the mathematical operators can be projected beyond this interaction to larger distance scales and smaller time scales • It suggests that the trip operators is the same for the Internet as for a shopping centre. • As were approach the singularity (for Internet Trips) , special features emerge, such as ‘virtual distance’, ‘virtual trips’ ‘time reversal’ The Stanford Internet Experiments • The Stanford experiments were undertaken by Professor Les Cottrell at the Linear Accelerator Centre, Stanford, USA. • The Stanford experiments have been running from 1998 to 2004 with various numbers of monitoring sites and remote hosts. The year 2000 had the greatest connectivity between number of monitoring sites and remote hosts and presents the best opportunity to test the model. • It features 27 global monitoring sites in 2000 pinging transactions every hour to 170 remote hosts. The experiment measures the time taken from these origin-destination pairs and further measures the amount of packets that were shed from congestion on the route. Definitions Latency Latency is a synonym for delay and measures how much time it takes for a packet of data to get from one designated point to another in a network. •Propagation: Constrained by the speed of light •Transmission: The medium and size of the packet can introduce delay •Router and other processing: Each hub takes time to examine the packet •Internal connectivity: Delay within networks from intermediate devices Latency and latitude/longitude co-ordinates will be the time-space variables Packet Loss When too many packets arrive on an origin-destination trip, routers hold them in buffers until the traffic decreases. When the buffer fills up during times of congestion, the router drops packets. This is part of what is called the ‘Internet Protocol’ (IP). Packet loss is what is being measured here as a proxy of peak demand. Chaos 2000 (Jan 1 and Jan 2, 2000) 1.2 1 0.8 0.6 0.4 0.2 0 1 -0.2 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 211 General Space-Time Trip Differential Equation J [ ] A 2 x 2 B 2 x t C 2 t 2 D x E t F A Classification of Relevant Trip Equations using Space-Time Operator Matrix Variable / x 2/ x2 / t Continuity Eq Diffusion Eq 2/ t2 Supermarket Eq (Gravity Model) Wave Eq 2/ t2 +/ t Internet Eq 2/ x2+/ x Telegrapher’s Eq Trips to a Supermarkets or Planned Shopping Centres (Time Discounting Behaviour) Population Variable ( ) Population Variable x c exp( x) Gravity Model 2 Population Variable( ) Population Variable 2 t c sin t or cost Internet Demand Wave Does this type of model apply to Internet traffic? Does this type of model apply to Internet traffic? Time-based Random Walk Each monitoring site serves a number of remote hosts at a particular locality and there are i remote hosts linked to each monitoring site i . Assume that each of these remote hosts can hop to adjacent sites with a frequency Γ that does not depend on the characteristics of i . These hops can access sites forward in time or backwards in time. It is assumed the movement forwards or backwards are equally likely. Assumptions 1. The jump frequency of transactions between sites is constant and it is assumed independent of the site index i and its location in space. 2. This frequency of movement does not depend on the distribution of remote hosts or users in the neighbourhood of the i th site. 3. The time distance between sites and the type of transfer network does not influence the process, the only thing that is important is the timebased ordering of the points. 4. The time distance between sites is very much smaller than the smallest significant wavelength and the A amplitude of the demand wave (A sin kt) must be insignificant outside kmax. Map of Monitoring Sites 1 143.108.25.100 10 netmon.physics.carleton 18 netmon.desy.de 2 cmuhep2.phys.cmu.edu 11 otf1.er.doe.gov 19 sgiserv.rmki.kfki.hu 3 fermi.physics.umd.edu 12 vicky.stanford.edu 20 suncs02.cern.ch 4 gull.cs.wisc.edu 12 patabwa.stanford.edu 20 sunstats.cern.ch 5 hepnrc.hep.net 13 pitcairn.mcs.anl.gov 21 noc.ilan.net.il 6 jasper.cacr.caltech.edu 14 wwwmics.er.doe.gov 22 rainbow.inp.nsk.su 7 jlab7.jlab.org 15 dxcnaf.cnaf.infn.it 23 ccjsun.riken.go.jp 8 missinglink.transpac.or 16 gate.itep.ru 24 cloud.kaist.kr.apan.net 9 netdb3.es.net 17 icfamon.dl.ac.uk 25 yumj2.kek.jp Map of Host Sites • 1. We take the raw hourly The Internet Demand Wave packet loss data for a given period and perform Data Reduction Method (Time) an average for each hour. • 2. This plotted for the average week (per hour) for the monitoring host/ remote host pairs. • 3. The data begins Monday 00:00 local time of the remote host and is truncated to extract the first five days (Monday to Friday 00:00 -24:00 local time). An inverse temporal translation is made back to GMT. We then can view the graph of the Internet demand wave. For example Vicky.stanford.edu 4. Weekends tend to have significantly less congestion, so extracting week days gives a cleaner Fourier spectrum 5. We apply a Discrete Fourier Transform to the data for each host/pair For example Vicky.stanford.edu-pinglafex.cbpf.br Phase vs Longitude Linear Least Squares Regression Let and be longitude and phase respectively. Let i , i be the longitude and phase of the ith data point. Let G and L be the linear least squares regression of the n data points i , i where i 1n Since we must have d 0,1 to satisfy the boundary conditions of continuity over d the 24 hour boundary. The case d 0 corresponds to local congestion dominated data, where the d packet loss distribution is not strongly dependant on remote host longitude and is most probably due to local effects only. d The case d 1 . corresponds to remote congestion dominated data, where the packet loss distribution is correlated with the remote host longitude. We wish to find G and L such as to minimize the normalized sum of the squares of the residuals RG and We have: RG RL RL 1 n 2 2 min G i i , G i i 2 n i 1 1 n 2 2 min , 2 i G i i G i n i 1 1 n 2 2 min , 2 L i i L i i n i 1 1 n 2 2 min , 2 L i L i n i 1 Critical points can occur at boundaries G , L , , discontinuities or local extrema: Thus we are able to minimize RG and RL and find suitable G and L by comparing the values at the possible critical points. Scaling to the Earth’s Rotation: Global Periodicity [0,1] We also wish to scale RG and RL so as to produce a useful statistic for comparison. To this end we multiply by a scale factor so that RG and RL take values on the interval 0,1 . The maximum sum of RMAX squares of angular residuals occurs when the data points are uniformly distributedr in the direction perpendicular to the regression line. are distributed uniformly distributed on the interval So that residuals r , . Thus RMAX 1 2 2 r dr 2 3 Therefore, define the statistic which will define global (and local periodicity) 2 R RMAX 3R 2 Table for Global and Local Periodicity for Internet Traffic 2000 SRC_HOST SRC_LAT SRC_LONG N N_0.05 N_0.05/N PHASE CHI^2/CHI_0^2 143.108.25.100 -22.0 -46.0 65 62 0.954 151.7 0.339 ccjsun.riken.go.jp 35.7 139.8 95 60 0.632 151.5 0.418 cloud.kaist.kr.apan.net 37.6 127.0 92 74 0.804 134.3 0.665 cmuhep2.phys.cmu.edu 40.4 -80.0 44 38 0.864 142.7 0.139 dxcnaf.cnaf.infn.it 44.5 11.3 63 47 0.746 -171.2 0.410 fermi.physics.umd.edu 39.0 -76.9 40 24 0.600 143.3 0.066 gate.itep.ru 55.0 37.0 69 63 0.913 -171.5 0.499 gull.cs.wisc.edu 43.1 -89.4 56 36 0.643 149.8 0.293 hepnrc.hep.net 41.0 -88.0 170 87 0.512 128.0 0.178 icfamon.dl.ac.uk 53.0 -2.0 60 59 0.983 173.2 0.324 jasper.cacr.caltech.edu 34.1 -118.1 64 41 0.641 128.1 0.167 jlab7.jlab.org 37.0 -76.0 209 130 0.622 136.6 0.186 missinglink.transpac.org 41.0 -87.0 45 27 0.600 156.1 0.061 netdb3.es.net 38.0 -122.0 70 47 0.671 134.5 0.186 netmon.desy.de 53.0 9.0 56 54 0.964 -177.0 0.416 netmon.physics.carleton.ca 45.4 -75.7 45 43 0.956 138.0 0.220 noc.ilan.net.il 31.8 35.2 45 34 0.756 170.1 0.231 otf1.er.doe.gov 38.9 -77.0 51 42 0.824 177.3 0.447 patabwa.stanford.edu 37.4 -122.2 81 38 0.469 146.3 0.064 pitcairn.mcs.anl.gov 41.9 -88.0 214 175 0.818 131.5 0.165 rainbow.inp.nsk.su 55.1 83.1 54 41 0.759 -136.6 0.621 sgiserv.rmki.kfki.hu 47.4 19.3 44 39 0.886 -160.6 0.555 suncs02.cern.ch 46.2 6.1 23 2 0.087 147.5 0.023 sunstats.cern.ch 46.2 6.1 142 83 0.585 144.2 0.080 vicky.stanford.edu 37.4 -122.2 83 40 0.482 143.1 0.060 wwwmics.er.doe.gov 38.9 -77.0 68 53 0.779 161.0 0.214 yumj2.kek.jp 36.1 140.3 47 47 1.000 -112.5 0.819 Avg 160.0 Vicky.stanford.edu (2000) vicky.stanford.edu 180 120 phase_5 60 0 -180 -120 -60 0 -60 -120 -180 dst_long 60 120 180 Case Studies (2000) • • • • • vicky.stanford.edu (West USA) hepnrc.hep.net (East USA) sunstats.cern.ch ( Switzerland) icfamon.dl.ac.uk (UK) yumj2.kek.jp (Japan) hepnrc.hep.net (East USA) Internet Demand Wave (2000) hepnrc.hep.net 180 120 phase_5 60 0 -180 -120 -60 0 60 120 -60 -120 -180 dst_long Global/Local Periodicity Regression Plot (2000) for 5% Periodicity 180 sunstats.cern.ch (Switzerland) Internet Demand Wave (2000) sunstats.cern.ch 180 120 phase_5 60 0 -180 -120 -60 0 60 120 -60 -120 -180 dst_long Global/Local Periodicity Regression Plot (2000) for 5% Periodicity 180 icfamon.dl.ac.uk (UK) icfamon.dl.ac.uk Internet Demand Wave (2000) 180 120 phase_5 60 0 -180 -120 -60 0 60 120 -60 -120 -180 dst_long Global/Local Periodicity Regression Plot (2000) for 5% Periodicity 180 yumj2.kek.jp (Japan) yumj2.kek.jp Internet Demand Wave (2000) 180 120 phase_5 60 0 -180 -120 0 -60 60 120 -60 -120 -180 dst_long Global/Local Periodicity Regression Plot (2000) for 5% Periodicity 180 (2) Time Gaussian Behaviour What is the relations between distance and ping time latencies? Is Internet traffic normally distributed? Spatial and Time Partitioning Same as Padmanabhan and Subramanian (2001) Microsoft Ping Times: 5-15ms; 16-25ms; 26-35ms,… Distance Units: Concentric Aggregation, 75km; 150km, 225km;… (a) The cumulative probability for a gravity- type distribution for the distance between client and proxy for America-Online (Source: Padmanabhan and Subramanian, 2001) (b) The cumulative probability for a gravitytype distribution for a regional shopping mall (Bankstown Square, 1998 afternoon distribution; Baker 2000) (c) The results of a probe machine at Seattle, USA, measuring transaction delay in four categories (5-15ms; 25-35ms; 45-55ms 6575ms) relative to geographic distance. (Source: Padmanabhan and Subramanian , 2001; Baker 2001) A Time Gaussian is a Solution of the Time Discounting Differential Equation 1 2 p t , x exp t / 4 Mx 12 o 2 M x Key relationship t 2 =2MΔx. vicky.stanford.edu (West USA) (1998-2003) Testing the Relationship t 2 =2Mx 1998-2003 hepnrc.hep.net (East USA) Cumulative Frequency of Latency Bands and Distance Mid-points 2000 Testing the Relationship t 2 =2Mx 1998-2003 sunstats.cern.ch ( Switzerland) Cumulative Frequency of Latency Bands and Distance Mid-points 2000 Testing the Relationship t 2 =2Mx 1998-2003 icfamon.dl.ac.uk (UK) Testing the Relationship t 2 =2Mx 1998-2003 3 Distance Decay The distance decay metric is a corollary of a time gaussian. For example: hepnrc.hep.net (East USA) (a) Log-linear Gravity Model (b) 3-D Contour Model Showing Gaussian Distribution (c) 2-D Density Plot Showing Gaussian Distribution Space-Time Convergence • This convergence, connecting origin-destination pairs, is defined by the rate of time discounting (and distance minimisation) and its rate is a function of the technology of transfer • The space-time convergence means that, at least theoretically, the mathematical operators can be projected beyond this interaction to larger distance scales and smaller time scales • It suggests that the trip operators are the same for the Internet as for a shopping centre. • As were approach the singularity (for Internet Trips) , special features emerge, such as ‘virtual distance’, ‘virtual trips’ ‘time reversal’ Finite Difference Form A continuous distribution can also be ‘sampled’, where we can work backwards and derive the ‘finite difference’ form which can be solved numerically. Towards the end of this sampling, introduce a constant space-time rectangular grid for the independent variables (t, x) by choosing points for integers n and i. xi = nx tj = it This grid system is shown below and are arbitrarily determined by x and t. This system could represent the sampling mesh constructed to provide data for space-time distributions in the space-time convergence Time and Space Estimation The time derivative is estimates by taking a Taylor expansion around the point ti i 1 1 2 2 i t t At 3 2 t i t t i 2 t Taking the differences, yields the central difference system i 1 i 1 2t At 2 t t i The central second difference is stated as: i x M t 2 i 1 ( x ) i 1 ( x ) 2 i ( x ) At 2 Similarly, for the space derivative is derived around a point xi estimated from data forward over space from the revolution of the Earth (the Euler Forward scheme): i 1 i x x x i Re-arranging the terms, yields the finite difference equation equivalent to the supermarket differential equation: in1 in ( in1 in1 2 i2 where is the modulus representing the ratio of space to time mesh (Ghez, 1988) and is defined by: Mx / t 2 The trip to the destination (the n +1 site) is requires a convergence without any oscillations and the finite difference trip back to the origin must be stable. The finite difference equation cannot have oscillatory solutions and this will occur if all the coefficient have the same sign. The modulus of the space-time grid for the data collection is positive, like M, and the coefficients of must be positive. Therefore, the modulus must obey the inequality of 0 1 and the trip from the destination to the residence is restricted by: Mx t 2 or for the gravity coefficient k 2 x 2 t This is the gravity inequality for spatial interaction modelling for one time zone and applies to distance minimisation strategies There is a Gaussian inequality derived similarly for time minimisation strategies Is there evidence for this inequality in the Internet Experiments? 2 Mx t 2 Is there evidence for a gravity inequality? k 2 x 2 t hepnrc.hep.net (East USA) There is distance decay for the 5-15ms ping times (the ping times of least congestion) is a negative exponential function with an R-squared value of 0.73 and β value of 0.015. The frequency for this distribution is calculated at 0.208 and this corresponds to a more localised spatial interaction (less than 350km) For the 15-25 ms latency, the log-linear regression still showed a significant line of best fit where the R-squared value is 0.53 and the value is now 0.004 meaning that the destinations were dispersed over a wider area (less than 1000km). The corollary a lower interaction frequency (where k = 0.11). Conclusion The space-time convergence suggests that Internet transactions should be part of spatial interaction modelling Using the packet loss demand proxy from the Stanford Internet experiments, there is an Internet demand wave and it has similar features found in shopping trip modelling The Internet equation is defined by: 2 2 x t t This equation has two components There is a local time gaussian component with distance decay: Distance does matter! There is a global drift component from the 24-hour rotation of the Earth. There is a statistic that can classify sites as global or local periodic by standardising to unity the Earth’s rotation as the slope of the regression line. Conclusion (cont.) • The Internet allows for us to look at trip behaviour near the space-time convergence. • The finite difference form allows for the examination of the convergence of the space-time mesh near this point. • The result is an inequality for the convergence to be stable and the definition of a gravity inequality. • Examination of the ping latency data from 1998-2004 for the Stanford Internet experiments suggests the inequality for convergence exists and there is a fundamental boundary from the speed of light in transmission. • The space-time distributions for one monitoring site hepnrc.hep.net (East USA) suggest that the gravity inequality is robust for this site.