Information Transfer and Phase Transitions in a Model of Internet

advertisement
Phase Transitions in a Model of
Internet Traffic
Ricard V. Solé
Sergi Valverge
SFI WORKING PAPER: 2000-03-020
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the
views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or
proceedings volumes, but not papers that have already appeared in print. Except for papers by our external
faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or
funded by an SFI grant.
©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure
timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights
therein are maintained by the author(s). It is understood that all persons copying this information will
adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only
with the explicit permission of the copyright holder.
www.santafe.edu
SANTA FE INSTITUTE
Information Transfer and Phase Transitions in a Model of Internet TraÆc
1
Ricard V. Sole1;2 and Sergi Valverde1
Complex Systems Research Group, Departament of Physics, FEN
Universitat Politecnica de Catalunya, Campus Nord B4, 08034 Barcelona, Spain
2
Santa Fe Institute, 1399 Hyde Park Road, New Mexico 87501, USA
In a recent study, Ohira and Sawatari presented a simple model of computer network traÆc dynamics. These authors showed that a phase transition point is present separating the low-traÆc
phase with no congestion from the congestion phase as the packet creation rate increases. We further investigate this model by relaxing the network topology using a random location of routers.
It is shown that the model exhibits nontrivial scaling properties close to the critical point, which
reproduce some of the observed real Internet features. At criticality the net shows maximum information transfer and eÆciency. It is shown that some of the key properties of this model are shared
by highway traÆc models, as previously conjectured by some authors. The relevance to Internet
dynamics and to the performance of parallel arrays of processors is discussed.
PACS number(s): 87.10.+e, 0.5.50.+q, 64.60.Cn
I. INTRODUCTION
The exchange of information in complex networks and
how these networks evolve in time has been receiving increasing attention by physicists over the last years [1,2].
In particular, it has been shown that the growth dynamics of the World Wide Web (WWW) follows some characteristic traits displayed by generic models of growth
in random graphs [3]. The presence of scaling in the
distribution of connections between nodes of the WWW
[3] or in the number of pages per web site [4] are consistent with other analyses involving the dynamical patterns
displayed, such as the download relaxation dynamics [5]
which also decays as a power law.
The WWW is a virtual graph connecting nodes containing dierent amounts of information. This information ows through a physical support which also displays
scale-free behavior. The network of computers is a complex system by itself, and complex dynamics has been
detected suggesting that self-similar patterns are also at
work [6].
Some previous studies have shown evidence for criticallike dynamics in Computer Networks [7] in terms of fractal, 1/f noise spectrum as well as long-tail distributions of
some characteristic quantities. Some authors have even
speculated about the possibility that the traÆc of information through computer networks (such as Internet)
can display the critical features already reported in cellular automata models of traÆc ow, such as the NagelSchreckenberg (NS) model [8]. The NS model shows that
as one increases the density of cars , a well-dened transition occurs at a critical density c . This transition separates a uid phase showing no jams from the jammed
phase were traÆc jams emerge. At the critical boundary,
the rst jams are observed as back-propagating waves
with fractal properties.
FIG. 1. Model network architecture (two-dimensional lattice, periodic boundary conditions). Two types of nodes are
considered: hosts (gray squares) which can generate and receive messages, and routers (open circles) which can store and
forward messages
A number of both quantitative and qualitative observations of the real computer network dynamics reveals
some features of interest:
1. Extensive data mining from Internet/Ethernet trafc shows that it displays long-range correlations
[6] with well dened persistence, as measured by
means of the Hurst exponent. This analysis totally rejected the previous theoretical approach to
Poisson-based (Markovian) models assuming statistical independence of the arrival process of information.
2. Fluctuations in density of packets show well-dened
self similar behavior over long time scales. This
has been measured by several authors [7] [9]. The
1
square, L L lattice will be indicated as L(L), following
previous notation [13]. All our simulations are performed
using periodic boundary conditions. In previous papers,
either the hosts were distributed through the boundary
[12] (and thus the inner nodes were routers) or all nodes
were both hosts and routers [13]. Here we consider a more
realistic situation, were only a fraction of the nodes are
hosts and the rest are routers [14].
The location of each object, r 2 L(L), will be given by
r = icx + j cy , where cx ; cy are Cartesian unit vectors.
So the set of nearest neighbors C (r) is given by
C (r) = fr cx ; r + cx ; r cy ; r + cy g
(1)
Each node maintains a queue of unlimited length where
the packets arriving are stored. The local number of
packets will be indicated as n(r; t) and thus the total
number of packets in the system will be
power spectrum is typically a power law, although
local (spatial) dierences have been shown to be
involved.
3. The statistical properties of Internet congestion reveal long-tailed (lognormal) distributions of latencies [refs]. Here latency times TL are thus given
by
1p exp ln TL P (TL) =
22
TL 2
Latencies are measured by performing series of experiments in which the round-trip times of ping
packets is averaged over many sent messages between two given nodes.
4. There is a clear feedback between the bottom-level
were users send their messages through the net and
increase network activity (and congestion) and the
top-level described by the overall network activity.
Users are responsible of the global behavior (since
packets are generated by users) and the later modies the individual decisions (users will tend to leave
the net if it becomes too congested).
On the other hand, previous studies on highway traÆc
dynamics revealed that the phase transition point presented by the models as the density of cars increased
waslinked with a high degree of unpredictability [10]. Interestingly, this is maximum at criticality [11] as well
as the ow rate. In other words, eÆciency and unpredictability are connected by the phase transition. In
this paper the previous conjecture linking Internet dynamics with critical points in highway traÆc is further
explored. By considering a generalization of the OhiraSawatari model, we show that the all the previously reported features of real traÆc dynamics are recovered by
the model. The paper is organized as follows. In section
II, the basic model and its phase transition is presented,
together with a continuous mean eld approximation. In
section III the self-similar character of the time dynamics
is shown by means of the calculation of the latency times
and queue distributions as well as by means of spectral
and Hurst analysis. In section IV the eÆciency and information transfer are calculated for dierent network sizes.
In section V our main conclussions and a discussion of its
implications is presented.
N (t) =
X
r2L(L)
n(r; t)
(2)
And the metric used in our system will be given by
the Manhattan metric dened for lattices with periodic
boundaries [ref]:
dpm (r1 ; r2 ) = L
ji
1
i2j
L
2 j jjj
1
j2 j
L 2
where rk = (ik ; jk ).
The rules are dened as in the OS deterministic model
(the stochastic version only shows the dierences already
reported by those authors [12]). The rules are dened as
follows [14]:
Creation: The hosts create packets following a random uniform distribution with probability . Only
another host can be the destination of a packet,
which it is also selected randomly. Finally, this
new packet is appended at the end of the host tail.
Routing: Each node picks up the packet at the
head of its queue and decides which outgoing link is
better suited to the packet destination. Here, the
objective is to minimize the communication time
for any single message, taking into account only
shortest paths and avoiding congested links as well.
First, the selected link is the one that points to a
neighbour node that is nearer to the packet destination. Second, when two choices are possible, the
less congested link is selected. The measure of congestion of a link is simply dened as the amount
of packets forwarded through that link. Once the
node has made the routing decision, the packet is
inserted at the end of the queue of the node selected
and the counter of the outgoing link is incremented
by one.
These rules are applied to each site and each L L
updatings dene our time step.
II. MODEL OF COMPUTER NETWORK
TRAFFIC
Following the work by Ohira and Sawatari (OS), let us
consider a two-dimensional network with a square lattice
topology with four nearest neighbors [12]. The network
involved two types of nodes: hosts and routers. The rst
are nodes that can generate and receive messages and
the second can only store and forward messages. Our
2
is available) but accumulate as a consequence of already
jammed nodes. This can be mathematically writen as:
Average latency
15000
10000
dN
= L2
dt
T=27000
T=19000
T=12000
A
Delivered packets
0.2
0.4
0.6
0.8
1.0
300000
200000
100000
0
0.0
0.4
λ
0.6
0.8
(3)
0.50
B
0.2
for < c , nite values of N are obtained, corresponding
to the non-jammed phase. Once the threshold is reached,
packets accumulate and cannot be successfully delivered
to their destinations. For this phase, non-bounded values of N are obtained (consistently with the simulation
model). The last result is also consistent with previous
studies [13] for = 1 where the critical (free packet) delay
was estimated as c = L=2. Using this value, we obtain
a critical parameter c = 2=L as reported by Fucks and
Lawniczak [13].
T=27000
T=19000
T=12000
400000
L2 N
N
L2
Where will be the inverse of average latency . We
obtain the xed points
"
#
L2
4
1=2
N =
(4)
2 1 1 The previous result gives us the critical line separating
the two phases in our system:
c =
(5)
4
5000
0
0.0
500000
1.0
CONGESTION
PHASE
0.40
FIG. 2. (A) Phase transition in network traÆc. Here a
0.30
λ
L = 32 lattice has been used and the average latency has
been computed over dierent, increasing intervals of time T
steps, as indicated. The density of hosts is = 0:08. (B) As
a measure of eÆciency, the number of delivered packets Ndp
0.20
has been measured under the same conditions. We can see the
optimum at the critical point c 0:2. For < c we have
a linear increase Ndp = with = L2 T , corresponding to
the number of released paquets.
0.10
This model exhibits a similar phase transition than the
one reported in previous studies [12,13]. It is shown in
gure 2 for a L = 32 system with = 0:08 (the same density is used in all our simulations). We can see that the
transition occurs at a given c 0:2. As it occurs with
models of highway traÆc, the ow of packets is maximized at criticality, as shown in gure 2B, where the
number of delivered packets (indicated as NDP) is plotted.
A simple mean eld model can be obtained for the total
number of packets N (t). The number of traveling packets
increases as a consequence of the constant pumping from
the hosts, which occurs at a rate L2 . On the other
hand packets are removed from the system if the lattice
is not too congested (i. e. if free space for movement
0.00
0.00
FREE
PHASE
0.20
0.40
ρ
0.60
0.80
1.00
FIG. 3. Phase diagram obtained from the mean eld model
for L = 32. The continuous line shows the transition curve for
the mean eld. The black circles correspond to the simulated
system with the same size. We can see that both approach
the same values as for high densities of hosts
III. SCALING AND SELF-SIMILARITY
An example of the time series at criticality for the previous system is shown in gure 4a-b. It conrms our expectations and previous observation from real computer
3
F ( ) is obtained from
traÆc: the local uctuations in the number of packets
n(r; t) are self-aÆne, as we can appreciate from an enlargement of the rst plot. This is conrmed by the calculation of the power spectrum P (f ). It is shown in gure
4 and scales as P (f ) f with = 0:97 0:06. There
is some local variability in the value of the scaling exponent through space, but it is typically inside the interval
0:75 < < 1:0, in agreement with data analysis [7,9].
F ( ) < (Æy( ))2 >
(7)
where Æy( ) = y(t0 + ) y(t0 ). The angular brackets
indicate an average over all times t0 . If n(r; t) is random
or a simple Markov process, then we have F (t) t1=2 .
But if no characteristic time scale is involved, then a
scaling F (t) t is observed with 6= 1=2. In our system, persistent uctuations are found, with an exponent
2 (0:8; 1) for most nodes in the lattice.
60
A
50
< (Æy( )) >2 1=2
1
10
n(r,t)
40
−0.97
0
10
30
−1
Power
20
10
0
0
20000
10
−2
10
−3
10
40000
30
−4
10
B
−4
10
−3
−2
10
10
−1
10
0
10
Frequency
n(r,t)
20
5
10
F(t)
10
0
6000
11000
3
10
2
10
i=1
n(r; t)
1
10
2
10
3
10
4
10
5
10
t
FIG. 5. Top: Power spectrum P (f ) computed from the
time series shown in gure 3a. A well dened scaling is at
work over four decades, with 1; Bottom: the rms uctuation F (t), as dened in (7). A power law t F (t) t has
been calculated for 102 < t < 104 which gives = 1:01 0:02
An additional, complementary measure is obtained by
means of the use of "random walk" methods able to detect long-range correlationS [15]. This has been used in
Ethernet data analysis [6]. Starting from the time series of the number of packets at a given host, n(r; t), the
running sum
X
4
10
16000
Time step
FIG. 4. An example of the time series dynamics of the
number of packets n(r; t) at a given arbitrary node. Here
a L = 256; = 0:08 and = c = 0:055. We can see in
(A) uctuations of many sizes, which display self-aÆnity, as
we can see from (B) where the fraction of the previous time
series indicated by means of a window has been enlarged.
y( ) 1.01
The statistics of latencies and queue lengths leads to
long tails close to criticality. Some examples of the results obtained are shown in gure 6. Here a L = 256
lattice has been used. Latencies are measured as the
number of steps needed to travel from emiting hosts to
their destinations. The distribution of latencies close to
(6)
is computed, and the root mean square (rms) uctuation
4
c is a log-normal, in agreement with the study of Hu-
IV. EFFICIENCY, UNCERTAINTY AND
berman and Adamic for round-trip times of ping packets
[10]. This means that there is a characteristic latency
time but also very long tails: a high uctuation regime
is present. As goes into the congestion phase, longer
times are present but also long tails. This is due to the
fact that at this phase the number of packets is allways
increasing with time.
INFORMATION TRANSFER
As mentioned at the introduction, models of highway
traÆc ow revealed that the ow of cars (and thus the
system's eÆciency) is maximal at the critical point but
that the unpredictability is also maximal.
EÆciency can be measured in several ways. One is
close to our model properties: eÆciency is directly linked
to information transfer and thus information-based measures can be used. Here we consider an information-based
characterization of the dierent phases by means of the
Markov partition . Specically the following binary
choice is performed:
= fn(r) = 0 ) S (r) = 0 ; n(r) > 0 ) S (r) = 1g
which essentially separates non-jammed from jammed
nodes.
The distribution of queue lengths is equivalent to the
distribution of jam sizes in the highway traÆc model.
As with the Nagel-Schreckenberg model, the distribution
approaches a power law for c but it also displays
some bending at small values (and a characteristic cuto
at large values). This is probably the result of the presence of spatial structures, which propagate as waves of
congestion and will be analysed elsewhere (Valverde and
Sole, in preparation).
Information transfer
0.015
10000
A
Frequency
Frequency
8000
6000
4000
10
5
10
4
10
3
10
2
10
1
10
0
Frequency
10
4
10
3
10
2
10
1
10
0
0
0.07
0
1
2
5000
10000
Latency
0.010
0.005
0.000
0.00
3
4
10 10 10 10 10 10
Latency
2000
0
0.055
0.06
15000
L=64
L=128
L=32
0.20
0.40
λ
5
0.60
0.80
FIG. 7. Information transfer for three dierent lattice sizes
(as indicated). Information transfer grows rapidly close to
criticality but reaches a maximum at some point close to
c .
20000
Information transfer is maximized close to secondorder phase transitions [16] and should be maximum at
c . In order to compute this quantity we will take use
of the previous partition . Let S (r) and S (k) the binary states associated with two given hosts in L. The
-entropy for each host is given by:
X
H (r) =
P (S (r)) log P (S (r))
(8)
B
0.06
0.07
0.055
S (r)=0;1
10
1
2
10
Queue length
10
and the joint entropy for each pair of hosts,
3
H (r; k) =
FIG. 6. (A) Log-normal distribution of latency times at
criticality for a L = 256 system. Here c 0:055. Inset:
three examples of these distributions in log-log scale for three
dierent values (as indicated); (B) Distributions of queue
lengths for the same system at dierent rates. Scaling is
observable at intermediate values close to criticality.
X
S (r);S (k)=0;1
P (r; k) log P (r; k)
(9)
where for simplicity we use: P (r; k) P (S (r); S (k)) to
indicate the joint probability.
From the previous quantities, we can compute the information transfer between two given hosts. It will be
5
of traveling. This situation sharply changes in the neighborhood of c where the uctuations (experienced as local congestion) lead to a rapid increase in the variance.
As grows beyond the transition, these uctuations are
damped and (TL ) decays slowly. This result also conrms the study by Nagel and co-workers who analysed
the behavior of the variance of travel times [11] for a
closed-loop system. They found that there was a nontrivial implication for this result: increasing eÆciency (i.
e. traÆc ow) tunes the system to criticality and as a
consequence to unpredictable behavior.
given by:
M (r; k) = H (r) + H (k) H (r; k)
(10)
The average information transfer will be computed from
M q =< M (r; k) > Were the brackets indicate average
over a sample of q hosts randomly chosen from the whole
set (here q = 100).
At the sub-critical domain, in terms of information
transfer under the Markov partition, all pairs of nodes
will be typically in the non-congested (free state) and
P (i; j ) Æ00 so it is easy to see that in this phase we
have vanishing entropies and the mutual information is
small. The information is totally dened by the entropy
of the single nodes, as far as the correlations are trivial.
A similar situation holds at the congestion phase, where
nodes are typically congested. At intermediate values,
the uctuations inherent to the system lead to a diversity of states that gives a maximum information transfer
at some c . It should be noted however that this measure is not very good for small systems, where c > c ,
but we can see that c ! c as L increases.
Unpredictability will be measured, following Nagel and
Rasmussen [11] by means of the normalized variance of
latencies:
[(T < TL >)2 ]1=2
(11)
(T ) = L
L
V. DISCUSSION
In this paper we have analysed the statistical properties of a computer network traÆc model. This model is a
simple extension of the Ohira-Sawatari system, but with
a random distribution of hosts scattered through L with
a density . One of the goals of our study was to see if
the reported regularities from real networks of computers
(such as the Internet) were similar to those observed in
highway traÆc and reproducible by our model. The second was to explore the possibility that the observed features correspond to those expected from a near-to-critical
system. We have presented evidence that real Internet
traÆc takes place close to a phase transition point.
The model has been shown to match some basic properties of Internet dynamics: (i) it shows self-aÆne patterns of activity close to criticality, consistent with the
fractal nature of computer traÆc; (ii) the observed time
series display 1=f behavior and the corresponding Hurst
exponents reveal the presence of persistence and longrange correlations in congestion dynamics, as reported
from real data; (iii) the distribution of latency times close
to the transition point is a lognormal, and the distribution of queue lengths approaches power laws with some
bending for small lengths (as in highway traÆc models).
The model conrms the previous conjecture [7] suggesting some deep links between the NS model and the
dynamics displayed by computer networks. In this sense
the previous measures and other quantitative characterizations support the idea that the two type of traÆc share
some generic features. The model also exhibits the same
kind of variance plot shown by the NS and related models: it is almost zero at the subcritical (free) regime and
it abruptly grows close to c . This leads to the same
conclusion pointed by Nagel and co-workers: maximum
eÆciency leads to complex dynamics and unpredictable
behavior.
Some authors have discussed the origins of Internet
congestion in terms of the interactions among users [10].
Huberman and Luckose suggested that this is a particularly interesting illustration of a social dilema. Our
study suggests a somewhat complementary view: there
is a feedback between the system's activity and the user's
behavior. Users introduce new packets into the system,
< TL >
where < TL > is the average over a given number of
steps.
2.5
Variance
T=12000
T=19000
T=27000
2.0
1.5
1.0
0.1
0.2
0.3
0.4
0.5
λ
FIG. 8. The variance plot for the L = 32 system. The
critical point c is perfectly indicated by this measure with a
sharp maximum. Three dierent times have been used in the
averages.
The unpredictable nature of the critical point is
sharply revealed by the plot of the variance (TL ) (gure
8). We can see that, as it was shown by Nagel and coworkers for highway traÆc, the system shows the highest
unpredictability close to the critical point. At the subcritical regime < c , the packets reach their destinations in a time close to the characteristic, average time
6
thus enhancing the congestion of the net. As congestion
increases, users tend to leave the net, thus reducing local
activity. This type of feedback is similar to the dynamics
characteristic of self-organized critical systems (such as
sandpiles) [17]. The main dierence arises from the driving. Activity is being introduced into the system without
a complete temporal separation between two scales. In
this sense, this is not a self-organized critical system but
it is close enough to be the appropiate theoretical framework. An immediate extension of this model should contain a self-tuning of : users might increase their levels
of activity if congestion is low and decrease it (or leave
the system) in a too congested situation. In this way the
system might self-organize close to the phase transition.
Our previous results can also be applied to other, similar networks. This is the case of large, parallel arrays
of processors. In this sense, some previous studies [18]
have shown the validity of the Ohira Sawatari model in
describing the overall dynamics of small arrays of processors with simple topologies. They also found some
additional phenomena such as the presence of hot spots,
which we have also observed in our model. This work can
be extended to high-dimensional parallel systems such as
the connection machine [19] in order to test the presence
of phase transitions and their dependence on dimensionality (in particular they will help to determine the upper
critical dimension for this system).
Internet dynamics and the WWW growth provide an
extremely interesting, real evolution experiment of a
complex adaptive system [10]. In a near future, we are
likely to see new types of behavior in the web. As Daniel
Hillis predicts, as the information available on the Internet becomes richer, and the types of interactions among
computers become more complex, we should expect to
see new emergent phenomena going beyond any that has
been explicitly programmed into the system [20]. Models based on phase transitions in far from equilibrium
systems will be of great help in providing an appropiate
theoretical framework.
4. B. A. Huberman and L. A. Adamic, Nature 401,
131 (1999)
5. A. Johansen and D. Sornette, preprint condmat/9907371
6. W. E. Leland, M. S. Taqqu and W. Willinger, IEEE
Trans. Networking 2, 1 (1994)
7. I. Csabai, J. Phys. A: Math. Gen. 27, L417 (1994)
8. K. Nagel and M. Schreckenberg, J. Phys. I France
2, 2221 (1992); ibid J. Phys. A 26, L679 (1993).
See also K. Nagel and M. Paczuski, Phys. Rev.
E51, 2909 (1995).
9. M. Takayasu, H. Takayasu and T. Sato, Physica A
233, 824 (1996)
10. B. A. Huberman and R. M. Luckose, Science 277,
535 (1997)
11. K. Nagel and S. Rasmussen, in: Articial Life IV,
R. A. Brooks and P. Maes (eds.), p. 222. MIT
Press (Cambridge, MA, 1994)
12. T. Ohira and R. Sawatari, Phys. Rev. E 58, 193
(1998)
13. H. Fucks and A. T. Lawniczak, preprint adaporg/9909006
14. When aplying the rules it is important to pick randomly the nodes and avoid external orderings. In
this way, priorization eects are avoided that can
vary the results of the simulation.
15. Stanley, H. H. et. al. Physica A205, 214-253
(1996)
16. R. V. Sole, S. C. Manrubia, B. Luque, J. Delgado
and J. Bascompte, Complexity 1 (4) (1996). An
example of the maximum information close to transition points is given by the uctuations displayed
by some ant species: see R. V. Sole and O. Miramontes, Physica D 80, 171 (1995).
17. P. Bak, C. Tang and K. Wiesenfeld, Phys. Rev.
Lett. 59, 381 (1987).
18. K. Bolding, M. L. Fulgham and L. Snyden, Technical Report CSE-94-02-04
19. W. D. Hillis, The Connection Machine, MIT Press
(Cambridge, MA, 1985); B. M. Bhogosian, Comp.
Phys. Jan/Feb, 14 (1990). The Connection Machine is a highly parallel system with a highdimensional topology. As shown by Feynman, it
is in fact a twelve-dimensional hypercube.
20. W. D. Hillis, The Pattern on the Stone, Weidenfeld
and Nicolson (London, 1998)
Acknowledgments
We thank B. Luque for many useful discussions and
his earlier participation in this work. This work has been
supported by a grant PB97-0693 and by the Santa Fe
Institute (RVS).
1. The Ecology of Computation, B. Huberman (ed.),
North-Holland, Amsterdam (1989)
2. J. O. Kepart, T. Hogg and B. Huberman, Phys.
Rev. A 40, 404 (1989)
3. R. Albert, H. Jeong and A. Barabasi, Nature 401,
130 (1999); R. Albert and A. Barabasi, Science
286, 510 (1999)
7
Download