Phase Transitions in a Model of Internet Traffic Ricard V. Solé Sergi Valverge SFI WORKING PAPER: 2000-03-020 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu SANTA FE INSTITUTE Information Transfer and Phase Transitions in a Model of Internet TraÆc 1 Ricard V. Sole1;2 and Sergi Valverde1 Complex Systems Research Group, Departament of Physics, FEN Universitat Politecnica de Catalunya, Campus Nord B4, 08034 Barcelona, Spain 2 Santa Fe Institute, 1399 Hyde Park Road, New Mexico 87501, USA In a recent study, Ohira and Sawatari presented a simple model of computer network traÆc dynamics. These authors showed that a phase transition point is present separating the low-traÆc phase with no congestion from the congestion phase as the packet creation rate increases. We further investigate this model by relaxing the network topology using a random location of routers. It is shown that the model exhibits nontrivial scaling properties close to the critical point, which reproduce some of the observed real Internet features. At criticality the net shows maximum information transfer and eÆciency. It is shown that some of the key properties of this model are shared by highway traÆc models, as previously conjectured by some authors. The relevance to Internet dynamics and to the performance of parallel arrays of processors is discussed. PACS number(s): 87.10.+e, 0.5.50.+q, 64.60.Cn I. INTRODUCTION The exchange of information in complex networks and how these networks evolve in time has been receiving increasing attention by physicists over the last years [1,2]. In particular, it has been shown that the growth dynamics of the World Wide Web (WWW) follows some characteristic traits displayed by generic models of growth in random graphs [3]. The presence of scaling in the distribution of connections between nodes of the WWW [3] or in the number of pages per web site [4] are consistent with other analyses involving the dynamical patterns displayed, such as the download relaxation dynamics [5] which also decays as a power law. The WWW is a virtual graph connecting nodes containing dierent amounts of information. This information ows through a physical support which also displays scale-free behavior. The network of computers is a complex system by itself, and complex dynamics has been detected suggesting that self-similar patterns are also at work [6]. Some previous studies have shown evidence for criticallike dynamics in Computer Networks [7] in terms of fractal, 1/f noise spectrum as well as long-tail distributions of some characteristic quantities. Some authors have even speculated about the possibility that the traÆc of information through computer networks (such as Internet) can display the critical features already reported in cellular automata models of traÆc ow, such as the NagelSchreckenberg (NS) model [8]. The NS model shows that as one increases the density of cars , a well-dened transition occurs at a critical density c . This transition separates a uid phase showing no jams from the jammed phase were traÆc jams emerge. At the critical boundary, the rst jams are observed as back-propagating waves with fractal properties. FIG. 1. Model network architecture (two-dimensional lattice, periodic boundary conditions). Two types of nodes are considered: hosts (gray squares) which can generate and receive messages, and routers (open circles) which can store and forward messages A number of both quantitative and qualitative observations of the real computer network dynamics reveals some features of interest: 1. Extensive data mining from Internet/Ethernet trafc shows that it displays long-range correlations [6] with well dened persistence, as measured by means of the Hurst exponent. This analysis totally rejected the previous theoretical approach to Poisson-based (Markovian) models assuming statistical independence of the arrival process of information. 2. Fluctuations in density of packets show well-dened self similar behavior over long time scales. This has been measured by several authors [7] [9]. The 1 square, L L lattice will be indicated as L(L), following previous notation [13]. All our simulations are performed using periodic boundary conditions. In previous papers, either the hosts were distributed through the boundary [12] (and thus the inner nodes were routers) or all nodes were both hosts and routers [13]. Here we consider a more realistic situation, were only a fraction of the nodes are hosts and the rest are routers [14]. The location of each object, r 2 L(L), will be given by r = icx + j cy , where cx ; cy are Cartesian unit vectors. So the set of nearest neighbors C (r) is given by C (r) = fr cx ; r + cx ; r cy ; r + cy g (1) Each node maintains a queue of unlimited length where the packets arriving are stored. The local number of packets will be indicated as n(r; t) and thus the total number of packets in the system will be power spectrum is typically a power law, although local (spatial) dierences have been shown to be involved. 3. The statistical properties of Internet congestion reveal long-tailed (lognormal) distributions of latencies [refs]. Here latency times TL are thus given by 1p exp ln TL P (TL) = 22 TL 2 Latencies are measured by performing series of experiments in which the round-trip times of ping packets is averaged over many sent messages between two given nodes. 4. There is a clear feedback between the bottom-level were users send their messages through the net and increase network activity (and congestion) and the top-level described by the overall network activity. Users are responsible of the global behavior (since packets are generated by users) and the later modies the individual decisions (users will tend to leave the net if it becomes too congested). On the other hand, previous studies on highway traÆc dynamics revealed that the phase transition point presented by the models as the density of cars increased waslinked with a high degree of unpredictability [10]. Interestingly, this is maximum at criticality [11] as well as the ow rate. In other words, eÆciency and unpredictability are connected by the phase transition. In this paper the previous conjecture linking Internet dynamics with critical points in highway traÆc is further explored. By considering a generalization of the OhiraSawatari model, we show that the all the previously reported features of real traÆc dynamics are recovered by the model. The paper is organized as follows. In section II, the basic model and its phase transition is presented, together with a continuous mean eld approximation. In section III the self-similar character of the time dynamics is shown by means of the calculation of the latency times and queue distributions as well as by means of spectral and Hurst analysis. In section IV the eÆciency and information transfer are calculated for dierent network sizes. In section V our main conclussions and a discussion of its implications is presented. N (t) = X r2L(L) n(r; t) (2) And the metric used in our system will be given by the Manhattan metric dened for lattices with periodic boundaries [ref]: dpm (r1 ; r2 ) = L ji 1 i2j L 2 j jjj 1 j2 j L 2 where rk = (ik ; jk ). The rules are dened as in the OS deterministic model (the stochastic version only shows the dierences already reported by those authors [12]). The rules are dened as follows [14]: Creation: The hosts create packets following a random uniform distribution with probability . Only another host can be the destination of a packet, which it is also selected randomly. Finally, this new packet is appended at the end of the host tail. Routing: Each node picks up the packet at the head of its queue and decides which outgoing link is better suited to the packet destination. Here, the objective is to minimize the communication time for any single message, taking into account only shortest paths and avoiding congested links as well. First, the selected link is the one that points to a neighbour node that is nearer to the packet destination. Second, when two choices are possible, the less congested link is selected. The measure of congestion of a link is simply dened as the amount of packets forwarded through that link. Once the node has made the routing decision, the packet is inserted at the end of the queue of the node selected and the counter of the outgoing link is incremented by one. These rules are applied to each site and each L L updatings dene our time step. II. MODEL OF COMPUTER NETWORK TRAFFIC Following the work by Ohira and Sawatari (OS), let us consider a two-dimensional network with a square lattice topology with four nearest neighbors [12]. The network involved two types of nodes: hosts and routers. The rst are nodes that can generate and receive messages and the second can only store and forward messages. Our 2 is available) but accumulate as a consequence of already jammed nodes. This can be mathematically writen as: Average latency 15000 10000 dN = L2 dt T=27000 T=19000 T=12000 A Delivered packets 0.2 0.4 0.6 0.8 1.0 300000 200000 100000 0 0.0 0.4 λ 0.6 0.8 (3) 0.50 B 0.2 for < c , nite values of N are obtained, corresponding to the non-jammed phase. Once the threshold is reached, packets accumulate and cannot be successfully delivered to their destinations. For this phase, non-bounded values of N are obtained (consistently with the simulation model). The last result is also consistent with previous studies [13] for = 1 where the critical (free packet) delay was estimated as c = L=2. Using this value, we obtain a critical parameter c = 2=L as reported by Fucks and Lawniczak [13]. T=27000 T=19000 T=12000 400000 L2 N N L2 Where will be the inverse of average latency . We obtain the xed points " # L2 4 1=2 N = (4) 2 1 1 The previous result gives us the critical line separating the two phases in our system: c = (5) 4 5000 0 0.0 500000 1.0 CONGESTION PHASE 0.40 FIG. 2. (A) Phase transition in network traÆc. Here a 0.30 λ L = 32 lattice has been used and the average latency has been computed over dierent, increasing intervals of time T steps, as indicated. The density of hosts is = 0:08. (B) As a measure of eÆciency, the number of delivered packets Ndp 0.20 has been measured under the same conditions. We can see the optimum at the critical point c 0:2. For < c we have a linear increase Ndp = with = L2 T , corresponding to the number of released paquets. 0.10 This model exhibits a similar phase transition than the one reported in previous studies [12,13]. It is shown in gure 2 for a L = 32 system with = 0:08 (the same density is used in all our simulations). We can see that the transition occurs at a given c 0:2. As it occurs with models of highway traÆc, the ow of packets is maximized at criticality, as shown in gure 2B, where the number of delivered packets (indicated as NDP) is plotted. A simple mean eld model can be obtained for the total number of packets N (t). The number of traveling packets increases as a consequence of the constant pumping from the hosts, which occurs at a rate L2 . On the other hand packets are removed from the system if the lattice is not too congested (i. e. if free space for movement 0.00 0.00 FREE PHASE 0.20 0.40 ρ 0.60 0.80 1.00 FIG. 3. Phase diagram obtained from the mean eld model for L = 32. The continuous line shows the transition curve for the mean eld. The black circles correspond to the simulated system with the same size. We can see that both approach the same values as for high densities of hosts III. SCALING AND SELF-SIMILARITY An example of the time series at criticality for the previous system is shown in gure 4a-b. It conrms our expectations and previous observation from real computer 3 F ( ) is obtained from traÆc: the local uctuations in the number of packets n(r; t) are self-aÆne, as we can appreciate from an enlargement of the rst plot. This is conrmed by the calculation of the power spectrum P (f ). It is shown in gure 4 and scales as P (f ) f with = 0:97 0:06. There is some local variability in the value of the scaling exponent through space, but it is typically inside the interval 0:75 < < 1:0, in agreement with data analysis [7,9]. F ( ) < (Æy( ))2 > (7) where Æy( ) = y(t0 + ) y(t0 ). The angular brackets indicate an average over all times t0 . If n(r; t) is random or a simple Markov process, then we have F (t) t1=2 . But if no characteristic time scale is involved, then a scaling F (t) t is observed with 6= 1=2. In our system, persistent uctuations are found, with an exponent 2 (0:8; 1) for most nodes in the lattice. 60 A 50 < (Æy( )) >2 1=2 1 10 n(r,t) 40 −0.97 0 10 30 −1 Power 20 10 0 0 20000 10 −2 10 −3 10 40000 30 −4 10 B −4 10 −3 −2 10 10 −1 10 0 10 Frequency n(r,t) 20 5 10 F(t) 10 0 6000 11000 3 10 2 10 i=1 n(r; t) 1 10 2 10 3 10 4 10 5 10 t FIG. 5. Top: Power spectrum P (f ) computed from the time series shown in gure 3a. A well dened scaling is at work over four decades, with 1; Bottom: the rms uctuation F (t), as dened in (7). A power law t F (t) t has been calculated for 102 < t < 104 which gives = 1:01 0:02 An additional, complementary measure is obtained by means of the use of "random walk" methods able to detect long-range correlationS [15]. This has been used in Ethernet data analysis [6]. Starting from the time series of the number of packets at a given host, n(r; t), the running sum X 4 10 16000 Time step FIG. 4. An example of the time series dynamics of the number of packets n(r; t) at a given arbitrary node. Here a L = 256; = 0:08 and = c = 0:055. We can see in (A) uctuations of many sizes, which display self-aÆnity, as we can see from (B) where the fraction of the previous time series indicated by means of a window has been enlarged. y( ) 1.01 The statistics of latencies and queue lengths leads to long tails close to criticality. Some examples of the results obtained are shown in gure 6. Here a L = 256 lattice has been used. Latencies are measured as the number of steps needed to travel from emiting hosts to their destinations. The distribution of latencies close to (6) is computed, and the root mean square (rms) uctuation 4 c is a log-normal, in agreement with the study of Hu- IV. EFFICIENCY, UNCERTAINTY AND berman and Adamic for round-trip times of ping packets [10]. This means that there is a characteristic latency time but also very long tails: a high uctuation regime is present. As goes into the congestion phase, longer times are present but also long tails. This is due to the fact that at this phase the number of packets is allways increasing with time. INFORMATION TRANSFER As mentioned at the introduction, models of highway traÆc ow revealed that the ow of cars (and thus the system's eÆciency) is maximal at the critical point but that the unpredictability is also maximal. EÆciency can be measured in several ways. One is close to our model properties: eÆciency is directly linked to information transfer and thus information-based measures can be used. Here we consider an information-based characterization of the dierent phases by means of the Markov partition . Specically the following binary choice is performed: = fn(r) = 0 ) S (r) = 0 ; n(r) > 0 ) S (r) = 1g which essentially separates non-jammed from jammed nodes. The distribution of queue lengths is equivalent to the distribution of jam sizes in the highway traÆc model. As with the Nagel-Schreckenberg model, the distribution approaches a power law for c but it also displays some bending at small values (and a characteristic cuto at large values). This is probably the result of the presence of spatial structures, which propagate as waves of congestion and will be analysed elsewhere (Valverde and Sole, in preparation). Information transfer 0.015 10000 A Frequency Frequency 8000 6000 4000 10 5 10 4 10 3 10 2 10 1 10 0 Frequency 10 4 10 3 10 2 10 1 10 0 0 0.07 0 1 2 5000 10000 Latency 0.010 0.005 0.000 0.00 3 4 10 10 10 10 10 10 Latency 2000 0 0.055 0.06 15000 L=64 L=128 L=32 0.20 0.40 λ 5 0.60 0.80 FIG. 7. Information transfer for three dierent lattice sizes (as indicated). Information transfer grows rapidly close to criticality but reaches a maximum at some point close to c . 20000 Information transfer is maximized close to secondorder phase transitions [16] and should be maximum at c . In order to compute this quantity we will take use of the previous partition . Let S (r) and S (k) the binary states associated with two given hosts in L. The -entropy for each host is given by: X H (r) = P (S (r)) log P (S (r)) (8) B 0.06 0.07 0.055 S (r)=0;1 10 1 2 10 Queue length 10 and the joint entropy for each pair of hosts, 3 H (r; k) = FIG. 6. (A) Log-normal distribution of latency times at criticality for a L = 256 system. Here c 0:055. Inset: three examples of these distributions in log-log scale for three dierent values (as indicated); (B) Distributions of queue lengths for the same system at dierent rates. Scaling is observable at intermediate values close to criticality. X S (r);S (k)=0;1 P (r; k) log P (r; k) (9) where for simplicity we use: P (r; k) P (S (r); S (k)) to indicate the joint probability. From the previous quantities, we can compute the information transfer between two given hosts. It will be 5 of traveling. This situation sharply changes in the neighborhood of c where the uctuations (experienced as local congestion) lead to a rapid increase in the variance. As grows beyond the transition, these uctuations are damped and (TL ) decays slowly. This result also conrms the study by Nagel and co-workers who analysed the behavior of the variance of travel times [11] for a closed-loop system. They found that there was a nontrivial implication for this result: increasing eÆciency (i. e. traÆc ow) tunes the system to criticality and as a consequence to unpredictable behavior. given by: M (r; k) = H (r) + H (k) H (r; k) (10) The average information transfer will be computed from M q =< M (r; k) > Were the brackets indicate average over a sample of q hosts randomly chosen from the whole set (here q = 100). At the sub-critical domain, in terms of information transfer under the Markov partition, all pairs of nodes will be typically in the non-congested (free state) and P (i; j ) Æ00 so it is easy to see that in this phase we have vanishing entropies and the mutual information is small. The information is totally dened by the entropy of the single nodes, as far as the correlations are trivial. A similar situation holds at the congestion phase, where nodes are typically congested. At intermediate values, the uctuations inherent to the system lead to a diversity of states that gives a maximum information transfer at some c . It should be noted however that this measure is not very good for small systems, where c > c , but we can see that c ! c as L increases. Unpredictability will be measured, following Nagel and Rasmussen [11] by means of the normalized variance of latencies: [(T < TL >)2 ]1=2 (11) (T ) = L L V. DISCUSSION In this paper we have analysed the statistical properties of a computer network traÆc model. This model is a simple extension of the Ohira-Sawatari system, but with a random distribution of hosts scattered through L with a density . One of the goals of our study was to see if the reported regularities from real networks of computers (such as the Internet) were similar to those observed in highway traÆc and reproducible by our model. The second was to explore the possibility that the observed features correspond to those expected from a near-to-critical system. We have presented evidence that real Internet traÆc takes place close to a phase transition point. The model has been shown to match some basic properties of Internet dynamics: (i) it shows self-aÆne patterns of activity close to criticality, consistent with the fractal nature of computer traÆc; (ii) the observed time series display 1=f behavior and the corresponding Hurst exponents reveal the presence of persistence and longrange correlations in congestion dynamics, as reported from real data; (iii) the distribution of latency times close to the transition point is a lognormal, and the distribution of queue lengths approaches power laws with some bending for small lengths (as in highway traÆc models). The model conrms the previous conjecture [7] suggesting some deep links between the NS model and the dynamics displayed by computer networks. In this sense the previous measures and other quantitative characterizations support the idea that the two type of traÆc share some generic features. The model also exhibits the same kind of variance plot shown by the NS and related models: it is almost zero at the subcritical (free) regime and it abruptly grows close to c . This leads to the same conclusion pointed by Nagel and co-workers: maximum eÆciency leads to complex dynamics and unpredictable behavior. Some authors have discussed the origins of Internet congestion in terms of the interactions among users [10]. Huberman and Luckose suggested that this is a particularly interesting illustration of a social dilema. Our study suggests a somewhat complementary view: there is a feedback between the system's activity and the user's behavior. Users introduce new packets into the system, < TL > where < TL > is the average over a given number of steps. 2.5 Variance T=12000 T=19000 T=27000 2.0 1.5 1.0 0.1 0.2 0.3 0.4 0.5 λ FIG. 8. The variance plot for the L = 32 system. The critical point c is perfectly indicated by this measure with a sharp maximum. Three dierent times have been used in the averages. The unpredictable nature of the critical point is sharply revealed by the plot of the variance (TL ) (gure 8). We can see that, as it was shown by Nagel and coworkers for highway traÆc, the system shows the highest unpredictability close to the critical point. At the subcritical regime < c , the packets reach their destinations in a time close to the characteristic, average time 6 thus enhancing the congestion of the net. As congestion increases, users tend to leave the net, thus reducing local activity. This type of feedback is similar to the dynamics characteristic of self-organized critical systems (such as sandpiles) [17]. The main dierence arises from the driving. Activity is being introduced into the system without a complete temporal separation between two scales. In this sense, this is not a self-organized critical system but it is close enough to be the appropiate theoretical framework. An immediate extension of this model should contain a self-tuning of : users might increase their levels of activity if congestion is low and decrease it (or leave the system) in a too congested situation. In this way the system might self-organize close to the phase transition. Our previous results can also be applied to other, similar networks. This is the case of large, parallel arrays of processors. In this sense, some previous studies [18] have shown the validity of the Ohira Sawatari model in describing the overall dynamics of small arrays of processors with simple topologies. They also found some additional phenomena such as the presence of hot spots, which we have also observed in our model. This work can be extended to high-dimensional parallel systems such as the connection machine [19] in order to test the presence of phase transitions and their dependence on dimensionality (in particular they will help to determine the upper critical dimension for this system). Internet dynamics and the WWW growth provide an extremely interesting, real evolution experiment of a complex adaptive system [10]. In a near future, we are likely to see new types of behavior in the web. As Daniel Hillis predicts, as the information available on the Internet becomes richer, and the types of interactions among computers become more complex, we should expect to see new emergent phenomena going beyond any that has been explicitly programmed into the system [20]. Models based on phase transitions in far from equilibrium systems will be of great help in providing an appropiate theoretical framework. 4. B. A. Huberman and L. A. Adamic, Nature 401, 131 (1999) 5. A. Johansen and D. Sornette, preprint condmat/9907371 6. W. E. Leland, M. S. Taqqu and W. Willinger, IEEE Trans. Networking 2, 1 (1994) 7. I. Csabai, J. Phys. A: Math. Gen. 27, L417 (1994) 8. K. Nagel and M. Schreckenberg, J. Phys. I France 2, 2221 (1992); ibid J. Phys. A 26, L679 (1993). See also K. Nagel and M. Paczuski, Phys. Rev. E51, 2909 (1995). 9. M. Takayasu, H. Takayasu and T. Sato, Physica A 233, 824 (1996) 10. B. A. Huberman and R. M. Luckose, Science 277, 535 (1997) 11. K. Nagel and S. Rasmussen, in: Articial Life IV, R. A. Brooks and P. Maes (eds.), p. 222. MIT Press (Cambridge, MA, 1994) 12. T. Ohira and R. Sawatari, Phys. Rev. E 58, 193 (1998) 13. H. Fucks and A. T. Lawniczak, preprint adaporg/9909006 14. When aplying the rules it is important to pick randomly the nodes and avoid external orderings. In this way, priorization eects are avoided that can vary the results of the simulation. 15. Stanley, H. H. et. al. Physica A205, 214-253 (1996) 16. R. V. Sole, S. C. Manrubia, B. Luque, J. Delgado and J. Bascompte, Complexity 1 (4) (1996). An example of the maximum information close to transition points is given by the uctuations displayed by some ant species: see R. V. Sole and O. Miramontes, Physica D 80, 171 (1995). 17. P. Bak, C. Tang and K. Wiesenfeld, Phys. Rev. Lett. 59, 381 (1987). 18. K. Bolding, M. L. Fulgham and L. Snyden, Technical Report CSE-94-02-04 19. W. D. Hillis, The Connection Machine, MIT Press (Cambridge, MA, 1985); B. M. Bhogosian, Comp. Phys. Jan/Feb, 14 (1990). The Connection Machine is a highly parallel system with a highdimensional topology. As shown by Feynman, it is in fact a twelve-dimensional hypercube. 20. W. D. Hillis, The Pattern on the Stone, Weidenfeld and Nicolson (London, 1998) Acknowledgments We thank B. Luque for many useful discussions and his earlier participation in this work. This work has been supported by a grant PB97-0693 and by the Santa Fe Institute (RVS). 1. The Ecology of Computation, B. Huberman (ed.), North-Holland, Amsterdam (1989) 2. J. O. Kepart, T. Hogg and B. Huberman, Phys. Rev. A 40, 404 (1989) 3. R. Albert, H. Jeong and A. Barabasi, Nature 401, 130 (1999); R. Albert and A. Barabasi, Science 286, 510 (1999) 7