The Power-Law of Top500 Matei Ripeanu (matei@cs.uchicago.edu) GFLOPS (log scale) Most natural and technical phenomena are characterized by highly unbalanced 10000 2001 2000 distributions: there are few powerful, 1999 1998 1000 1997 1996 devastating earthquakes and countless 1995 unnoticeable ones; there are few 100 machines with a peak FLOPS rate larger than 1 TFLOPS while millions of 10 machines work at around 1 MFLOPS. Many of these events (city sizes, 1 incomes, word frequency [1, 2]) fit 1 10 100 1000 power-law distributions: the number of Rank (log scale) events of a certain size is proportional Figure 1: Peak processing rate (GFLOPS) for to the size of the event to a negative world’s fastest supercomputers in Top500 list from constant power. 1995 to 2001. Each series of points represents one year on this log-log plot. There are many dimensions of variation for entities participating in the Internet: from the obvious ones like CPU speed, available disk space and network bandwidth, to more elaborate ones such as inter-failure time, node trustworthiness, or reliability. We conjecture that these follow similar, highly heterogeneous distributions. Preliminary results support our intuition: Internet’s autonomous system size [3], node bandwidth for nodes in Gnutella network [4, 5] or CPU power for machines in Top500 list [6], all follow power-law distributions (or at least highly variable distributions that can be well approximated as power-laws). We use an example to depict the -0.66 characteristics of this distribution: peak -0.68 MFLOPS rate of the world’s most -0.70 powerful supercomputers follows a -0.72 power-law distribution for all years for -0.74 which data is available (see Figure 1). -0.76 If we make the assumption that the -0.78 same distribution extends to machines -0.80 beyond the Top500, even more -0.82 interesting is perhaps the heavy-tail -0.84 1995 1996 1997 1998 1999 2000 2001 property. As a result, aggregating computers in the tail results into a Figure 4: Evolution of power-law distribution powerful machine comparable with the coefficient k over time. As k gets closer to 0 the distribution is more heavy-tailed. Note that k top ones. By analyzing data for the last decreases on average 2% per year. seven years, one can notice a fascinating trend: the heavy-tail property becomes more accentuated (Figure 2 shows that the power constant of these distributions gets closer to zero). If this trend persists, the interest will continue to shift from building large machines to largescale integrations of less powerful systems. References [1] [2] [3] [4] [5] [6] M. Schroeder, Fractals, Chaos, Power Laws : Minutes from an Infinite Paradise: W.H. Freeman and Company, 1991. N. Shiode and M. Batty, "Power Law Distributions in Real and Virtual Worlds," presented at INET 2000, Yokohama, Japan, 2000. H. Tangmunarunkit, S. Doyle, R. Govindan, S. Jamin, S. Shenker, and W. Willinger, "Does AS Size Determine Degree in AS Topology?," ACM Computer Communication Review, 2001. S. Saroiu, P. K. Gummadi, and S. D. Gribble, "A Measurement Study of Peer-toPeer File Sharing Systems," presented at Multimedia Computing and Networking Conference (MMCN), San Jose, CA, USA, 2002. M. Ripeanu, I. Foster, and A. Iamnitchi, "Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design," Internet Computing Journal, vol. 6, 2002. "TOP500 Supercomputer Sites - http://www.top500.org/," 2002.