Analysis of Large-Scale Peer-to-Peer Network Topology Chao Xie Department of Computer Science Georgia State University Atlanta, Georgia 30302-3994 Email: {cxie}@cs.gsu.edu Abstract— Modeling peer-to-peer (P2P) networks is a challenge for P2P researchers. It is the key to provide insight into the nature of the underlying system and building a successful model able to generate realistic topologies for simulation purpose. In this paper, we provide a detailed analysis of large-scale P2P network topology, using Gnutella as a case study. First, we re-examine the power-law distributions of the Gnutella network discovered by previous researchers. Our results show that the current Gnutella network deviates from the early power-laws, suggesting that the Gnutella network topology may have evolved a lot over time. Second, we identify important trends with regard to the evolution of the Gnutella network between September 2005 and February 2006. Third, we provide a novel two-layered approach to study the topology of the Gnutella network. Due to the limitations of the power-laws, we divide the Gnutella network into two layers, namely the mesh and the forest, to model the hybrid and highly dynamic architecture of the current Gnutella network. We give a detailed analysis of the topology of the mesh and present four power-laws concerning the mesh topology. Moreover, we examine the topology properties of the forest and provide one empirical law concerning the tree size. Using the two-layered approach and laws proposed, we can generate realistic topologies easily. I. I NTRODUCTION Modeling the topologies of peer-to-peer (P2P) networks is an important open problem. An accurate topological model can have significant influence on P2P research. First, we can gain detailed insight into the nature of the underlying system. Second, the model can enable analytical analysis of algorithms and facilitate design of more efficient protocols that take advantage of topology properties. Third, we can generate more accurate artificial topologies for simulation purpose. Further more, we can predict future trends and thereby address potential problems in advance. In this paper, we provide a detailed analysis of large-scale P2P network topology, giving results concerning major topology properties and main distributions. In our study, we choose Gnutella as a case study, as it has a large user community and open architecture. Our work can be summarized by the following points. First, we re-examine the power-law distributions of the Gnutella network discovered by previous researchers. Our results show that the current Gnutella network deviates from the early power-laws. This observation suggests that the Gnutella network topology may have evolved a lot over time. Second, we identify important trends with regard to the evolution of the Gnutella network between September 2005 and February 2006. As our primary contribution, we provide a novel two-layered approach to study the topology of the Gnutella network. Due to the limitations of the power-laws, we divide the Gnutella network into two layers, namely the mesh and the forest, to model the hybrid and highly dynamic architecture of the current Gnutella network. We give a detailed analysis of the topological properties of the mesh and present four powerlaws concerning the mesh topology. Moreover, we examine the topology properties of the forest and provide one empirical law concerning the tree size. Finally, we focus on the generation of realistic topologies using our approach and laws proposed. The rest of this paper is organized as follows. Section II presents background and previous work. In Section III, we present our instances of the Gnutella network. In Section IV, we re-examine the power-law distributions discovered by previous researchers, identify the trends concerning the evolution of Gnutella network. In Section V, we analyze the limitations of the power-laws and introduce our new twolayered approach to study the topology of Gnutella network. In Section VI, we analyze the topological properties of the mesh and present four power-laws concerning the mesh topology. In Section VII, we examine the topology properties of the forest and provide one empirical law concerning the tree size. In Section VIII, we discuss the practical uses of our approach and laws. Finally, Section IX concludes our work. II. BACKGROUND AND P REVIOUS W ORK A. Gnutella Protocol and the Crawler Gnutella protocol 0.4 [1] employs a pure decentralized model. In this model, individual nodes, also called servents are equal in terms of functionality. They not only perform server-side roles such as matching incoming queries against their local resources and respond with applicable results, but also offer client-side functions such as issuing queries and collecting search results. All servents are connected to each other randomly. Figure 1 illustrates the topology of the Gnutella 0.4 network. Gnutella protocol 0.6 [2] employs a hybrid architecture combining centralized and decentralized model. Servents are categorized into leaf and ultrapeer. A leaf keeps only a small TABLE I BASIC S TATISTICS OF THE G NUTELLA N ETWORK number of connections to ultrapeers. An ultrapeer maintains connections with other ultrapeers and acts as a proxy to the Gnutella network for the leaves connected to it. An ultrapeer only forwards a query to a leaf if it believes the leaf can answer it, and leaves never relay queries between ultrapeers. Figure 2 illustrates the topology of the Gnutella 0.6 network. Protocol 0.6 is compatible with protocol 0.4, which implies that the current Gnutella network can contain some fraction of nodes of former protocol specification 0.4. We developed a crawler to collect topology information of the Gnutella network, based on message communication mechanism of both protocol 0.4 [1] and protocol 0.6 [2]. The crawler is based on the Limewire [3] open source client and performs a breadth first searching on the network in parallel. It can discover more than 100,000 nodes in half an hour. B. Power-law Power-laws have been found in numerous diverse fields spanning sociological, geological, natural and biological systems. Power-laws of the form y ∝ xα enables a compact characterization of topologies through their exponents. Faloutsos et al. [4] discovered four power-laws characterizing the topology of the Internet, while Magoni et al. [5] found another four power-laws of the Internet. In [6] [7] and [8], several power-laws were found with regard to the topology of the Gnutella network. P2P studies usually assume that these power-laws characterize the topology of P2P networks and use synthetically generated topologies following these power-laws, although Ripeanu et al. [9] argued that the connection distribution of the more recent Gnutella network deviates from a pure power-law. In Section IV, we will re-examine these power-laws. III. O UR G NUTELLA N ETWORK I NSTANCES We have studied the topology of the Gnutella network from September 2005 until February 2006. We can build the graph of nodes by analyzing the collected data on the Gnutella network. We model two adjacent nodes that have at least one connection between each other by an edge. We treat the Gnutella network as a undirected graph. In this paper, we provide two snapshots of the Gnutella network that correspond to five-month intervals approximately, namely the 091505 instance and the 021106 instance. In Table I, we present some basic statistics about our instances and previous work [6] [7]. In Table I, l represents the average shortest distance and k represents the average degree. Ours 091505 021106 09/ 05 02/ 06 107,205 118,925 118,187 130,612 6.4 7.9 22 24 2.20 2.20 [7] V34206 09/ 03 34,206 43,958 5.4 16 2.57 [6] V57926 10/ 03 57,926 80,276 5.8 15 2.72 11/ 00 992 2,465 3.7 9 4.97 12/ 00 1,125 4,080 3.3 8 7.25 3 4 10 10 091505.rank 021106.rank exp(3.09197) *x** (-0.64268) exp(2.93379) *x** (-0.60681) 3 10 2 10 degree Fig. 2. Topology of the Gnutella 0.6 Network. degree Fig. 1. Topology of the Gnutella 0.4 Network. Stat. Data Time Nodes Edges l Diam. k 2 10 1 10 1 10 0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 rank (a) 091505 ACC=0.92178 6 10 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 rank (b) 021106 ACC=0.88120 Fig. 3. Log-log plot of the degree dv versus the rank rv in the sequence of decreasing degree. IV. C URRENT G NUTELLA N ETWORK T OPOLOGY In this section, we examine the power-laws of the Gnutella network described in previous literatures against our two instances. Note that in this paper we only present the examination of two early power-laws that give strong evidence for the representativeness of a topology [10]. The goal of our work is to find out whether the topology of the current Gnutella network accords with the early power-laws. We use linear regression to fit a line in a set of twodimensional points using the least-square errors method. The validity of the approximation is quantified by the correlation coefficient ranging from -1.0 and 1.0. The absolute value of the correlation coefficient is ACC. An ACC value of 1.0 indicates perfect linear correlation. In general, the ACC level should be great than 0.90 to validate linear correlation. A. Rank Distribution In this section, we study the degrees of the nodes in the Gnutella network. Power-Law of Rank Exponent R: The degree dv of a node v is proportional to the rank of the node rv to the power of a constant R : dv ∝ rvR . The rank rv of a node v is defined as its index in the order of decreasing degree. Jovanovic [6] found that the early Gnutella network followed the above power-law with rank exponent of -0.98 and ACC of 0.94. For our two instances, the rank exponent is -0.64268 and -0.60681 and ACC is 0.92178 and 0.88120 in chronological order as we see in Figure 3. The low ACC values imply that this power-law is relatively weak in the 091505 graph and even invalidated in the 021106 graph. Compared with a pure power-law distribution, the two graphs deviate from the linear regression with similar patterns. On the one hand, the nodes with high rank are of too small degree. This is because the Gnutella protocol 0.6 imposes a limit on maximal connections of an ultrapeer. On the other hand, there are too many nodes with degree around 30, resulting the curve breakouts from the linear regression. This pattern suggests that ultrapeers in the Gnutella 0.6 network tend to have the connection limit around 30. Moreover, the 021106 graph is somewhat different from the 091505 graph. First, the nodes with high rank in the former graph are of smaller degree compared with the counterparts in the latter, implying that protocol 0.6 is effectively replacing protocol 0.4. Secondly, the curve after a degree of approximately 30 drops much more sharply in the former graph than in the latter, which suggests that ultrapeers tend to employ as many connections as they can. B. Degree Distribution In this section, we study the distribution of the degrees of the nodes. Note that the degree power law we present here is different from the one in earlier work [6]. However, they both refer to the same distribution. The difference is that the former uses the cumulative probability distribution function, while the latter uses the probability distribution function. As a result, the exponents of the two power-laws differ approximately by one. The cumulative distribution is preferable because it can be estimated in a statistically robust way. Power-Law of Degree Exponent D: The CCDF Dd of a degree d, is proportional to the degree to the power of a constant D : Dd ∝ dD . The complementary cumulative distribution function (CCDF) of a degree d is the percentage of nodes that have degree greater than the degree d. Jovanovic [6] showed degree exponent of -1.4 and ACC of 0.96 for the early Gnutella network by probability distribution. Chen et al. [7] argued that cumulative probability distribution of the node degree follows a power-law. But they did not provide any information about the ACC value and the exponent value. For our two instances, the degree exponent is -2.25926 and -2.31074 and ACC is 0.91744 and 0.87718 in chronological order as we see in Figure 4. Again, the low ACC values imply that this power-law is relatively weak in the 091505 graph and even invalidate in the 021106 graph. 0 0 10 10 091505.degree exp(0.55475) *x** (-2.25926) -1 10 10 -2 -2 10 CCDF CCDF 10 -3 10 -3 10 -4 -4 10 10 -5 V. T HE T WO -L AYERED A PPROACH In this section, we first discuss the limitations of the powerlaws and then present a new approach to study the topology of the Gnutella network. A. Limitations of the Power-laws Previous researches [11] and [10] suggest two key causes for power-law distributions in network topologies: incremental growth and preferential connectivity. Incremental growth refers to open networks that form by the continual addition of new nodes, and thus the gradual increase in the size of the network. Preferential connectivity refers to the tendency of a new node to connect to existing nodes that are highly connected or popular. The topology of the Gnutella network is highly dynamic, since a node can join or leave the Gnutella network at any time. More specifically, most leaves tend to disconnect from the Gnutella network in several minutes after they connect to the network. The transient life-time of the leaves works against incremental growth. Moreover, due to the hybrid architecture of Gnutella protocol 0.6 [2], a leaf keeps only a small number of connections to ultrapeers and cannot connect to other leaves. This limitation on leaves also works against preferential connectivity, because leaves can never become highly connected. Combining the above factors, we can explain why the current Gnutella network does not follow the early power-law distributions. It is the limitations of the power-laws that make them inappropriate for modeling hybrid and highly dynamic topologies. As we mentioned earlier, P2P studies usually use synthetically generated topologies characterize by the early powerlaws. These topologies may not reflect properties of current P2P networks. So there should be a new approach to model current P2P networks. B. Our Approach -5 10 10 -6 10 021106.degree exp(0.72008) *x** (-2.31074) -1 are too few ultrapeers with a degree in this interval. This confirms our previous conclusion that ultrapeers try to hold more connections up to the limit. The curve of higher degree in the 021106 graph drops much more sharply, which agrees with our previous comment that the Gnutella protocol 0.6 prevents ultrapeers from employing a large number of connections. -6 0 10 1 2 10 10 degree (a) 091505 ACC=0.91744 Fig. 4. 3 10 10 0 10 1 2 10 10 3 10 degree (b) 021106 ACC=0.87718 Log-log plot of Dd versus the degree d. Compared with a pure power-law distribution, the graphs share some common patterns. There are too many nodes with degree around 30, and the resulting curves deviate from the linear regression. This is coincident with what we found in rank distribution. Furthermore, in the 021106 graph, degrees in interval 5 to 20 follow an almost constant distribution, which means there In our study, we propose a new two-layered approach to model the topology of the current Gnutella network. We split the Gnutella network into two layers, namely the mesh and the forest. Before we present the analysis of our approach, we provide below a few definitions. Note that Magoni et al. [5] proposed some definitions to describe the AS network. We keep these definitions and modify them into the following ones. Figure 5 shows different kinds of nodes in a sample graph. • Cycle node: a node that belongs to a cycle (i.e. it is on a closed path of disjoint nodes; in Figure 5, there are eleven cycle nodes). TABLE II BASIC S TATISTICS OF THE M ESH Stat. Nodes p(m) Edges l Diam. k VI. M ESH T OPOLOGY A NALYSIS In this section, we study the topology properties concerning the mesh in the Gnutella network. In Table II, we present some basic statistics about the mesh in our instances. In Table II, p(m) represents the percentage of nodes in the mesh, l represents average shortest distance, and k represents average degree. A. Mesh Node Rank Exponent Rm In this section, we study the degrees of the nodes in the mesh. We sort the nodes in the mesh in decreasing order of 021106 11,852 10.0% 23,539 6.5 17 3.97 degree dvm and define the mesh node rank rvm as the index of the node in the sequence. We plot the (dvm , rvm ) pairs in log-log scale. The plots are shown in Figure 6. The data values are represented by points, while the solid lines represent the least-squares approximation. 3 4 10 10 exp(2.66303) *x** (-0.60680) 3 10 2 10 1 10 2 10 1 10 0 0 10 021106.mesh.rank 091505.mesh.rank exp(2.15411) *x** (-0.45877) mesh node degree Different kinds of nodes. mesh node degree Fig. 5. Bridge node: a node which is not a cycle node and is on a path connecting 2 cycle nodes (in Figure 5, there is one bridge node). • In-mesh node: a node which is a cycle node or a bridge node (in Figure 5, the mesh has twelve in-mesh nodes). • In-tree node: a node which is not an in-mesh node (i.e. it belongs to a tree; in Figure 5, each tree has four in-tree nodes). Mesh is the set of in-mesh nodes and forest is the set of in-tree nodes. • Branch node: an in-tree node of degree at least 2. • Leaf node: an in-tree AS of degree 1. • Root node: an in-mesh node which is the root of a tree. • Relay node: a node having exactly 2 connections. • Border node: a node located on the diameter of the network. If we split the Gnutella network into the mesh and the forest, we can analyze the topological properties of the mesh and the forest respectively. After a careful comparison between Figure 2 and Figure 5, we can find that the mesh in Figure 5 is composed merely of ultrapeers and acts as the backbone of the Gnutella network. Since ultrapeers are relatively stable and tend to stay in the Gnutella network for a longer time, it can meet the requirement of incremental growth. Further more, since ultrapeers can connect to other ultrapeers, it can meet the requirement of preferential connectivity. Hence, the topology of the mesh theoretically should comply with power-laws (see Section VI for detailed validation). On the other hand, we can also obtain major topology properties and distributions of the forest (see Section VII). With the knowledge of both the topology of the mesh and the topology of the forest, we can model the topology of the Gnutella network easily by merging these two layers. • 091505 16,487 15.4% 27,467 5.2 14 3.33 0 10 1 10 2 10 3 10 4 10 5 10 10 0 10 1 10 2 10 3 10 4 10 5 10 mesh node rank mesh node rank (a) 091505 ACC=0.96425 (b) 021106 ACC=0.96580 Fig. 6. Log-log plot of the mesh node degree dmv versus the rank rmv in the sequence of decreasing degree. The points of Figure 6 are well approximated by the linear regression. The ACC is 0.96425 for the 091505 instance and 0.96580 for the 021106 instance. This leads us to the following power law and definition. Power-Law 1 (Mesh Node Rank Exponent): The degree dvm of a mesh node vm is proportional to the rank of the mesh node rvm to the power of a constant Rm : dvm ∝ rvRmm . Definition 1: Let us sort the mesh nodes of a graph in decreasing order of degree. We define the mesh rank exponent Rm to be the slope of the plot of the degrees of the mesh nodes versus the rank of the nodes in log-log scale. B. Mesh Node Degree Exponent Om In this section, we study the distribution of the degrees of the nodes in the mesh. We define the frequency fdm of a mesh node degree dm as the number of nodes in the mesh with degree dm . We plot the (fdm , dm ) pairs in log-log scale in Figure 7. In these plots, we exclude a small percentage of nodes of higher degree that have frequency of one, but still plot 99.9% of the total number of nodes. As we saw earlier, the higher degrees are described and captured by the mesh rank exponent. The major observation of Figure 7 is that the plots are approximately linear with ACC of 0.97171 for the 091505 instance and 0.96016 for the 021106 instance. We infer the following power-law and definition. number of the mesh nodes exp(4.58382) *x** (-2.71269) 4 10 021106.mesh.degree exp(3.97970) *x** (-2.02982) 3 10 2 10 2 10 1 10 1 10 0 10 0 1 10 10 node degree 0 1 10 2 10 mesh (a) 091505 ACC=0.97171 Fig. 7. 0 10 2 10 mesh 10 node degree (b) 021106 ACC=0.96016 Log-log plot of frequency fdm versus the mesh node degree dm . Power-Law 2 (Mesh Node Degree Exponent): The frequency fdm of a mesh node degree dm , is proportional to the degree to the power of a constant Om : m fdm ∝ dO m . Definition 2: We define the mesh node degree exponent Om to be the slope of the plot of the frequency of the mesh node degrees versus the degrees in log-log scale. C. Mesh Pair Rank Exponent Pm In this section, we study the Number of distinct Shortest Paths (NSP) of each pair of vertices in the mesh. The number of distinct shortest paths between two vertices is the number of shortest paths such as any of these paths have at least one vertex not in common [5]. The distribution of NSP is useful for evaluating the amount of redundancy edges involved in shortest path. Higher NSP values mean that if one edge of a shortest path between a pair of nodes is removed, there is still a probability for another shortest path of the same length to exist for this pair. We sort the pairs in the mesh in decreasing NSP npm and define the pair rank rpm as the index of the pair in the sequence. We plot the (npm , rpm ) pairs in log-log scale. The plots are shown in Figure 8. Due to the enormous amount of node pairs, we plot the first 106 pairs only. 6 5 10 10 5 10 4 10 mesh NSP mesh NSP Stat. Nodes p(t) Nb of trees Mean tree size Max tree size Mean tree depth Max tree depth 3 10 3 10 4 10 021106 107,073 90.0% 6,830 16.68 231 1.30 10 D. Mesh NSP Exponent Nm In this section, we study the distribution of NSP in the mesh. We define the frequency fnm of a NSP nm as the number of pairs with NSP of nm in the mesh. We plot the (fnm , nm ) pairs in log-log scale in Figure 9. In these plots, we exclude a small percentage of pairs of higher NSP that have lowest frequency, but still plot more than 99.9% of the total number of pairs. The solid lines are the result of the linear regression. 9 9 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 091506.mesh.nsp 1 10 exp(10.11262) *x** (-1.22241) 0 10 0 10 1 10 2 10 3 10 4 10 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 021106.mesh.nsp 1 10 exp(9.46422) *x** (-1.48623) 0 10 0 10 (a) 091505 ACC=0.94301 Fig. 9. 1 10 2 10 3 10 4 10 NSP mesh NSP (b) 021106 ACC=0.99840 Log-log plot of frequency fnm versus the mesh NSP nm . The major observation of Figure 9 is that the plots are approximately linear with ACC of 0.94301 for the 091505 instance and 0.99840 for the 021106 instance. We infer the following power-law and definition. Power-Law 4 (Mesh NSP Exponent): The frequency fnm of a NSP between a pair of nodes in the mesh, nm , is proportional to the NSP to the power of a constant Nm : 3 10 091505.mesh.NSP.rank exp(6.34358) *x** (-0.63079) 2 0 10 1 10 2 10 3 10 4 10 m fnm ∝ nN m . 021106.mesh.NSP.rank exp(5.44708) *x** (-0.39529) 2 10 091505 90,718 84.6% 9,886 10.18 4,824 1.52 8 Definition 3: Let us sort the pairs of nodes in the mesh of a graph in decreasing order of NSP. We define the mesh pair rank exponent Pm to be the slope of the plot of the NSP versus the rank of the mesh node pairs in log-log scale. number of mesh node pairs number of the mesh nodes TABLE III BASIC S TATISTICS OF THE F OREST 4 10 091505.mesh.degree number of mesh node pairs 5 10 5 10 6 10 7 10 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 mesh node pair rank mesh node pair rank (a) 091505 ACC=0.99157 (b) 021106 ACC=0.99632 Fig. 8. Log-log plot of the mesh NSP npm versus the rank rpm in the sequence of decreasing degree. The points of Figure 8 are well approximated by the linear regression with ACC of 0.99157 for the 091505 instance and 0.99632 for the 021106 instance. This leads us to the following power law and definition. Power-Law 3 (Mesh Pair Rank Exponent): The NSP npm between a pair of mesh nodes pm , is proportional to the rank of the pair rpm to the power of a constant Pm : npm ∝ rpPmm . Definition 4: We define the Mesh NSP exponent Nm to be the slope of the plot of the frequency of the mesh NSP versus the mesh NSP in log-log scale. VII. F OREST T OPOLOGY A NALYSIS In this section, we study the topology properties concerning the forest in the Gnutella network. In Table III, we present some basic statistics about the forest in our instances. In Table III, p(t) represents the percentage of nodes in the forest. A. Tree Depth Distribution We define the probability p(td ) of a tree depth td as the percentage of trees in the forest with depth td . Figure 10 describes the tree depth distribution. 0.8 091505.tree.depth 0.7 021106.tree.depth 0.6 d p(t ) 0.5 0.4 0.3 0.2 0.1 0.0 -0.1 0 2 4 6 8 10 t d Fig. 10. Tree depth distribution From Figure 10, we notice that more than 56% of trees is simply composed of leaves directly connected to their corresponding root and more than 27% of trees has depth 2. In addition, less than 4% of trees have depth larger than 3. B. Tree Rank Distribution In this section, we study the size of each tree, which is defined as the sum of the vertices composing the tree plus the root. We sort the trees in decreasing tree size st and define tree rank rt as the index of the tree in the sequence. We plot the (st , rt ) pairs in Figure 11, applying log-scale only on the y-axis. The solid lines are given by linear regression. 3 4 10 10 091505.tree.rank 021106.tree.rank exp((-2.64284E-4)*x+1.84179) exp((-1.49779E-4)*x+1.48667) 3 2 tree size tree size 10 2 10 10 1 10 1 10 0 0 10 2000 4000 6000 8000 10000 10 2000 4000 6000 8000 could reasonably conjecture that our laws might continue to hold, at least for the near future. Our work can facilitate the generation of realistic topologies of P2P networks, specially those which employ a hybrid and highly dynamic architecture like the Gnutella network. As an overview, we list the following guidelines for creating P2P network topologies. First, a small percentage of the nodes (15.4% or 10.0%) belong to the mesh. Second, the degree distribution of the mesh is skewed following our power-law 1 and 2, and the NSP distribution of the mesh is skewed following our power-law 3 and 4. Third, a large percentage of the nodes (84.6% or 90.0%) belong to the forest. Fourth, more than 56% of the trees have depth one, less than 4% of the trees have depth larger than 3, and the maximum depth is 7 or 10. Fifth, the size distribution of the trees is skewed following our empirical law 1. As a final step, we merge the generated mesh and the generated forest together to get the P2P network topology. If we finetune the parameters, we can get specific topologies that meet our needs. IX. C ONCLUSION In this paper, we give a detailed review of the current Gnutella network topology as well as its on-going evolution. We present a novel two-layered approach to study the topology of the Gnutella network, analyzing it through the mesh perspective and the forest perspective respectively. Furthermore, we provide detailed topology properties as well as additional power-laws and the empirical law concerning these two layers. Using the two-layered approach and laws proposed, we can generate realistic topologies easily. tree rank tree rank (a) 091505 ACC=0.95621 (b) 021106 ACC=0.95465 Fig. 11. Plot of the tree size st (log-scale) versus the rank rt in the sequence of decreasing size. The plots of Figure 11 match the linear regression line. The ACC is 0.95621 for the 091505 instance and 0.95465 for the 021106 instance. Consequently, we infer the following empirical law and definition. Empirical Law 1: The size st of a tree t, is proportional to an exponential function with exponent being the product of the rank of the tree rt and a constant T : st ∝ exp(T rt ). Definition 5. Let us sort the trees of a graph in decreasing order of size. We define T to be the slope of the plot of the sizes of trees versus the rank of the trees with log-scale applied on the sizes of trees. This empirical law provides the formula on the sizes of trees in a sequence of trees. VIII. D ISCUSSION The regularity observed in our instances of the Gnutella network between September 2005 and February 2006 (including but not restricted to the two instances specifically discussed in this paper) is unlikely to be a coincidence. We R EFERENCES [1] Clip2. (2001) The gnutella protocol specification v0.4. [Online]. Available: http://www9.limewire.com/developer/gnutella protocol 0.4. pdf [2] Gnutella. (2002) The gnutella protocol v0.6. [Online]. Available: http://rfc-gnutella.sourceforge.net/src/rfc-0 6-draft.html [3] (2006) The Limewire website. [Online]. Available: http://www.limewire. org/ [4] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the internet topology,” in Proc. ACM SIGCOMM’99, New York, NY, 1999, pp. 251–262. [5] D. Magoni and J.-J. Pansiot, “Analysis of the autonomous system network topology,” ACM SIGCOMM Computer Communication Review, vol. 31, no. 3, pp. 26–37, July 2001. [6] M. A. Jovanovic, “Modelling large-scale peer-to-peer networks and a case study of gnutella,” Master’s thesis, University of Cincinnati, Cambridge, June 2000. [7] H. Chen, H. Jin, J. Sun, D. Deng, and X. Liao, “Analysis of large-scale topological properties for peer-to-peer networks,” in Proc. IEEE CCGrid ’04, Apr. 19-22, 2004, pp. 27–34. [8] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman, “Search in power-law networks,” Physical Review E, vol. 64, pp. 46 135– 46 143, 2001. [9] M. Ripeanu, I. Foster, and A. Iamnitchi, “Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design,” IEEE Internet Computing Journal, vol. 6(1), pp. 50– 57, 2002. [10] A. Medina, I. Matta, and J. Byers, “On the origin of power laws in internet topologies,” ACM SIGCOMM Computer Communication Review, vol. 30, no. 2, pp. 18–28, 2000. [11] A. L. Barabasi and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, p. 509, 1999.