Analysis of Large-Scale Peer-to

advertisement
Analysis of Large-Scale Peer-to-Peer Network
Topology
Chao Xie
Department of Computer Science
Georgia State University
Atlanta, Georgia 30302-3994
Email: {cxie}@cs.gsu.edu
Abstract— Modeling peer-to-peer (P2P) networks is a challenge
for P2P researchers. It is the key to provide insight into the nature
of the underlying system and building a successful model able
to generate realistic topologies for simulation purpose. In this
paper, we provide a detailed analysis of large-scale P2P network
topology, using Gnutella as a case study. First, we re-examine the
power-law distributions of the Gnutella network discovered by
previous researchers. Our results show that the current Gnutella
network deviates from the early power-laws, suggesting that the
Gnutella network topology may have evolved a lot over time.
Second, we identify important trends with regard to the evolution
of the Gnutella network between September 2005 and February
2006. Third, we provide a novel two-layered approach to study
the topology of the Gnutella network. Due to the limitations of
the power-laws, we divide the Gnutella network into two layers,
namely the mesh and the forest, to model the hybrid and highly
dynamic architecture of the current Gnutella network. We give
a detailed analysis of the topology of the mesh and present four
power-laws concerning the mesh topology. Moreover, we examine
the topology properties of the forest and provide one empirical
law concerning the tree size. Using the two-layered approach and
laws proposed, we can generate realistic topologies easily.
I. I NTRODUCTION
Modeling the topologies of peer-to-peer (P2P) networks is
an important open problem. An accurate topological model
can have significant influence on P2P research. First, we
can gain detailed insight into the nature of the underlying
system. Second, the model can enable analytical analysis of
algorithms and facilitate design of more efficient protocols that
take advantage of topology properties. Third, we can generate
more accurate artificial topologies for simulation purpose.
Further more, we can predict future trends and thereby address
potential problems in advance.
In this paper, we provide a detailed analysis of large-scale
P2P network topology, giving results concerning major topology properties and main distributions. In our study, we choose
Gnutella as a case study, as it has a large user community
and open architecture. Our work can be summarized by the
following points.
First, we re-examine the power-law distributions of the
Gnutella network discovered by previous researchers. Our results show that the current Gnutella network deviates from the
early power-laws. This observation suggests that the Gnutella
network topology may have evolved a lot over time.
Second, we identify important trends with regard to the
evolution of the Gnutella network between September 2005
and February 2006.
As our primary contribution, we provide a novel two-layered
approach to study the topology of the Gnutella network. Due
to the limitations of the power-laws, we divide the Gnutella
network into two layers, namely the mesh and the forest,
to model the hybrid and highly dynamic architecture of the
current Gnutella network. We give a detailed analysis of the
topological properties of the mesh and present four powerlaws concerning the mesh topology. Moreover, we examine
the topology properties of the forest and provide one empirical
law concerning the tree size.
Finally, we focus on the generation of realistic topologies
using our approach and laws proposed.
The rest of this paper is organized as follows. Section II
presents background and previous work. In Section III, we
present our instances of the Gnutella network. In Section
IV, we re-examine the power-law distributions discovered
by previous researchers, identify the trends concerning the
evolution of Gnutella network. In Section V, we analyze the
limitations of the power-laws and introduce our new twolayered approach to study the topology of Gnutella network. In
Section VI, we analyze the topological properties of the mesh
and present four power-laws concerning the mesh topology. In
Section VII, we examine the topology properties of the forest
and provide one empirical law concerning the tree size. In
Section VIII, we discuss the practical uses of our approach
and laws. Finally, Section IX concludes our work.
II. BACKGROUND AND P REVIOUS W ORK
A. Gnutella Protocol and the Crawler
Gnutella protocol 0.4 [1] employs a pure decentralized
model. In this model, individual nodes, also called servents
are equal in terms of functionality. They not only perform
server-side roles such as matching incoming queries against
their local resources and respond with applicable results,
but also offer client-side functions such as issuing queries
and collecting search results. All servents are connected to
each other randomly. Figure 1 illustrates the topology of the
Gnutella 0.4 network.
Gnutella protocol 0.6 [2] employs a hybrid architecture
combining centralized and decentralized model. Servents are
categorized into leaf and ultrapeer. A leaf keeps only a small
TABLE I
BASIC S TATISTICS OF THE G NUTELLA N ETWORK
number of connections to ultrapeers. An ultrapeer maintains
connections with other ultrapeers and acts as a proxy to the
Gnutella network for the leaves connected to it. An ultrapeer
only forwards a query to a leaf if it believes the leaf can
answer it, and leaves never relay queries between ultrapeers.
Figure 2 illustrates the topology of the Gnutella 0.6 network.
Protocol 0.6 is compatible with protocol 0.4, which implies
that the current Gnutella network can contain some fraction
of nodes of former protocol specification 0.4.
We developed a crawler to collect topology information
of the Gnutella network, based on message communication
mechanism of both protocol 0.4 [1] and protocol 0.6 [2]. The
crawler is based on the Limewire [3] open source client and
performs a breadth first searching on the network in parallel.
It can discover more than 100,000 nodes in half an hour.
B. Power-law
Power-laws have been found in numerous diverse fields
spanning sociological, geological, natural and biological systems. Power-laws of the form y ∝ xα enables a compact characterization of topologies through their exponents. Faloutsos et
al. [4] discovered four power-laws characterizing the topology
of the Internet, while Magoni et al. [5] found another four
power-laws of the Internet.
In [6] [7] and [8], several power-laws were found with
regard to the topology of the Gnutella network. P2P studies usually assume that these power-laws characterize the
topology of P2P networks and use synthetically generated
topologies following these power-laws, although Ripeanu et al.
[9] argued that the connection distribution of the more recent
Gnutella network deviates from a pure power-law. In Section
IV, we will re-examine these power-laws.
III. O UR G NUTELLA N ETWORK I NSTANCES
We have studied the topology of the Gnutella network from
September 2005 until February 2006. We can build the graph
of nodes by analyzing the collected data on the Gnutella
network. We model two adjacent nodes that have at least
one connection between each other by an edge. We treat the
Gnutella network as a undirected graph.
In this paper, we provide two snapshots of the Gnutella
network that correspond to five-month intervals approximately,
namely the 091505 instance and the 021106 instance. In Table
I, we present some basic statistics about our instances and
previous work [6] [7]. In Table I, l represents the average
shortest distance and k represents the average degree.
Ours
091505
021106
09/ 05
02/ 06
107,205
118,925
118,187
130,612
6.4
7.9
22
24
2.20
2.20
[7]
V34206
09/ 03
34,206
43,958
5.4
16
2.57
[6]
V57926
10/ 03
57,926
80,276
5.8
15
2.72
11/ 00
992
2,465
3.7
9
4.97
12/ 00
1,125
4,080
3.3
8
7.25
3
4
10
10
091505.rank
021106.rank
exp(3.09197) *x** (-0.64268)
exp(2.93379) *x** (-0.60681)
3
10
2
10
degree
Fig. 2. Topology of the Gnutella 0.6
Network.
degree
Fig. 1. Topology of the Gnutella 0.4
Network.
Stat.
Data
Time
Nodes
Edges
l
Diam.
k
2
10
1
10
1
10
0
0
10
0
10
1
10
2
10
3
10
4
10
5
10
rank
(a) 091505 ACC=0.92178
6
10
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
rank
(b) 021106 ACC=0.88120
Fig. 3. Log-log plot of the degree dv versus the rank rv in the sequence of
decreasing degree.
IV. C URRENT G NUTELLA N ETWORK T OPOLOGY
In this section, we examine the power-laws of the Gnutella
network described in previous literatures against our two
instances. Note that in this paper we only present the examination of two early power-laws that give strong evidence for the
representativeness of a topology [10]. The goal of our work
is to find out whether the topology of the current Gnutella
network accords with the early power-laws.
We use linear regression to fit a line in a set of twodimensional points using the least-square errors method. The
validity of the approximation is quantified by the correlation
coefficient ranging from -1.0 and 1.0. The absolute value of the
correlation coefficient is ACC. An ACC value of 1.0 indicates
perfect linear correlation. In general, the ACC level should be
great than 0.90 to validate linear correlation.
A. Rank Distribution
In this section, we study the degrees of the nodes in the
Gnutella network.
Power-Law of Rank Exponent R: The degree dv of a
node v is proportional to the rank of the node rv to the power
of a constant R : dv ∝ rvR . The rank rv of a node v is defined
as its index in the order of decreasing degree.
Jovanovic [6] found that the early Gnutella network followed the above power-law with rank exponent of -0.98 and
ACC of 0.94. For our two instances, the rank exponent is
-0.64268 and -0.60681 and ACC is 0.92178 and 0.88120 in
chronological order as we see in Figure 3. The low ACC values
imply that this power-law is relatively weak in the 091505
graph and even invalidated in the 021106 graph.
Compared with a pure power-law distribution, the two
graphs deviate from the linear regression with similar patterns.
On the one hand, the nodes with high rank are of too small
degree. This is because the Gnutella protocol 0.6 imposes a
limit on maximal connections of an ultrapeer. On the other
hand, there are too many nodes with degree around 30,
resulting the curve breakouts from the linear regression. This
pattern suggests that ultrapeers in the Gnutella 0.6 network
tend to have the connection limit around 30.
Moreover, the 021106 graph is somewhat different from the
091505 graph. First, the nodes with high rank in the former
graph are of smaller degree compared with the counterparts
in the latter, implying that protocol 0.6 is effectively replacing
protocol 0.4. Secondly, the curve after a degree of approximately 30 drops much more sharply in the former graph than
in the latter, which suggests that ultrapeers tend to employ as
many connections as they can.
B. Degree Distribution
In this section, we study the distribution of the degrees of
the nodes. Note that the degree power law we present here is
different from the one in earlier work [6]. However, they both
refer to the same distribution. The difference is that the former
uses the cumulative probability distribution function, while the
latter uses the probability distribution function. As a result, the
exponents of the two power-laws differ approximately by one.
The cumulative distribution is preferable because it can be
estimated in a statistically robust way.
Power-Law of Degree Exponent D: The CCDF Dd of
a degree d, is proportional to the degree to the power of
a constant D : Dd ∝ dD . The complementary cumulative
distribution function (CCDF) of a degree d is the percentage
of nodes that have degree greater than the degree d.
Jovanovic [6] showed degree exponent of -1.4 and ACC
of 0.96 for the early Gnutella network by probability distribution. Chen et al. [7] argued that cumulative probability
distribution of the node degree follows a power-law. But they
did not provide any information about the ACC value and the
exponent value. For our two instances, the degree exponent
is -2.25926 and -2.31074 and ACC is 0.91744 and 0.87718
in chronological order as we see in Figure 4. Again, the low
ACC values imply that this power-law is relatively weak in
the 091505 graph and even invalidate in the 021106 graph.
0
0
10
10
091505.degree
exp(0.55475) *x** (-2.25926)
-1
10
10
-2
-2
10
CCDF
CCDF
10
-3
10
-3
10
-4
-4
10
10
-5
V. T HE T WO -L AYERED A PPROACH
In this section, we first discuss the limitations of the powerlaws and then present a new approach to study the topology
of the Gnutella network.
A. Limitations of the Power-laws
Previous researches [11] and [10] suggest two key causes
for power-law distributions in network topologies: incremental growth and preferential connectivity. Incremental growth
refers to open networks that form by the continual addition
of new nodes, and thus the gradual increase in the size of
the network. Preferential connectivity refers to the tendency
of a new node to connect to existing nodes that are highly
connected or popular.
The topology of the Gnutella network is highly dynamic,
since a node can join or leave the Gnutella network at any
time. More specifically, most leaves tend to disconnect from
the Gnutella network in several minutes after they connect to
the network. The transient life-time of the leaves works against
incremental growth. Moreover, due to the hybrid architecture
of Gnutella protocol 0.6 [2], a leaf keeps only a small number
of connections to ultrapeers and cannot connect to other
leaves. This limitation on leaves also works against preferential
connectivity, because leaves can never become highly connected. Combining the above factors, we can explain why the
current Gnutella network does not follow the early power-law
distributions. It is the limitations of the power-laws that make
them inappropriate for modeling hybrid and highly dynamic
topologies.
As we mentioned earlier, P2P studies usually use synthetically generated topologies characterize by the early powerlaws. These topologies may not reflect properties of current
P2P networks. So there should be a new approach to model
current P2P networks.
B. Our Approach
-5
10
10
-6
10
021106.degree
exp(0.72008) *x** (-2.31074)
-1
are too few ultrapeers with a degree in this interval. This
confirms our previous conclusion that ultrapeers try to hold
more connections up to the limit. The curve of higher degree in
the 021106 graph drops much more sharply, which agrees with
our previous comment that the Gnutella protocol 0.6 prevents
ultrapeers from employing a large number of connections.
-6
0
10
1
2
10
10
degree
(a) 091505 ACC=0.91744
Fig. 4.
3
10
10
0
10
1
2
10
10
3
10
degree
(b) 021106 ACC=0.87718
Log-log plot of Dd versus the degree d.
Compared with a pure power-law distribution, the graphs
share some common patterns. There are too many nodes with
degree around 30, and the resulting curves deviate from the
linear regression. This is coincident with what we found in
rank distribution.
Furthermore, in the 021106 graph, degrees in interval 5 to
20 follow an almost constant distribution, which means there
In our study, we propose a new two-layered approach to
model the topology of the current Gnutella network. We split
the Gnutella network into two layers, namely the mesh and
the forest.
Before we present the analysis of our approach, we provide
below a few definitions. Note that Magoni et al. [5] proposed
some definitions to describe the AS network. We keep these
definitions and modify them into the following ones. Figure 5
shows different kinds of nodes in a sample graph.
• Cycle node: a node that belongs to a cycle (i.e. it is on
a closed path of disjoint nodes; in Figure 5, there are
eleven cycle nodes).
TABLE II
BASIC S TATISTICS OF THE M ESH
Stat.
Nodes
p(m)
Edges
l
Diam.
k
VI. M ESH T OPOLOGY A NALYSIS
In this section, we study the topology properties concerning
the mesh in the Gnutella network. In Table II, we present
some basic statistics about the mesh in our instances. In Table
II, p(m) represents the percentage of nodes in the mesh, l
represents average shortest distance, and k represents average
degree.
A. Mesh Node Rank Exponent Rm
In this section, we study the degrees of the nodes in the
mesh. We sort the nodes in the mesh in decreasing order of
021106
11,852
10.0%
23,539
6.5
17
3.97
degree dvm and define the mesh node rank rvm as the index
of the node in the sequence. We plot the (dvm , rvm ) pairs in
log-log scale. The plots are shown in Figure 6. The data values
are represented by points, while the solid lines represent the
least-squares approximation.
3
4
10
10
exp(2.66303) *x** (-0.60680)
3
10
2
10
1
10
2
10
1
10
0
0
10
021106.mesh.rank
091505.mesh.rank
exp(2.15411) *x** (-0.45877)
mesh node degree
Different kinds of nodes.
mesh node degree
Fig. 5.
Bridge node: a node which is not a cycle node and is on
a path connecting 2 cycle nodes (in Figure 5, there is one
bridge node).
• In-mesh node: a node which is a cycle node or a bridge
node (in Figure 5, the mesh has twelve in-mesh nodes).
• In-tree node: a node which is not an in-mesh node (i.e. it
belongs to a tree; in Figure 5, each tree has four in-tree
nodes).
Mesh is the set of in-mesh nodes and forest is the set of
in-tree nodes.
• Branch node: an in-tree node of degree at least 2.
• Leaf node: an in-tree AS of degree 1.
• Root node: an in-mesh node which is the root of a tree.
• Relay node: a node having exactly 2 connections.
• Border node: a node located on the diameter of the
network.
If we split the Gnutella network into the mesh and the forest,
we can analyze the topological properties of the mesh and the
forest respectively.
After a careful comparison between Figure 2 and Figure 5,
we can find that the mesh in Figure 5 is composed merely of
ultrapeers and acts as the backbone of the Gnutella network.
Since ultrapeers are relatively stable and tend to stay in the
Gnutella network for a longer time, it can meet the requirement
of incremental growth. Further more, since ultrapeers can
connect to other ultrapeers, it can meet the requirement of
preferential connectivity. Hence, the topology of the mesh
theoretically should comply with power-laws (see Section VI
for detailed validation). On the other hand, we can also obtain
major topology properties and distributions of the forest (see
Section VII).
With the knowledge of both the topology of the mesh and
the topology of the forest, we can model the topology of the
Gnutella network easily by merging these two layers.
•
091505
16,487
15.4%
27,467
5.2
14
3.33
0
10
1
10
2
10
3
10
4
10
5
10
10
0
10
1
10
2
10
3
10
4
10
5
10
mesh node rank
mesh node rank
(a) 091505 ACC=0.96425
(b) 021106 ACC=0.96580
Fig. 6. Log-log plot of the mesh node degree dmv versus the rank rmv in
the sequence of decreasing degree.
The points of Figure 6 are well approximated by the linear
regression. The ACC is 0.96425 for the 091505 instance and
0.96580 for the 021106 instance. This leads us to the following
power law and definition.
Power-Law 1 (Mesh Node Rank Exponent): The degree
dvm of a mesh node vm is proportional to the rank of the mesh
node rvm to the power of a constant Rm :
dvm ∝ rvRmm .
Definition 1: Let us sort the mesh nodes of a graph in
decreasing order of degree. We define the mesh rank exponent
Rm to be the slope of the plot of the degrees of the mesh nodes
versus the rank of the nodes in log-log scale.
B. Mesh Node Degree Exponent Om
In this section, we study the distribution of the degrees of
the nodes in the mesh. We define the frequency fdm of a
mesh node degree dm as the number of nodes in the mesh
with degree dm . We plot the (fdm , dm ) pairs in log-log scale
in Figure 7. In these plots, we exclude a small percentage of
nodes of higher degree that have frequency of one, but still
plot 99.9% of the total number of nodes. As we saw earlier,
the higher degrees are described and captured by the mesh
rank exponent.
The major observation of Figure 7 is that the plots are
approximately linear with ACC of 0.97171 for the 091505
instance and 0.96016 for the 021106 instance. We infer the
following power-law and definition.
number of the mesh nodes
exp(4.58382) *x** (-2.71269)
4
10
021106.mesh.degree
exp(3.97970) *x** (-2.02982)
3
10
2
10
2
10
1
10
1
10
0
10
0
1
10
10
node degree
0
1
10
2
10
mesh
(a) 091505 ACC=0.97171
Fig. 7.
0
10
2
10
mesh
10
node degree
(b) 021106 ACC=0.96016
Log-log plot of frequency fdm versus the mesh node degree dm .
Power-Law 2 (Mesh Node Degree Exponent): The frequency fdm of a mesh node degree dm , is proportional to the
degree to the power of a constant Om :
m
fdm ∝ dO
m .
Definition 2: We define the mesh node degree exponent Om
to be the slope of the plot of the frequency of the mesh node
degrees versus the degrees in log-log scale.
C. Mesh Pair Rank Exponent Pm
In this section, we study the Number of distinct Shortest
Paths (NSP) of each pair of vertices in the mesh. The number
of distinct shortest paths between two vertices is the number
of shortest paths such as any of these paths have at least one
vertex not in common [5]. The distribution of NSP is useful
for evaluating the amount of redundancy edges involved in
shortest path. Higher NSP values mean that if one edge of a
shortest path between a pair of nodes is removed, there is still
a probability for another shortest path of the same length to
exist for this pair. We sort the pairs in the mesh in decreasing
NSP npm and define the pair rank rpm as the index of the
pair in the sequence. We plot the (npm , rpm ) pairs in log-log
scale. The plots are shown in Figure 8. Due to the enormous
amount of node pairs, we plot the first 106 pairs only.
6
5
10
10
5
10
4
10
mesh NSP
mesh NSP
Stat.
Nodes
p(t)
Nb of trees
Mean tree size
Max tree size
Mean tree depth
Max tree depth
3
10
3
10
4
10
021106
107,073
90.0%
6,830
16.68
231
1.30
10
D. Mesh NSP Exponent Nm
In this section, we study the distribution of NSP in the mesh.
We define the frequency fnm of a NSP nm as the number of
pairs with NSP of nm in the mesh. We plot the (fnm , nm )
pairs in log-log scale in Figure 9. In these plots, we exclude
a small percentage of pairs of higher NSP that have lowest
frequency, but still plot more than 99.9% of the total number
of pairs. The solid lines are the result of the linear regression.
9
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
091506.mesh.nsp
1
10
exp(10.11262) *x** (-1.22241)
0
10
0
10
1
10
2
10
3
10
4
10
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
021106.mesh.nsp
1
10
exp(9.46422) *x** (-1.48623)
0
10
0
10
(a) 091505 ACC=0.94301
Fig. 9.
1
10
2
10
3
10
4
10
NSP
mesh NSP
(b) 021106 ACC=0.99840
Log-log plot of frequency fnm versus the mesh NSP nm .
The major observation of Figure 9 is that the plots are
approximately linear with ACC of 0.94301 for the 091505
instance and 0.99840 for the 021106 instance. We infer the
following power-law and definition.
Power-Law 4 (Mesh NSP Exponent): The frequency fnm
of a NSP between a pair of nodes in the mesh, nm , is
proportional to the NSP to the power of a constant Nm :
3
10
091505.mesh.NSP.rank
exp(6.34358) *x** (-0.63079)
2
0
10
1
10
2
10
3
10
4
10
m
fnm ∝ nN
m .
021106.mesh.NSP.rank
exp(5.44708) *x** (-0.39529)
2
10
091505
90,718
84.6%
9,886
10.18
4,824
1.52
8
Definition 3: Let us sort the pairs of nodes in the mesh
of a graph in decreasing order of NSP. We define the mesh
pair rank exponent Pm to be the slope of the plot of the NSP
versus the rank of the mesh node pairs in log-log scale.
number of mesh node pairs
number of the mesh nodes
TABLE III
BASIC S TATISTICS OF THE F OREST
4
10
091505.mesh.degree
number of mesh node pairs
5
10
5
10
6
10
7
10
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
mesh node pair rank
mesh node pair rank
(a) 091505 ACC=0.99157
(b) 021106 ACC=0.99632
Fig. 8. Log-log plot of the mesh NSP npm versus the rank rpm in the
sequence of decreasing degree.
The points of Figure 8 are well approximated by the linear
regression with ACC of 0.99157 for the 091505 instance and
0.99632 for the 021106 instance. This leads us to the following
power law and definition.
Power-Law 3 (Mesh Pair Rank Exponent): The NSP npm
between a pair of mesh nodes pm , is proportional to the rank
of the pair rpm to the power of a constant Pm :
npm ∝ rpPmm .
Definition 4: We define the Mesh NSP exponent Nm to be
the slope of the plot of the frequency of the mesh NSP versus
the mesh NSP in log-log scale.
VII. F OREST T OPOLOGY A NALYSIS
In this section, we study the topology properties concerning
the forest in the Gnutella network. In Table III, we present
some basic statistics about the forest in our instances. In Table
III, p(t) represents the percentage of nodes in the forest.
A. Tree Depth Distribution
We define the probability p(td ) of a tree depth td as the
percentage of trees in the forest with depth td . Figure 10
describes the tree depth distribution.
0.8
091505.tree.depth
0.7
021106.tree.depth
0.6
d
p(t )
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
0
2
4
6
8
10
t
d
Fig. 10.
Tree depth distribution
From Figure 10, we notice that more than 56% of trees
is simply composed of leaves directly connected to their
corresponding root and more than 27% of trees has depth 2.
In addition, less than 4% of trees have depth larger than 3.
B. Tree Rank Distribution
In this section, we study the size of each tree, which is
defined as the sum of the vertices composing the tree plus the
root. We sort the trees in decreasing tree size st and define
tree rank rt as the index of the tree in the sequence. We plot
the (st , rt ) pairs in Figure 11, applying log-scale only on the
y-axis. The solid lines are given by linear regression.
3
4
10
10
091505.tree.rank
021106.tree.rank
exp((-2.64284E-4)*x+1.84179)
exp((-1.49779E-4)*x+1.48667)
3
2
tree size
tree size
10
2
10
10
1
10
1
10
0
0
10
2000
4000
6000
8000
10000
10
2000
4000
6000
8000
could reasonably conjecture that our laws might continue to
hold, at least for the near future.
Our work can facilitate the generation of realistic topologies
of P2P networks, specially those which employ a hybrid and
highly dynamic architecture like the Gnutella network. As an
overview, we list the following guidelines for creating P2P
network topologies. First, a small percentage of the nodes
(15.4% or 10.0%) belong to the mesh. Second, the degree
distribution of the mesh is skewed following our power-law
1 and 2, and the NSP distribution of the mesh is skewed
following our power-law 3 and 4. Third, a large percentage
of the nodes (84.6% or 90.0%) belong to the forest. Fourth,
more than 56% of the trees have depth one, less than 4% of
the trees have depth larger than 3, and the maximum depth
is 7 or 10. Fifth, the size distribution of the trees is skewed
following our empirical law 1. As a final step, we merge the
generated mesh and the generated forest together to get the
P2P network topology. If we finetune the parameters, we can
get specific topologies that meet our needs.
IX. C ONCLUSION
In this paper, we give a detailed review of the current
Gnutella network topology as well as its on-going evolution.
We present a novel two-layered approach to study the topology
of the Gnutella network, analyzing it through the mesh perspective and the forest perspective respectively. Furthermore,
we provide detailed topology properties as well as additional
power-laws and the empirical law concerning these two layers.
Using the two-layered approach and laws proposed, we can
generate realistic topologies easily.
tree rank
tree rank
(a) 091505 ACC=0.95621
(b) 021106 ACC=0.95465
Fig. 11. Plot of the tree size st (log-scale) versus the rank rt in the sequence
of decreasing size.
The plots of Figure 11 match the linear regression line.
The ACC is 0.95621 for the 091505 instance and 0.95465
for the 021106 instance. Consequently, we infer the following
empirical law and definition.
Empirical Law 1: The size st of a tree t, is proportional
to an exponential function with exponent being the product of
the rank of the tree rt and a constant T :
st ∝ exp(T rt ).
Definition 5. Let us sort the trees of a graph in decreasing
order of size. We define T to be the slope of the plot of the
sizes of trees versus the rank of the trees with log-scale applied
on the sizes of trees.
This empirical law provides the formula on the sizes of trees
in a sequence of trees.
VIII. D ISCUSSION
The regularity observed in our instances of the Gnutella
network between September 2005 and February 2006 (including but not restricted to the two instances specifically
discussed in this paper) is unlikely to be a coincidence. We
R EFERENCES
[1] Clip2. (2001) The gnutella protocol specification v0.4. [Online].
Available: http://www9.limewire.com/developer/gnutella protocol 0.4.
pdf
[2] Gnutella. (2002) The gnutella protocol v0.6. [Online]. Available:
http://rfc-gnutella.sourceforge.net/src/rfc-0 6-draft.html
[3] (2006) The Limewire website. [Online]. Available: http://www.limewire.
org/
[4] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the internet topology,” in Proc. ACM SIGCOMM’99, New York,
NY, 1999, pp. 251–262.
[5] D. Magoni and J.-J. Pansiot, “Analysis of the autonomous system
network topology,” ACM SIGCOMM Computer Communication Review,
vol. 31, no. 3, pp. 26–37, July 2001.
[6] M. A. Jovanovic, “Modelling large-scale peer-to-peer networks and
a case study of gnutella,” Master’s thesis, University of Cincinnati,
Cambridge, June 2000.
[7] H. Chen, H. Jin, J. Sun, D. Deng, and X. Liao, “Analysis of large-scale
topological properties for peer-to-peer networks,” in Proc. IEEE CCGrid
’04, Apr. 19-22, 2004, pp. 27–34.
[8] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman,
“Search in power-law networks,” Physical Review E, vol. 64, pp. 46 135–
46 143, 2001.
[9] M. Ripeanu, I. Foster, and A. Iamnitchi, “Mapping the gnutella network:
Properties of large-scale peer-to-peer systems and implications for
system design,” IEEE Internet Computing Journal, vol. 6(1), pp. 50–
57, 2002.
[10] A. Medina, I. Matta, and J. Byers, “On the origin of power laws
in internet topologies,” ACM SIGCOMM Computer Communication
Review, vol. 30, no. 2, pp. 18–28, 2000.
[11] A. L. Barabasi and R. Albert, “Emergence of scaling in random
networks,” Science, vol. 286, p. 509, 1999.
Download