Mapping peer2peer networks

advertisement
Topology Mapping of
Peer-to-Peer Systems
Suat Mercan
Sep 23th, 2009
CS 790G: COMPLEX NETWORKS
Outline

Characterization of users of P2P systems


Effect of P2P traffic on the underlying network


Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale
Peer-to-Peer Systems and Implications for System Design”, IEEE Internet
Computing, 2002.
Searching on the P2P network


Sen, et.al., “Analyzing peer-to-peer traffic across large networks”, IMW’02
Peer-to-Peer Topologies


Saroiu, et.al., “A Measurement Study of Peer-to-Peer File Sharing
Systems”, MMCN, 2002.
Sripanidkulchai, “The popularity of Gnutella queries and its implications on
scalability”, 2001
Deciphering proprietary P2P systems (like Kazaa)

Leibowitz, et.al., “Deconstructing the Kazaa Network”, WIAPP, 2003.
Introduction to Peer-to-Peer (P2P)
systems



End-systems (or peers), are capable of behaving as
clients and servers of data, hence system is scalable
and reliable
Peers participation is voluntary, membership is dynamic,
hence topology keeps changing
Most popularly used for file sharing, hence peer-to-peer
systems have become synonymous with peer-to-peer
file sharing networks
Classification of P2P systems

Centralized (e.g. Napster)

Decentralized


Structured (e.g. Chord, CAN, Pastry, Tapestry)
Unstructured (e.g. Gnutella, Kazaa, Freenet, eDonkey,
eMule, Direct Connect, …)
Popularity of unstructured
decentralized P2P networks


Gnutella host count,
maintained by Limewire
(http://www.limewire.com)
good scope for measurement
studies because:
 deployed
and widely used
 use a lot of bandwidth during
data transfer, hence a
concern for network operators

quite a few measurement
studies have been done on
these systems
Gnutella protocol overview

Connecting to the Gnutella network



Searching on the network



bootstrap using GWebCache system and locally cached hostlist
Ping/Pong messages are exchanged with potential neighbors
Query messages are flooded on the network
QueryHit messages are received (back-propagated along Query
path) from peers having the requested content
Downloading the content

peers download files directly from peers having the requested
content
Characterization of Users of P2P
systems





Latency
Lifetime of peers
Bottleneck bandwidth
Number of files shared and downloaded
Degree of cooperation
Measurement Methodology

active crawling of the Napster and Gnutella systems


Napster: issued queries for popular content, and then queried
central server for peer information
Gnutella: used ping/pong messages in protocol to get metadata
about peers, and then their neighbors and so on
Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002
Host Lifetime analysis




20% peers in Napster, Gnutella have IP-level uptime of 93% or more
Napster peers have higher application uptimes than Gnutella peers
the best 20% of Napster peers have uptime of 83% or more and the
best 20% of Gnutella peers have uptime of 45% or more
median session duration is 60 minutes for Napster and Gnutella
Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002
Latency analysis (Gnutella)


20% peers have a latency of at most 70ms and 20% have a latency
of at least 280ms
correlation between downstream bottleneck bandwidth and latency:
two clusters for modems (20-60Kbps, 100-1000ms) and broadband
(1Mbps, 60-300ms)
Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002
Downloads, Uploads and Shared Files


relative number of
downloads and
uploads varies
significantly across
bandwidth classes
clear client/server
behavior of
different classes
Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002
Shared files v/s Shared Data
(Napster and Gnutella)


Strong correlation between number of files shared and amount of
shared MB of data
slope of both lines is 3.7MB, the size of a typical MP3 audio file
Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002
Degree of Cooperation (Napster)


30% of the peers
report bandwidth as
64Kbps or less, but
actually have
significantly higher
bandwidths
10% of the peers
reporting higher
bandwidths (3Mbps or
higher) actually have
significantly lower
bandwidth
Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002
Effect of P2P traffic on underlying
network




host distribution and host connectivity
traffic volume and mean bandwidth usage
traffic patterns over time
connection duration and on-time methodology: passive
measurements at routers (port based)
Datasets used for analysis



FastTrack is most popular in terms of number of hosts participating
and average traffic volume per day
rapid growth of P2P traffic is mainly caused by increasing number of
hosts in the system
Direct Connect systems have higher traffic volume per IP address
S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002
Host distribution analysis





# of IP addresses in FastTrack
ranges from 0.5 to 2 million
ratio of # of IP addresses in
FastTrack:Gnutella:DirectConnect
is 150:30:1
Density of a prefix is the number
of unique active IP addresses
belonging to it
Density of an AS is the number of
unique prefixes belonging to it
FastTrack hosts are distributed
more densely than Gnutella and
Direct Connect hosts (64:16:4)
S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002
Host connectivity analysis (FastTrack)



48% of individual IPs
communicate with at most one IP
and 89% with at most 10 IPs
75% of prefixes and ASes
communicate with at least 2
prefixes or ASes
very few hosts have very high
connectivity and most hosts have
very low connectivity
S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002
Connection duration and On-time
(FastTrack)




50% of the IPs are online for less than one minute/day
60% IPs, 40% prefixes, 30% ASes stay for less than 10 mins/day
65% of the IPs join only once
AS, prefix level- not very transient
S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002
Peer-to-Peer Topologies


Goal: To discover and analyze the Gnutella overlay
topology and evaluate generated traffic
methodology: active crawling
Gnutella Network Growth



number of nodes
in the largest
connected
component in the
Gnutella network
significantly larger
network found
during Memorial
Day and
Thanksgiving
50 times increase
within 6 months
Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer
Systems”, 2002
Distribution of node-to-node shortest
paths


more than 95%
node pairs are at
most 7 hops away
longest node-tonode path is 12
hops
Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer
Systems”, 2002
Averag node connectivity

average number
of connections
per node remains
constant = 3.4
Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer
Systems”, 2002
Node connectivity distribution


Nov 2000: Gnutella nodes organize themselves in a power law
March 2001: connectivity does not look like a power law for all
nodes; power law distribution is preserved for nodes with more than
10 links; for less than 10 links, the distribution is almost constant
Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer
Systems”, 2002
Searching on the P2P network

methodology: passive measurements at one or two
peers, made part of the Gnutella network, to log queries
and query messages routed through it
Query popularity distribution



two distinct distributions
of document popularity,
with a break at query
rank 100
most popular documents
are equally popular
less popular documents
follow a Zipf-like
distribution, with alpha
beween 0.63 and 1.24
K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”,
2001.
Deciphering P2P systems
File download distribution by bytes



CDF of byte popularity distribution for 10%, 1% most popular files
0.8 % of all files account for 80% of the generated traffic
0.1% of the most bandwidth hungry files (top 1% of all files)
generate 50% traffic
Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003
File size distribution


note the log-scale on
X-axis
3 distinct modes



100KB for pictures
2-5MB for music files
700MB for movies
Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003
Quantity and Rate of Distinct Files





new files seen at different time scales- every day, hour, minute
150,000 distinct files during a 17-day period
daily graph: new files seen continued to decrease, but no steady
state value (rate of injection of files in the network) achieved
hourly graph: time of day effect
per-minute graph: 50 new files seen every minute on an average
Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003
Rate of change of popularity of files



percentage of files that make it to the N most popular files list- (a) in
consecutive intervals and (b) after T intervals, compared with first list
measurement interval is 24 hours
15% of the highly popular files remain popular throughout the
experiment, and the rest are popular at short time intervals
Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003
Open Questions




Mapping a global snapshot of the entire Gnutella
topology
Bootstrapping of peers in unstructured peer-to-peer
systems
More efficient searching on P2P networks- efforts in this
direction include random walks, bloom-filter based
techniques etc.
End-point privacy/anonymity is absent in most of these
peer-to-peer networks
References

Papers covered in the seminar:






S. Saroiu, P. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing
Systems”, MMCN 2002.
S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW 2002.
M. Ripeanu, I. Foster, A. Iamnitchi, “Mapping the Gnutella Network: Properties of Large-Scale
Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, 2002.
Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001.
N. Leibowitz, M. Ripeanu, A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP 2003.
Papers not covered in the seminar:





J. Chu, K.Labonte and B. Levine, “Availability and Locality Measurements of Peer-to-Peer File
Systems”, SPIE, July 2002.
F. Bustamante and Y. Qiao, “Friendships that last: Peer lifespan and its role in P2P
protocols”, WCW 2003.
R. Bhagwan, S. Savage and G. Voelker, “Understanding Availability”, IPTPS 2003.
Saroiu, et.al., “An Analysis of Internet Content Delivery Systems”, OSDI 2002.
Markatos et.al., “Tracing a large-scale Peer-to-Peer System: An hour in the life of Gnutella”,
CCGrid 2002.
Download