Characterizing Unstructured Overlay Topologies in Modern P2P File

advertisement
Characterizing Overlay Topologies &
Dynamics in Peer-to-Peer Networks
Daniel Stutzbach, Reza Rejaie
University of Oregon
Subhabrata Sen
AT&T Labs
IEEE Computer & Communications Workshop, Huntington Beach
October 25th, 2005
Motivation

P2P file-sharing systems are very popular in practice.




Most use an unstructured overlay.
Understanding overlay properties & dynamics is important:




Several million simultaneous users collectively.
60% of all Internet traffic [CacheLogic Research 2005]
Understanding how existing P2P systems function
Developing and evaluating new systems
Unstructured overlays are not well-understood.
We characterized overlay topology in Gnutella because



Size: one of the largest P2P systems; more than 1 million users
Mature: In use for several years; older studies for comparisons
Open: No reverse-engineering needed
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 2/18
Defining the Problem
Ultrapeer

Gnutella uses a two-tier overlay.





Improves scalability.
Ultrapeers form an unstructured mesh.
Leaf peers connect to the ultrapeers.
eDonkey, FastTrack are similar.
Studying the overlay requires snapshots.




Top-level overlay
Snapshots capture the overlay as a graph.
Individual snapshots reveal graph properties.
Consecutive snapshots reveal dynamics.
Leaf
However, capturing accurate snapshots is
difficult.
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 3/18
Challenges in Capturing Accurate Snapshots

Snapshots are captured iteratively by a crawler.
 An ideal snapshot is instantaneous.
 But the overlay is large and rapidly changing.


Captured snapshots are likely to be distorted.
Previous studies captured either


Complete snapshots with slow crawler => distorted
Partial snapshots => less distorted, but unrepresentative

Some types of analysis require the whole graph.
 Increasing crawler speed reduces distortion in
captured snapshots.
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 4/18
Cruiser: a Fast Gnutella Crawler

Features:



Cruiser is orders of magnitude faster than other P2P
crawlers:



Distributed, highly parallelized implementation
Dynamic adaptation to bandwidth & CPU constraints
Captures one million nodes in around 7 minutes
140,000 peers/min, compared to 2,500 peers/min [Saroiu 02]
We investigated the effects of speed on distortion.

4% node distortion and 15% edge distortion
 Daniel Stutzbach and Reza Rejaie, “Capturing Accurate Snapshots
of the Gnutella Network”, the Global Internet Symposium, March,
2005.
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 5/18
Data Set


More than 80,000 snapshots, over the past year.
To examine static properties, we focus on four:
Date
Total Nodes
Leaves
Ultrapeers
Top-level
Edges
9/27/04
725,120
614,912
110,208
1,212,772
10/11/04
779,535
662,568
116,967
1,244,219
10/18/04
806,948
686,719
120,229
1,331,745
2/2/05
1,031,471
873,130
158,345
1,964,121

To examine dynamic properties, we use slices:


CCW 2005
Each slice is 2 days of ~500 back-to-back snapshots
Captured starting 10/14/04, 10/21/04, 11/25/04, 12/21/04,
and 12/27/04
http://mirage.cs.uoregon.edu/P2P
Slide 6/18
Summary of Characterizations


Graph Properties


Implementation
heterogeneity
Degree Distribution:




Reachability:




CCW 2005
Top-level degree
distribution
Ultrapeer-leaf connectivity
Degree-distance
correlation
Dynamic Properties

Existence of stable core:



Uptime distribution
Biased connectivity
Properties of stable core:



Largest connected
component
Path lengths
Clustering coefficient
Path lengths
Eccentricity
Small world properties
Resiliency
http://mirage.cs.uoregon.edu/P2P
Slide 7/18
Top-level Degree
Max 30 in most clients
Max 75 in some clients
Custom




This is the degree distribution among ultrapeers.
There are obvious peaks at 30 and 70 neighbors.
A substantial number of ultrapeers have fewer than 30.
What happened to the power-law reported by prior studies?
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 8/18
What happened to power-law?
[Ripeanu 02 ICJ]



When a crawl is slow, many short-lived peers report long-lived
peers as neighbors.
But those neighbors are not all present at the same time.
Degree distribution from a slow crawl resembles prior results.
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 9/18
Shortest-Path Distances

Distribution of distances among ultrapeers (left)


Distribution of distances among all peers (right)



70% of distances are exactly 4 hops.
Most distances are 5 or 6 hops.
Shows the effect of the two-tier with multiple parents
Despite large size, pair-wise distances are short.
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 10/18
Is Gnutella a Small World?

Small worlds arise naturally in many places.


Movies actors, power grid, co-authors of papers
Small world graphs have short distances, but significant
clustering, compared to a similar random graph.
Mean
Clustering
Distance Coefficient

Gnutella
4.2
0.018
Random
3.8
0.00038
Gnutella is a small world.


CCW 2005
Very high clustering adversely affects flooding queries.
But Gnutella isn’t too clustered to affect performance.
http://mirage.cs.uoregon.edu/P2P
Slide 11/18
Resiliency to Node Failure
Random
Highest degree first




Ratio of connected peers after node failure.
The Gnutella topology is extremely resilient to random node failure.
It’s resilient even when the highest-degree nodes are removed.
Complex algorithms are not necessary to achieve resiliency.
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 12/18
Dynamic Properties

How does node churn affect overlay dynamics?
Are some “regions” of the overlay more stable?
How can we identify such a region?
Methodology:







Capture a long series of back-to-back snapshots
Estimate the uptime of individual peers in the last snapshot
Group peers with uptime higher than a threshold
Examine biased connectivity within each group
Present for 5 snapshots
Present for 2 snapshots
Departed peer
Newly arrived peer
CCW 2005
Time
http://mirage.cs.uoregon.edu/P2P
Slide 13/18
Stable Core
T > 20 h



Most peers have a short uptime.
T > 10 h
Other peers have been around for a long time.
Stable core: a set of peers with uptime higher than a threshold (T).

CCW 2005
Higher threshold => more stable group of peers
http://mirage.cs.uoregon.edu/P2P
Slide 14/18
Biased Connectivity

Hypothesis: long-lived nodes tend to be more
connected to other long-lived nodes



Rationale: Once connected, they stay connected.
Long-lived peers have more opportunities to
become neighbor.
To quantify bias in the connectivity of the
stable core:


CCW 2005
Randomize the edges to create a graph without
biased connectivity.
Compare the edges in the observed stable core
with the randomized graph.
http://mirage.cs.uoregon.edu/P2P
Slide 15/18
Stable Core Edges




20%—40% more edges in the stable core compared to random.
Connectivity exhibits an onion-like biased connectivity where peers
are more likely to connect to other peers with same/higher uptime.
We examined other properties of the stable core.
Despite high churn, there is a relatively stable “backbone”.
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 16/18
Summary


Characterizations of Gnutella overlay based on recent
and accurate snapshots.
Graph properties:




Dynamic properties:




The degree distribution in Gnutella is not power law.
Gnutella exhibits small world characteristics.
Gnutella is resilient.
There is a stable core within the overlay topology.
Peer churn causes the stable core to exhibit an onion-like biased
connectivity.
This effect is likely to occur in other unstructured P2P systems.
Daniel Stutzbach, Reza Rejaie, Subhabrata Sen, “Characterizing
Unstructured Overlay Topologies in Modern P2P File-Sharing Systems”,
Internet Measurement Conference, Berkeley, 2005
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 17/18
Future Work

Examining underlying causes of the biased
connectivity.
 Exploring long-term trends in overlay properties.
 Characterizing churn
 Characterizing properties of other widelydeployed P2P systems



Kad (a DHT with more than 1 million users)
BitTorrent
Developing sampling techniques for P2P
CCW 2005
http://mirage.cs.uoregon.edu/P2P
Slide 18/18
Download