Characterizing Unstructured Overlay Topologies in Modern P2P File

advertisement
Characterizing Unstructured
Overlay Topologies in Modern
P2P File-Sharing Systems
Daniel Stutzbach – University of Oregon
Reza Rejaie – University of Oregon
Subhabrata Sen – AT&T Labs
Internet Measurement Conference
Berkeley, CA, USA
October 19th, 2005
Motivation

P2P file-sharing systems are very popular in practice.




Most use an unstructured overlay
Understanding overlay properties is important:




Several million simultaneous users collectively.
60% of all Internet traffic [CacheLogic Research 2005]
Understanding how existing P2P systems function
Developing and evaluating new systems
Unstructured overlays are not well-understood.
We studied overlay properties in Gnutella.



Size: one of the largest P2P systems; more than 1 million users
Mature: In use for several years; older studies for comparisons
Open: No reverse-engineering needed
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 2/18
Defining the Problem
Ultrapeer

Gnutella uses a two-tier overlay.





Improves scalability.
Ultrapeers form an unstructured mesh.
Leaf peers connect to the ultrapeers.
eDonkey, FastTrack are similar.
Studying the overlay requires snapshots.




Top-level overlay
Snapshots capture the overlay as a graph.
Individual snapshots reveal graph properties.
Consecutive snapshots reveal dynamics.
Leaf
However, capturing accurate snapshots is
difficult.
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 3/18
Challenges in Capturing Accurate Snapshots





Snapshots are captured iteratively by a crawler.
An ideal snapshot is instantaneous.
But the overlay is large and rapidly changing.
Therefore, captured snapshots are distorted.
Sampling:



Partial snapshots are less distorted, but may be unrepresentative
For some types of analysis, the whole graph is needed.
Previous studies capture either:


Complete snapshots slowly, or
Partial snapshots.
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 4/18
Cruiser: a Fast Gnutella Crawler

Features:



Cruiser is orders of magnitude faster.



Distributed, highly parallelized implementation
Dynamic adaptation to bandwidth and CPU constraints
Captures one million nodes in around 7 minutes
140,000 peers/min, compared to 2,500 peers/min [Saroiu 02]
We investigated the effects of speed on distortion.

Daniel Stutzbach and Reza Rejaie, “Capturing Accurate Snapshots of the
Gnutella Network”, the Global Internet Symposium, March, 2005.

4% node distortion
15% edge distortion

October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 5/18
Data Set


More than 80,000 snapshots, over the past year.
To examine static properties, we focus on four:
Date
Total Nodes
Leaves
Ultrapeers
Top-level
Edges
9/27/04
725,120
614,912
110,208
1,212,772
10/11/04
779,535
662,568
116,967
1,244,219
10/18/04
806,948
686,719
120,229
1,331,745
2/2/05
1,031,471
873,130
158,345
1,964,121

To examine dynamic properties, we use slices:


Each slice is 2 days of ~500 back-to-back snapshots
Captured starting 10/14/04, 10/21/04, 11/25/04, 12/21/04,
and 12/27/04
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 6/18
Summary of Characterizations


Graph Properties


Implementation
heterogeneity
Degree Distribution:




Reachability:




Top-level degree
distribution
Ultrapeer-leaf connectivity
Degree-distance
correlation
Dynamic Properties

Existence of stable core:



Uptime distribution
Biased connectivity
Properties of stable core:



Largest connected
component
Path lengths
Clustering coefficient
Path lengths
Eccentricity
Small world properties
Resiliency
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 7/18
Top-level Degree
Max 30 in most clients
Max 75 in some clients
Custom




This is the degree distribution among ultrapeers.
There are obvious peaks at 30 and 70 neighbors.
A substantial number of ultrapeers have fewer than 30.
What happened to the power-law seen in prior studies?
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 8/18
What happened to power-law?
[Ripeanu 02 ICJ]



When a crawl is slow, many short-lived peers report long-lived
peers as neighbors.
However, those neighbors are not all present at the same time.
Degree distribution from a slow crawl resembles prior results.
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 9/18
Shortest-Path Distances



Distribution of distances among ultrapeers and among all peers
In the top-level, 70% of distances are exactly 4 hops.
Across all peers, most distances are 5 or 6 hops.


Shows the effect of the two-tier with multiple parents
Despite large size, distances are short.
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 10/18
Is Gnutella a Small World?

Small worlds arise naturally in many places.


Movies actors, power grid, co-authors of papers
They have short distances, but significant clustering,
compared to a similar random graph.
Mean
Clustering
Distance Coefficient

Gnutella
4.2
0.018
Random
3.8
0.00038
Conclusion: Gnutella is a small world.


Very high clustering adversely affects flooding queries
But Gnutella isn’t clustered enough to affect performance.
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 11/18
Resiliency to Node Failure
Random
Highest degree first




After removing nodes, this figure shows how many remain connected.
The Gnutella topology is extremely resilient to random node failure.
It’s resilient even when the highest-degree nodes are removed first.
Complex algorithms are not necessary for ensuring resilience.
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 12/18
What about Dynamic Properties?



Prior work suggests many peers are short-lived while
others are very long-lived.
How do these nodes interact?
Methodology:




Capture a long series of back-to-back snapshots
Annotate the last snapshot with the uptime of each peer
Examine the properties of the annotated topology
Group peers by uptime
Present for 5 snapshots
Present for 2 snapshots
Departed peer
Newly arrived peer
Time
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 13/18
Stable Core
> 20 h


Most peers are recent arrivals.
> 10 h
Other peers have been around for a long time.
 We can select a set of peers based on a minimum uptime threshold.
 We call this the stable core.
 Does the longevity of a peer affect who its neighbors are?
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 14/18
Biased Connectivity

Hypothesis: long-lived nodes tend to be more
connected to other long-lived nodes



Rationale: Once connected, they stay connected.
The longer they’re around, the more opportunities
they have to neighbor.
Approach: Check for biased connectivity



Randomize the edges to create a graph without
biased connectivity
Then compare
Are there more edges in the observed stable core
compared to random?
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 15/18
Stable Core Edges




20%—40% more edges in the stable core compared to random.
There is an onion-like bias where long-lived peers are more likely
to be connected to other long-lived peers.
We examined other properties of the stable core.
Despite high churn, there is a relatively stable “backbone”.
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 16/18
Summary


Characterizations of recent and accurate snapshots
Graph properties:




The degree distribution in Gnutella is not power law.
Gnutella exhibits small world characteristics.
Gnutella is resilient.
Dynamic properties:



There is a stable core within the topology
Peer churn causes the stable core to have an onion-like shape.
This effect is likely to occur in any unstructured system.
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 17/18
Future Work
Examining long-term trends in Gnutella
using many snapshots.
 Characterizing churn
 Characterizing properties of other widelydeployed P2P systems

Kad (a DHT with more than 1 million users)
 BitTorrent


Developing sampling techniques for P2P
October 19th, 2005
http://mirage.cs.uoregon.edu/P2P
IMC 2005
Slide 18/18
Download