Characterizing Overlay Topologies & Dynamics in Peer-to-Peer Networks Daniel Stutzbach, Reza Rejaie University of Oregon Subhabrata Sen AT&T Labs IEEE Computer & Communications Workshop, Huntington Beach October 25th, 2005 Motivation P2P file-sharing systems are very popular in practice. Most use an unstructured overlay. Understanding overlay properties & dynamics is important: Several million simultaneous users collectively. 60% of all Internet traffic [CacheLogic Research 2005] Understanding how existing P2P systems function Developing and evaluating new systems Unstructured overlays are not well-understood. We characterized overlay topology in Gnutella because Size: one of the largest P2P systems; more than 1 million users Mature: In use for several years; older studies for comparisons Open: No reverse-engineering needed CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 2/18 Defining the Problem Ultrapeer Gnutella uses a two-tier overlay. Improves scalability. Ultrapeers form an unstructured mesh. Leaf peers connect to the ultrapeers. eDonkey, FastTrack are similar. Studying the overlay requires snapshots. Top-level overlay Snapshots capture the overlay as a graph. Individual snapshots reveal graph properties. Consecutive snapshots reveal dynamics. Leaf However, capturing accurate snapshots is difficult. CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 3/18 Challenges in Capturing Accurate Snapshots Snapshots are captured iteratively by a crawler. An ideal snapshot is instantaneous. But the overlay is large and rapidly changing. Captured snapshots are likely to be distorted. Previous studies captured either Complete snapshots with slow crawler => distorted Partial snapshots => less distorted, but unrepresentative Some types of analysis require the whole graph. Increasing crawler speed reduces distortion in captured snapshots. CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 4/18 Cruiser: a Fast Gnutella Crawler Features: Cruiser is orders of magnitude faster than other P2P crawlers: Distributed, highly parallelized implementation Dynamic adaptation to bandwidth & CPU constraints Captures one million nodes in around 7 minutes 140,000 peers/min, compared to 2,500 peers/min [Saroiu 02] We investigated the effects of speed on distortion. 4% node distortion and 15% edge distortion Daniel Stutzbach and Reza Rejaie, “Capturing Accurate Snapshots of the Gnutella Network”, the Global Internet Symposium, March, 2005. CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 5/18 Data Set More than 80,000 snapshots, over the past year. To examine static properties, we focus on four: Date Total Nodes Leaves Ultrapeers Top-level Edges 9/27/04 725,120 614,912 110,208 1,212,772 10/11/04 779,535 662,568 116,967 1,244,219 10/18/04 806,948 686,719 120,229 1,331,745 2/2/05 1,031,471 873,130 158,345 1,964,121 To examine dynamic properties, we use slices: CCW 2005 Each slice is 2 days of ~500 back-to-back snapshots Captured starting 10/14/04, 10/21/04, 11/25/04, 12/21/04, and 12/27/04 http://mirage.cs.uoregon.edu/P2P Slide 6/18 Summary of Characterizations Graph Properties Implementation heterogeneity Degree Distribution: Reachability: CCW 2005 Top-level degree distribution Ultrapeer-leaf connectivity Degree-distance correlation Dynamic Properties Existence of stable core: Uptime distribution Biased connectivity Properties of stable core: Largest connected component Path lengths Clustering coefficient Path lengths Eccentricity Small world properties Resiliency http://mirage.cs.uoregon.edu/P2P Slide 7/18 Top-level Degree Max 30 in most clients Max 75 in some clients Custom This is the degree distribution among ultrapeers. There are obvious peaks at 30 and 70 neighbors. A substantial number of ultrapeers have fewer than 30. What happened to the power-law reported by prior studies? CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 8/18 What happened to power-law? [Ripeanu 02 ICJ] When a crawl is slow, many short-lived peers report long-lived peers as neighbors. But those neighbors are not all present at the same time. Degree distribution from a slow crawl resembles prior results. CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 9/18 Shortest-Path Distances Distribution of distances among ultrapeers (left) Distribution of distances among all peers (right) 70% of distances are exactly 4 hops. Most distances are 5 or 6 hops. Shows the effect of the two-tier with multiple parents Despite large size, pair-wise distances are short. CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 10/18 Is Gnutella a Small World? Small worlds arise naturally in many places. Movies actors, power grid, co-authors of papers Small world graphs have short distances, but significant clustering, compared to a similar random graph. Mean Clustering Distance Coefficient Gnutella 4.2 0.018 Random 3.8 0.00038 Gnutella is a small world. CCW 2005 Very high clustering adversely affects flooding queries. But Gnutella isn’t too clustered to affect performance. http://mirage.cs.uoregon.edu/P2P Slide 11/18 Resiliency to Node Failure Random Highest degree first Ratio of connected peers after node failure. The Gnutella topology is extremely resilient to random node failure. It’s resilient even when the highest-degree nodes are removed. Complex algorithms are not necessary to achieve resiliency. CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 12/18 Dynamic Properties How does node churn affect overlay dynamics? Are some “regions” of the overlay more stable? How can we identify such a region? Methodology: Capture a long series of back-to-back snapshots Estimate the uptime of individual peers in the last snapshot Group peers with uptime higher than a threshold Examine biased connectivity within each group Present for 5 snapshots Present for 2 snapshots Departed peer Newly arrived peer CCW 2005 Time http://mirage.cs.uoregon.edu/P2P Slide 13/18 Stable Core T > 20 h Most peers have a short uptime. T > 10 h Other peers have been around for a long time. Stable core: a set of peers with uptime higher than a threshold (T). CCW 2005 Higher threshold => more stable group of peers http://mirage.cs.uoregon.edu/P2P Slide 14/18 Biased Connectivity Hypothesis: long-lived nodes tend to be more connected to other long-lived nodes Rationale: Once connected, they stay connected. Long-lived peers have more opportunities to become neighbor. To quantify bias in the connectivity of the stable core: CCW 2005 Randomize the edges to create a graph without biased connectivity. Compare the edges in the observed stable core with the randomized graph. http://mirage.cs.uoregon.edu/P2P Slide 15/18 Stable Core Edges 20%—40% more edges in the stable core compared to random. Connectivity exhibits an onion-like biased connectivity where peers are more likely to connect to other peers with same/higher uptime. We examined other properties of the stable core. Despite high churn, there is a relatively stable “backbone”. CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 16/18 Summary Characterizations of Gnutella overlay based on recent and accurate snapshots. Graph properties: Dynamic properties: The degree distribution in Gnutella is not power law. Gnutella exhibits small world characteristics. Gnutella is resilient. There is a stable core within the overlay topology. Peer churn causes the stable core to exhibit an onion-like biased connectivity. This effect is likely to occur in other unstructured P2P systems. Daniel Stutzbach, Reza Rejaie, Subhabrata Sen, “Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems”, Internet Measurement Conference, Berkeley, 2005 CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 17/18 Future Work Examining underlying causes of the biased connectivity. Exploring long-term trends in overlay properties. Characterizing churn Characterizing properties of other widelydeployed P2P systems Kad (a DHT with more than 1 million users) BitTorrent Developing sampling techniques for P2P CCW 2005 http://mirage.cs.uoregon.edu/P2P Slide 18/18