A Measurement Study of Peer-to-Peer File Sharing Systems Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble Presented by Zhengxiang Pan March 18th, 2003 Introduction • • • • • Napster & Gnutella Population of users Bottleneck bandwidth of hosts & latencies Duration time of remain connected Number of files shared & downloaded Methodology-architecture • Napster’s architecture – A cluster of central servers – Each peer connects to one server – Servers cooperate to process query • Gnutella’s architecture – No centralized servers – Peers form overlay network – Send a query by a controlled flood Methodology-crawler • Napster crawler – A larger number of connections to a single server – Issue popular queries in parallel – Captured 40%-60% local users • Gnutella crawler – Iteratively send ping messages with large TTLs – Discover new hosts by receiving pong messages. – Capture 25%-50% of the total population Methodology-directly measure characteristics • Latency – Measure the time spent by exchanging a 40-byte TCP packet. • Lifetime – Offline: not respond to TCP SYN packets – Inactive: respond with TCP RST – Active: accept the connection • Bottleneck bandwidth – Approximate to available bandwidth – Actively measure upstream and downstream using a few TCP packets Results-bandwidth Downstream & upstream bottleneck bandwidth -50% in Napster & 60% in Gnutella use broadband connections -25% in Napster & 8% in Gnutella use modems -20% in Napster & 30% in Gnutella have high bandwidth (>3Mbps) Result-reported bandwidth 22% in Napster report “unknown” bandwidth Result- latency Latencies for Gnutella users -Unstructured, ad-hoc, a substantial fraction suffer from highlantency -Difference in trans-oceanic peers Result- availability -only 20% peers had an IP-level uptime of 93% or more -Median session duration : 60 minutes Result-files -25% in Gnutella do not share any files -40%-60% peers share 5%-20% of the shared files Result-download & upload the percentage of peers in each bandwidth class is roughly the same as the percentage of files shared by that bandwidth class. Result- cooperate -30% of the users that report their bandwidth as 64 Kbps or less actually have a significantly greater bandwidth. -10% of the users reporting high bandwidth (3Mbps or higher) in reality have significantly lower bandwidth. Result-resilience of Gnutella overlay Although highly resilient in the face of random breakdowns, Gnutella is nevertheless highly vulnerable in the face of well-orchestrated, targeted attacks. Conclusion • Heterogeneity of hosts – Carefully delegate responsibilities • Clearly evidence of client-like and serverlike behaviors • Peers tend to misreport information if there is an incentive to do so – Built-in incentive for telling the truth – Verify reported information