ppt

advertisement
CS7701: Research Seminar on Networking
http://arl.wustl.edu/~jst/cse/770/
Review of:
Making Gnutella-like P2P Systems
Scalable
• Paper by:
– Yatin Chawathe (AT&T)
–Sylvia Ratnasamy (Intel)
–Lee Breslau (AT&T)
–Nick Lanham (UC Berkeley)
–Scott Shenker (ICSI)
• Published in:
– IEEE SIGCOMM 2003
• Reviewed by:
– Todd Sproull
•Discussion Leader:
– Christoph Jechlitschek
1 - CS7701 – Fall 2004
Outline
•
•
•
•
•
•
Introduction
Problem Description
Gia Design
Simulation Results
Implementation
Conclusions
2 - CS7701 – Fall 2004
Introduction
• Peer to Peer (P2P) Networks
– “Systems serving other Systems”
– Potential for millions of users
– Gained consumer popularity through Napster
• Napster
– Started in 1999 by Shawn Fanning
– Enabled music fans to trade songs over a P2P network
– Clients connected to centralized Napster Servers to locate music
– 2001 Judge ruled Napster had to block all copyrighted material
– 2002 Napster folded
• RIAA continued after Napster clones
• Gnutella
– March 14, 2000 Nullsoft released first version of software
• Created by Justin Frankel and Tom Pepper
• Nullsoft pulled the software the next day
– Software was reverse engineered
– Open Source clients became available
– Built around decentralized approach
3 - CS7701 – Fall 2004
Gnutella
• Distributed search and download
• Unstructured: ad-hoc topology
– Peers connect to random nodes
• Random search
P5
– Flood queries across network
• Scaling problems
– As network grows, search
overhead increases
P1
P4
who has
P2 has “madonna“madonna”
ray-of-light.mp3”
P2
4 - CS7701 – Fall 2004
P6
P3
Problem
• Gnutella has notoriously poor scaling
– Flooding-based Solution
– Just using Distributed Hash Tables does not
necessarily fix the problem
• Challenge
– Improve scaling while maintain Gnutella’s
simplicity
• Propose new mechanisms to fix scalability issues
• Evaluate performance of these individual
components and the entire network
5 - CS7701 – Fall 2004
What about DHTS?
• Distributed Hash Tables (DHTs)
– Provides hash table abstraction over multiple compute nodes
• How it works
– Each DHT can store data items
– Data items indexed via lookup key
– Overlay routing delivers requests for a given key to the responsible
node
– O (log N) message hops in network of N nodes
– DHT adjusts mapping of keys and neighbor tables when node set
changes
6 - CS7701 – Fall 2004
Example
B’s Routing
Table
Key
Pointer
7
C
8
D
I have key
6
E
Key 6!
A
Key 6?
Key 6!
Key 6?
B
Key 6?
Key 6?
Key 6!
D
Nope!
C
7 - CS7701 – Fall 2004
D’s Routing
Table
Key
Pointer
6
E
DHT only P2P network?
• Problems
– P2P clients are transient
• Clients joining and leaving at rates causing a fair amount of “churn”
• Route failures require O (log n) repair operations
– Keyword searches are more prevalent, and more important than
an exact-match queries
• “Madonna Ray of Light mp3” or “Madona Ray Light mp3” ..
– Queries are for hay, not needles
• Most requests for popular content
• 50% content requests for more than 100 replicas
• 80% content requests for more than 80 replicas
8 - CS7701 – Fall 2004
The Solution
• Design new Gnutella like P2P system “Gia”
– Short for gianduia, generic form of hazelnut spread
Nutella
• What’s so great about it?
– Dynamic Topology Adaptation
• Accounts for heterogeneity among nodes
– Active Flow Control Scheme
• Implements token based allocation for queries
– One-hop replication
• Keep small nodes next to well connected “higher capacity”
nodes
– Capacity refers to message processing capabilities of a node per
unit time
– Search Protocol based on Random Walks
• No longer flooding the network with requests
9 - CS7701 – Fall 2004
Example
• Make high-capacity nodes easily reachable
– Dynamic topology adaptation
• Make high-capacity nodes have more answers
– One-hop replication
• Search efficiently
– Biased random walks
• Prevent overloaded nodes
– Active flow control
Query
10 - CS7701 – Fall 2004
Dynamic Topology Adaptation
• Core Component of Gia
• Goals
– Ensure high capacity nodes are ones with high
degree
– Keep low capacity nodes within short reach of
high capacity nodes
• Accomplished through satisfaction level S
– When S=0, node is dissatisfied
– As node accumulates more neighbors,
satisfaction rises until it reaches a satisfaction
level of 1
11 - CS7701 – Fall 2004
Adding new neighbors
• Adding neighbor Y to X
– Add neighbor new neighbor,
if room exists
– If no room, check to see if an
existing neighbor can be
replaced
– Goal:
• Find an existing neighbor
with capacity less then or
equal to new neighbor, with
the highest degree
A
Y
X
• Do not drop an already poorly
connected neighbor
B
C
• Assumptions:
– Max Neighbors of X = 3
– Capacity of all nodes the same
12 - CS7701 – Fall 2004
Token Based Flow Control
• Allows client to query the neighbor only if
allowed from the neighbor
– Client must have token from neighbor
• Tokens sent from a client to its neighbors
periodically
– Token allocation rate based on nodes ability to
process queries
13 - CS7701 – Fall 2004
One Hop Replication
• Gia nodes maintain index of content of
neighbors
– Improves efficiency of search process
– Allows for neighbors to respond to search
queries
• Being “close” to content is useful
– Not necessary that you have the requested
content, but instead a pointer to it
14 - CS7701 – Fall 2004
Search Protocol
• Based on biased random walks
– Gia node selects highest capacity neighbor that
it has tokens for and sends query
– Queues message if no tokens available for any
neighbor
• Uses two mechanisms for control
– TTL bounds duration of walks
– Maintains MAX_RESPONSES parameter for
maximum number of answers it searches for
15 - CS7701 – Fall 2004
Simulations
•
Four basic models
– FLOOD
• Gnutella Model
– RWRT
• Random Walks over Random Topologies
• Proposed by Lv et al.
– SUPER
• Classifies some nodes as “Super Nodes”, based on Capacity (> 1000x)
– GIA
•
• Gia protocol suite
Capacity
– The number of messages (queries or add/drop requests) a node can process per
unit time
– Derived from measured bandwidth distributions from Sariou et al.
• Fair amount of clients have dialup connections
• Majority are using cable-modem or DSL
• Few have “high-speed” connections
16 - CS7701 – Fall 2004
Performance Metrics
• Collapse Point (CP)
– Per node query rate at the point beyond which
the success rate drops below 90%.
– Referred to as the knee
• Hop-count before collapse (CP-HP)
– Average hop count prior to collapse
17 - CS7701 – Fall 2004
Collapse Point (qps/node)
Performance Comparison
1000
GIA: N=10,000
SUPER: N=10,000
10
RWRT: N=10,000
FLOOD: N=10,000
0.1
0.001
0.00001
0.01
18 - CS7701 – Fall 2004
0.1
Replication Rate (percentage)
1
Factor Analysis
• Effects of individual
components
– Remove each
component from Gia
one at a time
– Add each component
to RWRT
– No single component
contributes entirely to
Gia’s success
19 - CS7701 – Fall 2004
Multiple Searches
• CP changes with MAX_RESPONSES
• Replication Factor and MAX_RESPONSES
20 - CS7701 – Fall 2004
Robustness
Collapse point
(qps/node)
1000
replication rate = 1.0%
replication rate = 0.5%
replication rate = 0.1%
100
10
1
0.1
Static SUPER
0.01
Static RWRT (1% repl)
0.001
10
21 - CS7701 – Fall 2004
100
1000
10000
Per-node max-lifetime (seconds)
Active Replication
• Allow higher capacity nodes to replicate files
– On demand replication when high capacity
node receives query and download request
• Active replication can increase capacity of
nodes serving files from a factor of 38 to 50
22 - CS7701 – Fall 2004
Implementation
• Satisfaction Level
– Aggressiveness of Adaptation
– Exponential relationship between satisfaction
level S and adaptation interval I
– Define:
•
•
•
•
I = Adaptation interval
S = Satisfaction level
T = maximum interval between adaptation iterations
K = aggressiveness of adaptation interval
– Let I = T * K -(1-S)
23 - CS7701 – Fall 2004
Satisfaction Level
• Calculating Satisfaction
level
– S = 0 initially and if # of
neighbors is less than
predefined min
– Satisfaction Algorithm does
the following
• Adds up normalized capacity of
all neighbors
– High capacity neighbor with
low degree is worth more than
High capacity high degree
• Divide your capacity from total
to find S
• Returns S=1 if S > 1 or #
neighbors greater than
predefined max
24 - CS7701 – Fall 2004
Deployment
• Planet Lab
– Wide Area service deployment testbed in North
America, Europe, Asia and the South Pacific
– Deployed Gia on 83 clients
– Measured time to reach “steady state”
25 - CS7701 – Fall 2004
Related Work
• KaZaA
– At time of SIGCOMM little had been published on
KaZaA
– “Understanding KaZaA” Liang, et al. 2004
• CAP
– Cluster based approach to handle scaling in Gnutella
• Based on a central clustering server
• Clusters act as directory servers
• PierSearch
– Published in SIGCOMM 2004
– PIER + Gnutella
• PIER uses DHT for hard to find content and Gnutella for the
more popular
• Gnuetella2
– Aimed at fixing many of the problems with Gnutella
– Not created by Gnutella founders, causing some
controversy in the community
26 - CS7701 – Fall 2004
Conclusion
• Gia proves to be a scalable Gnutella
– 3 to 5 orders of magnitude improvement
• Unstructed system works well for popular
content
– DHT not necessary in most cases
• Working implementation on Planet Lab
27 - CS7701 – Fall 2004
28 - CS7701 – Fall 2004
29 - CS7701 – Fall 2004
30 - CS7701 – Fall 2004
31 - CS7701 – Fall 2004
32 - CS7701 – Fall 2004
Download