Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent CS 640 1 Peer-to-Peer Networks Overview • A peer-to-peer (P2P) network allows a community of users to pool their resources – Content – Storage – CPU,… • A P2P network is both decentralized and self-organizing – Just like the Internet itself! • Why do we care about these networks? – It is challenging to achieve decentralization and scalability at the same time. CS 640 2 Gnutella • One of the first decentralized P2P networks for file sharing • No central registry of objects. • Example topology of a Gnutella P2P network • Edges of the graph correspond to the relationship “A and B know each other” CS 640 3 Gnutella cont. • The simple idea in Gnutella is to distribute the method for finding data – Great idea! – Lots of fun architectural possibilities! • Gnutella is a distributed search protocol with a decentralized model – Clients can issue/view query results – Clients can serve/request data – Clients accept queries and respond with matches from their local data store CS 640 4 Gnutella Protocol • Protocol defines method of client communication – Set of descriptors used for communicating data – Set of rules governing inter-client exchange of descriptors • Descriptors – – – – Ping: active discovery of hosts on a network Pong: response to Ping includes client address and metadata Query: Ask for an object Query Response: response to Query includes info necessary to get data • A Gnutella client connects to network by establishing a connection with another client on the network – Finding another client is not part of Gnutella spec. • Host cache services are the typical way this is done CS 640 5 Gnutella Protocol • New client then creates connection to the Gnutella client and thereby becomes part of the network – Gnutella client can reject the connect request – Successful new client can then send/receive descriptors • Pings/pongs are then sent to establish network – No specification as to how much/often to probe – Network data can/is cached • Message routing should be well behaved – Ping/Query descriptors should be sent to all directly connected clients – Pong/Query Response descriptors should be sent back along same path – TTL is mechanism to limit distance • File downloads via HTTP/1.0 protocol via direct connect CS 640 6 Gnutella Protocol • New client then creates connection to the Gnutella client and thereby becomes part of the network – Gnutella client can reject the connect request – Successful new client can then send/receive descriptors • Pings/pongs are then sent to establish network – No specification as to how much/often to probe – Network data can/is cached • Message routing should be well behaved – Ping/Query descriptors should be sent to all directly connected clients – Pong/Query Response descriptors should be sent back along same path – TTL is mechanism to limit distance • File downloads via HTTP/1.0 protocol via direct connect CS 640 7 Gnutella’s Downside • Flooding does not scale well! • Alternatives: – Forward queries randomly or according to probability of success based on past results – Proactively replicate objects to make them easier to find – OR – Structured Overlays CS 640 8 Structured Overlays • Designed to conform to a particular graph structure that allows reliable and efficient object location. • However, with the cost of additional complexity in overlay construction and maintenance. • 2 questions to answer: – How do we map objects to nodes that should serve them? – How do we find an object? CS 640 9 Mapping Objects to Nodes • Hashing is used to map objects to n nodes • Regular hashing: Hash(x,n){ Return x % n } • Challenge: – What if a node joins the network? – What if a node leaves the network? – How do we choose the proper n? CS 640 10 Consistent Hashing • Hash a set of objects x uniformly across a large ID space • Each object is maintained on the node whose ID is closest • Advantages: – Distributes objects fairly evenly across nodes – Only a small number of objects have to move when a node joins or leaves CS 640 11 Consistent Hashing cont. • Example 0 14 Bucket/Node 4 12 8 CS 640 12 How to find an object? • Idea: Route the query message to the node that has the object – Each node maintains a routing table and the IP addresses of a small set of numerically larger and smaller node IDs. – Forward the query message to the node that is closer than you to the destination node – This is repeated until you get to destination CS 640 13 How to find an object: Cont. • Structured Overlays provide a logarithmic bound on the number of routing hops required to locate an object • However, Nodes might be arbitrarily far away from each other in the Internet! • Optimizations: – Route to the physically closest node when possible – Replicate data across the nodes CS 640 14